CN107004115B

CN107004115B - Method and system for recognition of face

Info

Publication number: CN107004115B
Application number: CN201480083717.5A
Authority: CN
Inventors: 汤晓鸥; 孙祎; 王晓刚
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2014-12-03
Filing date: 2014-12-03
Publication date: 2019-02-15
Anticipated expiration: 2034-12-03
Also published as: CN107004115A; WO2016086330A1

Abstract

The open device and method for being used for recognition of face.The equipment may include: extractor, be configured to have multiple cascade characteristic extracting modules, wherein each of cascade characteristic extracting module includes convolutional layer and full articulamentum.The convolutional layer is used to extract local feature from input facial image or in the feature extracted from the previous characteristic extracting module of cascade characteristic extracting module；Convolutional layer that the full articulamentum is connected in same characteristic features extraction module and global characteristics are extracted from the local feature of extraction.The equipment may also include identifier, determine for the distance between the global characteristics according to extraction: whether the one of image whether two facial images in input picture come from same identity or input picture as search facial image belongs to same identity with one of image in the facial image volume including input picture.

Description

Method and system for recognition of face

Technical field

This application involves the system and method for recognition of face.

Background technique

Recently, deep learning obtains immense success and is substantially better than in terms of recognition of face is using low-level features System.There are two noticeable breakthroughs.First is the extensive recognition of face for utilizing deep neural network.By by face figure As being categorized into thousands of or even millions of a identity, last hidden layer forms the feature for having height distinctive to identity.The Two are to supervise deep neural network using identification and validation task.Validation task is by the distance between the feature of same identity It minimizes, and reduces the variation inside individual.By combining the feature learnt from many human face regions, identifies-test in combination Card realizes that 99.15% face verification of currently existing technical level is accurate on the LFW facial recognition data collection of most extensive evaluation Property.

It has been directed to learn attributive classification device first, then carries out recognition of face using attribute forecast.In addition, extensively The classification based on rarefaction representation (sparse representation-based) is had studied generally and is blocked having (occlusions) recognition of face carried out in the case where.Robust Boltzmann machine has also been proposed to distinguish the pixel of damage And learn potential expression (representation).The design of these methods clearly handles the component blocked.

Summary of the invention

Existing at present is to learn attributive classification device first, then carries out recognition of face, but the application using attribute forecast Attempt inverse process: prediction identity first then predicts attribute using the identity correlated characteristic of study.It is observed that refreshing Feature in higher level through network has high selectivity to identity and identity association attributes (such as, gender and race).When When identity (it can be except training data) or attribute is presented, the subset of the feature constantly motivated can be identified, and It can identify another subset of continuous repressed feature.The feature of any subset in the two subsets effectively indicates the body Part or attribute in the presence/absence of, and the application show only single feature to the identification of specific identity or attribute also have compared with High accuracy.In other words, the feature in deep neural network has sparsity in identity and attribute.Although there is no instruct Deep neural network in the application distinguishes attribute during the training period, but they implicitly learn such level concepts.With Widely used hand-made feature (such as, higher-dimension LBP (local binary patterns)) is compared, and directly use is by depth nerve net The feature of network study has much higher classification accuracy in terms of identity association attributes.

With traditional classification based on rarefaction representation on the contrary, the application, which shows to be not added with during the training period, manually blocks mould In the case where formula (pattern), the deep neural network trained by neural network facial image has implicit coding not to blocking Denaturation.

It observes in this application, the sparsity by the feature of deep neural network study is appropriate.For input Facial image activates the approximately half of of the feature in the hidden layer of top.On the other hand, it is activated on approximately half of facial image Each feature.Such sparsity distribution can maximize the distance between distinguishing ability and image of deep neural network.No With the different subsets of identity activation feature.Two images of same identity have similar activation pattern.This excitation the application will be deep The real-valued binaryzation in the top hidden layer of neural network is spent, and is identified using binary code.As a result it obtains Unexpected effect.Only slightly decline is less than 1% to the verifying accuracy of LFW.It is to advising greatly caused by due to mass memory The search of mould face makes a significant impact, and saves and calculate the time.This also indicates that two-value activation pattern compares deep neural network In activation amplitude it is important.

In the one side of the application, the open equipment for being used for recognition of face.The equipment may include feature extractor and Recognition unit.Feature extractor is configured to have multiple cascade characteristic extracting modules, wherein every in characteristic extracting module A includes: convolutional layer, and the convolutional layer is used for from input facial image or from multiple cascade characteristic extracting modules Previous characteristic extracting module in extract local feature in the feature extracted；And full articulamentum, full articulamentum are connected to identical Convolutional layer in characteristic extracting module and global characteristics are extracted from the local feature of extraction.According to the global characteristics of extraction it Between distance, identifier is for determining: whether two facial images in input picture come from same identity or input picture A middle image as search facial image (probe face image) and the facial image volume including the input picture In one of image whether belong to same identity.

The convolution in fisrt feature extraction module in one embodiment of the application, in cascade characteristic extracting module Layer is connected to input facial image, and convolutional layer in each of subsequent characteristic extracting module is connected to previous feature extraction Convolutional layer in module.Full articulamentum in each characteristic extracting module is connected to the convolutional layer in same characteristic extracting module.

The equipment can also include training aids, and the training aids is configured to by that will identify supervisory signals and verifying prison Superintend and direct signal counter-propagate through cascade characteristic extracting module update each convolutional layer in same characteristic extracting module with it is right Answer the neuron weight in the connection between full articulamentum.

The process of update may include: that two facial images are input to neural network respectively, obtain two facial images In each of character representation；By the way that the character representation of facial image in the full articulamentum of each of neural network, each is divided Class calculates identification error to one in multiple identity；By verifying the respective of two facial images in each full articulamentum Whether character representation comes from same identity to calculate validation error, and identification error and validation error are considered identification supervision letter respectively Number and verifying supervisory signals；And all identification supervisory signals and verifying supervisory signals are counter-propagated through into neural network, To update the neuron weight of the connection between each convolutional layer in same characteristic extracting module and corresponding full articulamentum.This Application finds and proves three properties of the feature extracted in characteristic extracting module later, that is, sparsity, selectivity and Robustness, they are all very crucial to recognition of face, wherein the feature in each facial image has approximately half of zero and half It is zero and the time of half is positive that positive value and each feature have the approximately half of time in all people's face image, Say that feature has sparsity in this meaning；Face images for giving identity or containing given identity association attributes, With the feature for taking positive value (being activated) or zero (suppressed), to the extent that, feature is related to identity for identity Attribute (such as, gender and race) has selectivity；Feature has robustness for image damage (such as, blocking), wherein In the case where appropriate image damage, characteristic value is largely remained unchanged.

In the one side of the application, the open equipment for being used for recognition of face, comprising:

Extractor, including neural network, the neural network configuration at multiple cascade characteristic extracting modules, wherein Each of described cascade characteristic extracting module includes:

Convolutional layer, the convolutional layer are used for from the facial image of input or from the cascade characteristic extracting module Previous characteristic extracting module in extract local feature in the feature extracted；With

Full articulamentum is connected to the convolutional layer of same characteristic extracting module and extracts from the local feature of extraction complete Office's feature；And

Identifier is determined for the distance between the global characteristics according to extraction:

Whether two facial images in the facial image of the input come from same identity, or

As one of image of search facial image and including the people of the input in the facial image of the input Whether one of image in the facial image volume of face image belongs to same identity.

In one embodiment of the application, the volume of the fisrt feature extraction module in the cascade characteristic extracting module Lamination is configured to extract the local feature, and subsequent each characteristic extracting module from the facial image of the input In convolutional layer be connected to the convolutional layer in the previous characteristic extracting module in the cascade characteristic extracting module.

In one embodiment of the application, the full articulamentum in each characteristic extracting module is connected to the cascade spy Levy the convolutional layer in the same characteristic extracting module in extraction module.

In one embodiment of the application, the equipment further include:

Training aids is configured to by that will identify that supervisory signals and verifying supervisory signals counter-propagate through the cascade Characteristic extracting module update the neural weight for following connection:

Connection between convolutional layer in fisrt feature extraction module and the input layer of the facial image containing the input；

Second feature extraction module to the end each convolutional layer in characteristic extracting module into previous characteristic extracting module Correspondence convolutional layer between connection；And

The connection between each convolutional layer and corresponding full articulamentum in same characteristic extracting module.

In one embodiment of the application, the facial image of each input extracts in last characteristic extracting module Feature sparsely organized with 2D, the feature has approximately half of zero and half positive value, and each feature is in institute Having the approximately half of time on the facial image for the input having is zero and the time of half is positive.

In one embodiment of the application, the feature extracted in last characteristic extracting module is related to identity to identity Attribute has selectivity, so that for the facial image for giving identity or all inputs containing given identity association attributes It all has and is activated or repressed feature.

In one embodiment of the application, the identity association attributes include gender and/or race.

In one embodiment of the application, the feature extracted in the last characteristic extracting module has image damage There is robustness, wherein the value of the feature of the extraction largely remains unchanged in the case where appropriate image damage.

In one embodiment of the application, if it is determined that characteristic distance be less than threshold value, then the identifier is determined Two faces belong to same identity, or

If as one of image of search facial image and including the input in the facial image of the input Facial image facial image volume in one of image characteristic distance and the retrieval facial image to facial image The characteristic distance of every other facial image is compared to minimum in volume, it is determined that goes out them and belongs to same identity.

In one embodiment of the application, the characteristic distance include from by Euclidean distance, joint Bayes away from One selected in the group constituted from, COS distance and Hamming distance.

In one embodiment of the application, for individual human face image, the feature point that will be exported from each full articulamentum Class is to one in multiple identity, and wherein error in classification regards as the identification supervisory signals.

In one embodiment of the application, for the facial image of two comparisons, separately verify from each full articulamentum Whether the feature of output belongs to same identity with the facial image of the described two comparisons of determination, and wherein validation error regards as institute State verifying supervisory signals.

In the one side of the application, the open method for being used for recognition of face, comprising:

The local feature of the facial image of two or more inputs is extracted by the neural network after training；

Global characteristics are extracted from the local feature of extraction by the neural network after the training；

Determine the distance between the global characteristics extracted；And

It is determined according to determining distance:

Whether two facial images in the facial image of the input come from same identity to be used for face verification, or

As one of image of search facial image and including the input picture in the facial image of the input Facial image volume in one of image whether belong to same identity,

Wherein, the neural network includes multiple cascade characteristic extracting modules, and each characteristic extracting module has Convolutional layer, and wherein the convolutional layer in the fisrt feature extraction module in the cascade characteristic extracting module be connected to it is described The facial image of input, and the convolutional layer in each subsequent characteristic extracting module is connected in previous characteristic extracting module Convolutional layer.

In one embodiment of the application, each characteristic extracting module further includes full articulamentum, and each feature mentions Full articulamentum in modulus block is connected to the convolutional layer in same characteristic extracting module.

In one embodiment of the application, the method also includes:

By will identify supervisory signals and verifying supervisory signals counter-propagate through the cascade characteristic extracting module come Update the neural weight for connecting below:

Convolutional layer in the fisrt feature extraction module and between the input layer of the facial image containing the input Connection；

Second to the end each convolutional layer in characteristic extracting module to the correspondence convolutional layer in previous characteristic extracting module Between connection；And

In one embodiment of the application, the update further include:

Two facial images are input to the neural network respectively, obtain the character representation of each facial image；

It is multiple by the way that the character representation of facial image in the full articulamentum of each of the neural network, each to be classified as One in identity calculates identification error；

Individual features by verifying in each full articulamentum, two facial images indicate whether from same identity come Validation error is calculated, the identification error and the validation error are regarded as identification supervisory signals respectively and verifies supervisory signals； And

The identification supervisory signals and the verifying supervisory signals are counter-propagated through into the neural network simultaneously, with more It is used newly in the neural weight of following connection:

In one embodiment of the application, feature that each facial image extracts in the last characteristic extracting module It is sparsely organized with 2D, the feature has approximately half of zero and half positive value, and each feature is in all faces Having the approximately half of time on image is zero and the time of half is positive.

In one embodiment of the application, the feature extracted in the last characteristic extracting module is to identity and identity Association attributes have selectivity, so that for the facial image for giving identity or all inputs containing given identity association attributes It all has and is activated or repressed feature.

In one embodiment of the application, the determination further include:

If it is determined that characteristic distance be less than threshold value, it is determined that go out two faces belong to same identity, or

If as one of image of search facial image and including the input in the facial image of the input Facial image facial image volume in one of image characteristic distance and described search facial image to facial image The characteristic distance of every other facial image is compared to minimum in volume, it is determined that goes out them and belongs to same identity.

Detailed description of the invention

Exemplary non-limiting embodiments of the invention are described referring to the attached drawing below.Attached drawing is illustrative, and generally not In definite ratio.Same or like element on different figures quotes identical drawing reference numeral.

Fig. 1 is the schematic diagram for showing the equipment for recognition of face for meeting some open embodiments.

Fig. 2 is showing for the sparsity for showing the feature extracted in characteristic extracting module later, selectivity and robustness It is intended to.

Fig. 3 is the structure for showing the cascade nature extraction module in feature extractor and the input face figure in training aids The schematic diagram of picture and supervisory signals.

Fig. 4 is to show the sparsity of the activation feature (neuron) on independent facial image and in face images The signal histogram of the sparsity of the independent feature (neuron) of activation.

Fig. 5 is the signal histogram for showing selective activation and inhibition on the facial image of specific identity.

Fig. 6 is the signal histogram for showing selective activation and inhibition on the facial image containing particular community.

Fig. 7 is to show to have randomized block to the robustness of image damage for testing the feature extracted by feature extractor The schematic diagram of the facial image blocked.

Fig. 8 is the average characteristics shown in the case where various degrees of randomized block blocks on the facial image of independent identity The schematic diagram of activation.

Fig. 9 is the schematic flow diagram for showing the training aids as shown in Figure 1 for meeting some open embodiments.

Figure 10 is the schematic flow diagram for showing the feature extractor as shown in Figure 1 for meeting some open embodiments.

Figure 11 is the schematic flow diagram for showing the identifier as shown in Figure 1 for meeting some open embodiments.

Specific embodiment

Reference will now be made in detail to some specific embodiments of the present inventions, including for implementing the present invention expected from inventor Optimal mode.The example of these specific embodiments is shown in attached drawing.Although describing the present invention in conjunction with these specific embodiments, It should be understood that being not meant to limit the invention to the embodiments described herein.Contrary, it is intended to cover may include in such as appended power Alternative solution, modification and the equivalent in the spirit and scope of the present invention that sharp claim limits.It is listed perhaps in being described below More details, in order to provide the comprehensive understanding to the application.It can be in some or all of these no details In the case of practice the present invention.In other cases, well known process operation is not described in detail, in order to avoid unnecessarily make to this The understanding of invention generates obstacle.

Term used herein is not intended to limit the present invention merely for the sake of for the purpose of describing particular embodiments.It removes Non- context clearly indicates otherwise, and otherwise singular " one " used herein, "one" and " described " also may include multiple Number form formula.It should also be understood that term " includes " and/or " comprising " used in this specification are for illustrating that there are the features, whole Number, step, operations, elements, and/or components, but presence is not precluded or add other one or more features, integer, step, Operation, component, assembly unit and/or their combination.

Such as those skilled in the art it will be appreciated that, the present invention can be presented as system, method or computer program product.Cause This, the present invention can use following form: full hardware embodiment, full software implementation (including firmware, resident software, microcode Deng), or the software and hardware aspect group that usually will all can be described as " circuit ", " device ", " module " or " system " herein Embodiment altogether.In addition, the present invention can use the form of computer program product, the computer program product embodies In any tangible expression media, the medium has the computer usable program code embodied in the medium.

It should also be understood that such as the first and second etc. relational languages (if yes) are used alone, by entity, an item Mesh or movement are distinguished with another, and may not require or imply any practical pass between these entities, project or movement System or sequence.

Many functions in function of the present invention and many principles in the principle of the invention by software or integrate electricity when implementing (IC) is best supported on road, such as, digital signal processor and software or application-specific integrated circuit.Despite the presence of may largely make great efforts and The many design alternatives motivated by such as pot life, current techniques and economic consideration, it is anticipated that those skilled in the art Member will readily be able to when by concept disclosed herein and principle guidance using the least experiment such software instruction of generation or IC.Therefore, in order to succinct and minimize any risk of fuzzy principles and concepts according to the present invention, such software and IC's It is discussed further and (if yes) is limited to necessity principle and concept used in preferred embodiment.

Fig. 1 is the schematic diagram for showing the example devices 100 for recognition of face for meeting some open embodiments.Such as figure Shown, equipment 100 may include feature extractor 10 and identifier 20.Feature extractor 10 is configured to from input facial image Middle extraction feature.In one embodiment of the application, feature extractor 10 may include neural network, which can be with It is configured with multiple cascade characteristic extracting modules.Each characteristic extracting module in cascade includes convolutional layer and full connection Layer.Cascade characteristic extracting module can be implemented by software, integrated circuit (IC) or their combination.Fig. 3 shows feature extraction The schematic diagram of the structure of cascade characteristic extracting module in device 10.As shown, first in cascade characteristic extracting module Convolutional layer in characteristic extracting module is connected to input facial image, and the convolutional layer in subsequent each characteristic extracting module The convolutional layer being connected in previous characteristic extracting module.Full articulamentum in each characteristic extracting module is connected to same feature and mentions Convolutional layer in modulus block.

With reference to Fig. 1, in order to enable neural network effectively to work, equipment 100 further includes training aids 30,30 quilt of training aids It is configured to by that will identify that supervisory signals and verifying supervisory signals counter-propagate through cascade characteristic extracting module and update use In the neural weight of following connection:

Convolutional layer in fisrt feature extraction module and the connection between the input layer containing input facial image；

Second each convolutional layer in characteristic extracting module and the corresponding convolutional layer in previous characteristic extracting module to the end Between connection；And

The connection between each convolutional layer and corresponding full articulamentum in same characteristic extracting module,

So that the feature extracted in last/highest characteristic extracting module in cascade characteristic extracting module has Sparsity, selectivity and robustness, this will be discussed later.

Identifier 20 can be implemented by software, integrated circuit (IC) or their combination, and be configured to calculate from difference Facial image in the distance between the feature extracted, to determine whether two facial images come from same identity for face As one of image of search facial image and including the facial image of the input picture in verifying or input picture Whether an image in volume belongs to same identity.

Feature extractor 10

Feature extractor 10 contains multiple cascade characteristic extracting modules, and operates by different level from input face figure Feature is extracted as in.Fig. 3 shows the example of the structure of the cascade characteristic extracting module in feature extractor 10, for example, described Feature extractor includes four cascade characteristic extracting modules, and each characteristic extracting module includes convolutional layer Conv-n and full connection Layer FC-n, wherein n=1 ..., 4.Convolutional layer Conv-1 in the fisrt feature extraction module of feature extractor 10 is connected to defeated Enter facial image, as input layer, and the convolutional layer Conv-n in subsequent each characteristic extracting module of feature extractor 10 (n > 1) is connected to the convolutional layer Conv- (n-1) in previous characteristic extracting module.Each feature extraction mould of feature extractor 10 Full articulamentum FC-n in block is connected to the convolutional layer Conv-n in same characteristic extracting module.

Figure 10 is the schematic flow diagram for showing the characteristic extraction procedure in feature extractor 10.In step 201, feature mentions Take device 10 that will input the convolutional layer in all characteristic extracting modules that facial image propagated forward passes through feature extractor 10.With Afterwards, in step 202, output in each of convolutional layer is propagated forward in same characteristic extracting module by feature extractor 10 The full articulamentum of correspondence.Finally, in step 203, it is by output/table of the last one full articulamentum in full articulamentum It is shown as being characterized, as discussed below.

Convolutional layer in feature extractor 10 is configured to from input picture (for the first convolutional layer) or characteristic pattern (such as this The output characteristic pattern of previous convolutional layer (being maximum pond behind) as field is known) in extract local facial spy It levies (that is, the feature extracted from the regional area of input picture or input feature vector), to form the output feature of current convolutional layer Figure.Each characteristic pattern is a certain feature organized with 2D.In previous convolutional layer (followed by maximum pond) and current convolution It is identical in layer between corresponding input feature vector figure and output characteristic pattern in the case where neural connection weight set w having the same Feature in output characteristic pattern or in the regional area of same characteristic features figure is extracted from input feature vector figure.Volume in each convolutional layer Product operation can indicate are as follows:

Wherein xⁱAnd y^jIt is i-th of input feature vector figure and j-th of output characteristic pattern respectively.k^ijIt is i-th of input feature vector figure The convolution kernel exported between characteristic pattern with j-th.* convolution is indicated.b^jIt is the deviation of j-th of output characteristic pattern.Herein, will ReLU nonlinear function y=max (0, x) is used for neuron.Weight in the higher convolutional layer of ConvNets is locally shared. R indicates the regional area of shared weight.

It can be maximum pond after each convolutional layer, maximum pond is formulated into:

Wherein i-th of output characteristic pattern yⁱIn each neuron in i-th of input feature vector figure xⁱIn s × s non-overlap office Portion region upper storage reservoir.

Each of full articulamentum in feature extractor 10 is configured to from being obtained from same characteristic extracting module Global characteristics (feature extracted from the whole region of input feature vector figure) is extracted in the characteristic pattern of convolutional layer.In other words, Quan Lian It meets layer FC-n and extracts global characteristics from convolutional layer Conv-n.Full articulamentum also serve as receive during the training period supervisory signals and The interface of feature is exported during feature extraction.Full articulamentum can be formulated into:

Wherein x_iIndicate the output of i-th of neuron in previous convolutional layer (followed by maximum pond).y_jIndicate current Full articulamentum in j-th of neuron output.w_i,jIt is i-th of mind in previous convolutional layer (followed by maximum pond) Through the weight in the connection between j-th of neuron in member and current full articulamentum.b_jIt is in current full articulamentum The deviation of j-th of neuron.Max (0, x) is ReLU non-linear.

The feature extracted in last/highest characteristic extracting module of feature extractor 10 is (for example, FC- as shown in Figure 3 Those of in 4 layers) with sparsity, selectivity and robustness: each facial image feature have approximately half of zero and It is zero and the time of half is that half positive value and each feature have the approximately half of time in all people's face image Just, say that feature has sparsity from this two o'clock；For given identity or contain all face figures of given identity association attributes As having the feature for taking positive value (being activated) or zero (suppressed), to the extent that feature is related to identity for identity Attribute (such as, gender and race) has selectivity；Feature has robustness for image damage (such as, blocking), wherein In the case where appropriate image damage, characteristic value is largely remained unchanged.Sparse features can be turned and being compared with threshold value Change binary code into, wherein binary code can be used for recognition of face.

Fig. 2 shows three properties of the feature extracted in FC-4 layers: sparsity, selectivity and robustness.Show on the left of Fig. 2 OutBush(Bush) three facial images andBao Weier(Powell) a facial image on feature.BushSecond Facial image partial destruction.In one embodiment of the application, there are 512 features in FC-4 layers, Fig. 2 shows special from these To 32 progress double samplings in sign, to be shown as example.Feature sparsely activates on each facial image, wherein approximately half of Feature be positive and half is zero.The feature of the facial image of same identity has similar activation pattern, and is directed to different bodies Part is then different.The robustness of feature is: when presentation is blocked, such as showing on the face in second people of Bush, the activation of feature Mode largely remains unchanged.Face images (as background) are shown, are belonged on the right side of Fig. 2BushAll images, have All images of attribute " male " and the activation histogram with some selection features on all images of attribute " women ", it is special Sign is usually activated on the facial image of about half.But for all images for belonging to particular community identity, these features can Be constantly activated (or not activating).To the extent that feature has sparsity and selectivity to identity and attribute.

Appropriate sparsity on image distinguish the face of different identity can farthest, and the appropriateness in feature is dilute Thin property makes them have maximum distinguishing ability.46594 of verify data concentration are shown on the left of Fig. 4 (for example) in facial image The histogram of each (just) feature quantity of being activated, and the image that each feature is activated and (is positive) is shown on the right side of Fig. 4 The histogram of quantity.It assesses based on the feature extracted by FC-4 layers.In one embodiment of the application, with FC-4 layers in it is complete 512, portion (for example) feature is compared, and the average and standard deviation of the quantity of the neuron being activated on image is 292 ± 34, And compared with whole 46594 authentication images, the mean standard deviation of the quantity for the image that each feature is activated is 26565 ± 5754, they are all placed in the middle about in the half of all feature/images.

Activation pattern (that is, whether feature is activated and (has positive value)) is more important than accurate activation value.By taking threshold value Feature activation is converted into binary code and only sacrifices face verification accuracy less than 1%.This shows the excitation of feature Or holddown has contained most of discrimination property information.Binary code is both economical for storage and is rapidly used for figure As search.

The example of the activation histogram of the feature of given identity and attribute is shown respectively in Fig. 5 and Fig. 6.Given identity Histogram shows stronger selectivity.For given identity, some feature lasts are activated, and wherein histogram distribution is big In zero value, as shown in the front two row in Fig. 5；And some other feature lasts are suppressed, wherein histogram accumulation zero or At smaller value, as shown in rear two row in Fig. 5.As for attribute, every row of Fig. 7 show some association attributes (with sex, race and Those age-dependent attributes) on single feature histogram.It is selected on each attribute given on the left of every a line It is characterized in being activated.As shown in fig. 6, feature to sex, race and certain ages (such as, children and the elderly) show compared with Strong selectivity, wherein feature is effectively activated for given attribute, and is suppressed for other attributes of identical type. For some other attributes, such as, young and a middle-aged person, selectivity is weaker, without individually for each of these attributes Feature is activated, this is because the age does not correspond exactly to identity.For example, in recognition of face, feature for young and The same identity of middle age shooting has invariance.

Fig. 7 and Fig. 8 shows robustness of the feature to image damage of the extraction in subsequent characteristics extraction module (FC-4 layers). Facial image is blocked by the randomized block of all size from 10 × 10 to 70 × 70, as shown in Figure 7.Fig. 8 is shown with randomized block Average characteristics activation on the image blocked, wherein on each facial image for listing the single identity given at the top of it Average activation, wherein the left side of every a line gives various degrees of block.Characteristic value is mapped to color diagram, and wherein warm colour indicates Positive value and cool colour expression zero or smaller value.The sequence of the feature in figure in each column is respectively by the original face of each identity Feature activation value classification on image.It can such as find out in fig. 8, activation pattern largely remains unchanged (wherein most to be swashed Feature living is still activated and most repressed features are still suppressed), until occurring largely blocking.

Identifier 20

The operation of identifier 20 is special by the overall situation of the different faces image of the full articulamentum extraction of feature extractor 10 to calculate The distance between sign, to determine whether two facial images come from same identity for face verification, or determining input figure As one of image of search facial image and one of image in the facial image volume including input picture as in Whether same identity is belonged to, to be used for recognition of face.Figure 11 is the schematic flow diagram for showing the identification process of identifier 20.In step In rapid 301, identifier 20 calculates the feature extracted from different facial images by feature extractor 10 (that is, by full articulamentum The distance between the global characteristics of the different faces image of extraction).Finally, identifier 20 determines two face figures in step 302 Seem it is no from same identity with for face verification, or in step 303, determine in input picture as search face figure Whether one of image of picture belongs to same identity with one of image in the facial image volume including input picture, with For recognition of face.

In identifier 20, if the characteristic distance of two facial images is less than threshold value, it is determined that they belong to all over the body Part；Or compared with the search facial image to the characteristic distance of every other facial image volume, if search facial image and One characteristic distance in facial image volume is the smallest, it is determined that they belong to same identity.Wherein, true by identifier 20 Fixed characteristic distance can be Euclidean distance, joint Bayes's distance, COS distance, Hamming distance or any other away from From.

In one embodiment of the application, joint Bayes's distance is used as characteristic distance.Joint Bayes at For the universal similarity measurement of face, it indicates that extracted face characteristic x (is subtracted with the sum of two independent gaussian variables After mean value):

X=μ+ò, (5)

Wherein (0, S μ~N_μ) indicate face identity and ò~N (0, S_ò) indicate individual inside variation.In given individual P (x is assumed in internal or external variation₁,x₂∣H_I) and P (x₁,x₂∣H_E) in the case where, joint Bayes is general to the joint of two faces Rate is modeled.It is easy to show that the two probability are also the Gaussian Profile for being respectively provided with following variation from equation (5):

With

It can use EM algorithm and send and learn S in data_μAnd S_ò.In testing, likelihood ratio is calculated:

It has closing solution and effectively.

Training aids 30

Training aids 30 is used to through the company between the convolutional layer of input feature vector value extractor 10 and the neuron of full articulamentum Initial weight, multiple identification supervisory signals and the multiple verifying supervisory signals connect update the convolutional layer of feature extractor 10 and complete The weight w of connection between the neuron of articulamentum, so that the last one in cascade nature extraction module in extractor The feature that extraction module extracts has sparsity, selectivity and robustness.

As shown in figure 3, identification supervisory signals in training aids 30 and verifying supervisory signals (be expressed as " Id " and " Ve ") while being added to the full connection of each of the full articulamentum FC-n of each characteristic extracting module in feature extractor 10 Layer, wherein n=1 ..., 4, and propagate backward to input facial image respectively, extract mould to update all cascade natures The weight of connection between the neuron of block.

By the way that all full articulamentum expression/outputs (that is, formula (4)) of individual human face image are classified as N number of identity One, so that the identification supervisory signals " Id " for being used for training aids 30 are generated, wherein error in classification is used as identification supervisory signals.

Full articulamentum by separately verifying the facial image of two comparisons in each characteristic extracting module indicates, determines Whether the facial image of two comparisons belongs to same identity, so that the verifying supervisory signals in training aids 30 are generated, wherein will test It demonstrate,proves error and is used as verifying supervisory signals.Given a pair of of training facial image, feature extractor 10 is respectively from each feature extraction mould Two characteristic vector f are extracted in two facial images of block_iAnd f_j.If f_iAnd f_jIt is the feature of the facial image of same identity, So validation error isOr if f_iAnd f_jIt is the feature of the facial image of different identity, then validation error It isWherein | | f_i-f_j||₂It is the Euclidean distance of two characteristic vectors, m is regime values. If f_iAnd f_jIt is dissimilar for same identity, or if f_iAnd f_jIt is similar for different identity, it is missed then existing Difference.

Fig. 9 is the schematic flow diagram for showing the training process of training aids 30.In a step 101, training aids 30 is to two people Face image samples and they is input to feature extractor 10 respectively, to obtain in all full articulamentums of feature extractor 10 , the character representations of two facial images.Then, in a step 102, training aids 30 will be by will be in each full articulamentum, every The character representation of a facial image is classified as one in multiple (N number of) identity to calculate identification error.Meanwhile in step 103 In, training aids 30 is indicated whether by verifying the individual features of in each full articulamentum, two facial images from all over the body Part calculate validation error.Identification error and validation error are used as identification supervisory signals and verifying supervisory signals respectively.In step In rapid 104, all identification supervisory signals and verifying supervisory signals are counter-propagated through feature extractor simultaneously by training aids 30 10, to update the weight of the connection between the neuron in feature extractor 10.Full articulamentum FC-n will be added to (wherein simultaneously N=1,2,3,4) identification supervisory signals and verifying supervisory signals (or error) counter-propagate through the grade of characteristic extracting module Connection, until traveling to input picture.After backpropagation, by error obtained in every layer in the cascade of characteristic extracting module It is cumulative.The weight in the connection between the neuron in feature extractor 10 is updated according to the size of error.Finally, in step In 105, training aids 30 assesses whether training process restrains, without convergence point is reached, then repeatedly step 101 to 104.

All components or step in the appended claims add the counter structure of function element, material, act and wait Effect object be intended to include: for special requirement protection, other claimed elements combine execute function any structure, Material or movement.The description of this invention is merely for the sake of the purpose of illustration and description above, and is not exhaustion and simultaneously Non- is to limit the invention to disclosed form.Without departing from the scope and spirit of the present invention, the technology of this field Personnel should understand many modifications and variations.By the way that above-mentioned embodiment is chosen and described, it is therefore intended that best illustrate this The principle and practical application of invention, and enable those skilled in the art to be suitable for the various implementations of expected special-purpose Example and various changes are to understand the present invention.

Claims

1. a kind of equipment for recognition of face comprising:

Extractor, including neural network, the neural network configuration is at multiple cascade characteristic extracting modules, wherein described Each of cascade characteristic extracting module includes:

Convolutional layer, before the convolutional layer is used for from the facial image of input or from the cascade characteristic extracting module Local feature is extracted in the feature extracted in one characteristic extracting module；With

Full articulamentum is connected to the convolutional layer of same characteristic extracting module and extracts from the local feature of extraction global special Sign；And

As one of image of search facial image and including the face figure of the input in the facial image of the input Whether one of image in the facial image volume of picture belongs to same identity,

Wherein,

If it is determined that characteristic distance be less than threshold value, then the identifier determines that two faces belong to same identity, or

If the people in the facial image of the input as one of image of search facial image and including the input The characteristic distance and described search facial image of one of image in the facial image volume of face image are into facial image volume The characteristic distance of every other facial image is compared to minimum, it is determined that goes out them and belongs to same identity.

2. equipment according to claim 1, wherein the fisrt feature extraction module in the cascade characteristic extracting module Convolutional layer be configured to extract the local feature, and subsequent each feature extraction from the facial image of the input Convolutional layer in module is connected to the convolutional layer in the previous characteristic extracting module in the cascade characteristic extracting module.

3. equipment according to claim 2, further include:

Training aids is configured to by that will identify that supervisory signals and verifying supervisory signals counter-propagate through the cascade spy Extraction module is levied to update the neural weight for following connection:

Second feature extraction module to the end each convolutional layer in characteristic extracting module to pair in previous characteristic extracting module Answer the connection between convolutional layer；And

4. equipment according to claim 3, wherein the facial image of each input is in last characteristic extracting module The feature of extraction is sparsely organized with 2D, and the feature has half zero and half positive value, and each feature is in institute Time on the facial image for the input having with half is zero and the time of half is positive.

5. equipment according to claim 3, wherein the feature extracted in last characteristic extracting module is to identity and identity Association attributes have selectivity, so that for the face for giving identity or all inputs containing given identity association attributes Image, which all has, to be activated or repressed feature.

6. equipment according to claim 5, wherein the identity association attributes include gender and/or race.

7. equipment according to claim 1, wherein the characteristic distance includes from by Euclidean distance, joint Bayes One selected in the group that distance, COS distance and Hamming distance are constituted.

8. equipment according to claim 3, wherein

For individual human face image, by the tagsort exported from each full articulamentum to one in multiple identity, wherein dividing Class error regards as the identification supervisory signals.

9. equipment according to claim 3 separately verifies wherein being directed to the facial image of two comparisons from each full connection Whether the feature of layer output, belong to same identity with the facial image of the described two comparisons of determination, wherein validation error is regarded as The verifying supervisory signals.

10. a kind of method for recognition of face comprising:

Determine the distance between the global characteristics extracted；And

It is determined according to determining distance:

As one of image of search facial image and including the people of the input picture in the facial image of the input Whether one of image in face image volume belongs to same identity,

Wherein, the neural network includes multiple cascade characteristic extracting modules, and each characteristic extracting module has convolution Layer, and wherein the convolutional layer in the fisrt feature extraction module in the cascade characteristic extracting module is connected to the input Facial image, and the convolutional layer in each subsequent characteristic extracting module is connected to the convolution in previous characteristic extracting module Layer,

And wherein,

If the people in the facial image of the input as one of image of search facial image and including the input The characteristic distance and described search facial image of one of image in the facial image volume of face image are into facial image volume The characteristic distance of every other facial image is compared to minimum, it is determined that and go out them and belongs to same identity,

Wherein, each characteristic extracting module further includes full articulamentum, the full articulamentum connection in each characteristic extracting module To the convolutional layer in same characteristic extracting module.

11. according to the method described in claim 10, further include:

By that will identify that supervisory signals and verifying supervisory signals counter-propagate through the cascade characteristic extracting module and update Neural weight for connecting below:

Convolutional layer in the fisrt feature extraction module and the connection between the input layer of the facial image containing the input；

12. according to the method for claim 11, wherein the update further include:

By the way that the character representation of facial image in the full articulamentum of each of the neural network, each is classified as multiple identity In one calculate identification error；

Individual features by verifying in each full articulamentum, two facial images indicate whether to calculate from same identity The identification error and the validation error are regarded as identification supervisory signals respectively and verify supervisory signals by validation error；And

The identification supervisory signals and the verifying supervisory signals are counter-propagated through into the neural network simultaneously, to update use In the neural weight of following connection:

13. according to the method for claim 11, wherein each facial image extracts in the last characteristic extracting module Feature sparsely organized with 2D, the feature has half zero and half positive value, and each feature is in owner Time in face image with half is zero and the time of half is positive.

14. according to the method for claim 11, wherein the feature extracted in the last characteristic extracting module is to identity There is selectivity with identity association attributes, so that for the people for giving identity or all inputs containing given identity association attributes Face image, which all has, to be activated or repressed feature.

15. according to the method for claim 14, wherein the identity association attributes include gender and/or race.