CN105550657A

CN105550657A - Key point based improved SIFT human face feature extraction method

Info

Publication number: CN105550657A
Application number: CN201510977092.3A
Authority: CN
Inventors: 李伟; 王璐; 冯复标
Original assignee: Beijing University of Chemical Technology
Current assignee: Beijing University of Chemical Technology
Priority date: 2015-12-23
Filing date: 2015-12-23
Publication date: 2016-05-04
Anticipated expiration: 2035-12-23
Also published as: CN105550657B

Abstract

The invention discloses a key point based improved SIFT human face feature extraction method. Five key points are described by utilizing a directional histogram through positioning five key pixel points in a human face so as to form robust human face image feature vectors. A score of the similarity between two human face feature vectors is calculated in combination with a bilinear similarity function and a Mahalanobis distance. The score of the similarity is subjected to binary classification by adopting a KELM classifier; for human face pictures with relatively high scores, two human face pictures are both judged to be from a same person; and for human face pictures with relatively low scores, two human face pictures are both judged to be from different persons. In a process of human face identification based on the human face feature vectors, the score of the similarity between the two feature vectors is calculated in combination with the bilinear similarity function and the Mahalanobis distance, so that the distinguishability of classes is enhanced.

Description

Based on the improvement SIFT face feature extraction method of key point

Technical field

The present invention relates to a kind of improvement SIFT (Scale invariant features transform) face feature extraction method based on key point, belong to field of face identification.

Background technology

Recognition of face is the biological identification technology that a kind of face feature information based on people carries out identification.Compared with other biological feature, face characteristic has natural sex, convenience and the advantage such as untouchable, makes it in security monitoring, authentication, man-machine interaction etc., have huge application prospect.Therefore, face recognition technology very has researching value.Generally speaking, face recognition process is divided into two processes: face characteristic extracts and human face similarity degree score value calculates.Face characteristic leaching process is that some key features extracting face picture form face feature vector, human face similarity degree score value computation process is the similarity between calculating two face feature vector, similarity is higher, shows that two face picture more likely come from same person, otherwise, then more show that two face picture come from different people.In some cases, more it is of concern that face characteristic Extraction parts.

Existing face feature extraction method comprises LBP (local binary patterns) method and its mutation method etc., these Local textural feature extracting method form histogram vectors by carrying out block statistics to whole face picture, and the histogram vectors cascade of each block is finally formed face feature vector.Because this method carries out Local textural feature extraction to whole face, therefore, its feature vector dimension formed is larger, and wherein contains some redundant informations.In addition, this mode is for the change not robust of expression or attitude under complex environment.

SIFT feature extracting method has been widely used in the identification of general object, and its main thought finds the key point of image under different scale, and describe the proper vector of key point as image with direction histogram.But, when SIFT method is used for facial image, accurately can not navigates to the key point in face, because it is mainly applicable to the identification of the general object with higher contrast, and there is between facial image higher similarity.

Summary of the invention

Fundamental purpose of the present invention is to provide a kind of improvement SIFT face feature extraction method based on key point.

Be different from traditional feature extracting method based on whole face, concrete innovative point of the present invention is to have employed the method that the improvement SIFT face characteristic based on key point extracts.By the crucial pixel of five in locating human face, and utilize the direction histogram in SIFT method to describe this five key points, thus form the facial image proper vector of robust.Described five crucial pixels are respectively the pixel in left eye centre position, the pixel in right eye centre position, supratip pixel, the pixel of the left corners of the mouth and the pixel of the right corners of the mouth.

Technical scheme of the present invention specifically, mainly comprises following technology contents:

1, five the crucial pixels (pixel in left eye centre position, the pixel in right eye centre position, supratip pixel, the pixel of the left corners of the mouth and the pixel of the right corners of the mouth) in three layer depth convolutional network cascade locating human face are adopted.

2, SIFT feature extracting method is improved: replace with the crucial pixel of five in face the key point that SIFT feature extracting method self detects, reduce characteristic dimension, the redundant information in rejecting characteristic.

3, by the maps feature vectors of face in intra-personal subspace, ensure same person different face picture between there is unchangeability in class.

4, the similarity score value between two face feature vector is calculated in conjunction with bilinearity similarity function and mahalanobis distance.This score value is higher, show that two face picture more may come from same person, otherwise this score value is lower, shows that two face picture more may come from different people.

5, KELM sorter (extreme learning machine based on core) is adopted to carry out two-value classification to similarity score value, the class face picture that score value is higher, two face picture are all judged to and come from same person, and the class face picture that score value is lower, two face picture are all judged to and come from different people.

As shown in Figure 1, implementing procedure is as follows for process flow diagram of the present invention:

Step 1, reading face picture, and utilize five crucial pixel (pixels in left eye centre position on three layer depth convolutional network cascade locating human face pictures, the pixel in right eye centre position, supratip pixel, the pixel of the left corners of the mouth and the pixel of the right corners of the mouth).

Degree of depth convolutional network cascade used in this step comprises three layers.Ground floor utilizes degree of depth convolutional network accurately to locate five crucial pixels, and other two-layer convolutional network positioning results to ground floor that utilize are reaffirmed.In order to ensure the accuracy of locating, in every layer, the positioning result of each degree of depth convolutional network is merged the positioning result be averaged as final.Deep layer convolutional network comprises four convolutional layers, pond layer and two full articulamentums, initial layers obtains the global context information of face picture, because convolutional network predicts five crucial pixels simultaneously, so relative position between each crucial pixel is also encoded while convolutional network training, and then weaken the impact that expression shape change, illumination variation and other environmental factors cause.

Step 2, for five the crucial pixels extracted in step 1, utilize SIFT method to carry out feature interpretation, extract the feature of face picture.

SIFT feature extracting method is a kind of method detecting local feature, not only has scale invariability, has rotational invariance simultaneously.The method generally comprises Four processes: (1) builds metric space, detects key point; (2) unstable key point is rejected; (3) be key point assignment direction parameter; (4) descriptor of key point is generated.Generally speaking, SIFT feature extracting method is applicable to the identification of the general object with higher contrast, and face picture has lower contrast and skirt response, because SIFT feature extracting method can not key point accurately in locating human face's picture, therefore replace step (1) in SIFT feature extracting method and step (2) by the method for locator key pixel in step 1.

Then, for five the crucial pixels obtained in step 1, get some pixels in each crucial pixel field, calculate gradient modulus value and the direction of each pixel.The coordinate defining certain pixel is P (x, y):

m (x, y) = \sqrt{{(P (x + 1, y) - P (x - 1, y))}^{2} + {(P (x, y + 1) - P (x, y - 1))}^{2}}

θ(x,y)＝tan ^-1((P(x,y+1)-P(x,y-1))/(P(x+1,y)-P(x-1,y)))

Wherein, the gradient modulus value that m (x, y) is this pixel, the gradient direction that θ (x, y) is this pixel.

According to the result of calculation of above formula, utilize the gradient direction of pixel in statistics with histogram field.For reducing sudden change impact, need with Gaussian function smoothing to histogram.So, histogrammic peak value represents the gradient principal direction of crucial pixel field pixel, is also the direction of crucial pixel.

In order to keep rotational invariance, by the direction that X-axis rotate is crucial pixel, then centered by key point, get the field window of 16 × 16 sizes, in the grid of every 4 × 4 sizes, calculate the histogram of 8 gradient directions, finally form the SIFT face feature vector of 4 × 4 × 8=128 dimension.

Step 3, by the maps feature vectors that obtains in step 2 in intra-personal subspace.

In this step, in order to the impact of attenuating noise, first need the proper vector obtained in step 2 to be utilized PCA method (principal component analysis (PCA)) to carry out dimensionality reduction, morphogenesis characters face.Its covariance matrix expression formula is as follows:

C = Σ_{i = 1}^{n} (x_{i} - m) {(x_{i} - m)}^{T}

Wherein, n is face sample size, x _irepresent face vector, m is the average of n face vector.Because covariance matrix describes the correlativity between vector, therefore the proper vector of above-mentioned covariance matrix forms mapping matrix, maps face image data according to mapping matrix, can morphogenesis characters face.Then, in order to ensure unchangeability in the class between the face picture that same person is different, eigenface be mapped in intra-personal subspace, its covariance matrix expression formula is as follows:

C_{S} = \underset{(i, j) &Element; S}{Σ} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T}

Wherein, S represents the face picture set of same person, x _iand x _jrepresent the face vector of two face picture different in the face picture set of same person.∧={ λ ₁..., λ _kand V={v ₁..., v _krepresent front k eigenwert and the proper vector of above formula covariance matrix respectively.Similarly, front k proper vector forms mapping matrix, above-mentioned eigenface data is mapped by this mapping matrix, thus unchangeability in class between the face picture ensureing same person.If C _sbe reversible, so, eigenface is mapped to intra-personal subspace following formula and expresses:

L_{S} = V d i a g (λ_{1}^{1 / 2}, ..., λ_{k}^{1 / 2})

\tilde{x} = L_{S}^{- 1} x

Wherein, V is the mapping matrix formed by an above-mentioned k proper vector, the diagonal matrix be made up of an above-mentioned k eigenwert, the i.e. final eigenmatrix formed.

Step 4, bilinearity similarity function and mahalanobis distance is utilized to calculate similarity score value between two face feature vector.

Mahalanobis distance is widely used in field of face identification, but its recognition effect is not good especially, and research in recent years shows, bilinearity similarity function has obtained good effect at picture analogies degree search field.Therefore, in this method, calculate the similarity score value between two face feature vector in conjunction with bilinearity similarity function and mahalanobis distance, its expression formula is as follows:

f_{(M, G)} ({\tilde{x}}_{i}, {\tilde{x}}_{j}) = S_{G} ({\tilde{x}}_{i}, {\tilde{x}}_{j}) - d_{M} ({\tilde{x}}_{i}, {\tilde{x}}_{j})

S_{G} ({\tilde{x}}_{i}, {\tilde{x}}_{j}) = x_{i}^{T} {Gx}_{j}

d_{M} ({\tilde{x}}_{i}, {\tilde{x}}_{j}) = {(x_{i} - x_{j})}^{T} M (x_{i} - x_{j})

Wherein, represent with bilinearity similarity function between eigenmatrix, represent with mahalanobis distance between eigenmatrix.G and M is the matrix of k × k size, needs to train suitable M and G to ensure the maximum distinguishability ensured while unchangeability in class between class as far as possible.Therefore, the expression formula that intra-personal subspace similarity measure learns is defined as following form:

\underset{M, G &Element; S^{d}}{m i n} \underset{t &Element; P}{Σ} ξ_{t} + \frac{γ}{2} (| | M - I | |_{F}^{2} + | | G - I | |_{F}^{2})

\begin{matrix} s . t . & y_{i j} \cdot [f_{(M, G)} ({\tilde{x}}_{i}, {\tilde{x}}_{j})] &GreaterEqual; 1 - ξ_{i j} \end{matrix}

ξ_{t} &GreaterEqual; 0, &ForAll; t = (i, j) &Element; P = S \cup D

Wherein, S with D represents similar face to (i.e. two face picture of same person) and dissimilar face respectively to the label of (i.e. two face picture of different people).|| .|| _fthe F norm of matrix, i.e. the quadratic sum of matrix element absolute value evolution again.Its effect is similar to 2 norms of vector, therefore, and expression formula effectively Expired Drugs is prevented while unchangeability in guarantee class.ξ _tbe the loss function that experience differentiates, minimize this parameter and can strengthen distinguishability between class.As can be seen here, ξ _tensure that the maximum distinguishability between class, and ensure that unchangeability in class, and positive number γ is used for coordinating the impact that this two expression formulas are brought.For the inequality in above-mentioned expression formula, when a pair face picture comes from same person, y _ij=1, and ξ _ijvalue less, so value can be large as far as possible.And when a pair face picture comes from different people, y _ij=-1, and ξ _ijvalue less, so value can be little as far as possible.Therefore, when when being worth larger, then show that a pair face picture comes from same person, otherwise, then show that a pair face comes from different people.

Step 5, whether two pictures come from same person to utilize KELM sorter to judge.

ELM is the neural network only comprising a hidden layer and an output layer.Its most outstanding feature is that its hidden layer parameter does not need to debug, but random setting, there is stronger generalization ability.Suppose that the hidden layer of ELM comprises L node, then its output function is as follows:

f_{L} (x) = Σ_{i = 1}^{L} β_{i} h (ω_{i}, b_{i}, x) = h (x) β = y

Wherein, x ∈ R ^d, y ∈ R ^c, β represents L of the hidden layer weight between node and output layer, and h (x) represents the relation between L node and input x, it is a nonlinear excitation function (as sigmoid function), in fact, its effect d dimension data is mapped in L dimension data space, w _irepresent the connection weight of hidden layer i-th node and input layer, b _irepresent the deviation of hidden layer i-th node.

On the basis of ELM, also been proposed a kind of ELM method based on kernel function, i.e. KELM method.The method carrys out the original excitation function H of hiding ELM by kernel function, thus improve the generalization ability of algorithm better.For a certain sample x _i, it is as follows that its output function expresses formula:

f_{L} (x_{i}) = {[\begin{matrix} K (x_{i}, x_{1}) \\ . \\ . \\ . \\ K (x_{i}, x_{n}) \end{matrix}]}^{T} {(\frac{I}{C} + K)}^{- 1} Y

Y＝[y ₁；...；y _n]∈R ^n×c

Wherein, C is a regression coefficient.

Using the similarity score value that obtains in step 4 as the input of KELM sorter, if the output obtained 1, then show that two face picture come from same person, if 0, then show that two face picture come from different people.

Compared with prior art, tool of the present invention has the following advantages:

Be different from traditional feature extracting method based on whole face, concrete innovative point of the present invention is to have employed the method that the improvement SIFT face characteristic based on key point extracts.By the pixel in face left eye centre position, the pixel in right eye centre position, supratip pixel, the pixel of the left corners of the mouth and the pixel of the right corners of the mouth are as five crucial pixels of whole face, and extract the proper vector of SIFT feature as face of these five key points, not only on the basis reducing intrinsic dimensionality, improve operation efficiency, and weaken due to the impact on recognition effect such as illumination, expression shape change and other environmental factors.In addition, the basis of face feature vector is carried out in the process of recognition of face, combine the similarity score value that bilinearity similarity function and mahalanobis distance calculate two proper vectors, enhance the distinguishability between class, simultaneously, in order to the similarity unchangeability between considering in class, face feature vector is mapped in intra-personal subspace.By cross-validation experiments, the retrievable face recognition accuracy rate of the present invention can reach 80.56%.

Accompanying drawing explanation

Fig. 1 is particular flow sheet of the present invention.

Embodiment

The basic procedure of the improvement SIFT face feature extraction method based on key point of the present invention as shown in Figure 1, specifically comprises the following steps:

1) data in face database are divided into 10 groups and carry out cross-validation experiments, wherein, 9 groups of data, as training data, remain 1 group of data as test data, often comprise 300 in group to the face picture and 300 from same person to the face picture from different people.For often opening face picture, three layer depth convolutional network cascades are utilized to locate the coordinate position (pixel in left eye centre position of five crucial pixels, the pixel in right eye centre position, supratip pixel, the pixel of the left corners of the mouth and the pixel of the right corners of the mouth).

In the ground floor of three layer depth convolutional network cascade structures, comprise three degree of depth convolutional network.The input of first degree of depth convolutional network is whole face, exports the position of five crucial pixels; The input of second degree of depth convolutional network is the first half of face, exports the position of the crucial pixel of eyes and nose position; The input of the 3rd degree of depth convolutional network is the latter half of face, exports the position of the crucial pixel of nose and face position.Finally, the Output rusults of these three degree of depth convolutional network is averaged obtains the final output of ground floor.The second layer of deep layer convolutional network cascade and third layer all get field in front one deck Output rusults around crucial pixel as input, reaffirm the coordinate position of crucial pixel, are supplementing ground floor Output rusults.

2) for step 1) in five crucial pixels, centered by each crucial pixel, get the neighborhood window size of 14 × 14, calculate gradient modulus value and the direction of each pixel in this window.Then the histogram in statistical gradient direction.In order to reduce the impact of sudden change, be that the Gaussian function of 1.5 × 14 is to the smoothing process of histogram with parameter σ.The last direction determining each crucial pixel with the gradient direction distribution characteristic of each crucial neighborhood of pixel points pixel.

After determining the direction of crucial pixel, be the direction of crucial pixel by X-axis rotate, to guarantee rotational invariance.Then centered by crucial pixel, get the neighborhood window size of 16 × 16, on the fritter of every 4 × 4 sizes, then calculate the histogram of gradients in 8 directions, finally, each proper vector forms the descriptor of 4 × 4 × 8=128 dimension.Because the crucial pixel number often opening face picture in the present invention is 5, therefore, the proper vector of often opening face picture is 5 × 128=640 dimension, this not only greatly reduces intrinsic dimensionality, and weaken the impact that expression shape change, illumination variation and other environmental factors etc. cause, enhance the robustness of face characteristic.

3) by step 2) in the proper vector that obtains utilize PCA (principal component analysis (PCA)) method to carry out dimensionality reduction, obtain front 400 main variables, form the eigenface of 400 dimensions.Then, in order to ensure unchangeability in the class between the face picture that same person is different, eigenface be mapped in intra-personal subspace, wherein, the intrinsic dimensionality in intra-personal space gets 300.

4) the M parameter and G parameter that obtain in bilinearity similarity function and mahalanobis distance is trained by 9 groups of training datas.Similarity score value in the M parameter utilizing training process to obtain and G parameter calculation training data and test data between each pair of face picture.

5) whether two pictures come from same person to utilize KELM sorter to judge.In KELM sorter, the present invention selects Radial basis kernel function (RBF) as kernel function, regression coefficient C=1024.Using step 4) in similarity score value between two face picture obtaining as the input of KELM sorter, if the sorter obtained exports 1, then show that two face picture come from same person, if 0, then show that two face picture come from different people.

Claims

1., based on the improvement SIFT face feature extraction method of key point, it is characterized in that: the implementing procedure of this method is as follows,

Step 1, reading face picture, and utilize five crucial pixels on three layer depth convolutional network cascade locating human face pictures; These five pixels are respectively the pixel in left eye centre position, the pixel in right eye centre position, supratip pixel, the pixel of the left corners of the mouth and the pixel of the right corners of the mouth;

Degree of depth convolutional network cascade used in this step comprises three layers; Ground floor utilizes degree of depth convolutional network accurately to locate five crucial pixels, and other two-layer convolutional network positioning results to ground floor that utilize are reaffirmed; In order to ensure the accuracy of locating, in every layer, the positioning result of each degree of depth convolutional network is merged the positioning result be averaged as final; Deep layer convolutional network comprises four convolutional layers, pond layer and two full articulamentums, initial layers obtains the global context information of face picture, because convolutional network predicts five crucial pixels simultaneously, so relative position between each crucial pixel is also encoded while convolutional network training, and then weaken the impact that expression shape change, illumination variation and other environmental factors cause;

Step 2, for five the crucial pixels extracted in step 1, utilize SIFT method to carry out feature interpretation, extract the feature of face picture;

SIFT feature extracting method is a kind of method detecting local feature, not only has scale invariability, has rotational invariance simultaneously; The method generally comprises Four processes: (1) builds metric space, detects key point; (2) unstable key point is rejected; (3) be key point assignment direction parameter; (4) descriptor of key point is generated; Generally speaking, SIFT feature extracting method is applicable to the identification of the general object with higher contrast, and face picture has lower contrast and skirt response, because SIFT feature extracting method can not key point accurately in locating human face's picture, therefore replace step (1) in SIFT feature extracting method and step (2) by the method for locator key pixel in step 1;

Then, for five the crucial pixels obtained in step 1, get some pixels in each crucial pixel field, calculate gradient modulus value and the direction of each pixel; The coordinate defining certain pixel is P (x, y):

m (x, y) = \sqrt{{(P (x + 1, y) - P (x - 1, y))}^{2} + {(P (x, y + 1) - P (x, y - 1))}^{2}}

θ(x,y)＝tan ^-1((P(x,y+1)-P(x,y-1))/(P(x+1,y)-P(x-1,y)))

Wherein, the gradient modulus value that m (x, y) is this pixel, the gradient direction that θ (x, y) is this pixel;

According to the result of calculation of above formula, utilize the gradient direction of pixel in statistics with histogram field; For reducing sudden change impact, need with Gaussian function smoothing to histogram; So, histogrammic peak value represents the gradient principal direction of crucial pixel field pixel, is also the direction of crucial pixel;

In order to keep rotational invariance, by the direction that X-axis rotate is crucial pixel, then centered by key point, get the field window of 16 × 16 sizes, in the grid of every 4 × 4 sizes, calculate the histogram of 8 gradient directions, finally form the SIFT face feature vector of 4 × 4 × 8=128 dimension;

Step 3, by the maps feature vectors that obtains in step 2 in intra-personal subspace;

In this step, in order to the impact of attenuating noise, first need the proper vector obtained in step 2 to be utilized PCA method (principal component analysis (PCA)) to carry out dimensionality reduction, morphogenesis characters face; Its covariance matrix expression formula is as follows:

C = Σ_{i = 1}^{n} (x_{i} - m) {(x_{i} - m)}^{T}

Wherein, n is face sample size, x _irepresent face vector, m is the average of n face vector; Because covariance matrix describes the correlativity between vector, therefore the proper vector of above-mentioned covariance matrix forms mapping matrix, maps face image data according to mapping matrix, gets final product morphogenesis characters face; Then, in order to ensure unchangeability in the class between the face picture that same person is different, eigenface be mapped in intra-personal subspace, its covariance matrix expression formula is as follows:

C_{S} = \underset{(i, j) &Element; S}{Σ} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T}

Wherein, S represents the face picture set of same person, x _iand x _jrepresent the face vector of two face picture different in the face picture set of same person; ∧={ λ ₁..., λ _kand V=(v ₁..., v _k) represent front k eigenwert and the proper vector of above formula covariance matrix respectively; Similarly, front k proper vector forms mapping matrix, above-mentioned eigenface data is mapped by this mapping matrix, thus unchangeability in class between the face picture ensureing same person; If C _sbe reversible, so, eigenface is mapped to intra-personal subspace following formula and expresses:

L_{S} = V d i a g (λ_{1}^{1 / 2}, ..., λ_{k}^{1 / 2})

\tilde{x} = L_{S}^{- 1} x

Wherein, V is the mapping matrix formed by an above-mentioned k proper vector, the diagonal matrix be made up of an above-mentioned k eigenwert, the i.e. final eigenmatrix formed;

Step 4, bilinearity similarity function and mahalanobis distance is utilized to calculate similarity score value between two face feature vector;

Mahalanobis distance is widely used in field of face identification, but its recognition effect is not good especially, and research in recent years shows, bilinearity similarity function has obtained good effect at picture analogies degree search field; Therefore, in this method, calculate the similarity score value between two face feature vector in conjunction with bilinearity similarity function and mahalanobis distance, its expression formula is as follows:

f_{(M, G)} ({\tilde{x}}_{i}, {\tilde{x}}_{j}) = S_{G} ({\tilde{x}}_{i}, {\tilde{x}}_{j}) - d_{M} ({\tilde{x}}_{i}, {\tilde{x}}_{j})

S_{G} ({\tilde{x}}_{i}, {\tilde{x}}_{j}) = x_{i}^{T} {Gx}_{j}

d_{M} ({\tilde{x}}_{i}, {\tilde{x}}_{j}) = {(x_{i} - x_{j})}^{T} M (x_{i} - x_{j})

Wherein, represent with bilinearity similarity function between eigenmatrix, represent with mahalanobis distance between eigenmatrix; G and M is the matrix of k × k size, needs to train suitable M and G to ensure the maximum distinguishability ensured while unchangeability in class between class as far as possible; Therefore, the expression formula that intra-personal subspace similarity measure learns is defined as following form:

\underset{M, G &Element; S^{d}}{m i n} \underset{t &Element; P}{Σ} ξ_{t} + \frac{γ}{2} (| | M - I | |_{F}^{2} + | | G - I | |_{F}^{2})

\begin{matrix} s . t . & y_{i j} [f_{(M, G)} ({\tilde{x}}_{i}, {\tilde{x}}_{j})] &GreaterEqual; 1 - ξ_{i j} \end{matrix}

ξ_{t} &GreaterEqual; 0, &ForAll; t = (i, j) &Element; P = S \cup D

Wherein, S with D represents similar face to (i.e. two face picture of same person) and dissimilar face respectively to the label of (i.e. two face picture of different people); || || _fthe F norm of matrix, i.e. the quadratic sum of matrix element absolute value evolution again; Its effect is similar to 2 norms of vector, therefore, and expression formula effectively Expired Drugs is prevented while unchangeability in guarantee class; ξ _tbe the loss function that experience differentiates, minimize this parameter and can strengthen distinguishability between class; As can be seen here, ξ _tensure that the maximum distinguishability between class, and ensure that unchangeability in class, and positive number γ is used for coordinating the impact that this two expression formulas are brought; For the inequality in above-mentioned expression formula, when a pair face picture comes from same person, y _ij=1, and ξ _ijvalue less, so value can be large as far as possible; And when a pair face picture comes from different people, y _ij=-1, and ξ _ijvalue less, so value can be little as far as possible; Therefore, when when being worth larger, then show that a pair face picture comes from same person, otherwise, then show that a pair face comes from different people;

Step 5, whether two pictures come from same person to utilize KELM sorter to judge;

ELM is the neural network only comprising a hidden layer and an output layer; Its most outstanding feature is that its hidden layer parameter does not need to debug, but random setting, there is stronger generalization ability; Suppose that the hidden layer of ELM comprises L node, then its output function is as follows:

f_{L} (x) = Σ_{i = 1}^{L} β_{i} h (ω_{i}, b_{i}, x) = h (x) β = y

Wherein, x ∈ R ^d, y ∈ R ^c, β represents L of the hidden layer weight between node and output layer, and h (x) represents the relation between L node and input x, it is a nonlinear excitation function (as sigmoid function), in fact, its effect d dimension data is mapped in L dimension data space, w _irepresent the connection weight of hidden layer i-th node and input layer, b _irepresent the deviation of hidden layer i-th node;

On the basis of ELM, also been proposed a kind of ELM method based on kernel function, i.e. KELM method; The method carrys out the original excitation function H of hiding ELM by kernel function, thus improve the generalization ability of algorithm better; For a certain sample x _i, it is as follows that its output function expresses formula:

f_{L} (x_{i}) = {[\begin{matrix} K (x_{i}, x_{1}) \\ \begin{matrix} . \\ . \\ . \end{matrix} \\ K (x_{i}, x_{n}) \end{matrix}]}^{T} {(\frac{I}{C} + K)}^{- 1} Y

Y＝[y ₁；...；y _n]∈R ^n×c

Wherein, C is a regression coefficient;

2. the improvement SIFT face feature extraction method based on key point according to claim 1, is characterized in that: this method specifically comprises the following steps,

1) data in face database are divided into 10 groups and carry out cross-validation experiments, wherein, 9 groups of data, as training data, remain 1 group of data as test data, often comprise 300 in group to the face picture and 300 from same person to the face picture from different people; For often opening face picture, three layer depth convolutional network cascades are utilized to locate the coordinate position of five crucial pixels, the pixel in left eye centre position, the pixel in right eye centre position, supratip pixel, the pixel of the left corners of the mouth and the pixel of the right corners of the mouth;

In the ground floor of three layer depth convolutional network cascade structures, comprise three degree of depth convolutional network; The input of first degree of depth convolutional network is whole face, exports the position of five crucial pixels; The input of second degree of depth convolutional network is the first half of face, exports the position of the crucial pixel of eyes and nose position; The input of the 3rd degree of depth convolutional network is the latter half of face, exports the position of the crucial pixel of nose and face position; Finally, the Output rusults of these three degree of depth convolutional network is averaged obtains the final output of ground floor; The second layer of deep layer convolutional network cascade and third layer all get field in front one deck Output rusults around crucial pixel as input, reaffirm the coordinate position of crucial pixel, are supplementing ground floor Output rusults;

2) for step 1) in five crucial pixels, centered by each crucial pixel, get the neighborhood window size of 14 × 14, calculate gradient modulus value and the direction of each pixel in this window; Then the histogram in statistical gradient direction; In order to reduce the impact of sudden change, be that the Gaussian function of 1.5 × 14 is to the smoothing process of histogram with parameter σ; The last direction determining each crucial pixel with the gradient direction distribution characteristic of each crucial neighborhood of pixel points pixel;

After determining the direction of crucial pixel, be the direction of crucial pixel by X-axis rotate, to guarantee rotational invariance; Then centered by crucial pixel, get the neighborhood window size of 16 × 16, on the fritter of every 4 × 4 sizes, then calculate the histogram of gradients in 8 directions, finally, each proper vector forms the descriptor of 4 × 4 × 8=128 dimension; Because the crucial pixel number often opening face picture in the present invention is 5, therefore, the proper vector of often opening face picture is 5 × 128=640 dimension, this not only greatly reduces intrinsic dimensionality, and weaken the impact that expression shape change, illumination variation and other environmental factors etc. cause, enhance the robustness of face characteristic;

3) by step 2) in the proper vector that obtains utilize PCA (principal component analysis (PCA)) method to carry out dimensionality reduction, obtain front 400 main variables, form the eigenface of 400 dimensions; Then, in order to ensure unchangeability in the class between the face picture that same person is different, eigenface be mapped in intra-personal subspace, wherein, the intrinsic dimensionality in intra-personal space gets 300;

4) the M parameter and G parameter that obtain in bilinearity similarity function and mahalanobis distance is trained by 9 groups of training datas; Similarity score value in the M parameter utilizing training process to obtain and G parameter calculation training data and test data between each pair of face picture;

5) whether two pictures come from same person to utilize KELM sorter to judge; In KELM sorter, the present invention selects Radial basis kernel function (RBF) as kernel function, regression coefficient C=1024; Using step 4) in similarity score value between two face picture obtaining as the input of KELM sorter, if the sorter obtained exports 1, then show that two face picture come from same person, if 0, then show that two face picture come from different people.