CN111860309A

CN111860309A - Face recognition method and system

Info

Publication number: CN111860309A
Application number: CN202010696174.1A
Authority: CN
Inventors: 汪秀英
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-07-20
Filing date: 2020-07-20
Publication date: 2020-10-30

Abstract

The invention relates to the technical field of face recognition, and discloses a face recognition method, which comprises the following steps: acquiring a face image to be recognized, converting the face image to be recognized into a gray image by using a proportion method, and performing noise reduction on the gray image by using Gaussian filtering; performing image contrast enhancement on the gray image by using a contrast enhancement algorithm based on linear stretching, and performing binarization processing on the image by using an OTSU algorithm to obtain a binarized image of the face image to be recognized; detecting a face external key point region in a binary image by using a cascaded external key point detection model; detecting key point regions in the human face by using a facial feature detection model, and extracting SIFT feature descriptors of the key point regions by using an improved SIFT feature extraction algorithm; and according to the extracted SIFT feature descriptors, carrying out face recognition by using a pre-trained F-GAN model. The invention further provides a face recognition system. The invention realizes the recognition of the human face.

Description

Face recognition method and system

Technical Field

The invention relates to the technical field of face recognition, in particular to a face recognition method and a face recognition system.

Background

Face recognition is a technique that extracts facial information and uses a classifier for recognition, and can be used as a unique identifier for identifying a person. The face recognition system has the advantages of non-contact, non-invasive, reliability and the like, so that the face recognition system is widely applied to actual life, such as high-speed rail entry, attendance check-in, examinee recognition and the like.

The current image recognition technology is mainly a deep learning image recognition technology, wherein a deep convolution network can adaptively extract local and global image features according to a classification task, and has good recognition performance. However, the image recognition method based on the deep convolutional network requires a large amount of data for training, and needs to discard difficult samples in training samples, which are easy to identify errors of the model, but also contain boundary information, so that the training samples are insufficient, and the image recognition effect is reduced.

Meanwhile, the existing face key point positioning algorithm achieves a high recognition rate in a limited environment, but is still easily influenced by factors such as uneven ambient light, wide test angle range, various detected target postures, fuzzy shielding and the like in a non-limited environment. In the prior art, face images are generally recognized by extracting SIFT descriptors in the face images, but descriptors generated by the traditional SIFT algorithm have high dimensionality, and the calculation process is complex and the calculation amount is large in the generation and matching stages of the descriptors.

In view of this, how to improve the existing feature extraction algorithm, extract effective features in a face image, and improve the detection precision of key points of a face, thereby realizing accurate recognition of the face, is a problem to be solved by those skilled in the art.

Disclosure of Invention

The invention provides a face recognition method, which extracts effective features in a face image by improving the existing feature extraction algorithm and provides a face key point detection algorithm to improve the detection precision of face key points, thereby realizing accurate face recognition.

In order to achieve the above object, the present invention provides a face recognition method, including:

acquiring a face image to be recognized, and converting the face image to be recognized into a gray image by using each proportion method;

carrying out noise reduction processing on the gray-scale image by using Gaussian filtering;

performing image contrast enhancement on the gray image by using a contrast enhancement algorithm based on linear stretching, and performing binarization processing on the image by using an OTSU algorithm to obtain a binarized image of the face image to be recognized;

detecting a face external key point region in a binary image by using a cascaded external key point detection model;

detecting key point regions in the human face by utilizing a five sense organs detection model;

Extracting SIFT feature descriptors of the key point regions by using an improved SIFT feature extraction algorithm;

and according to the extracted SIFT feature descriptors, carrying out face recognition by using a pre-trained F-GAN model.

Optionally, the obtaining the face image to be recognized, and converting the face image to be recognized into a gray-scale image by using each proportion method includes:

converting the face image to be recognized into a gray image by using each proportion method, wherein the calculation formula of each proportion method is as follows:

O_i＝0.30*R_i+0.59*G_i+0.11*B_i

wherein:

R_i，G_i，B_ithree pixel components, respectively, of a current pixel i;

O_ithe pixel is converted for the gray scale of the current pixel i.

Optionally, the process of performing noise reduction on the gray-scale map by using gaussian filtering is as follows:

scanning each pixel in the image by using a circular template, and replacing the value of a central pixel point of the template by using a weighted average gray value of pixels in a neighborhood determined by the template, wherein a calculation formula of the central pixel point of the template is as follows:

wherein:

sigma is the standard deviation of the neighborhood pixel values, and the larger the value is, the more blurred the image is;

n is the dimension of the template, and is set to be 2;

r is the fuzzy radius, which refers to the distance from the template element to the central pixel point of the template.

Optionally, the binarizing the image by using the OTSU algorithm includes:

The formula for carrying out binarization processing on the image is as follows:

g(t)＝w₀*w₁*(u₀-u₁)*(u-u₀)

u＝w₀*u₀+w₁*u₁

wherein:

t is a segmentation threshold of the foreground and the background;

w₀the number of the foreground points accounts for the proportion of the image;

u₀average gray scale of foreground;

w₁the number of background points accounts for the proportion of the image;

u₁average gray of background;

u is the total average gray scale of the image;

when the image variance g (t) is maximum, the difference between the foreground and the background can be considered to be maximum at the moment, the gray level t at the moment is the optimal threshold value, and the image binarization processing is carried out according to the threshold value at the moment to obtain the binarization image of the face image to be recognized.

Optionally, the detecting the face external key point region by using the cascaded external key point detection model includes:

the cascaded external key point detection model comprises a face detection layer and an external key point positioning layer;

the face detection layer comprises four convolution layers, wherein: 1) the 1 st convolutional layer is composed of 64 convolution kernels of 3 × 3, and the span is 2; 2) the 2 nd convolutional layer is composed of 128 3 × 3 convolutional kernels, and has a span of 1; 3) the 3 rd convolutional layer is composed of 256 3 × 3 convolutional kernels, and the span is 1; 4) the 4 th convolutional layer is composed of 600 3 × 3 convolutional kernels, the span is 1, the maximum pooling layer with 2 × 2 span is formed after each convolutional layer;

The following function is adopted as a detection error function, iterative training is carried out on a face detection layer by combining the coordinates of the positions of the five sense organs of the face, and the positions of the face and the five sense organs are detected simultaneously:

wherein:

lambda is used to balance the face detection error err_FaceAnd the five sense organs detector err_PartThe present invention sets it to 1;

i is composed of 12 key points of five sense organs including the left and right eyebrows, the left and right eye corners, the nose and the left and right mouth corners of the mouth of the human face;

(x, y) is a detection coordinate point;

(x ', y') is a true coordinate point;

carrying out size expansion of 1.2 times on the face position positioned in the face detection layer by using the face center, and cutting and reshaping the image into a size of 96 multiplied by 96 pixels as input of an external key point positioning layer;

the external keypoint location layer comprises two layers: 1) layer 1: inputting and remolding obtained face image and face external contour point coordinates

Obtaining the human face external outline frame by sequentially passing through three layers of convolution layers

The three convolutional layers respectively consist of 64 convolution kernels of 5 × 5, 96 convolution kernels of 5 × 5 and 128 convolution kernels of 5 × 5; 2) layer 2: fixing layer 1 network weight, expanding the layer 1 estimated external contour point by 1.2 times, and picking up original face image to obtain new image and 17 key points of external contour

As the input of the layer, the 34-dimensional vector representing the key point region of the external contour is obtained by sequentially outputting the four convolutional layers

The four convolutional layers are composed of 64 3 × 3 convolutional kernels, 128 3 × 3 convolutional kernels, 256 3 × 3 convolutional kernels, and 600 3 × 3 convolutional kernels, respectively.

Optionally, the detecting, by using a five sense organs detection model, an internal key point region of the face image includes:

according to the face position positioned by the face detection layer, the coordinates of the face position positioned in the face detection layer are enlarged by 1.2 times, and the face position coordinates are cut and reshaped into an image with 96 multiplied by 96 pixels;

the network structure in the face detection layer is adopted, and the position information of the five sense organs is combined, so that the positioning of the boundary frame of the internal outline of the face and the positions of the five sense organs is completed at the same time;

by fixing the weight of the network, enlarging the coordinates of the positioned five sense organs by 1.5 times, digging 6 local images of the left and right eyebrows, the left and right eyes, the nose and the mouth, zooming to the size of 48 multiplied by 48 pixels, recording space transformation parameters among the images, and respectively positioning key point areas of the local images by adopting a convolution network which is the same as an external key point positioning layer.

Optionally, the process of extracting the SIFT feature descriptors of the key point regions by using the improved SIFT feature extraction algorithm includes:

1) Converting the images of the key point regions into images in a scale space, and obtaining a Gaussian pyramid of the images at the same time, wherein Gaussian the pyramid comprises several levels, each level comprises several layers, the ratio between two adjacent layers of the same level is k, and the scale factor between adjacent levels is k σ²Wherein the formula for converting the keypoint region image into an image in scale space is:

L(x，y，σ)＝G(x，y，σ)*I(x，y)

wherein:

sigma is a spatial scale factor;

convolution operation between Gaussian kernel function and image;

i (x, y) is a key point area image;

g (x, y, σ) is a Gaussian kernel function;

l (x, y, σ) is an image in scale space;

2) detecting the extreme point of the scale space, namely traversing each point in the image once, further analyzing whether each point has an extreme value, judging whether the extreme point is based on the standard that the point is expanded and compared with 26 points of an adjacent layer, and if the value of the point is greater than that of the adjacent pixel point, considering the pixel point as the extreme point;

3) the edge feature points are improved by utilizing an algorithm combining an SIFT algorithm and Canny edge extraction, and firstly, the gradient size and the gradient direction of an image are respectively calculated for each scale layer by the following formulas:

Wherein:

edge (e) is the gradient size;

dir (theta) is the gradient direction;

I_xand I_yRespectively representing images I (x, y) in the x-directionAnd gradient values in the Y direction;

carrying out maximum suppression on pixel points according to the gradient calculation result, and judging whether the pixel points are boundary points or not by setting upper and lower boundary threshold values, thereby separating the background and the foreground; finally, fusing the extracted feature points and the SIFT features, and combining the feature points repeatedly extracted from the extracted feature points and the SIFT features to obtain more useful SIFT features;

4) the dimension reduction of the SIFT feature descriptors is realized by reducing the number of sub-pixel regions by re-dividing the pixel regions, and the specific dividing steps are as follows:

in the first step, a 4 × 4 square pixel region around a feature point is divided into 4 small square sub-pixel regions, i.e., 2 × 2 sub-pixel regions. In each small square, the information in 8 directions contained in the small square is subjected to gradient accumulation, so that a feature point descriptor with dimensions of 4 × 8 ═ 32 is obtained;

secondly, in order to supplement the 32-dimensional feature point descriptor, 4 × 1 rectangular pixel areas close to the feature points are selected from a square pixel area with the size of 4 × 4, information in 8 directions contained in each rectangular pixel area is accumulated in a gradient manner, and then the 4 × 8-32-dimensional feature point descriptor is obtained;

Thirdly, combining the two obtained feature descriptors to obtain a new feature descriptor with dimensions of 32+ 32-64;

5) in order to ensure the illumination invariance of the generated 64-dimensional new descriptor, the invention carries out normalization processing on the generated SIFT feature descriptor:

wherein:

d is SIFT feature descriptor;

is a normalized SIFT feature descriptor;

d_iis the ith dimension vector of the SIFT feature descriptor.

Optionally, the F-GAN model is:

the F-GAN model is an improvement of the traditional GAN model and consists of a generation network G, a true and false distinguishing network D and a classification distinguishing network C;

the generation network G is used for generating sample data according to the SIFT feature descriptors, the true-false distinguishing network D is used for distinguishing the true or false of the input sample, the G network utilizes the deconvolution layer to realize image generation, and the D network utilizes the convolution layer to extract features;

the F-GAN model constructs a classification network C for distinguishing categories to classify images while constructing a G network and a D network, wherein the C network is a multi-classifier and can perform multi-classification tasks, and the C network and the D network share all convolutional layers. In the training process, the three networks simultaneously carry out the confrontation training, and because the three networks are alternately optimized in an iterative way, the truth of an input sample is judged once in each iterative process, and the category of the input sample is predicted once.

In addition, to achieve the above object, the present invention further provides a face recognition system, including:

the face image acquisition device is used for receiving a face image to be recognized;

the face image processor is used for converting the face image to be recognized into a gray image by using each proportion method and performing noise reduction processing on the gray image by using Gaussian filtering; performing image contrast enhancement on the gray image by using a contrast enhancement algorithm based on linear stretching, and performing binarization processing on the image by using an OTSU algorithm to obtain a binarized image of the face image to be recognized; detecting a human face external key point region in the binary image by using a cascaded external key point detection model, and detecting a human face internal key point region by using a five sense organs detection model;

and the face recognition device is used for extracting the SIFT feature descriptors of the key point regions by using an improved SIFT feature extraction algorithm and recognizing the face by using a pre-trained F-GAN model according to the extracted SIFT feature descriptors.

In addition, to achieve the above object, the present invention further provides a computer-readable storage medium, which stores face recognition instructions, where the face recognition instructions are executable by one or more processors to implement the steps of the implementation method of face recognition as described above.

Compared with the prior art, the invention provides a face recognition method, which has the following advantages:

firstly, the existing face key point positioning algorithm achieves a high recognition rate in a limited environment, but is still easily influenced by factors such as uneven ambient light, wide test angle range, various detected target postures, fuzzy occlusion and the like in a non-limited environment. Therefore, the method adopts the idea of cascading convolutional networks, is different from the conventional 5-point positioning, realizes the positioning of 68 key points including the external 17 key points and the internal 51 key points, and cascades two layers of convolutional networks aiming at the external key points to respectively complete the thickness positioning of the external key points; aiming at the internal key points, the deformation information of the facial features is introduced by combining the facial feature position information detected by the facial detection layer, the facial features are detected while the facial features are detected, and the problem that the facial feature positioning precision is directly influenced by the facial feature position detection result caused by the fact that the facial feature position is detected firstly and then the facial feature position is detected in the traditional algorithm is solved. In addition, in the model training process, the method introduces multi-channel convolution, extracts feature information of different levels, makes full use of low, medium and high resolution pixels in the image, and improves the detection precision of the face key points.

In the prior art, face images are generally recognized by extracting SIFT descriptors from the face images, but descriptors generated by a traditional SIFT algorithm have high dimensionality, and in the stages of descriptor generation and matching, the calculation process is complex and the calculation amount is large; therefore, the invention improves the traditional SIFT algorithm, because the characteristic point descriptor has close relation with the position pixel where the characteristic point is located and the position pixel nearby, the closer the pixel around the characteristic point is to the position of the characteristic point, the greater the influence on the characteristic point descriptor is, in the stage of generating the characteristic point, the pixel area where the characteristic point is located is selected for multiple times, the number of the sub-pixel areas is re-divided, compared with the descriptor of the characteristic point of the traditional SIFT algorithm, the characteristic point pixel area is divided into 4x4 sub-pixel areas by dividing the characteristic point pixel area into 8 sub-pixel areas, each sub-pixel area has 8 pieces of direction information, thereby generating 128-dimensional SIFT characteristic descriptor, the invention re-divides the original 16 sub-pixel areas into 8 sub-pixel areas by changing the dividing mode, and respectively carries out twice gradient information superposition based on the direction information, so as to generate 64-dimensional SIFT characteristic descriptor, the image information around the feature points is not lost, so that compared with the original descriptor, the description of the image information by the new descriptor is not lost, and the dimensionality of the new descriptor is reduced to half of the dimensionality of the original descriptor, so that the operation is simplified, and the complexity of the algorithm is reduced.

Because the D network of the traditional GAN model is a two-classifier and cannot perform multi-classification tasks, the F-GAN model provided by the invention constructs a classification network C for distinguishing categories to classify images while constructing a G network and a D network, the C network is a multi-classifier and can perform multi-classification tasks, and the C network and the D network share all convolutional layers. In the training process, the three networks simultaneously carry out the confrontation training, and because the three networks are alternately optimized in an iterative way, the truth of the input sample is judged once in each iterative process, and the category of the input sample is predicted once, so that the truth judgment and the classification are synchronously carried out. Compared with the traditional GAN model, the input of the generating network G is added with a constraint condition c besides the random noise z, and the generating process of the sample is guided by the constraint condition, so that the sample generated by the F-GAN model is controllable, namely, the specified sample data can be generated according to the condition; in a specific embodiment of the present invention, when the constraint condition c is label data, the F-GAN model generates sample data of a specified class, that is, the label of the generated sample is known; after the countertraining, the sample data generated by the F-GAN model is very close to the real data and has own style, and if the data is used as the supplement of the training set and is used as the input data of the C network and the D network together with the real sample, the data volume of the training set can be enlarged, the C network can learn more data characteristics, and the data enhancement effect is achieved.

Drawings

Fig. 1 is a schematic flow chart of a face recognition method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a face recognition system according to an embodiment of the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The method has the advantages that the existing feature extraction algorithm is improved, effective features in the face image are extracted, and a face key point detection algorithm is provided to improve the detection precision of face key points, so that the accurate recognition of the face is realized. Fig. 1 is a schematic diagram illustrating a face recognition method according to an embodiment of the present invention.

In this embodiment, the face recognition method includes:

and S1, acquiring the face image to be recognized, converting the face image to be recognized into a gray image by using each proportion method, and performing noise reduction processing on the gray image by using Gaussian filtering.

Firstly, the invention obtains a face image to be recognized, and converts the face image to be recognized into a gray image by utilizing each proportion method, wherein the calculation formula of each proportion method is as follows:

O_i＝0.30*R_i+0.59*G_i+0.11*B_i

Wherein:

R_i，G_i，B_ithree pixel components, respectively, of a current pixel i;

O_iconverting the gray scale of the current pixel i into a pixel;

further, for the gray level image of the face image to be recognized, the invention performs noise reduction processing on the gray level image by using gaussian filtering, in a specific embodiment of the invention, each pixel in the image is scanned by using a circular template, the weighted average gray level value of the pixels in the neighborhood determined by the template is used for replacing the value of the central pixel point of the template, and the calculation formula of the central pixel point of the template is as follows:

wherein:

n is the dimension of the template, and is set to be 2;

And S2, performing image contrast enhancement on the gray image by using a contrast enhancement algorithm based on linear stretching, and performing binarization processing on the image by using an OTSU algorithm to obtain a binarized image of the face image to be recognized.

Further, the invention utilizes a contrast enhancement algorithm based on linear stretching to perform image contrast enhancement on the gray scale image, wherein the linear stretching refers to pixel level operation in which the input gray scale value and the output gray scale value are in a linear relation, and a contrast enhancement formula is as follows:

D_b＝f(D_a)＝a*D_a+b

Wherein:

D_ais the gray value of the input image;

D_bis the gray value of the output image;

a is a linear slope, if a is more than 1, the output image contrast is enhanced compared with the original image, if a is less than 1, the output image contrast is weakened compared with the original image, and a is set to be 2;

b is intercept, the invention is set to 0.5;

further, the invention uses OTSU algorithm to carry out binarization processing on the image to obtain a binarization image of the face image to be recognized, wherein the formula for carrying out binarization processing on the image is as follows:

g(t)＝w₀*w₁*(u₀-u₁)*(u-u₀)

u＝w₀*u₀+w₁*u₁

wherein:

t is a segmentation threshold of the foreground and the background;

u₀average gray scale of foreground;

w₁the number of background points accounts for the proportion of the image;

u₁average gray of background;

u is the total average gray scale of the image;

And S3, detecting the human face external key point region in the binary image by using the cascaded external key point detection model, and detecting the human face internal key point region by using the five sense organs detection model.

Further, the invention uses a cascade external key point detection model to detect the external key point area of the human face in the binary image, wherein the cascade external key point detection model comprises a human face detection layer and an external key point positioning layer;

the face detection layer comprises four convolution layers, wherein: 1) the 1 st convolutional layer is composed of 64 convolution kernels of 3 × 3, and the span is 2; 2) the 2 nd convolutional layer is composed of 128 3 × 3 convolutional kernels, and has a span of 1; 3) the 3 rd convolutional layer is composed of 256 3 × 3 convolutional kernels, and the span is 1; 4) the 4 th convolutional layer is composed of 600 3 × 3 convolutional kernels, and has a span of 1. Each convolutional layer is followed by a 2 x 2, maximum pooling layer with a span of 2.

In the process of detecting the human face, the invention adopts the following function as a detection error function, and carries out iterative training on a human face detection layer by combining the position coordinates of the five sense organs of the human face to complete the simultaneous detection of the positions of the human face and the five sense organs:

wherein:

(x, y) is a detection coordinate point;

(x ', y') is a true coordinate point;

further, the invention enlarges the size of the face position positioned in the face detection layer by 1.2 times according to the center of the face, cuts and reshapes the image into 96 multiplied by 96 pixel size, and uses the image as the input of the external key point positioning layer;

As input for the layer, and sequentially passes through four convolutional layersObtaining 34-dimensional vectors representing the key point regions of the outer contour

The four convolutional layers respectively consist of 64 convolution kernels, 128 convolution kernels, 256 convolution kernels and 600 convolution kernels, wherein the convolution kernels are 3 × 3 convolution kernels, 128 convolution kernels are 3 × 3 convolution kernels, and the convolution kernels are 3 × 3 convolution kernels;

furthermore, according to the face position positioned by the face detection layer, the coordinates of the face position positioned in the face detection layer are enlarged by 1.2 times, the face position coordinates are cut and reshaped into an image with 96 multiplied by 96 pixels, meanwhile, a network structure in the face detection layer is adopted, the position of the internal outline boundary frame of the face and the position of the five sense organs is combined, the positioned five sense organs position coordinates are enlarged by 1.5 times by fixing network weight, 6 local images of the left eyebrow, the right eyebrow, the left eye, the right eye, the nose and the mouth are scratched to be 48 multiplied by 48 pixels in size, space transformation parameters between the images are recorded, and then, a convolution network which is the same as an external key point positioning layer is adopted to respectively position key point areas of the local images.

And S4, extracting SIFT feature descriptors of the key point regions by using an improved SIFT feature extraction algorithm.

Further, the invention utilizes an improved SIFT feature extraction algorithm to extract SIFT feature descriptors of the key point regions, and the extraction process of the improved SIFT feature descriptors comprises the following steps:

1) converting the image of the key point region into an image in a scale space, and obtaining a Gaussian pyramid of the image, wherein the Gaussian pyramid comprises a plurality of orders, each order comprises a plurality of layers, the proportionality coefficient between two adjacent layers of the same order is k, and the scale factor between the adjacent orders is k sigma²Wherein the formula for converting the keypoint region image into an image in scale space is:

L(x，y，σ)＝G(x，y，σ)*I(x，y)

wherein:

sigma is a spatial scale factor;

convolution operation between Gaussian kernel function and image;

i (x, y) is a key point area image;

g (x, y, σ) is a Gaussian kernel function;

l (x, y, σ) is an image in scale space;

wherein:

edge (e) is the gradient size;

dir (theta) is the gradient direction;

I_xand I_yRepresenting the gradient values of the image I (x, Y) in the x-direction and the Y-direction, respectively.

wherein:

d is SIFT feature descriptor;

is a normalized SIFT feature descriptor;

d_iis the ith dimension vector of the SIFT feature descriptor.

And S5, performing face recognition by using a pre-trained F-GAN model according to the extracted SIFT feature descriptors.

Further, for the extracted SIFT feature descriptors, the face recognition is carried out by utilizing a pre-trained F-GAN model, wherein the F-GAN model is an improvement of a traditional GAN model and comprises a generation network G, a true and false distinguishing network D and a classification distinguishing network C;

the generation network G is used for generating sample data according to the SIFT feature descriptors, the true-false distinguishing network D is used for distinguishing the true or false of the input sample, the G network utilizes the deconvolution layer to realize image generation, and the D network utilizes the convolution layer to extract features. Because the D network of the traditional GAN model is a two-classifier and cannot perform multi-classification tasks, the F-GAN model constructs a classification network C for distinguishing categories to classify images while constructing a G network and a D network, the C network is a multi-classifier and can perform multi-classification tasks, and the C network and the D network share all convolutional layers. In the training process, the three networks simultaneously carry out the confrontation training, and because the three networks are alternately optimized in an iterative way, the truth of the input sample is judged once in each iterative process, and the category of the input sample is predicted once, so that the truth judgment and the classification are synchronously carried out.

Compared with the traditional GAN model, the input of the generation network G is added with the constraint condition c besides the random noise z, and the generation process of the sample is guided by the constraint condition, so that the sample generated by the F-GAN model is controllable, namely, the specified sample data can be generated according to the condition; in a specific embodiment of the present invention, when the constraint condition c is label data, the F-GAN model generates sample data of a specified class, that is, the label of the generated sample is known;

after the countertraining, the sample data generated by the F-GAN model is very close to the real data and has own style, and if the data is used as the supplement of the training set and is used as the input data of the C network and the D network together with the real sample, the data volume of the training set can be enlarged, the C network can learn more data characteristics, and the data enhancement effect is achieved.

The following describes embodiments of the present invention through an algorithmic experiment and tests of the inventive treatment method. The hardware testing environment of the algorithm is deployed in a tensorflow deep learning framework, a processor is an Intel (R) core (TM) i5-7700 CPU 8 core, a display card is GeForce GTX1040, a display memory 8G, a development environment is python3.5, and a development tool is an Anaconda scientific calculation library; the comparison algorithm model is a Linear classifier (1-layerNN) algorithm, a K-nearest-neighbor, a Euclidean (L2) algorithm and an SVM algorithm.

In the algorithm experiment of the invention, the data set is data in a Yale face database, and the data set comprises 70000 samples, wherein 60000 training samples and 10000 testing samples, and each sample is a face image with the size of 28 x 28 pixels. The method comprises the steps of respectively inputting face image training samples into a Linear classifier (1-layerNN) algorithm, a K-nearest-neighbor algorithm, an Euclidean (L2) algorithm, an SVM algorithm and the face recognition method for training, recognizing test samples by using a model obtained through training, comparing the recognition result of the algorithm with the original labels of the test samples, and obtaining the recognition accuracy of the test samples through statistics, namely the recognition accuracy of the face images of the algorithm.

According to the experimental result, the face image recognition accuracy of the Linear classifier (1-layerNN) algorithm is 91.85%, the face image recognition accuracy of the K-nearest-neighbor, Euclidean (L2) algorithm is 97.00%, the face image recognition accuracy of the SVM algorithm is 92.32%, the face image recognition accuracy of the algorithm is 99.31%, and compared with the comparison algorithm, the face recognition method provided by the invention has higher face image recognition accuracy.

The invention also provides a face recognition system. Fig. 2 is a schematic diagram of an internal structure of a face recognition system according to an embodiment of the present invention.

In this embodiment, the face recognition system 1 at least includes a face image acquisition device 11, a face image processor 12, a face recognition device 13, a communication bus 14, and a network interface 15.

The face image acquiring device 11 may be a Personal Computer (PC), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server.

The face image processor 12 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The face image processor 12 may in some embodiments be an internal storage unit of the face recognition system 1, for example a hard disk of the face recognition system 1. The face image processor 12 may also be an external storage device of the face recognition system 1 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the face recognition system 1. Further, the face image processor 12 may also include both an internal storage unit and an external storage device of the face recognition system 1. The face image processor 12 may be used not only to store application software installed in the face recognition system 1 and various types of data, but also to temporarily store data that has been output or is to be output.

The face recognition device 13 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data processing chip in some embodiments, and is used for running program codes stored in the face image processor 12 or processing data, such as face recognition program instructions.

The communication bus 14 is used to enable connection communication between these components.

The network interface 15 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication link between the system 1 and other electronic devices.

Optionally, the system 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the face recognition system 1 and for displaying a visual user interface.

Fig. 2 only shows the face recognition system 1 with the components 11-15, and it will be understood by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the face recognition system 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.

In the embodiment of the apparatus 1 shown in fig. 2, the face image processor 12 stores therein face recognition program instructions; the steps of the face recognition device 13 executing the face recognition program instructions stored in the face image processor 12 are the same as the implementation method of the face recognition method, and are not described here.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium has stored thereon face recognition program instructions, where the face recognition program instructions are executable by one or more processors to implement the following operations:

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A face recognition method, comprising:

2. The method for recognizing the human face according to claim 1, wherein the obtaining of the human face image to be recognized and the converting of the human face image to be recognized into the gray-scale image by using the scaling method comprises:

O_i＝0.30*R_i+0.59*G_i+0.11*B_i

wherein:

R_i，G_i，B_ithree pixel components, respectively, of a current pixel i;

O_ithe pixel is converted for the gray scale of the current pixel i.

3. The face recognition method of claim 2, wherein the process of denoising the gray-scale image by gaussian filtering comprises:

wherein:

n is the dimension of the template, and is set to be 2;

4. A face recognition method as claimed in claim 3, wherein the binarizing process for the image by using OTSU algorithm comprises:

g(t)＝w₀*w₁*(u₀-u₁)*(u-u₀)

u＝w₀*u₀+w₁*u₁

wherein:

t is a segmentation threshold of the foreground and the background;

u₀average gray scale of foreground;

w₁the number of background points accounts for the proportion of the image;

u₁average gray of background;

u is the total average gray scale of the image;

and when the image variance g (t) is maximum, the difference between the foreground and the background is maximum at the moment, the gray level t is the optimal threshold value at the moment, and the image binarization processing is carried out according to the threshold value at the moment to obtain the binarization image of the face image to be recognized.

5. The face recognition method of claim 4, wherein the detecting the face external key point region by using the cascaded external key point detection models comprises:

wherein:

(x, y) is a detection coordinate point;

(x ', y') is a true coordinate point;

carrying out size expansion of 1.2 times on the face position positioned in the face detection layer by using the face center, and cutting and remolding an image with the size of 96 multiplied by 96 to be used as the input of an external key point positioning layer;

6. The method of claim 5, wherein the detecting the key point region in the face image by using the five sense organs detection model comprises:

7. The face recognition method of claim 6, wherein the process of extracting the SIFT feature descriptors of the key point regions by using the improved SIFT feature extraction algorithm comprises the following steps:

1) converting the image of the key point region into an image in a scale space, and obtaining a Gaussian pyramid of the image, wherein the Gaussian pyramid comprises a plurality of orders, each order comprises a plurality of layers, the proportionality coefficient between two adjacent layers of the same order is k, and the scale factor between the adjacent orders is k sigma²Wherein the keypoint region images are converted into scale spacesThe formula for the image in between is:

L(x，y，σ)＝G(x，y，σ)*I(x，y)

wherein:

sigma is a spatial scale factor;

convolution operation between Gaussian kernel function and image;

i (x, y) is a key point area image;

g (x, y, σ) is a Gaussian kernel function;

l (x, y, σ) is an image in scale space;

2) detecting extreme points in the scale space, namely traversing each point in the image once, further analyzing whether each point has an extreme value, judging whether the extreme point is based on the standard that the point is expanded and compared with 26 pixel points of an adjacent layer, and if the value of the point is greater than that of the pixel points of the adjacent layer, considering the pixel point as the extreme point;

3) the edge feature points are improved by utilizing an algorithm combining an SIFT algorithm and Canny edge extraction, and the gradient size and the gradient direction of an image are calculated for each scale layer by the following formulas:

Wherein:

edge (e) is the gradient size;

dir (theta) is the gradient direction;

I_xand I_yRepresenting the gradient values of the image I (x, Y) in the x direction and the Y direction respectively;

firstly, dividing a 4 × 4 square pixel area around a feature point into 2 × 2-4 small square sub-pixel areas, and performing gradient accumulation on information in 8 directions contained in each small square to obtain a feature point descriptor with 4 × 8-32 dimensions;

wherein:

d is SIFT feature descriptor;

is a normalized SIFT feature descriptor;

d_iis the ith dimension vector of the SIFT feature descriptor.

8. The face recognition method of claim 7, wherein the F-GAN model is:

the F-GAN model constructs a classification network C for distinguishing categories to classify images while constructing a G network and a D network, wherein the C network is a multi-classifier, and the C network and the D network share all convolutional layers; in the training process, the three networks simultaneously carry out confrontation training, and because the three networks are alternately optimized in an iterative way, each iterative process can carry out one-time judgment on the truth of an input sample and simultaneously carry out one-time prediction on the type of the input sample.

9. A face recognition system, the system comprising:

10. A computer-readable storage medium having stored thereon face recognition program instructions executable by one or more processors to implement the steps of a method of implementing face recognition as claimed in any one of claims 1 to 8.