CN111860309A - Face recognition method and system - Google Patents

Face recognition method and system Download PDF

Info

Publication number
CN111860309A
CN111860309A CN202010696174.1A CN202010696174A CN111860309A CN 111860309 A CN111860309 A CN 111860309A CN 202010696174 A CN202010696174 A CN 202010696174A CN 111860309 A CN111860309 A CN 111860309A
Authority
CN
China
Prior art keywords
image
face
layer
key point
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010696174.1A
Other languages
Chinese (zh)
Inventor
汪秀英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202010696174.1A priority Critical patent/CN111860309A/en
Publication of CN111860309A publication Critical patent/CN111860309A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Geometry (AREA)

Abstract

The invention relates to the technical field of face recognition, and discloses a face recognition method, which comprises the following steps: acquiring a face image to be recognized, converting the face image to be recognized into a gray image by using a proportion method, and performing noise reduction on the gray image by using Gaussian filtering; performing image contrast enhancement on the gray image by using a contrast enhancement algorithm based on linear stretching, and performing binarization processing on the image by using an OTSU algorithm to obtain a binarized image of the face image to be recognized; detecting a face external key point region in a binary image by using a cascaded external key point detection model; detecting key point regions in the human face by using a facial feature detection model, and extracting SIFT feature descriptors of the key point regions by using an improved SIFT feature extraction algorithm; and according to the extracted SIFT feature descriptors, carrying out face recognition by using a pre-trained F-GAN model. The invention further provides a face recognition system. The invention realizes the recognition of the human face.

Description

Face recognition method and system
Technical Field
The invention relates to the technical field of face recognition, in particular to a face recognition method and a face recognition system.
Background
Face recognition is a technique that extracts facial information and uses a classifier for recognition, and can be used as a unique identifier for identifying a person. The face recognition system has the advantages of non-contact, non-invasive, reliability and the like, so that the face recognition system is widely applied to actual life, such as high-speed rail entry, attendance check-in, examinee recognition and the like.
The current image recognition technology is mainly a deep learning image recognition technology, wherein a deep convolution network can adaptively extract local and global image features according to a classification task, and has good recognition performance. However, the image recognition method based on the deep convolutional network requires a large amount of data for training, and needs to discard difficult samples in training samples, which are easy to identify errors of the model, but also contain boundary information, so that the training samples are insufficient, and the image recognition effect is reduced.
Meanwhile, the existing face key point positioning algorithm achieves a high recognition rate in a limited environment, but is still easily influenced by factors such as uneven ambient light, wide test angle range, various detected target postures, fuzzy shielding and the like in a non-limited environment. In the prior art, face images are generally recognized by extracting SIFT descriptors in the face images, but descriptors generated by the traditional SIFT algorithm have high dimensionality, and the calculation process is complex and the calculation amount is large in the generation and matching stages of the descriptors.
In view of this, how to improve the existing feature extraction algorithm, extract effective features in a face image, and improve the detection precision of key points of a face, thereby realizing accurate recognition of the face, is a problem to be solved by those skilled in the art.
Disclosure of Invention
The invention provides a face recognition method, which extracts effective features in a face image by improving the existing feature extraction algorithm and provides a face key point detection algorithm to improve the detection precision of face key points, thereby realizing accurate face recognition.
In order to achieve the above object, the present invention provides a face recognition method, including:
acquiring a face image to be recognized, and converting the face image to be recognized into a gray image by using each proportion method;
carrying out noise reduction processing on the gray-scale image by using Gaussian filtering;
performing image contrast enhancement on the gray image by using a contrast enhancement algorithm based on linear stretching, and performing binarization processing on the image by using an OTSU algorithm to obtain a binarized image of the face image to be recognized;
detecting a face external key point region in a binary image by using a cascaded external key point detection model;
detecting key point regions in the human face by utilizing a five sense organs detection model;
Extracting SIFT feature descriptors of the key point regions by using an improved SIFT feature extraction algorithm;
and according to the extracted SIFT feature descriptors, carrying out face recognition by using a pre-trained F-GAN model.
Optionally, the obtaining the face image to be recognized, and converting the face image to be recognized into a gray-scale image by using each proportion method includes:
converting the face image to be recognized into a gray image by using each proportion method, wherein the calculation formula of each proportion method is as follows:
Oi=0.30*Ri+0.59*Gi+0.11*Bi
wherein:
Ri,Gi,Bithree pixel components, respectively, of a current pixel i;
Oithe pixel is converted for the gray scale of the current pixel i.
Optionally, the process of performing noise reduction on the gray-scale map by using gaussian filtering is as follows:
scanning each pixel in the image by using a circular template, and replacing the value of a central pixel point of the template by using a weighted average gray value of pixels in a neighborhood determined by the template, wherein a calculation formula of the central pixel point of the template is as follows:
Figure BDA0002591069590000021
wherein:
sigma is the standard deviation of the neighborhood pixel values, and the larger the value is, the more blurred the image is;
n is the dimension of the template, and is set to be 2;
r is the fuzzy radius, which refers to the distance from the template element to the central pixel point of the template.
Optionally, the binarizing the image by using the OTSU algorithm includes:
The formula for carrying out binarization processing on the image is as follows:
g(t)=w0*w1*(u0-u1)*(u-u0)
u=w0*u0+w1*u1
wherein:
t is a segmentation threshold of the foreground and the background;
w0the number of the foreground points accounts for the proportion of the image;
u0average gray scale of foreground;
w1the number of background points accounts for the proportion of the image;
u1average gray of background;
u is the total average gray scale of the image;
when the image variance g (t) is maximum, the difference between the foreground and the background can be considered to be maximum at the moment, the gray level t at the moment is the optimal threshold value, and the image binarization processing is carried out according to the threshold value at the moment to obtain the binarization image of the face image to be recognized.
Optionally, the detecting the face external key point region by using the cascaded external key point detection model includes:
the cascaded external key point detection model comprises a face detection layer and an external key point positioning layer;
the face detection layer comprises four convolution layers, wherein: 1) the 1 st convolutional layer is composed of 64 convolution kernels of 3 × 3, and the span is 2; 2) the 2 nd convolutional layer is composed of 128 3 × 3 convolutional kernels, and has a span of 1; 3) the 3 rd convolutional layer is composed of 256 3 × 3 convolutional kernels, and the span is 1; 4) the 4 th convolutional layer is composed of 600 3 × 3 convolutional kernels, the span is 1, the maximum pooling layer with 2 × 2 span is formed after each convolutional layer;
The following function is adopted as a detection error function, iterative training is carried out on a face detection layer by combining the coordinates of the positions of the five sense organs of the face, and the positions of the face and the five sense organs are detected simultaneously:
Figure BDA0002591069590000031
Figure BDA0002591069590000032
wherein:
lambda is used to balance the face detection error errFaceAnd the five sense organs detector errPartThe present invention sets it to 1;
i is composed of 12 key points of five sense organs including the left and right eyebrows, the left and right eye corners, the nose and the left and right mouth corners of the mouth of the human face;
(x, y) is a detection coordinate point;
(x ', y') is a true coordinate point;
carrying out size expansion of 1.2 times on the face position positioned in the face detection layer by using the face center, and cutting and reshaping the image into a size of 96 multiplied by 96 pixels as input of an external key point positioning layer;
the external keypoint location layer comprises two layers: 1) layer 1: inputting and remolding obtained face image and face external contour point coordinates
Figure BDA0002591069590000033
Obtaining the human face external outline frame by sequentially passing through three layers of convolution layers
Figure BDA0002591069590000034
The three convolutional layers respectively consist of 64 convolution kernels of 5 × 5, 96 convolution kernels of 5 × 5 and 128 convolution kernels of 5 × 5; 2) layer 2: fixing layer 1 network weight, expanding the layer 1 estimated external contour point by 1.2 times, and picking up original face image to obtain new image and 17 key points of external contour
Figure BDA0002591069590000041
As the input of the layer, the 34-dimensional vector representing the key point region of the external contour is obtained by sequentially outputting the four convolutional layers
Figure BDA0002591069590000042
The four convolutional layers are composed of 64 3 × 3 convolutional kernels, 128 3 × 3 convolutional kernels, 256 3 × 3 convolutional kernels, and 600 3 × 3 convolutional kernels, respectively.
Optionally, the detecting, by using a five sense organs detection model, an internal key point region of the face image includes:
according to the face position positioned by the face detection layer, the coordinates of the face position positioned in the face detection layer are enlarged by 1.2 times, and the face position coordinates are cut and reshaped into an image with 96 multiplied by 96 pixels;
the network structure in the face detection layer is adopted, and the position information of the five sense organs is combined, so that the positioning of the boundary frame of the internal outline of the face and the positions of the five sense organs is completed at the same time;
by fixing the weight of the network, enlarging the coordinates of the positioned five sense organs by 1.5 times, digging 6 local images of the left and right eyebrows, the left and right eyes, the nose and the mouth, zooming to the size of 48 multiplied by 48 pixels, recording space transformation parameters among the images, and respectively positioning key point areas of the local images by adopting a convolution network which is the same as an external key point positioning layer.
Optionally, the process of extracting the SIFT feature descriptors of the key point regions by using the improved SIFT feature extraction algorithm includes:
1) Converting the images of the key point regions into images in a scale space, and obtaining a Gaussian pyramid of the images at the same time, wherein Gaussian the pyramid comprises several levels, each level comprises several layers, the ratio between two adjacent layers of the same level is k, and the scale factor between adjacent levels is k σ2Wherein the formula for converting the keypoint region image into an image in scale space is:
L(x,y,σ)=G(x,y,σ)*I(x,y)
Figure BDA0002591069590000043
wherein:
sigma is a spatial scale factor;
convolution operation between Gaussian kernel function and image;
i (x, y) is a key point area image;
g (x, y, σ) is a Gaussian kernel function;
l (x, y, σ) is an image in scale space;
2) detecting the extreme point of the scale space, namely traversing each point in the image once, further analyzing whether each point has an extreme value, judging whether the extreme point is based on the standard that the point is expanded and compared with 26 points of an adjacent layer, and if the value of the point is greater than that of the adjacent pixel point, considering the pixel point as the extreme point;
3) the edge feature points are improved by utilizing an algorithm combining an SIFT algorithm and Canny edge extraction, and firstly, the gradient size and the gradient direction of an image are respectively calculated for each scale layer by the following formulas:
Figure BDA0002591069590000051
Figure BDA0002591069590000052
Wherein:
edge (e) is the gradient size;
dir (theta) is the gradient direction;
Ixand IyRespectively representing images I (x, y) in the x-directionAnd gradient values in the Y direction;
carrying out maximum suppression on pixel points according to the gradient calculation result, and judging whether the pixel points are boundary points or not by setting upper and lower boundary threshold values, thereby separating the background and the foreground; finally, fusing the extracted feature points and the SIFT features, and combining the feature points repeatedly extracted from the extracted feature points and the SIFT features to obtain more useful SIFT features;
4) the dimension reduction of the SIFT feature descriptors is realized by reducing the number of sub-pixel regions by re-dividing the pixel regions, and the specific dividing steps are as follows:
in the first step, a 4 × 4 square pixel region around a feature point is divided into 4 small square sub-pixel regions, i.e., 2 × 2 sub-pixel regions. In each small square, the information in 8 directions contained in the small square is subjected to gradient accumulation, so that a feature point descriptor with dimensions of 4 × 8 ═ 32 is obtained;
secondly, in order to supplement the 32-dimensional feature point descriptor, 4 × 1 rectangular pixel areas close to the feature points are selected from a square pixel area with the size of 4 × 4, information in 8 directions contained in each rectangular pixel area is accumulated in a gradient manner, and then the 4 × 8-32-dimensional feature point descriptor is obtained;
Thirdly, combining the two obtained feature descriptors to obtain a new feature descriptor with dimensions of 32+ 32-64;
5) in order to ensure the illumination invariance of the generated 64-dimensional new descriptor, the invention carries out normalization processing on the generated SIFT feature descriptor:
Figure BDA0002591069590000053
wherein:
d is SIFT feature descriptor;
Figure BDA0002591069590000054
is a normalized SIFT feature descriptor;
diis the ith dimension vector of the SIFT feature descriptor.
Optionally, the F-GAN model is:
the F-GAN model is an improvement of the traditional GAN model and consists of a generation network G, a true and false distinguishing network D and a classification distinguishing network C;
the generation network G is used for generating sample data according to the SIFT feature descriptors, the true-false distinguishing network D is used for distinguishing the true or false of the input sample, the G network utilizes the deconvolution layer to realize image generation, and the D network utilizes the convolution layer to extract features;
the F-GAN model constructs a classification network C for distinguishing categories to classify images while constructing a G network and a D network, wherein the C network is a multi-classifier and can perform multi-classification tasks, and the C network and the D network share all convolutional layers. In the training process, the three networks simultaneously carry out the confrontation training, and because the three networks are alternately optimized in an iterative way, the truth of an input sample is judged once in each iterative process, and the category of the input sample is predicted once.
In addition, to achieve the above object, the present invention further provides a face recognition system, including:
the face image acquisition device is used for receiving a face image to be recognized;
the face image processor is used for converting the face image to be recognized into a gray image by using each proportion method and performing noise reduction processing on the gray image by using Gaussian filtering; performing image contrast enhancement on the gray image by using a contrast enhancement algorithm based on linear stretching, and performing binarization processing on the image by using an OTSU algorithm to obtain a binarized image of the face image to be recognized; detecting a human face external key point region in the binary image by using a cascaded external key point detection model, and detecting a human face internal key point region by using a five sense organs detection model;
and the face recognition device is used for extracting the SIFT feature descriptors of the key point regions by using an improved SIFT feature extraction algorithm and recognizing the face by using a pre-trained F-GAN model according to the extracted SIFT feature descriptors.
In addition, to achieve the above object, the present invention further provides a computer-readable storage medium, which stores face recognition instructions, where the face recognition instructions are executable by one or more processors to implement the steps of the implementation method of face recognition as described above.
Compared with the prior art, the invention provides a face recognition method, which has the following advantages:
firstly, the existing face key point positioning algorithm achieves a high recognition rate in a limited environment, but is still easily influenced by factors such as uneven ambient light, wide test angle range, various detected target postures, fuzzy occlusion and the like in a non-limited environment. Therefore, the method adopts the idea of cascading convolutional networks, is different from the conventional 5-point positioning, realizes the positioning of 68 key points including the external 17 key points and the internal 51 key points, and cascades two layers of convolutional networks aiming at the external key points to respectively complete the thickness positioning of the external key points; aiming at the internal key points, the deformation information of the facial features is introduced by combining the facial feature position information detected by the facial detection layer, the facial features are detected while the facial features are detected, and the problem that the facial feature positioning precision is directly influenced by the facial feature position detection result caused by the fact that the facial feature position is detected firstly and then the facial feature position is detected in the traditional algorithm is solved. In addition, in the model training process, the method introduces multi-channel convolution, extracts feature information of different levels, makes full use of low, medium and high resolution pixels in the image, and improves the detection precision of the face key points.
In the prior art, face images are generally recognized by extracting SIFT descriptors from the face images, but descriptors generated by a traditional SIFT algorithm have high dimensionality, and in the stages of descriptor generation and matching, the calculation process is complex and the calculation amount is large; therefore, the invention improves the traditional SIFT algorithm, because the characteristic point descriptor has close relation with the position pixel where the characteristic point is located and the position pixel nearby, the closer the pixel around the characteristic point is to the position of the characteristic point, the greater the influence on the characteristic point descriptor is, in the stage of generating the characteristic point, the pixel area where the characteristic point is located is selected for multiple times, the number of the sub-pixel areas is re-divided, compared with the descriptor of the characteristic point of the traditional SIFT algorithm, the characteristic point pixel area is divided into 4x4 sub-pixel areas by dividing the characteristic point pixel area into 8 sub-pixel areas, each sub-pixel area has 8 pieces of direction information, thereby generating 128-dimensional SIFT characteristic descriptor, the invention re-divides the original 16 sub-pixel areas into 8 sub-pixel areas by changing the dividing mode, and respectively carries out twice gradient information superposition based on the direction information, so as to generate 64-dimensional SIFT characteristic descriptor, the image information around the feature points is not lost, so that compared with the original descriptor, the description of the image information by the new descriptor is not lost, and the dimensionality of the new descriptor is reduced to half of the dimensionality of the original descriptor, so that the operation is simplified, and the complexity of the algorithm is reduced.
Because the D network of the traditional GAN model is a two-classifier and cannot perform multi-classification tasks, the F-GAN model provided by the invention constructs a classification network C for distinguishing categories to classify images while constructing a G network and a D network, the C network is a multi-classifier and can perform multi-classification tasks, and the C network and the D network share all convolutional layers. In the training process, the three networks simultaneously carry out the confrontation training, and because the three networks are alternately optimized in an iterative way, the truth of the input sample is judged once in each iterative process, and the category of the input sample is predicted once, so that the truth judgment and the classification are synchronously carried out. Compared with the traditional GAN model, the input of the generating network G is added with a constraint condition c besides the random noise z, and the generating process of the sample is guided by the constraint condition, so that the sample generated by the F-GAN model is controllable, namely, the specified sample data can be generated according to the condition; in a specific embodiment of the present invention, when the constraint condition c is label data, the F-GAN model generates sample data of a specified class, that is, the label of the generated sample is known; after the countertraining, the sample data generated by the F-GAN model is very close to the real data and has own style, and if the data is used as the supplement of the training set and is used as the input data of the C network and the D network together with the real sample, the data volume of the training set can be enlarged, the C network can learn more data characteristics, and the data enhancement effect is achieved.
Drawings
Fig. 1 is a schematic flow chart of a face recognition method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a face recognition system according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The method has the advantages that the existing feature extraction algorithm is improved, effective features in the face image are extracted, and a face key point detection algorithm is provided to improve the detection precision of face key points, so that the accurate recognition of the face is realized. Fig. 1 is a schematic diagram illustrating a face recognition method according to an embodiment of the present invention.
In this embodiment, the face recognition method includes:
and S1, acquiring the face image to be recognized, converting the face image to be recognized into a gray image by using each proportion method, and performing noise reduction processing on the gray image by using Gaussian filtering.
Firstly, the invention obtains a face image to be recognized, and converts the face image to be recognized into a gray image by utilizing each proportion method, wherein the calculation formula of each proportion method is as follows:
Oi=0.30*Ri+0.59*Gi+0.11*Bi
Wherein:
Ri,Gi,Bithree pixel components, respectively, of a current pixel i;
Oiconverting the gray scale of the current pixel i into a pixel;
further, for the gray level image of the face image to be recognized, the invention performs noise reduction processing on the gray level image by using gaussian filtering, in a specific embodiment of the invention, each pixel in the image is scanned by using a circular template, the weighted average gray level value of the pixels in the neighborhood determined by the template is used for replacing the value of the central pixel point of the template, and the calculation formula of the central pixel point of the template is as follows:
Figure BDA0002591069590000081
wherein:
sigma is the standard deviation of the neighborhood pixel values, and the larger the value is, the more blurred the image is;
n is the dimension of the template, and is set to be 2;
r is the fuzzy radius, which refers to the distance from the template element to the central pixel point of the template.
And S2, performing image contrast enhancement on the gray image by using a contrast enhancement algorithm based on linear stretching, and performing binarization processing on the image by using an OTSU algorithm to obtain a binarized image of the face image to be recognized.
Further, the invention utilizes a contrast enhancement algorithm based on linear stretching to perform image contrast enhancement on the gray scale image, wherein the linear stretching refers to pixel level operation in which the input gray scale value and the output gray scale value are in a linear relation, and a contrast enhancement formula is as follows:
Db=f(Da)=a*Da+b
Wherein:
Dais the gray value of the input image;
Dbis the gray value of the output image;
a is a linear slope, if a is more than 1, the output image contrast is enhanced compared with the original image, if a is less than 1, the output image contrast is weakened compared with the original image, and a is set to be 2;
b is intercept, the invention is set to 0.5;
further, the invention uses OTSU algorithm to carry out binarization processing on the image to obtain a binarization image of the face image to be recognized, wherein the formula for carrying out binarization processing on the image is as follows:
g(t)=w0*w1*(u0-u1)*(u-u0)
u=w0*u0+w1*u1
wherein:
t is a segmentation threshold of the foreground and the background;
w0the number of the foreground points accounts for the proportion of the image;
u0average gray scale of foreground;
w1the number of background points accounts for the proportion of the image;
u1average gray of background;
u is the total average gray scale of the image;
when the image variance g (t) is maximum, the difference between the foreground and the background can be considered to be maximum at the moment, the gray level t at the moment is the optimal threshold value, and the image binarization processing is carried out according to the threshold value at the moment to obtain the binarization image of the face image to be recognized.
And S3, detecting the human face external key point region in the binary image by using the cascaded external key point detection model, and detecting the human face internal key point region by using the five sense organs detection model.
Further, the invention uses a cascade external key point detection model to detect the external key point area of the human face in the binary image, wherein the cascade external key point detection model comprises a human face detection layer and an external key point positioning layer;
the face detection layer comprises four convolution layers, wherein: 1) the 1 st convolutional layer is composed of 64 convolution kernels of 3 × 3, and the span is 2; 2) the 2 nd convolutional layer is composed of 128 3 × 3 convolutional kernels, and has a span of 1; 3) the 3 rd convolutional layer is composed of 256 3 × 3 convolutional kernels, and the span is 1; 4) the 4 th convolutional layer is composed of 600 3 × 3 convolutional kernels, and has a span of 1. Each convolutional layer is followed by a 2 x 2, maximum pooling layer with a span of 2.
In the process of detecting the human face, the invention adopts the following function as a detection error function, and carries out iterative training on a human face detection layer by combining the position coordinates of the five sense organs of the human face to complete the simultaneous detection of the positions of the human face and the five sense organs:
Figure BDA0002591069590000101
Figure BDA0002591069590000102
wherein:
lambda is used to balance the face detection error errFaceAnd the five sense organs detector errPartThe present invention sets it to 1;
i is composed of 12 key points of five sense organs including the left and right eyebrows, the left and right eye corners, the nose and the left and right mouth corners of the mouth of the human face;
(x, y) is a detection coordinate point;
(x ', y') is a true coordinate point;
further, the invention enlarges the size of the face position positioned in the face detection layer by 1.2 times according to the center of the face, cuts and reshapes the image into 96 multiplied by 96 pixel size, and uses the image as the input of the external key point positioning layer;
the external keypoint location layer comprises two layers: 1) layer 1: inputting and remolding obtained face image and face external contour point coordinates
Figure BDA0002591069590000103
Obtaining the human face external outline frame by sequentially passing through three layers of convolution layers
Figure BDA0002591069590000104
The three convolutional layers respectively consist of 64 convolution kernels of 5 × 5, 96 convolution kernels of 5 × 5 and 128 convolution kernels of 5 × 5; 2) layer 2: fixing layer 1 network weight, expanding the layer 1 estimated external contour point by 1.2 times, and picking up original face image to obtain new image and 17 key points of external contour
Figure BDA0002591069590000105
As input for the layer, and sequentially passes through four convolutional layersObtaining 34-dimensional vectors representing the key point regions of the outer contour
Figure BDA0002591069590000106
The four convolutional layers respectively consist of 64 convolution kernels, 128 convolution kernels, 256 convolution kernels and 600 convolution kernels, wherein the convolution kernels are 3 × 3 convolution kernels, 128 convolution kernels are 3 × 3 convolution kernels, and the convolution kernels are 3 × 3 convolution kernels;
furthermore, according to the face position positioned by the face detection layer, the coordinates of the face position positioned in the face detection layer are enlarged by 1.2 times, the face position coordinates are cut and reshaped into an image with 96 multiplied by 96 pixels, meanwhile, a network structure in the face detection layer is adopted, the position of the internal outline boundary frame of the face and the position of the five sense organs is combined, the positioned five sense organs position coordinates are enlarged by 1.5 times by fixing network weight, 6 local images of the left eyebrow, the right eyebrow, the left eye, the right eye, the nose and the mouth are scratched to be 48 multiplied by 48 pixels in size, space transformation parameters between the images are recorded, and then, a convolution network which is the same as an external key point positioning layer is adopted to respectively position key point areas of the local images.
And S4, extracting SIFT feature descriptors of the key point regions by using an improved SIFT feature extraction algorithm.
Further, the invention utilizes an improved SIFT feature extraction algorithm to extract SIFT feature descriptors of the key point regions, and the extraction process of the improved SIFT feature descriptors comprises the following steps:
1) converting the image of the key point region into an image in a scale space, and obtaining a Gaussian pyramid of the image, wherein the Gaussian pyramid comprises a plurality of orders, each order comprises a plurality of layers, the proportionality coefficient between two adjacent layers of the same order is k, and the scale factor between the adjacent orders is k sigma2Wherein the formula for converting the keypoint region image into an image in scale space is:
L(x,y,σ)=G(x,y,σ)*I(x,y)
Figure BDA0002591069590000111
wherein:
sigma is a spatial scale factor;
convolution operation between Gaussian kernel function and image;
i (x, y) is a key point area image;
g (x, y, σ) is a Gaussian kernel function;
l (x, y, σ) is an image in scale space;
2) detecting the extreme point of the scale space, namely traversing each point in the image once, further analyzing whether each point has an extreme value, judging whether the extreme point is based on the standard that the point is expanded and compared with 26 points of an adjacent layer, and if the value of the point is greater than that of the adjacent pixel point, considering the pixel point as the extreme point;
3) The edge feature points are improved by utilizing an algorithm combining an SIFT algorithm and Canny edge extraction, and firstly, the gradient size and the gradient direction of an image are respectively calculated for each scale layer by the following formulas:
Figure BDA0002591069590000112
Figure BDA0002591069590000113
wherein:
edge (e) is the gradient size;
dir (theta) is the gradient direction;
Ixand IyRepresenting the gradient values of the image I (x, Y) in the x-direction and the Y-direction, respectively.
Carrying out maximum suppression on pixel points according to the gradient calculation result, and judging whether the pixel points are boundary points or not by setting upper and lower boundary threshold values, thereby separating the background and the foreground; finally, fusing the extracted feature points and the SIFT features, and combining the feature points repeatedly extracted from the extracted feature points and the SIFT features to obtain more useful SIFT features;
4) the dimension reduction of the SIFT feature descriptors is realized by reducing the number of sub-pixel regions by re-dividing the pixel regions, and the specific dividing steps are as follows:
in the first step, a 4 × 4 square pixel region around a feature point is divided into 4 small square sub-pixel regions, i.e., 2 × 2 sub-pixel regions. In each small square, the information in 8 directions contained in the small square is subjected to gradient accumulation, so that a feature point descriptor with dimensions of 4 × 8 ═ 32 is obtained;
secondly, in order to supplement the 32-dimensional feature point descriptor, 4 × 1 rectangular pixel areas close to the feature points are selected from a square pixel area with the size of 4 × 4, information in 8 directions contained in each rectangular pixel area is accumulated in a gradient manner, and then the 4 × 8-32-dimensional feature point descriptor is obtained;
Thirdly, combining the two obtained feature descriptors to obtain a new feature descriptor with dimensions of 32+ 32-64;
5) in order to ensure the illumination invariance of the generated 64-dimensional new descriptor, the invention carries out normalization processing on the generated SIFT feature descriptor:
Figure BDA0002591069590000121
wherein:
d is SIFT feature descriptor;
Figure BDA0002591069590000122
is a normalized SIFT feature descriptor;
diis the ith dimension vector of the SIFT feature descriptor.
And S5, performing face recognition by using a pre-trained F-GAN model according to the extracted SIFT feature descriptors.
Further, for the extracted SIFT feature descriptors, the face recognition is carried out by utilizing a pre-trained F-GAN model, wherein the F-GAN model is an improvement of a traditional GAN model and comprises a generation network G, a true and false distinguishing network D and a classification distinguishing network C;
the generation network G is used for generating sample data according to the SIFT feature descriptors, the true-false distinguishing network D is used for distinguishing the true or false of the input sample, the G network utilizes the deconvolution layer to realize image generation, and the D network utilizes the convolution layer to extract features. Because the D network of the traditional GAN model is a two-classifier and cannot perform multi-classification tasks, the F-GAN model constructs a classification network C for distinguishing categories to classify images while constructing a G network and a D network, the C network is a multi-classifier and can perform multi-classification tasks, and the C network and the D network share all convolutional layers. In the training process, the three networks simultaneously carry out the confrontation training, and because the three networks are alternately optimized in an iterative way, the truth of the input sample is judged once in each iterative process, and the category of the input sample is predicted once, so that the truth judgment and the classification are synchronously carried out.
Compared with the traditional GAN model, the input of the generation network G is added with the constraint condition c besides the random noise z, and the generation process of the sample is guided by the constraint condition, so that the sample generated by the F-GAN model is controllable, namely, the specified sample data can be generated according to the condition; in a specific embodiment of the present invention, when the constraint condition c is label data, the F-GAN model generates sample data of a specified class, that is, the label of the generated sample is known;
after the countertraining, the sample data generated by the F-GAN model is very close to the real data and has own style, and if the data is used as the supplement of the training set and is used as the input data of the C network and the D network together with the real sample, the data volume of the training set can be enlarged, the C network can learn more data characteristics, and the data enhancement effect is achieved.
The following describes embodiments of the present invention through an algorithmic experiment and tests of the inventive treatment method. The hardware testing environment of the algorithm is deployed in a tensorflow deep learning framework, a processor is an Intel (R) core (TM) i5-7700 CPU 8 core, a display card is GeForce GTX1040, a display memory 8G, a development environment is python3.5, and a development tool is an Anaconda scientific calculation library; the comparison algorithm model is a Linear classifier (1-layerNN) algorithm, a K-nearest-neighbor, a Euclidean (L2) algorithm and an SVM algorithm.
In the algorithm experiment of the invention, the data set is data in a Yale face database, and the data set comprises 70000 samples, wherein 60000 training samples and 10000 testing samples, and each sample is a face image with the size of 28 x 28 pixels. The method comprises the steps of respectively inputting face image training samples into a Linear classifier (1-layerNN) algorithm, a K-nearest-neighbor algorithm, an Euclidean (L2) algorithm, an SVM algorithm and the face recognition method for training, recognizing test samples by using a model obtained through training, comparing the recognition result of the algorithm with the original labels of the test samples, and obtaining the recognition accuracy of the test samples through statistics, namely the recognition accuracy of the face images of the algorithm.
According to the experimental result, the face image recognition accuracy of the Linear classifier (1-layerNN) algorithm is 91.85%, the face image recognition accuracy of the K-nearest-neighbor, Euclidean (L2) algorithm is 97.00%, the face image recognition accuracy of the SVM algorithm is 92.32%, the face image recognition accuracy of the algorithm is 99.31%, and compared with the comparison algorithm, the face recognition method provided by the invention has higher face image recognition accuracy.
The invention also provides a face recognition system. Fig. 2 is a schematic diagram of an internal structure of a face recognition system according to an embodiment of the present invention.
In this embodiment, the face recognition system 1 at least includes a face image acquisition device 11, a face image processor 12, a face recognition device 13, a communication bus 14, and a network interface 15.
The face image acquiring device 11 may be a Personal Computer (PC), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server.
The face image processor 12 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The face image processor 12 may in some embodiments be an internal storage unit of the face recognition system 1, for example a hard disk of the face recognition system 1. The face image processor 12 may also be an external storage device of the face recognition system 1 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the face recognition system 1. Further, the face image processor 12 may also include both an internal storage unit and an external storage device of the face recognition system 1. The face image processor 12 may be used not only to store application software installed in the face recognition system 1 and various types of data, but also to temporarily store data that has been output or is to be output.
The face recognition device 13 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data processing chip in some embodiments, and is used for running program codes stored in the face image processor 12 or processing data, such as face recognition program instructions.
The communication bus 14 is used to enable connection communication between these components.
The network interface 15 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication link between the system 1 and other electronic devices.
Optionally, the system 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the face recognition system 1 and for displaying a visual user interface.
Fig. 2 only shows the face recognition system 1 with the components 11-15, and it will be understood by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the face recognition system 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the face image processor 12 stores therein face recognition program instructions; the steps of the face recognition device 13 executing the face recognition program instructions stored in the face image processor 12 are the same as the implementation method of the face recognition method, and are not described here.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium has stored thereon face recognition program instructions, where the face recognition program instructions are executable by one or more processors to implement the following operations:
acquiring a face image to be recognized, and converting the face image to be recognized into a gray image by using each proportion method;
carrying out noise reduction processing on the gray-scale image by using Gaussian filtering;
performing image contrast enhancement on the gray image by using a contrast enhancement algorithm based on linear stretching, and performing binarization processing on the image by using an OTSU algorithm to obtain a binarized image of the face image to be recognized;
Detecting a face external key point region in a binary image by using a cascaded external key point detection model;
detecting key point regions in the human face by utilizing a five sense organs detection model;
extracting SIFT feature descriptors of the key point regions by using an improved SIFT feature extraction algorithm;
and according to the extracted SIFT feature descriptors, carrying out face recognition by using a pre-trained F-GAN model.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A face recognition method, comprising:
acquiring a face image to be recognized, and converting the face image to be recognized into a gray image by using each proportion method;
carrying out noise reduction processing on the gray-scale image by using Gaussian filtering;
performing image contrast enhancement on the gray image by using a contrast enhancement algorithm based on linear stretching, and performing binarization processing on the image by using an OTSU algorithm to obtain a binarized image of the face image to be recognized;
detecting a face external key point region in a binary image by using a cascaded external key point detection model;
detecting key point regions in the human face by utilizing a five sense organs detection model;
extracting SIFT feature descriptors of the key point regions by using an improved SIFT feature extraction algorithm;
and according to the extracted SIFT feature descriptors, carrying out face recognition by using a pre-trained F-GAN model.
2. The method for recognizing the human face according to claim 1, wherein the obtaining of the human face image to be recognized and the converting of the human face image to be recognized into the gray-scale image by using the scaling method comprises:
converting the face image to be recognized into a gray image by using each proportion method, wherein the calculation formula of each proportion method is as follows:
Oi=0.30*Ri+0.59*Gi+0.11*Bi
wherein:
Ri,Gi,Bithree pixel components, respectively, of a current pixel i;
Oithe pixel is converted for the gray scale of the current pixel i.
3. The face recognition method of claim 2, wherein the process of denoising the gray-scale image by gaussian filtering comprises:
scanning each pixel in the image by using a circular template, and replacing the value of a central pixel point of the template by using a weighted average gray value of pixels in a neighborhood determined by the template, wherein a calculation formula of the central pixel point of the template is as follows:
Figure FDA0002591069580000011
wherein:
sigma is the standard deviation of the neighborhood pixel values, and the larger the value is, the more blurred the image is;
n is the dimension of the template, and is set to be 2;
r is the fuzzy radius, which refers to the distance from the template element to the central pixel point of the template.
4. A face recognition method as claimed in claim 3, wherein the binarizing process for the image by using OTSU algorithm comprises:
The formula for carrying out binarization processing on the image is as follows:
g(t)=w0*w1*(u0-u1)*(u-u0)
u=w0*u0+w1*u1
wherein:
t is a segmentation threshold of the foreground and the background;
w0the number of the foreground points accounts for the proportion of the image;
u0average gray scale of foreground;
w1the number of background points accounts for the proportion of the image;
u1average gray of background;
u is the total average gray scale of the image;
and when the image variance g (t) is maximum, the difference between the foreground and the background is maximum at the moment, the gray level t is the optimal threshold value at the moment, and the image binarization processing is carried out according to the threshold value at the moment to obtain the binarization image of the face image to be recognized.
5. The face recognition method of claim 4, wherein the detecting the face external key point region by using the cascaded external key point detection models comprises:
the cascaded external key point detection model comprises a face detection layer and an external key point positioning layer;
the face detection layer comprises four convolution layers, wherein: 1) the 1 st convolutional layer is composed of 64 convolution kernels of 3 × 3, and the span is 2; 2) the 2 nd convolutional layer is composed of 128 3 × 3 convolutional kernels, and has a span of 1; 3) the 3 rd convolutional layer is composed of 256 3 × 3 convolutional kernels, and the span is 1; 4) the 4 th convolutional layer is composed of 600 3 × 3 convolutional kernels, the span is 1, the maximum pooling layer with 2 × 2 span is formed after each convolutional layer;
The following function is adopted as a detection error function, iterative training is carried out on a face detection layer by combining the coordinates of the positions of the five sense organs of the face, and the positions of the face and the five sense organs are detected simultaneously:
Figure FDA0002591069580000021
Figure FDA0002591069580000022
wherein:
lambda is used to balance the face detection error errFaceAnd the five sense organs detector errPartThe present invention sets it to 1;
i is composed of 12 key points of five sense organs including the left and right eyebrows, the left and right eye corners, the nose and the left and right mouth corners of the mouth of the human face;
(x, y) is a detection coordinate point;
(x ', y') is a true coordinate point;
carrying out size expansion of 1.2 times on the face position positioned in the face detection layer by using the face center, and cutting and remolding an image with the size of 96 multiplied by 96 to be used as the input of an external key point positioning layer;
the external keypoint location layer comprises two layers: 1) layer 1: inputting and remolding obtained face image and face external contour point coordinates
Figure FDA0002591069580000031
Obtaining the human face external outline frame by sequentially passing through three layers of convolution layers
Figure FDA0002591069580000032
The three convolutional layers respectively consist of 64 convolution kernels of 5 × 5, 96 convolution kernels of 5 × 5 and 128 convolution kernels of 5 × 5; 2) layer 2: fixing layer 1 network weight, expanding the layer 1 estimated external contour point by 1.2 times, and picking up original face image to obtain new image and 17 key points of external contour
Figure FDA0002591069580000033
As the input of the layer, the 34-dimensional vector representing the key point region of the external contour is obtained by sequentially outputting the four convolutional layers
Figure FDA0002591069580000034
The four convolutional layers are composed of 64 3 × 3 convolutional kernels, 128 3 × 3 convolutional kernels, 256 3 × 3 convolutional kernels, and 600 3 × 3 convolutional kernels, respectively.
6. The method of claim 5, wherein the detecting the key point region in the face image by using the five sense organs detection model comprises:
according to the face position positioned by the face detection layer, the coordinates of the face position positioned in the face detection layer are enlarged by 1.2 times, and the face position coordinates are cut and reshaped into an image with 96 multiplied by 96 pixels;
the network structure in the face detection layer is adopted, and the position information of the five sense organs is combined, so that the positioning of the boundary frame of the internal outline of the face and the positions of the five sense organs is completed at the same time;
by fixing the weight of the network, enlarging the coordinates of the positioned five sense organs by 1.5 times, digging 6 local images of the left and right eyebrows, the left and right eyes, the nose and the mouth, zooming to the size of 48 multiplied by 48 pixels, recording space transformation parameters among the images, and respectively positioning key point areas of the local images by adopting a convolution network which is the same as an external key point positioning layer.
7. The face recognition method of claim 6, wherein the process of extracting the SIFT feature descriptors of the key point regions by using the improved SIFT feature extraction algorithm comprises the following steps:
1) converting the image of the key point region into an image in a scale space, and obtaining a Gaussian pyramid of the image, wherein the Gaussian pyramid comprises a plurality of orders, each order comprises a plurality of layers, the proportionality coefficient between two adjacent layers of the same order is k, and the scale factor between the adjacent orders is k sigma2Wherein the keypoint region images are converted into scale spacesThe formula for the image in between is:
L(x,y,σ)=G(x,y,σ)*I(x,y)
Figure FDA0002591069580000035
wherein:
sigma is a spatial scale factor;
convolution operation between Gaussian kernel function and image;
i (x, y) is a key point area image;
g (x, y, σ) is a Gaussian kernel function;
l (x, y, σ) is an image in scale space;
2) detecting extreme points in the scale space, namely traversing each point in the image once, further analyzing whether each point has an extreme value, judging whether the extreme point is based on the standard that the point is expanded and compared with 26 pixel points of an adjacent layer, and if the value of the point is greater than that of the pixel points of the adjacent layer, considering the pixel point as the extreme point;
3) the edge feature points are improved by utilizing an algorithm combining an SIFT algorithm and Canny edge extraction, and the gradient size and the gradient direction of an image are calculated for each scale layer by the following formulas:
Figure FDA0002591069580000041
Figure FDA0002591069580000042
Wherein:
edge (e) is the gradient size;
dir (theta) is the gradient direction;
Ixand IyRepresenting the gradient values of the image I (x, Y) in the x direction and the Y direction respectively;
carrying out maximum suppression on pixel points according to the gradient calculation result, and judging whether the pixel points are boundary points or not by setting upper and lower boundary threshold values, thereby separating the background and the foreground; finally, fusing the extracted feature points and the SIFT features, and combining the feature points repeatedly extracted from the extracted feature points and the SIFT features to obtain more useful SIFT features;
4) the dimension reduction of the SIFT feature descriptors is realized by reducing the number of sub-pixel regions by re-dividing the pixel regions, and the specific dividing steps are as follows:
firstly, dividing a 4 × 4 square pixel area around a feature point into 2 × 2-4 small square sub-pixel areas, and performing gradient accumulation on information in 8 directions contained in each small square to obtain a feature point descriptor with 4 × 8-32 dimensions;
secondly, in order to supplement the 32-dimensional feature point descriptor, 4 × 1 rectangular pixel areas close to the feature points are selected from a square pixel area with the size of 4 × 4, information in 8 directions contained in each rectangular pixel area is accumulated in a gradient manner, and then the 4 × 8-32-dimensional feature point descriptor is obtained;
Thirdly, combining the two obtained feature descriptors to obtain a new feature descriptor with dimensions of 32+ 32-64;
5) in order to ensure the illumination invariance of the generated 64-dimensional new descriptor, the invention carries out normalization processing on the generated SIFT feature descriptor:
Figure FDA0002591069580000043
wherein:
d is SIFT feature descriptor;
Figure FDA0002591069580000044
is a normalized SIFT feature descriptor;
diis the ith dimension vector of the SIFT feature descriptor.
8. The face recognition method of claim 7, wherein the F-GAN model is:
the F-GAN model is an improvement of the traditional GAN model and consists of a generation network G, a true and false distinguishing network D and a classification distinguishing network C;
the generation network G is used for generating sample data according to the SIFT feature descriptors, the true-false distinguishing network D is used for distinguishing the true or false of the input sample, the G network utilizes the deconvolution layer to realize image generation, and the D network utilizes the convolution layer to extract features;
the F-GAN model constructs a classification network C for distinguishing categories to classify images while constructing a G network and a D network, wherein the C network is a multi-classifier, and the C network and the D network share all convolutional layers; in the training process, the three networks simultaneously carry out confrontation training, and because the three networks are alternately optimized in an iterative way, each iterative process can carry out one-time judgment on the truth of an input sample and simultaneously carry out one-time prediction on the type of the input sample.
9. A face recognition system, the system comprising:
the face image acquisition device is used for receiving a face image to be recognized;
the face image processor is used for converting the face image to be recognized into a gray image by using each proportion method and performing noise reduction processing on the gray image by using Gaussian filtering; performing image contrast enhancement on the gray image by using a contrast enhancement algorithm based on linear stretching, and performing binarization processing on the image by using an OTSU algorithm to obtain a binarized image of the face image to be recognized; detecting a human face external key point region in the binary image by using a cascaded external key point detection model, and detecting a human face internal key point region by using a five sense organs detection model;
and the face recognition device is used for extracting the SIFT feature descriptors of the key point regions by using an improved SIFT feature extraction algorithm and recognizing the face by using a pre-trained F-GAN model according to the extracted SIFT feature descriptors.
10. A computer-readable storage medium having stored thereon face recognition program instructions executable by one or more processors to implement the steps of a method of implementing face recognition as claimed in any one of claims 1 to 8.
CN202010696174.1A 2020-07-20 2020-07-20 Face recognition method and system Withdrawn CN111860309A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010696174.1A CN111860309A (en) 2020-07-20 2020-07-20 Face recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010696174.1A CN111860309A (en) 2020-07-20 2020-07-20 Face recognition method and system

Publications (1)

Publication Number Publication Date
CN111860309A true CN111860309A (en) 2020-10-30

Family

ID=73002346

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010696174.1A Withdrawn CN111860309A (en) 2020-07-20 2020-07-20 Face recognition method and system

Country Status (1)

Country Link
CN (1) CN111860309A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613459A (en) * 2020-12-30 2021-04-06 深圳艾摩米智能科技有限公司 Method for detecting face sensitive area
CN112637564A (en) * 2020-12-18 2021-04-09 中标慧安信息技术股份有限公司 Indoor security method and system based on multi-picture monitoring
CN112801020A (en) * 2021-02-09 2021-05-14 福州大学 Pedestrian re-identification method and system based on background graying
CN113011356A (en) * 2021-03-26 2021-06-22 杭州朗和科技有限公司 Face feature detection method, device, medium and electronic equipment
CN113269155A (en) * 2021-06-28 2021-08-17 苏州市科远软件技术开发有限公司 End-to-end face recognition method, device, equipment and storage medium
CN113762205A (en) * 2021-09-17 2021-12-07 深圳市爱协生科技有限公司 Human face image operation trace detection method, computer equipment and readable storage medium
CN114638968A (en) * 2022-01-10 2022-06-17 中国人民解放军国防科技大学 Method and device for extracting geometric structure and key points of space target
CN118196872A (en) * 2024-04-08 2024-06-14 陕西丝路众合智能科技有限公司 Face accurate recognition method under hybrid scene

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112637564A (en) * 2020-12-18 2021-04-09 中标慧安信息技术股份有限公司 Indoor security method and system based on multi-picture monitoring
CN112613459A (en) * 2020-12-30 2021-04-06 深圳艾摩米智能科技有限公司 Method for detecting face sensitive area
CN112801020A (en) * 2021-02-09 2021-05-14 福州大学 Pedestrian re-identification method and system based on background graying
CN112801020B (en) * 2021-02-09 2022-10-14 福州大学 Pedestrian re-identification method and system based on background graying
CN113011356A (en) * 2021-03-26 2021-06-22 杭州朗和科技有限公司 Face feature detection method, device, medium and electronic equipment
CN113269155A (en) * 2021-06-28 2021-08-17 苏州市科远软件技术开发有限公司 End-to-end face recognition method, device, equipment and storage medium
CN113269155B (en) * 2021-06-28 2024-07-16 苏州市科远软件技术开发有限公司 End-to-end face recognition method, device, equipment and storage medium
CN113762205A (en) * 2021-09-17 2021-12-07 深圳市爱协生科技有限公司 Human face image operation trace detection method, computer equipment and readable storage medium
CN114638968A (en) * 2022-01-10 2022-06-17 中国人民解放军国防科技大学 Method and device for extracting geometric structure and key points of space target
CN114638968B (en) * 2022-01-10 2024-01-30 中国人民解放军国防科技大学 Method and device for extracting geometric structure and key points of space target
CN118196872A (en) * 2024-04-08 2024-06-14 陕西丝路众合智能科技有限公司 Face accurate recognition method under hybrid scene

Similar Documents

Publication Publication Date Title
CN111860309A (en) Face recognition method and system
US8873856B1 (en) Determining a class associated with an image
Ye et al. Text detection and recognition in imagery: A survey
Pan et al. A robust system to detect and localize texts in natural scene images
US8452108B2 (en) Systems and methods for image recognition using graph-based pattern matching
CN110232713B (en) Image target positioning correction method and related equipment
CN103390164B (en) Method for checking object based on depth image and its realize device
US9020248B2 (en) Window dependent feature regions and strict spatial layout for object detection
CN101763507B (en) Face recognition method and face recognition system
US20160026899A1 (en) Text line detection in images
Yang et al. A framework for improved video text detection and recognition
US9489566B2 (en) Image recognition apparatus and image recognition method for identifying object
WO2022105521A1 (en) Character recognition method and apparatus for curved text image, and computer device
EP2434431A1 (en) Method and device for classifying image
US9042601B2 (en) Selective max-pooling for object detection
JP2008310796A (en) Computer implemented method for constructing classifier from training data detecting moving object in test data using classifier
US9025882B2 (en) Information processing apparatus and method of processing information, storage medium and program
CN109685065B (en) Layout analysis method and system for automatically classifying test paper contents
CN110674685B (en) Human body analysis segmentation model and method based on edge information enhancement
CN104036284A (en) Adaboost algorithm based multi-scale pedestrian detection method
US9020198B2 (en) Dimension-wise spatial layout importance selection: an alternative way to handle object deformation
CN110717497A (en) Image similarity matching method and device and computer readable storage medium
CN111951283A (en) Medical image identification method and system based on deep learning
Kobchaisawat et al. Thai text localization in natural scene images using convolutional neural network
Lahiani et al. Hand pose estimation system based on Viola-Jones algorithm for android devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201030

WW01 Invention patent application withdrawn after publication