CN111783748A

CN111783748A - Face recognition method and device, electronic equipment and storage medium

Info

Publication number: CN111783748A
Application number: CN202010808939.6A
Authority: CN
Inventors: 王朋远; 彭菲; 黄磊; 张建
Original assignee: Hanwang Technology Co Ltd
Current assignee: Hanwang Technology Co Ltd
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2020-10-16
Anticipated expiration: 2040-08-12
Also published as: CN111783748B

Abstract

The application discloses a face recognition method, belongs to the technical field of face recognition, and is beneficial to improving the accuracy and reliability of face recognition. The method comprises the following steps: acquiring two-dimensional face images and three-dimensional face images which are synchronously acquired by different image acquisition devices aiming at a target face; determining pixel positions and two-dimensional image content information of all preset face key points in the two-dimensional face image, and determining three-dimensional space coordinates of all preset face key points in the three-dimensional face image; constructing corresponding nodes according to the preset face key points, and constructing an undirected graph by expressing the adjacency relation of the nodes corresponding to the adjacent preset face key points in the preset face key points through undirected edges; the method comprises the steps of extracting face features through a pre-trained graph convolution neural network based on a node feature matrix, a preset adjacency matrix and an identity matrix of a preset adjacency matrix of an undirected graph, and carrying out face recognition on a target face according to the extracted face features.

Description

Face recognition method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of face recognition technologies, and in particular, to a face recognition method, an apparatus, an electronic device, and a computer-readable storage medium.

Background

The face recognition is widely applied to the fields of attendance checking, entrance guard, security protection and the like. In order to improve the recognition degree of the face recognition on the living body face and improve the reliability of the face recognition, in the prior art, the face depth image collected by the depth image collecting device is combined with the face plane image collected by the visible light image collecting device to comprehensively perform the face recognition. For example, the face key points in the face depth image are determined by combining the face key point information in the face plane image, then, the face living body detection is performed according to the depth information of the face key points in the face depth image, and when the acquired face plane image is determined to be a living body face image, the face comparison and identification are further performed according to the face features extracted from the face plane image. Because the three-dimensional face features in the face depth image are difficult to extract, the scheme of carrying out face recognition by using the three-dimensional face features is rarely adopted in the prior art.

Disclosure of Invention

The application provides a face recognition method which is beneficial to improving the accuracy and reliability of face recognition.

In order to solve the above problem, in a first aspect, an embodiment of the present application provides a face recognition method, including:

acquiring two-dimensional face images and three-dimensional face images which are synchronously acquired by different image acquisition devices aiming at a target face;

determining pixel positions and two-dimensional image content information of all preset face key points in the two-dimensional face image, and determining three-dimensional space coordinates of all preset face key points in the three-dimensional face image;

constructing corresponding nodes according to the preset face key points, and constructing an undirected graph by expressing the adjacency relation of the nodes corresponding to the adjacent preset face key points in the preset face key points through undirected edges; the undirected graph comprises: a preset adjacency matrix representing the adjacency relation between the nodes and a node characteristic matrix formed by key point information stored by each node in the undirected graph;

extracting graph features based on the node feature matrix, the preset adjacent matrix and the unit matrix of the preset adjacent matrix through a pre-trained graph convolution neural network;

and performing face recognition on the target face by using the features obtained by extracting the image features by the image convolution neural network as face features.

In a second aspect, an embodiment of the present application provides a face recognition apparatus, including:

the face image acquisition module is used for acquiring a two-dimensional face image and a three-dimensional face image which are synchronously acquired by different image acquisition devices aiming at a target face;

the face key point information acquisition module is used for determining the pixel position and the two-dimensional image content information of each preset face key point in the two-dimensional face image and determining the three-dimensional space coordinate of each preset face key point in the three-dimensional face image;

the face key point undirected graph construction module is used for constructing corresponding nodes according to the preset face key points, expressing the adjacency relation of the nodes corresponding to the adjacent preset face key points in the preset face key points through undirected edges and constructing an undirected graph; the undirected graph comprises: a preset adjacency matrix representing the adjacency relation between the nodes and a node characteristic matrix formed by key point information stored by each node in the undirected graph;

the graph feature extraction module is used for extracting the graph features based on the node feature matrix, the preset adjacent matrix and the unit matrix of the preset adjacent matrix through a pre-trained graph convolution neural network;

and the face recognition module is used for performing face recognition on the target face by taking the feature obtained by extracting the image feature by the image convolution neural network as the face feature.

In a third aspect, an embodiment of the present application further discloses an electronic device, which includes a memory, a processor, and a computer program that is stored in the memory and can be run on the processor, and when the processor executes the computer program, the face recognition method according to the embodiment of the present application is implemented.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor, and the computer program performs the steps of the face recognition method disclosed in the embodiments of the present application.

The embodiment of the application discloses a face recognition method, which comprises the steps of obtaining two-dimensional face images and three-dimensional face images which are synchronously collected by different image collection devices aiming at a target face; determining pixel positions and two-dimensional image content information of all preset face key points in the two-dimensional face image, and determining three-dimensional space coordinates of all preset face key points in the three-dimensional face image; constructing corresponding nodes according to the preset face key points, and constructing an undirected graph by expressing the adjacency relation of the nodes corresponding to the adjacent preset face key points in the preset face key points through undirected edges; the undirected graph comprises: a preset adjacency matrix representing the adjacency relation between the nodes and a node characteristic matrix formed by key point information stored by each node in the undirected graph; extracting graph features based on the node feature matrix, the preset adjacent matrix and the unit matrix of the preset adjacent matrix through a pre-trained graph convolution neural network; and performing face recognition on the target face by using the features obtained by extracting the image features by the image convolution neural network as the face features, which is favorable for improving the accuracy and reliability of face recognition.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of a face recognition method according to a first embodiment of the present application;

fig. 2 is a schematic diagram illustrating distribution of key points of a face in the face recognition method according to the first embodiment of the present application;

fig. 3 is a schematic diagram of an adjacency relation of key points of a face in the face recognition method according to the first embodiment of the present application;

fig. 4 is a schematic diagram illustrating a principle of establishing a three-dimensional coordinate system in the face recognition method according to the first embodiment of the present application;

fig. 5 is a flowchart of a face recognition method according to a second embodiment of the present application;

fig. 6 is a schematic structural diagram of a face recognition apparatus according to a second embodiment of the present application;

fig. 7 is a second schematic structural diagram of a face recognition apparatus according to a second embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Example one

As shown in fig. 1, the face recognition method disclosed in the embodiment of the present application includes steps 110 to 150.

And step 110, acquiring two-dimensional face images and three-dimensional face images which are synchronously acquired by different image acquisition devices aiming at the target face.

In some embodiments of the present application, the different image capturing devices may be a binocular camera or a structured light camera on an electronic device, or may be two independent cameras. One of the different image acquisition devices is an image acquisition device for acquiring a three-dimensional face image (such as a face depth image), and the other one can be an image acquisition device for acquiring a two-dimensional face image (such as a visible light face image or an infrared light face image). The different image acquisition devices synchronously and respectively acquire a two-dimensional face image and a three-dimensional face image of the target face aiming at the target face. For example, the different image capturing devices may be structured light image capturing devices, by which a visible light image (i.e., a two-dimensional face image) and a depth image (i.e., a three-dimensional face image) of a face to be recognized are captured simultaneously.

Step 120, determining the pixel position and the two-dimensional image content information of each preset face key point in the two-dimensional face image, and determining the three-dimensional space coordinate of each preset face key point in the three-dimensional face image.

In some embodiments of the present application, the determining a pixel position and two-dimensional image content information of each preset face key point in the two-dimensional face image, and determining a three-dimensional space coordinate of each preset face key point in the three-dimensional face image includes: carrying out image calibration on the two-dimensional face image and the three-dimensional face image; detecting preset face key points corresponding to the target face according to the two-dimensional face image obtained after image calibration, and determining the pixel position and the two-dimensional image content information of each preset face key point; and determining the three-dimensional space coordinates of each preset human face key point in the three-dimensional human face image obtained after image calibration according to the pixel positions of the preset human face key points.

Because different image acquisition devices have physical position difference relative to an acquired object, image contents in a two-dimensional face image and a three-dimensional face image of a target face acquired by different image acquisition devices have certain visual difference, and in specific implementation, firstly, image calibration is carried out on an acquired original two-dimensional face image and an acquired three-dimensional face image based on relevant parameters of a camera of the image acquisition devices according to an imaging principle and installation positions of the different image acquisition devices. In specific implementation, an image calibration method in the prior art can be adopted to perform image calibration on the two-dimensional face image and the three-dimensional face image, and details are not repeated in the embodiment of the application.

In the embodiment of the application, the image calibration method of the two-dimensional face image and the three-dimensional face image is not limited.

After the calibrated two-dimensional face image is acquired, face key points in the two-dimensional face image are detected by adopting a method in the prior art. In some embodiments of the present application, the two-dimensional face image may be a visible light face image or an infrared light face image, and the two-dimensional face image collected under different spectral conditions will determine the face key points in the two-dimensional face image by using a corresponding face key point detection method in the prior art.

Taking the two-dimensional face image as a visible light face image as an example, a Dlib library (an open source library for machine learning) can be adopted to perform face detection and face key point alignment on the two-dimensional face image, so as to determine the pixel position of the face key point in the two-dimensional face image. Different face detection algorithms can acquire different numbers of face key points, and in a specific application process, preset face key points can be selected for subsequent face recognition according to specific face recognition requirements. In some embodiments of the present application, 104 of the detected face key points as shown in fig. 2 may be selected as preset face key points for subsequent face recognition.

In some embodiments of the application, when face key points are detected on a two-dimensional face image, pixel positions of preset face key points in the two-dimensional face image can be acquired, and meanwhile, content information of the two-dimensional image of each preset face key point can also be acquired. For example, for a visible light face image, a color value (such as a value of an RGB color space) and a transparency of each preset face key point may be obtained. For another example, for an infrared light face image, an infrared brightness value of each preset face key point may be acquired.

Because the pixel positions of the two-dimensional face image and the three-dimensional face image which are subjected to the image calibration processing are in one-to-one correspondence, the pixel position of each face key point in the two-dimensional face image obtained by detection is also the pixel position of the corresponding face key point in the three-dimensional face image obtained by the image calibration processing. That is, the pixel position of each face key point in the two-dimensional face image can be used as the pixel position of the corresponding face key point in the three-dimensional face image.

Furthermore, according to the pixel position of the corresponding face key point in the three-dimensional face image, a three-dimensional space coordinate can be obtained. For example, three-dimensional space coordinates of face key points are determined by reconstructing a face point cloud, each face key point in the point cloud represents the position of each pixel position in a two-dimensional face image in a real scene, and the position is in an established right-hand coordinate system by taking the mirror surface center of the image acquisition device as an origin. As shown in fig. 3, in constructing a coordinate system based on the imaging principle shown in fig. 3, the three-dimensional space coordinates of the corresponding face key points can be obtained by converting the pixel positions of the preset face key points in the three-dimensional image according to the following formula 1:

wherein (col, row) represents the pixel position of a certain point in the three-dimensional face image, depth represents the pixel value at the pixel position of (col, row) in the three-dimensional face image, opt and FoculLength are the parameter matrix of the image acquisition device of the three-dimensional face image, and (x, y, z) are the three-dimensional space coordinates of the key points of the face at the pixel position of (col, row) in the three-dimensional face image.

In some embodiments of the present application, other methods may also be used to perform conversion from pixel positions of face key points in a three-dimensional face image to three-dimensional space coordinates, which are not listed in this embodiment. The specific method adopted for converting the pixel position of the face key point in the three-dimensional face image into the three-dimensional space coordinate is not limited.

Step 130, constructing corresponding nodes according to the preset face key points, and constructing an undirected graph by representing the adjacency relation of the nodes corresponding to the adjacent preset face key points in the preset face key points through undirected edges.

Wherein the undirected graph comprises: a preset adjacency matrix representing the adjacency relation between the nodes and a node characteristic matrix formed by key point information stored by each node in the undirected graph. And the elements in the preset adjacency matrix represent whether an adjacency relation exists between any two nodes, namely whether the face key points corresponding to the two nodes are adjacent. The preset adjacency matrix is used for storing data of the relationship (such as whether edges are connected) of each node in the undirected graph (for example, the value of an edge element between nodes corresponding to two adjacent face key points is 1, and the value of an edge element between nodes corresponding to two non-adjacent face key points is 0).

In some embodiments of the application, in order to extract the face features in the two-dimensional face image and the three-dimensional face image, the data features and the relationships of the face key points in the two-dimensional face image and the three-dimensional face image are expressed by constructing an undirected graph. For example, a node data is respectively constructed according to the data of each preset face key point, each preset face key point corresponds to a node in an undirected graph one by one, and then the nodes corresponding to the face key points with an adjacent relation (such as adjacent) in the preset face key points are obtained through the undirected edge detection, so that the undirected graph consisting of the nodes and the edges can be obtained. For example, for face keypoints 0 and 1, two nodes d0 and d1 can be created from face keypoints 0 and 1, and assuming that face keypoints 0 and 1, a facieless L01 is created connecting nodes d0 and d 1. That is, in the undirected graph, there is an edge with a value of 1 between nodes d0 and d 1.

In some embodiments of the application, the three-dimensional face key points of the face are connected in the facial region, and then the facial regions with the adjacent relation are connected, so that all the detected face key points are connected into a complete image, and the undirected graph of the face is obtained. And the preset adjacency matrix is used for indicating whether an adjacency relation exists between any two nodes of the undirected graph. For example, for 104 preset face key points, whether an adjacency relationship exists between any two face key points in the 104 preset face key points can be represented by a matrix of 104 × 104. Taking the preset adjacency matrix as an example A, the matrix element A in the preset adjacency matrix A_ijIndicating whether or not there is an adjacency between the ith personal face key point and the jth personal face key point, e.g., A_ij1 denotes the ith individual face key point and the jth individualAdjacent relationships exist between key points of the face, A_ij0 means that there is no adjacency between the ith personal face key point and the jth personal face key point (i.e., the ith personal face key point is not related to the jth personal face key point), where i and j are integers.

The generation process of the preset adjacency matrix a is exemplified below.

In some embodiments of the present application, all of the detected preset face key points are divided into a cheek (e.g., 33 face key points with numbers 0 to 32 in fig. 2), a left eyebrow (e.g., 9 face key points with numbers 33 to 41 in the upper left corner in fig. 2), a right eyebrow (e.g., 9 face key points with numbers 42 to 50 in the upper right corner in fig. 2), a left eye (e.g., 9 face key points with numbers 66 to 74 below the left eyebrow in fig. 2), a right eye (e.g., 9 face key points with numbers 75 to 83 below the right eyebrow in fig. 2), a nose (e.g., 15 face key points with middle positions in fig. 2, numbers 51 to 65), a mouth (e.g., 20 face key points with numbers 84 to 103 in the lower middle region in fig. 2), and 7 facial region in total according to the five sense organs and the distribution of the preset face key points.

In some embodiments of the application, the face key points in the above-mentioned five sense organ regions are connected respectively, so that the adjacency relation between some face key points can be obtained, and the values of the corresponding matrix elements in the preset adjacency matrix are set according to the obtained adjacency relation.

For example, 33 face key points of the cheek are sequentially connected from 0 to 32, that is, 0 and 1, 1 and 2, … …, and 31 and 32 have an adjacent relationship respectively among the 33 face key points of the cheek, and in the preset adjacent relationship matrix, an edge connected with the face key point corresponding node having the adjacent relationship is set as an edge having a relationship value of 1. For example, the relationship value of the face key points 0 and 1 corresponding to the edge formed by connecting two nodes is set to 1. Correspondingly, the matrix element A of the preset adjacency matrix A_0,1Is 1. For another example, the relationship value of the edge formed by connecting two nodes corresponding to the face key points 0 and 3 is set to 0, and correspondingly, the matrix element a of the preset adjacency matrix a_0,3Is 0.

Also for example, the left and right eyebrow regions are eachAdjacent human face key points in the 9 human face key points are connected in sequence and are connected in a closed mode to form a ring. If the face

key points

33 and 34, 34 and 35, and … … 41 and 33 in fig. 3 are respectively adjacent, the nodes corresponding to each group of adjacent face key points are respectively connected by an edge with a relation value of 1, such as the matrix element a of the preset adjacent matrix a_33,34 Is 1. For another example, if the face

key points

33 and 35 in fig. 3 are not adjacent, the nodes corresponding to the face

key points

33 and 35 are connected by an edge with a relation value of 0, that is, the matrix element a of the preset adjacent matrix a_33,35Is 0.

For another example, the neighboring face key points in the 8 edge face key points (as labeled 66 to 73 in fig. 3) in the left eye region are sequentially connected, the neighboring face key points in the 8 edge face key points (as labeled 75 to 82 in fig. 3) in the right eye region are sequentially connected, and are closed and connected to form a ring, and then the pupil key points (as labeled 73 and 84) are respectively connected to the 8 edge face key points in the left eye region and the 8 edge face key points in the right eye region.

The face key points on the outer edge of the nose are connected in sequence, closed into a ring, and then connected in sequence (51, 52, 53, 54 and 60 in fig. 3) in the nose tip region and the nose bridge region.

And for 20 human face key points in the mouth region, sequentially connecting the human face key points at the outer edge and the human face key points at the inner edge respectively, and closing and connecting to form a ring to form a double-ring shape.

In some embodiments of the application, for the face key points which are not in the same facial region, the face key points which are closer to each other may also be considered to have an adjacent relationship, and therefore, the face key points which are not in the same facial region and are closer to each other are sequentially connected into a line, so that all the preset face key points are connected into a complete graph, and the distribution information of the face key points is more comprehensively embodied.

For example, the external eyebrow corner, the external eye corner, and the face key point on the uppermost portion of the cheek are considered to have an adjacent relationship, and the nodes corresponding to the external eyebrow corner and the external eye corner, and the nodes corresponding to the face key point on the uppermost portion of the cheek are connected by the edges having a relationship value of 1, respectively. For another example, the face key points at the inner eyebrow angle, the inner canthus and the uppermost part of the nose bridge are considered to have an adjacent relationship, and the nodes corresponding to the inner eyebrow angle and the inner canthus and the nodes corresponding to the face key points at the inner canthus and the uppermost part of the nose bridge are respectively connected by the edges with the relationship value of 1. For example, it is considered that key points of the face on the central axis of the face, such as the eyebrow point, the nose apex, the nose bridge point, the nose tip point, the upper lip midpoint, the mouth midpoint, the lower lip midpoint, and the lower jaw midpoint, have an adjacent relationship in this order, and the nodes corresponding to the key points of the face having the adjacent relationship on the central axis of the face are connected by the edges having a relationship value of 1.

In other embodiments of the present application, adjacent face key points may also be determined in other manners. For example, regarding the above-mentioned face key points on the central axis of the face, it is considered that the face key points on the central axis of the face, such as the nasal apex, the nasal bridge point, the nasal cusp point, the upper lip midpoint, the mouth midpoint, the lower lip midpoint, and the lower jaw midpoint, have an adjacent relationship in order. Because the process of extracting the features by the graph neural network has strong robustness, the prediction result of the neural network cannot be influenced by only changing the relation value of one edge (namely, only changing the identification of the adjacent relation of a few face key points).

For two face key points with an adjacent relation (namely the two face key points which are connected), the nodes corresponding to the two face key points are connected through an edge with a relation value of 1; for two face key points without adjacency relation (i.e. the two face key points which are not connected), the nodes corresponding to the two face key points are connected by an edge with a relation value of 0. Accordingly, the relationship value of the edge connecting two nodes may be represented by the value of the corresponding matrix element of the preset adjacency matrix.

According to the method, the preset adjacency matrix of the undirected graph of the face can be obtained.

In some embodiments of the present application, redefining the face key points having the adjacency relationship will result in different adjacency matrices. In a face recognition application, it is preferable that the adjacency matrix of the undirected graph corresponding to each face is the same. That is, in a face recognition application, the adjacency matrix may be generated in advance according to the definition and connection rule of the face key points having the adjacency relationship.

On the other hand, for the graph structure, the data comprises two parts of nodes and edges, and the node storage data of the undirected graph generated in the previous step is represented by a node characteristic matrix. The key point information stored by each node includes: the three-dimensional space coordinates and the two-dimensional image content information of the preset human face key points corresponding to the nodes; the two-dimensional image content information comprises any one or more of the following items of information: color value, infrared brightness, transparency. Taking the node feature matrix X as an example of a dimension matrix of N × d, where N and d are natural numbers greater than 1, where N is the number of the preset face key points, and d is the dimension number of the key point information of each preset face key point, where the key point information includes: three-dimensional space coordinates and two-dimensional image content information. Specifically, for the present embodiment, N is equal to 104, d is equal to 6, that is, the keypoint information may be represented as (r, g, b, x, y, z), where r, g, b are color values of the face keypoint in the two-dimensional face image, and x, y, z are three-dimensional spatial coordinates of the face keypoint in the three-dimensional face image.

In other embodiments of the present application, the key point information may be represented as (r, g, b, a, x, y, z), where r, g, b are color values of the face key points in the two-dimensional face image, a is a transparency value of the face key points in the two-dimensional face image, and x, y, z are three-dimensional spatial coordinates of the face key points in the three-dimensional face image.

In other embodiments of the present application, if the two-dimensional face image is an infrared image, the key point information may be represented as (i, x, y, z), where i is an infrared brightness value of the face key point in the two-dimensional face image, and x, y, z is a three-dimensional spatial coordinate of the face key point in the three-dimensional face image.

And 140, extracting the graph characteristics based on the node characteristic matrix, the preset adjacent matrix and the unit matrix of the preset adjacent matrix through a pre-trained graph convolution neural network.

In some embodiments of the present application, the graph feature extraction is performed through a pre-trained graph convolution neural network based on the node feature matrix, the preset adjacency matrix, and the unit matrix of the preset adjacency matrix, and includes: and carrying out spatial information aggregation on the key point information in the node feature matrix of the undirected graph based on a preset adjacent matrix of the undirected graph and a unit matrix of the preset adjacent matrix through a pre-trained graph convolution neural network.

In an image recognition application, an input object is a picture, and the input object is a standard two-dimensional structure, wherein features are extracted by translating a convolution kernel on the picture and performing convolution. Since the picture has translation without deformation, the internal structure of a small window is the same no matter where the small window is moved to the picture. CNN can only process data in euclidean space. The structure of the graph is an irregular topological structure, can be regarded as data of an infinite dimension, has no translational invariance, the surrounding structure of each node is unique, and the data of the structure cannot be processed by both the CNN and the RNN (recurrent neural network). The graph convolution neural network (GCN) can extract features from graph data through a delicate design, and can use the features to perform node classification, graph classification, edge prediction, and the like on the data. When the method is specifically implemented, the face features in the undirected graph obtained based on the face image are extracted by adopting the graph convolution neural network.

The input data of the graph convolution neural network in the embodiment of the application comprises two parts: the node feature matrix and the adjacency matrix of the undirected graph (i.e., the aforementioned preset adjacency matrix). For example, the graph convolution neural network extracts features by forward propagation through equation 2 below:

in the above-mentioned formula 2, the first,

is the adjacency matrix A of the undirected graph G plusA sum matrix obtained from the unit matrix I of the adjacent matrix A;

is a degree matrix (degree matrix) with the calculation formula of

For diagonal matrices, the elements on the diagonal are corresponding nodes (e.g. matrix element A in adjacency matrix A)_ijCorresponding node) degree + 1;

has the effect of

Normalization is performed due to the matrix

The method is not normalized, some unpredictable problems are easily generated in the calculation process, and the normalization is carried out

The values of the elements are all between (0 and 1); h^(l)Input feature vector for hidden layer l, for input layer, H^(l)Equal to the original node feature matrix X,

the role is to realize the aggregation of spatial information; w^(l)The weight matrix of the first hidden layer is determined by pre-training the value of the weight matrix; σ is a nonlinear activation function.

In the embodiment of the application, the preset adjacent matrix is added with the identity matrix, so that the connection of one node to each node in the undirected graph is added. When forward propagation is performed according to formula 2, if the identity matrix is not added, when features are extracted through one layer of graph convolution, such as the face key points in fig. 2, the pupil corresponding node will only extract the data features of 8 nodes around the pupil, but ignore the data features of its own node. After the unit matrix is added, the graph convolution extracts the data characteristics of the neighborhood nodes and the pupil nodes.

The input of the 1 st hidden layer of the graph convolution neural network is an original node feature matrix X, the graph convolution neural network can aggregate information on a first-order neighborhood after each hidden layer processing, and l-order neighborhood features are fused for each node after l hidden layers. Weight matrix W^(l)Has a dimension of F^l*F^l+1Namely, the second dimension of the weight matrix determines the feature number of the next layer, and the core network has several hidden layers, and the weights W of several layers are trained. The activation function may be, for example, Sigmoid, ReLu, etc.

In some embodiments of the present application, a 5-tier graph volume may be employed

The network structure of the cumulative neural network is GCN 5+ FC1+ FC2, wherein GCN is a graph convolution network, and FC1 and FC2 are full connection layers. After 5 layers of GCN and + FC1, an output vector of a specified dimension (for example 1024) can be acquired, and the output vector can be used as a face feature of a face corresponding to an input undirected graph.

In some embodiments of the present application, before the step of extracting the graph feature based on the node feature matrix, the preset adjacency matrix, and the unit matrix of the preset adjacency matrix through the pre-trained graph convolution neural network, as shown in fig. 5, the method further includes:

and step 100, training the graph convolution neural network.

Wherein training the graph convolutional neural network comprises: for a plurality of training faces, constructing an undirected graph corresponding to the training faces, wherein the undirected graph comprises a preset adjacency matrix and a node feature matrix of the corresponding training faces; for each training face, constructing a training sample corresponding to the training face by taking the sum matrix of the preset adjacency matrix and the unit matrix and the node feature matrix corresponding to the training face as sample data and the real value of the classification result of the training face as a sample label; and for each training sample, respectively extracting graph features based on the sum matrix and the node feature matrix through the graph convolutional neural network, calculating a classification result predicted value according to the features obtained by extracting the graph features, then calculating errors of the classification result predicted value and the classification result true value of all the training samples through a cross entropy loss function, and optimizing network parameters of the graph convolutional neural network by taking the error minimum as a target until the errors converge to meet preset conditions, thereby completing the training process of the graph convolutional neural network.

In the training phase of the graph-convolution neural network, a large number of training samples are constructed first. Each training sample corresponds to a two-dimensional face image and a three-dimensional face image which are synchronously acquired aiming at the same training face. The sample data of each training sample comprises: presetting an adjacency matrix, generating a node characteristic matrix according to the two-dimensional face image and the three-dimensional face image, and setting a sample label as a real value of a classification result of the training face. Taking a training sample generated by a face image including 5 ten thousand persons in a training sample set as an example, the sample label is used to represent the probability that the current training sample belongs to 5 ten thousand categories, and may be represented in a form such as "0001000 …".

The specific implementation of detecting the preset face key point according to the two-dimensional face image and the three-dimensional face image of the training face, determining the two-dimensional image content information in the two-dimensional face image, and determining the three-dimensional space coordinate of the preset face key point in the three-dimensional face image is described in the foregoing steps, and is not repeated here. For a specific implementation of constructing an undirected graph according to the three-dimensional space coordinates and the two-dimensional image content information, reference is made to the foregoing step description, which is not repeated herein. The adjacency matrix in the undirected graph constructed in the training stage is the same as the preset adjacency matrix in the recognition process, and the unit matrix of the adjacency matrix can be generated according to the adjacency matrix by adopting the prior art.

Taking the example that the graph convolution neural network comprises 5 graph convolution layers and 2 full-connection layers, in the training process, for each input training sample (namely, labeled undirected graph), the graph convolution neural network performs spatial information aggregation on the sum matrix of the preset adjacent matrix and the unit matrix and the node feature matrix in the input undirected graph through each graph convolution layer, and the 5 th graph convolution layer outputs an output vector with 1024 dimensions; then, sequentially performing feature mapping on the two full-connection layers to obtain a face classification result predicted value corresponding to the input undirected graph; then, calculating errors of the classification result predicted values and the classification result true values of all training samples through cross entropy loss to obtain model errors of the graph convolution neural network; and then, aiming at the minimum model error, adjusting network parameters by adopting a gradient descent method, and recalculating the training sample until the model error is converged to finish the training process of the graph convolution neural network.

In some embodiments of the present application, when the 1024-dimensional output vector output by the previous fully-connected layer is mapped to C (e.g., 5 ten thousand) classes through the last fully-connected layer, the classification probability may be calculated by softmax. The softmax function may be as shown in equation 3 below.

In the above formula 3, g_iP (i) is the probability that the output vector is mapped to the ith category, 1 ≦ k ≦ C.

For example, for 100 ten thousand training samples generated according to 100 ten thousand human face images of 5 ten thousand individuals (20 pairs of images per person, each pair of human face images including one two-dimensional human face image and one three-dimensional human face image), feature extraction is performed on 100 ten thousand training samples through a graph convolution layer and a first full connection layer of the graph convolution neural network respectively, feature vectors obtained after feature extraction are mapped to corresponding 5 ten thousand categories (one category corresponding to each person) through a last full connection layer respectively, and if the mapping accuracy exceeds a certain threshold (for example, 99%), it can be considered that the features extracted by the graph convolution layer and the first full connection layer of the graph convolution neural network are unique features of each person, and the network training is completed.

And 150, performing face recognition on the target face by using the features obtained by extracting the image features by the image convolution neural network as face features.

In some embodiments of the present application, a face feature library needs to be established in advance before face recognition is performed. The preset face feature library comprises a plurality of face feature libraries for registering faces; wherein each group of the face features in the face feature library is obtained by the following method: acquiring a two-dimensional face image and a three-dimensional face image which are synchronously acquired by different image acquisition devices aiming at a registered face; determining pixel positions and two-dimensional image content information of all preset face key points in the two-dimensional face image acquired aiming at the registered face, and determining three-dimensional space coordinates of all preset face key points in the three-dimensional face image acquired aiming at the registered face; constructing the undirected graph corresponding to the registered face according to the three-dimensional space coordinates and the two-dimensional image content information of each preset face key point of the registered face; extracting the image features based on a node feature matrix and a preset adjacency matrix of the undirected graph corresponding to the registered face and a unit matrix of the preset adjacency matrix through a pre-trained image convolution neural network; taking the feature obtained by extracting the graph feature of the graph convolution neural network as the face feature of the registered face; and storing the face characteristics of the registered face in the face characteristic database.

For a specific implementation of obtaining the two-dimensional face image and the three-dimensional face image of each registered face, refer to the specific implementation of obtaining the two-dimensional face image and the three-dimensional face image of the target face in the foregoing steps, which is not described herein again.

Determining pixel positions and two-dimensional image content information of preset face key points in the two-dimensional face image acquired aiming at the registered face, and determining a specific implementation mode of three-dimensional space coordinates of the preset face key points in the three-dimensional face image acquired aiming at the registered face, referring to a specific implementation mode of determining the two-dimensional image content information and the three-dimensional space coordinates in a recognition stage or a training stage, which is not repeated herein.

The specific implementation of constructing the undirected graph according to the two-dimensional image content information and the three-dimensional space coordinates is referred to in the specific implementation of constructing the undirected graph in the recognition stage or the training stage, which is not described herein again.

Then, for each registered face, inputting an undirected graph generated according to the image of the registered face into a graph convolution neural network trained in advance, extracting graph features of the input undirected graph by the graph convolution neural network, then, taking the features obtained by the graph convolution neural network after the graph feature extraction step as a group of face features of the registered face, and storing the group of face features of the registered face in the face feature library.

In the recognition stage, the preset hidden vector of the graph convolution neural network is used as a face feature to perform face recognition on the target face, and the face recognition method comprises the following steps: comparing the similarity of the features obtained by extracting the image features by the image convolution neural network with the face features in a preset face feature library; and determining a recognition result of face recognition of the target face according to the result of the similarity comparison.

In some embodiments of the present application, when similarity comparison is performed between the features extracted by the graph convolution neural network and the face features in the preset face feature library, a method for calculating cosine similarity (for example, the following formula 4) may be adopted to calculate a comparison score:

in the above formula 4, sim (x, y) represents the similarity between the face feature x and the face feature y,

in order to calculate the dot product, x is a modulus, the calculated similarity sim (x, y) is a number between (0-1), the closer the value of sim (x, y) is to 1, the greater the similarity between the human face feature x and the human face feature y is, the closer the value of sim (x, y) is to 0, and the human face isThe smaller the similarity of the feature x and the face feature y.

In some embodiments of the present application, the face features in the face feature library with the largest similarity to the face features extracted by the graph convolution neural network may be taken as successfully matched face features, and the related information of the registered face to which the face features in the face feature library belong, which are successfully matched, is taken as a face recognition result.

In other embodiments of the present application, other methods may also be used to calculate the similarity between the facial features extracted by the graph convolutional neural network and each group of facial features in the facial feature library, and a specific implementation manner of calculating the similarity between two groups of facial features is not limited in this embodiment of the present application.

In other embodiments of the present application, other methods may also be used to determine a face recognition result according to the similarity between the face features extracted by the graph convolution neural network and each group of face features in the face feature library, and the specific implementation manner of determining the face recognition result according to the similarity between the face features is not limited in the present application.

Compared with the face recognition method in the prior art, the face recognition method disclosed by the embodiment of the application uses two-dimensional face features and three-dimensional face features at the same time in the comparison stage, namely, the texture information of the face is utilized, and the three-dimensional information of the face is utilized, so that the recognition precision can be improved, the face recognition method has strong advantages in the aspect of resisting the attack of printing photos and videos, and the safety and the reliability are higher.

The face recognition method disclosed by the embodiment of the application is more suitable for extracting the data features in the undirected graph constructed by the face key points, namely more suitable for extracting the position relation features among the face key points, by adopting the graph convolution neural network to replace a common convolution neural network.

Example two

Corresponding to the method embodiment, another embodiment of the present application discloses a face recognition apparatus, as shown in fig. 6, the apparatus includes:

a face image acquisition module 610, configured to acquire a two-dimensional face image and a three-dimensional face image that are synchronously acquired by different image acquisition devices for a target face;

a face key point information obtaining module 620, configured to determine pixel positions and two-dimensional image content information of preset face key points in the two-dimensional face image, and determine three-dimensional space coordinates of the preset face key points in the three-dimensional face image;

a face key point undirected graph constructing module 630, configured to construct corresponding nodes according to each of the preset face key points, and construct an undirected graph by using undirected edges to represent an adjacency relationship between the nodes in the preset face key points, which are adjacent to the preset face key points; the undirected graph comprises: a preset adjacency matrix representing the adjacency relation between the nodes and a node characteristic matrix formed by key point information stored by each node in the undirected graph;

a graph feature extraction module 640, configured to perform graph feature extraction based on the node feature matrix, the preset adjacency matrix, and a unit matrix of the preset adjacency matrix through a pre-trained graph convolution neural network;

and the face recognition module 650 is configured to perform face recognition on the target face by using features obtained by extracting the image features by the image convolution neural network as face features.

In some embodiments of the present application, the graph feature extraction module 640 is further configured to:

performing spatial information aggregation on the key point information in the node feature matrix of the undirected graph based on a preset adjacency matrix of the undirected graph and a unit matrix of the preset adjacency matrix through a pre-trained graph convolution neural network

In some embodiments of the present application, the key point information stored by each node includes: the three-dimensional space coordinates and the two-dimensional image content information of the preset human face key points corresponding to the nodes; the two-dimensional image content information comprises any one or more of the following items of information: color value, infrared brightness, transparency.

In some embodiments of the present application, as shown in fig. 7, the apparatus further comprises: a graph convolution neural network training module 600, the graph convolution neural network training module 600 to:

for a plurality of training faces, constructing an undirected graph corresponding to the training faces, wherein the undirected graph comprises a preset adjacency matrix and a node feature matrix of the corresponding training faces;

for each training face, constructing a training sample corresponding to the training face by taking the sum matrix of the preset adjacency matrix and the unit matrix and the node feature matrix corresponding to the training face as sample data and the real value of the classification result of the training face as a sample label;

and for each training sample, respectively extracting graph features based on the sum matrix and the node feature matrix through the graph convolutional neural network, calculating a classification result predicted value according to the features obtained by extracting the graph features, then calculating errors of the classification result predicted value and the classification result true value of all the training samples through a cross entropy loss function, and optimizing network parameters of the graph convolutional neural network by taking the error minimum as a target until the errors converge to meet preset conditions, thereby completing the training process of the graph convolutional neural network.

In some embodiments of the present application, the face recognition module 650 is further configured to:

comparing the similarity of the features obtained by extracting the image features by the image convolution neural network with the face features in a preset face feature library;

and determining a recognition result of face recognition of the target face according to the result of the similarity comparison.

In some embodiments of the present application, the preset face feature library includes a plurality of face feature libraries for registering faces; wherein each group of the face features in the face feature library is obtained by the following method:

acquiring a two-dimensional face image and a three-dimensional face image which are synchronously acquired by different image acquisition devices aiming at a registered face;

determining pixel positions and two-dimensional image content information of all preset face key points in the two-dimensional face image acquired aiming at the registered face, and determining three-dimensional space coordinates of all preset face key points in the three-dimensional face image acquired aiming at the registered face;

constructing the undirected graph corresponding to the registered face according to the three-dimensional space coordinates and the two-dimensional image content information of each preset face key point of the registered face;

extracting the image features based on a node feature matrix and a preset adjacency matrix of the undirected graph corresponding to the registered face and a unit matrix of the preset adjacency matrix through a pre-trained image convolution neural network;

taking the feature obtained by extracting the graph feature of the graph convolution neural network as the face feature of the registered face;

and storing the face characteristics of the registered face in the face characteristic database.

The face recognition device disclosed in this embodiment is used to implement the face recognition method described in the foregoing embodiment, and the specific implementation of each module of the device refers to corresponding steps in the method, which is not described in detail in this embodiment.

The face recognition device disclosed by the embodiment of the application acquires two-dimensional face images and three-dimensional face images which are synchronously acquired by different image acquisition devices aiming at a target face; determining pixel positions and two-dimensional image content information of all preset face key points in the two-dimensional face image, and determining three-dimensional space coordinates of all preset face key points in the three-dimensional face image; constructing corresponding nodes according to the preset face key points, and constructing an undirected graph by expressing the adjacency relation of the nodes corresponding to the adjacent preset face key points in the preset face key points through undirected edges; the undirected graph comprises: a preset adjacency matrix representing the adjacency relation between the nodes and a node characteristic matrix formed by key point information stored by each node in the undirected graph; extracting graph features based on the node feature matrix, the preset adjacent matrix and the unit matrix of the preset adjacent matrix through a pre-trained graph convolution neural network; and performing face recognition on the target face by using the features obtained by extracting the image features by the image convolution neural network as the face features, which is favorable for improving the accuracy and reliability of face recognition.

Compared with the face recognition method in the prior art, the face recognition device disclosed by the embodiment of the application uses two-dimensional face features and three-dimensional face features at the same time in the comparison stage, namely, the texture information of the face is utilized, and the three-dimensional information of the face is utilized, so that the recognition precision can be improved, the face recognition device also has strong advantages in the aspect of resisting the attack of printing photos and videos, and the safety and the reliability are higher.

Correspondingly, the application also discloses an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the face recognition method according to the first embodiment of the application. The electronic device can be a PC, a mobile terminal, a personal digital assistant, a tablet computer and the like.

The present application also discloses a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the face recognition method according to the first embodiment of the present application.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The face recognition method and device provided by the present application are introduced in detail, and a specific example is applied in the text to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understanding the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Claims

1. A face recognition method, comprising:

2. The method according to claim 1, wherein the step of performing graph feature extraction based on the node feature matrix, the preset adjacency matrix, and the identity matrix of the preset adjacency matrix through a pre-trained graph convolution neural network comprises:

and carrying out spatial information aggregation on the key point information in the node feature matrix of the undirected graph based on a preset adjacent matrix of the undirected graph and a unit matrix of the preset adjacent matrix through a pre-trained graph convolution neural network.

3. The method of claim 1, wherein the keypoint information stored by each of the nodes comprises: the three-dimensional space coordinates and the two-dimensional image content information of the preset human face key points corresponding to the nodes; the two-dimensional image content information comprises any one or more of the following items of information: color value, infrared brightness, transparency.

4. The method according to any one of claims 1 to 3, wherein before the step of performing graph feature extraction based on the node feature matrix, the preset adjacency matrix, and the identity matrix of the preset adjacency matrix through a pre-trained graph convolution neural network, the method comprises:

5. The method according to any one of claims 1 to 3, wherein the step of performing face recognition on the target face by using the feature obtained by performing the image feature extraction on the image convolutional neural network as a face feature comprises:

6. The method of claim 5, wherein the predetermined facial feature library comprises a facial feature library of a plurality of registered faces; wherein each group of the face features in the face feature library is obtained by the following method:

7. A face recognition apparatus, comprising:

8. The apparatus of claim 7, wherein the graph feature extraction module is further configured to:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the face recognition method of any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the steps of the face recognition method of any one of claims 1 to 6.