CN111091075A

CN111091075A - Face recognition method and device, electronic equipment and storage medium

Info

Publication number: CN111091075A
Application number: CN201911213058.3A
Authority: CN
Inventors: 张彦博; 李骊
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2019-12-02
Filing date: 2019-12-02
Publication date: 2020-05-01
Anticipated expiration: 2039-12-02
Also published as: CN111091075B

Abstract

The embodiment of the application discloses a face recognition method, a face recognition device, electronic equipment and a storage medium, wherein point cloud data are used for carrying out posture correction on a face area in a depth map, a target area is cut out from the corrected face area, the point cloud data of the target area are normalized, the normalized point cloud data are mapped into a planar three-channel image, and the three-channel image is input into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained by training a sample data set marked with a recognition label; the sample data set comprises first sample data of a face region in a depth image acquired by an image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image. 3D face recognition accuracy is improved.

Description

Face recognition method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of face recognition technologies, and in particular, to a face recognition method, an apparatus, an electronic device, and a storage medium.

Background

The face recognition is widely applied to the fields of identity authentication, information security, criminal investigation, access control and the like.

At present, the mainstream face recognition method is a two-dimensional (2D) face recognition method, and although the 2D face recognition method has the characteristics of small operand and fast recognition, the extracted features are relatively limited, and thus errors are easily distinguished in a complex scene. To compensate for this deficiency, three-dimensional (3D) face recognition applications are used.

However, the inventor researches and discovers that, in a 3D face recognition method based on a conventional method (the conventional method mainly adopts a data method to extract features from image data and performs face recognition based on the extracted features), only some local features (such as features of regions of eyes, mouth, nose and the like) of a face are extracted to perform face matching, and people with slightly similar faces are easily judged wrongly, and the recognition accuracy is low. The 3D face recognition method based on the neural network can improve the recognition accuracy, but the method is based on the premise of having enough training samples, and the fact is that the current 3D face data are very few, so the recognition accuracy of the current 3D face recognition method based on the neural network is also low.

Disclosure of Invention

The application aims to provide a face recognition method, a face recognition device, electronic equipment and a storage medium, so as to improve the 3D face recognition precision, and specifically comprises the following technical scheme:

a face recognition method, comprising:

acquiring a depth image;

performing attitude correction on the face area according to the point cloud data of the face area in the depth image;

cutting out a target area from the face area after the posture correction;

carrying out normalization processing on the point cloud data of the target area to obtain normalized point cloud data;

mapping the normalized point cloud data into a planar three-channel image;

inputting the planar three-channel image into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained by training a sample data set marked with a recognition tag; the sample data set comprises first sample data of a face region in a depth image acquired by an image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image.

In the above method, preferably, the performing pose correction on the face region according to the point cloud data of the face region in the depth map includes:

carrying out face detection and key point detection in a face region on the color image corresponding to the depth image to obtain face region coordinates and a plurality of key point coordinates;

determining a face area to be processed and a plurality of key points from the depth image according to the face area coordinates and the plurality of key point coordinates;

determining a posture transformation matrix according to the point cloud data of the plurality of key points and a preset point cloud data template of the plurality of key points;

and carrying out attitude correction on the point cloud data of the face area to be processed by utilizing the attitude transformation matrix.

In the above method, preferably, the plurality of key points include: two eyeball points, a nose tip and two mouth corner points;

before determining the attitude transformation matrix according to the point cloud data of the plurality of key points and a preset point cloud data template of the plurality of key points, the method further comprises the following steps:

acquiring the depth value of each pixel in a preset area around the nose tip point;

and taking the median of the depth values of all pixels in the preset area as the final depth value of the nose tip point.

In the above method, after determining the final depth value of the nose tip point, before performing pose correction on the point cloud data of the face area to be processed by using the pose transformation matrix, the method further includes:

and deleting the pixel points with the depth values smaller than the final depth value of the nose tip point in the depth map and the pixel points with the difference value between the depth value and the final depth value of the nose tip point larger than a preset threshold value.

In the above method, preferably, the point cloud data of the face area is obtained by using the following formula:

z＝(double)I(i，j)/s

x＝(j-cx)*z/fx

y＝(i-cy)*z/fy

wherein, I represents a frame depth image; i (I, j) refers to the depth value of the (I, j) position in the image matrix; double is used for data format conversion; fx, fy is the focal length of the camera that collects the depth map; cx, cy are principal point-like coordinates; s is a tilt parameter; x, y, z are the point cloud coordinates after transformation.

In the above method, preferably, the second type of sample data is obtained by:

taking a nose tip point as a center corresponding to each frame of depth image acquired by an image acquisition device, and intercepting a first preset amount of point cloud data closest to the nose tip point from the point cloud data of the frame of depth image to serve as face point cloud data;

calculating the difference of any two frames of depth images acquired by an image acquisition device according to the intercepted point cloud data of the human face;

according to the sequence of the differences from large to small, taking the depth image pairs with the largest difference and the second preset number as target depth image pairs;

calculating to obtain new face point cloud data by using the point cloud data of two frames of depth images in the target depth image pair corresponding to each target depth image pair;

for each new face point cloud data, carrying out multiple enhancement treatments on the new face point cloud data to obtain enhanced face point cloud data, and carrying out posture rotation once and increasing noise for each enhancement treatment;

for each enhanced face point cloud data, carrying out normalization processing on the enhanced face point cloud data; and mapping the normalized enhanced human face point cloud data into a planar three-channel image to obtain second type sample data.

In the above method, preferably, the face recognition model is a four-layer convolutional neural network.

A face recognition apparatus comprising:

the acquisition module is used for acquiring a depth image;

the correction module is used for carrying out posture correction on the face area according to the point cloud data of the face area in the depth image;

the cutting module is used for cutting out a target area from the face area after the posture correction;

the normalization module is used for performing normalization processing on the point cloud data of the target area to obtain normalized point cloud data;

the mapping module is used for mapping the normalized point cloud data into a planar three-channel image;

the recognition module is used for inputting the planar three-channel image into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained by training a sample data set marked with a recognition tag; the sample data set comprises first sample data of a face region in a depth image acquired by an image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image.

An electronic device comprising a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the steps of the face recognition method according to any one of the above aspects.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the face recognition method as defined in any one of the preceding claims.

According to the scheme, the human face recognition method, the human face recognition device, the electronic equipment and the storage medium provided by the application have the advantages that the point cloud data are used for carrying out posture correction on a human face area in a depth map, a target area is cut out from the corrected human face area, the point cloud data of the target area are normalized, the normalized point cloud data are mapped into a planar three-channel image, and the three-channel image is input into a human face recognition model which is trained in advance to obtain a human face recognition result; the face recognition model is obtained by training a sample data set marked with a recognition label; the sample data set comprises first sample data of a face region in a depth image acquired by an image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image. The second type of sample data of the enhanced face area obtained based on the depth image acquired by the image acquisition device is added in the sample data used for training the face recognition model, so that the number of training samples is increased, and the posture correction is performed on the face in the depth image to be recognized in combination, so that the 3D face recognition precision is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of an implementation of a face recognition method according to an embodiment of the present application;

fig. 2 is a flowchart illustrating an implementation of performing pose correction on a face region according to point cloud data of the face region in a depth map according to the embodiment of the present application;

fig. 3 is a flowchart illustrating an implementation of obtaining second type of sample data of an enhanced face region based on a depth image acquired by an image acquisition device according to an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a convolutional neural network provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of a face recognition apparatus according to an embodiment of the present application;

fig. 6 is a block diagram of a hardware structure of an electronic device according to an embodiment of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be practiced otherwise than as specifically illustrated.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

The expression model of the human face is divided into a 2D human face and a 3D human face, the 2D human face recognition has the advantages that the realized algorithms are relatively more, a set of relatively mature processes is provided, the image data is relatively simple to obtain, and only one common camera is needed, so that the human face recognition based on the 2D image data is the current mainstream and is applied to various scenes such as security, monitoring, entrance guard, attendance checking, financial identity auxiliary authentication, entertainment and the like. 2D face recognition can be divided into two broad categories according to its technical development: traditional face recognition, face recognition based on neural network. The traditional 2D face recognition mainly adopts a mathematical method to extract corresponding features from an image matrix, the features are generally scale-invariant features, and commonly used algorithms comprise SURF, SIFT, HARRISS, GFTT and the like; the 2D face recognition based on the neural network has the recognition accuracy rate of 99.80 percent tested on various face recognition challenge games and various open source data sets, the recognition accuracy rate can be even comparable with that of human beings, but the 2D face recognition is only used as an auxiliary means in a harsh financial environment, and other verification means such as inputting a mobile phone number and the like are required after the face recognition. Because 2D face recognition has certain limitation, 3D face recognition is generated at the same time in order to make up for the deficiency. The 2D face recognition only comprises RGB data, the 3D face recognition is basically based on point cloud data for recognition, and depth information is added to the point cloud data, so that a model based on the point cloud data has higher recognition accuracy and higher living body detection accuracy than a model only comprising RGB data.

The face recognition based on the neural network requires a large amount of data for training no matter the 2D algorithm or the 3D algorithm, the currently disclosed 2D face data reaches the million level, and a better face recognition model can be trained. Based on the fact that the research of the 3D face recognition is still in a development stage, the disclosed data is few and few, and the collected data needs long time and high cost, the method and the device for the face recognition achieve the purpose of data expansion by performing data enhancement on the disclosed data set or the collected few data sets, and further train a 3D face recognition model by using the expanded data, so that the precision of the 3D face recognition model is improved. The following provides a detailed description of examples of the present application.

Referring to fig. 1, fig. 1 is a flowchart of an implementation of a face recognition method according to an embodiment of the present application, which may include:

step S11: a depth image is acquired.

The depth image is an image to be subjected to 3D face recognition. The depth image can be acquired by the image acquisition device, or the depth image acquired by the image acquisition device in advance can be read from the memory.

Step S12: and carrying out posture correction on the face area according to the point cloud data of the face area in the depth map.

The depth image is a planar image, i.e., a two-dimensional image. In the embodiment of the application, a face region is determined in a two-dimensional depth image, and the face region is usually a rectangular region containing a face. And then converting the human face area into three-dimensional point cloud data, and performing attitude correction on the three-dimensional point cloud data to realize the correction of the human face area.

Optionally, the point cloud data of the face region in the depth map may be obtained as follows:

z＝(double)I(i，j)/s

x＝(j-cx)*z/fx

y＝(i-cy)*z/fy

wherein, I represents a frame depth image; i (I, j) refers to the depth value of the (I, j) position in the image matrix; double is used for data format conversion; fx, fy is the focal length of the camera that acquires the depth image, and generally fx ═ fy; cx, cy are the coordinates of the image principal point (i.e., the intersection of the perpendicular to the image plane and the center of the photograph) and the image plane; s is a tilt parameter; x, y, z are the point cloud coordinates after transformation. It should be noted that fx, fy, cx, cy, and s herein are internal parameters of the image capturing device for capturing the depth image, and the image capturing device determines that specific values of fx, fy, cx, cy, and s are also determined, and these parameters are generally different for different image capturing devices. For example, the above parameters of a certain image capturing device are configured as follows: fx 474.376, fy 474.376, s 1000, cx 317.301, and cy 245.217.

In order to facilitate the subsequent mapping of the normalized point cloud data into a planar three-channel image, the relative positions of all pixel points in the face area in the depth map can be recorded in the process of obtaining the point cloud data of the face area. Specifically, for any pixel point in the face region (for convenience of description, it is denoted as pixel point B, and its coordinate is denoted as (x)_B，y_B) The relative position of the pixel point B in the depth map can be used (x)_B/n_c，y_B/n_r) Characterised in that n_cIs the length of the depth map, n_rIs the width of the depth map.

Step S13: and cutting out a target area from the face area after the posture correction.

The face region determined in the two-dimensional depth image usually contains more redundant information, and in order to reduce subsequent data processing amount, in the embodiment of the present application, the face region after the pose correction is cut. Optionally, a region with a preset size may be captured as a target region with the nose tip as a center, where the target region is smaller than the face region after the pose correction. The inventor researches and discovers that when a 180 × 180 region is cut out with the nose tip point as the center to be used as a target region (namely, the length and the width of the target region are both 180 pixels), the target region contains the main information of a human face, and the human face recognition is carried out based on the main information, so that the recognition accuracy is ensured, and the recognition speed is improved.

Specifically, the face region after the pose correction may be converted into a depth map, and a 180 × 180 region may be cut out as a target region in the depth map with the nose tip as a center.

Step S14: and carrying out normalization processing on the point cloud data of the target area to obtain normalized point cloud data.

Optionally, the point cloud data may be normalized by using the maximum value and the minimum value of the coordinate of each dimension of the point cloud data of the face region after the pose correction, and specifically, the point cloud data of the target region may be normalized by using the following formula:

x′＝(2x-x_max-x_min)/(x_max-x_min)

y′＝(2y-y_max-y_min)/(y_max-y_min)

z′＝(2z-z_max-z_min)/(z_max-z_min)

wherein x, y, z are coordinates of three dimensions before normalization, x ' is a coordinate after x normalization, y ' is a coordinate after y normalization, z ' is a coordinate after z normalization, x_maxIs the maximum of the dimension to which x belongs, x_minIs the minimum value of the dimension to which x belongs, y_maxIs the maximum of the dimension to which y belongs, y_minIs the minimum of the dimensions to which y belongs, z_maxIs the maximum of the dimension to which z belongs, z_minIs the minimum value of the dimension z belongs to.

The coordinates of each point in the point cloud data of the target area are normalized to [ -1, 1] by the above normalization method.

Step S15: and mapping the normalized point cloud data into a planar three-channel image (namely an RGB image).

Optionally, the normalized point cloud data may be mapped into a planar three-channel image of a target size, where the size of the planar three-channel image is equal to or smaller than the size of the target area. In a preferred embodiment, the size of the planar three-channel image is smaller than the size of the target area.

The normalized point cloud data can be mapped into a planar three-channel image according to the relative position of a pixel point corresponding to the normalized point cloud data in the depth map and the target size of the planar three-channel image.

Specifically, the relative position of the pixel point B in the depth map is used as (x)_B/n_c，y_B/n_r) For example, the relative position of the pixel point B in the depth map is multiplied by the target size of the planar three-channel image to obtain the position of the pixel point B in the planar three-channel image of the target size. Assume that the target size of the planar three-channel image is n'_c×n′_rThat is, the target length of the planar three-channel image is n'_cTarget width is n'_r. Then, the position of the pixel point B in the planar three-channel image is: (x)_B×n′_c/n_c，y_B×n′_r/n_r)。

Suppose that the point cloud coordinate of the pixel point B in the depth map is (x ') after being normalized'_B，y′_B，z′_B) In a planar three-channel image (x)_B×n′_c/n_c，y_B×n′_r/n_r) At the position, the values of RGB three channels are x 'in sequence'_B，y′_B，z′_B。

Step S16: inputting the mapped planar three-channel data into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained by training a sample data set marked with a recognition label; the sample data set comprises first sample data of a face region in a depth image acquired by an image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image.

The identification tag of each sample data is used to identify to which face the sample data belongs. In the embodiment of the present application, for a depth image (for convenience of description, denoted as M1) acquired by an image acquisition device, according to point cloud data of a face region in the depth image M1, a pose correction is performed on the face region in the depth image M1, a target region is cut out from the face region after the pose correction, point cloud data of the target region is normalized to obtain normalized point cloud data, and the normalized point cloud data is mapped to a planar three-channel image (i.e., first-class sample data).

The enhanced face area may be enhanced face point cloud data obtained by enhancing point cloud data of the face area in the depth image acquired by the image acquisition device, the enhanced face point cloud data is normalized to obtain normalized point cloud data, and the normalized point cloud data is mapped to a planar three-channel image (i.e., second-class sample data).

The face recognition method provided by the embodiment of the application comprises the steps of utilizing point cloud data to carry out posture correction on a face area in a depth map, cutting out a target area from the corrected face area, normalizing the point cloud data of the target area, mapping the normalized point cloud data into a planar three-channel image, inputting the three-channel image into a pre-trained face recognition model, and obtaining a face recognition result; the face recognition model is obtained by training a sample data set marked with a recognition label; the sample data set comprises first sample data of a face region in a depth image acquired by an image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image. The second type of sample data of the enhanced face area obtained based on the depth image acquired by the image acquisition device is added in the sample data used for training the face recognition model, so that the number of training samples is increased, and the posture correction is performed on the face in the depth image to be recognized in combination, so that the 3D face recognition precision is improved.

In an alternative embodiment, the flowchart of performing pose correction on a face region according to point cloud data of the face region in a depth map as shown in fig. 2 may include:

step S21: and carrying out face detection and key point detection in the face region on the color image corresponding to the depth image to obtain face region coordinates and a plurality of key point coordinates.

Optionally, the plurality of key points may specifically be 5 key points, and specifically may include: two eyeball points, a nose tip point, and two mouth corner points.

The image acquisition device acquires a corresponding color image (i.e., an RGB image) at the same time of acquiring the depth image. In the embodiment of the application, the color image corresponding to the depth image can be utilized to perform face detection and key point detection so as to determine face region coordinates and key point coordinates in the color image, and because the face region coordinates in the depth image are the same as the face region coordinates in the corresponding color image, the face region coordinates in the color image corresponding to the depth image, that is, the face region coordinates in the depth image.

Optionally, a pre-trained detection model may be used to perform face detection and key point detection on the color image, so as to obtain a face region coordinate and a plurality of key point coordinates.

Step S22: a face region to be processed and a plurality of key points are determined from the depth image based on the face region coordinates and the plurality of key point coordinates, the coordinates of the face region to be processed being the same as the coordinates of the face region detected in step S21, and the coordinates of the plurality of key points in the depth image being the same as the coordinates of the plurality of key points detected in step S21.

And step S23, determining a posture transformation matrix according to the point cloud data of the plurality of key points and a preset point cloud data template of the plurality of key points.

The preset point cloud data template of a plurality of key points refers to point cloud data of a plurality of key points in a preset standard face area.

In the point cloud data template of the key points, the relative position relationship between the key points is determined, and according to the relative position relationship between the key points in the depth image and the relative position relationship between the key points in the point cloud data template, a posture transformation matrix of the key points can be obtained, wherein the posture transformation matrix embodies the mapping relationship from the point cloud data of a plurality of key points in the depth image to the point cloud data template.

For how to obtain the attitude transformation matrix, reference may be made to the prior art, and details are not described here. In the embodiment of the application, the posture transformation matrix of the key point is used as the posture transformation matrix of the face region.

Step S24: and performing attitude correction on the point cloud data of the face area to be processed by using the attitude transformation matrix.

The process of performing pose correction on point cloud data of a face region to be processed by using a pose transformation matrix can refer to the prior art, and is not described in detail here.

The inventor of the present application has found that, due to defects of the device and changes of the environment, a nose tip point is detected in the color image, and a hole is likely to exist at a corresponding position in the depth image, that is, the depth of the nose tip point is zero, which may reduce the recognition accuracy. In particular, the method comprises the following steps of,

in an optional embodiment, before determining the pose transformation matrix according to the point cloud data of the plurality of key points and a preset point cloud data template of the plurality of key points, the method may further include:

and obtaining the depth value of each pixel in a preset area around the nose tip point, and taking the average value of the depth values of all pixels in the preset area as the final depth value of the nose tip point.

The inventors of the present application have found that although this correction method improves the recognition accuracy to some extent, the improvement effect is still not satisfactory.

In another optional embodiment, before determining the pose transformation matrix according to the point cloud data of the plurality of key points and a preset point cloud data template of the plurality of key points, the method may further include:

and obtaining the depth value of each pixel in a preset area around the nose cusp. For example, the depth values of the respective pixels in the 10 × 10 area around the nose tip point may be acquired.

Compared with the method that the average value of the depth values of all pixels in the preset area is used as the final depth value of the nose tip point, the method that the median value of the depth values of all pixels in the preset area is used as the final depth value of the nose tip point enables the face recognition precision to be obviously improved.

In order to further improve the recognition accuracy and the recognition speed, in the embodiment of the present application, before performing pose correction on the point cloud data of the face region to be processed by using the pose transformation matrix, the method may further include:

and deleting the pixel points of which the depth values are smaller than the final depth value of the nose tip point in the depth map and the pixel points of which the difference value between the depth value and the final depth value of the nose tip point is larger than a preset threshold value.

The inventor of the present application has found that the depths of general faces are all within 30cm, and therefore, a point in the face region to be processed in the depth image, which is smaller than the final depth value of the nose tip point, and a point which is larger than the final depth value of the nose tip point and has a difference with the final depth value of the nose tip point of more than 30cm are regarded as redundant information, and these points are deleted, and only the remaining points in the face region to be processed in the depth image are subjected to pose correction, so that the data processing speed is increased while the face recognition accuracy is ensured.

In addition, the edges of the pose-corrected point cloud data of the human face may still have noise points of some points that are not already part of the face, and such points are generally called outliers. The presence of such points also reduces recognition accuracy. Therefore, in order to further improve the face recognition accuracy and speed, the outliers in the face region to be processed can be recognized and deleted. Specifically, for any point (for convenience of description, referred to as point a) in the face region to be processed, the number of points in a circular region determined by taking the point a as a center and setting a radius is counted, and if the number of points in the circular region is smaller than a threshold value, the point a is determined to be an outlier.

In an alternative embodiment, a flowchart for implementing obtaining the second type of sample data of the enhanced face region based on the depth image acquired by the image acquisition device is shown in fig. 3, and may include:

step S31: for each frame of depth image acquired by the image acquisition device (for convenience of description, the k (k is 1, 2, …, N is the number of depth images acquired by the image acquisition device) frame depth image is denoted as Mk), and point cloud data of a first preset number (for convenience of description, N1, for example, N1 may be 10000) closest to the nose point is extracted from the point cloud data of the frame depth image Mk by taking the nose point as a center, and the point cloud data is used as face point cloud data.

In the embodiment of the application, the distances between each point in the depth image Mk and the nose tip point are calculated, the points in the Mk are sorted according to the sequence from the distance between the point and the nose tip point from large to small, and the top n1 points are taken as point cloud data (recorded as Fk) of the face in the depth image Mk. Wherein Fk ═ x_p,y_p,z_p]^T；x_p，y_p，z_pCoordinates of the p-th point in the face point cloud data Fk; p is 1, 2, …, and P is the number of the midpoints in the face point cloud data Fk.

Step S32: for any two frames of depth images (for convenience of description, denoted as the i-th frame depth image Mi and the j-th frame depth image Mj) acquired by the image acquisition device, the difference between the two frames of depth images is calculated according to the face point cloud data captured in step S31.

After the face point cloud data Fi of the ith frame depth image Mi and the face point cloud data Fj of the jth frame depth image Mj are determined, a dense corresponding relation between the two face point cloud data is established, namely, for each point on the face point cloud data Fi, one point on the face point cloud data Fj corresponds to the point, and the two points represent the same object on the face point cloud.

Optionally, the minimum bending energy in the process of completely matching corresponding points in the point cloud data of two human faces can be taken as a target through a thin-plate spline interpolation function and a constraint function, the bending energy value is obtained by solving an equation system, and the specific calculation process can refer to the prior art and is not detailed here. The thin-plate spline interpolation function determined when the bending energy is minimum is the mapping relation from one face point cloud data to another face point cloud data, namely the dense corresponding relation between points in the one face point cloud data and points in the another face point cloud data.

In the embodiment of the present application, the bending energy when the dense correspondence between two face point cloud data is established is calculated from two directions, that is: calculating the bending energy gamma from Fi deformation to Fj_ijAnd bending energy gamma from Fj deformation to Fi_ji。

Bending energy gamma_ijAnd bending energy gamma_jiIs taken as the difference between the ith frame depth image Mi and the jth frame depth image Mj.

Step S33: in the order of the differences from large to small, the depth image pair with the largest difference and the second preset number (for convenience of description, denoted as n2, for example, the value of n2 may be 10000) is used as the target depth image pair.

Assuming that the number N of depth images acquired by the image acquisition device is 600, the total number N is calculated

Disparity to depth images. In the embodiment of the present application, all depth image pairs are sorted in the order of decreasing difference, and the top n2 pairs are taken as target depth image pairs.

Step S34: and calculating to obtain new face point cloud data by utilizing the point cloud data of the two frames of depth images in the target depth image pair corresponding to each target depth image pair.

Optionally, the mean value of the coordinates of the points in the target depth image pair having the corresponding relationship may be used as the coordinates of the points in the new face point cloud data. Can be expressed by the formula:

wherein,

representing new face point cloud data.

Step S35: and for each new face point cloud data, carrying out multiple enhancement treatments on the new face point cloud data to obtain enhanced face point cloud data, and carrying out attitude rotation once and increasing noise once for each enhancement treatment.

At least one of the face pose and the noise (gaussian noise) of the enhanced face point cloud data is changed compared with the face point cloud data before enhancement.

Optionally, data enhancement may be performed in two ways:

1. point cloud data of different poses:

and 3D rotation transformation is carried out on the new face point cloud data according to different set angles by utilizing a pre-constructed 3D rotation matrix, so as to generate point cloud data with different postures. And for each new face point cloud data, the new face point cloud data is subjected to 3D rotation transformation by utilizing different 3D rotation transformation relations, so as to generate point cloud data with different postures. Thus, dozens of or even more point cloud data with different postures can be generated corresponding to each new face point cloud data.

Wherein the angle includes three dimensional components: horizontal angle, pitch angle and tilt angle; wherein, the horizontal value range is between-60 degrees and 60 degrees; the pitch angle ranges from-45 degrees to 45 degrees; the range of the inclination angle is-45 degrees to-45 degrees. Different 3D rotation transformation relations can be obtained through different combinations of three components of a horizontal angle, a pitch angle and a tilt angle.

2. Point cloud data of simulated noise:

in the embodiment of the application, the Gaussian noise is added to the face point cloud data to simulate the real point cloud data, so that the richness of the data can be enhanced. Specifically, gaussian noise may be added to the point cloud data of each pose generated by using the 3D rotation transformation relationship. The gaussian noise is a normal distribution with a zero mean and a variable difference, and the level of the noise can be controlled by adjusting the magnitude of the variance. In an alternative embodiment, the variance is set to 0.06.

The added noise may be noise randomly sampled from gaussian noise.

Step S36: for each enhanced face point cloud data, carrying out normalization processing on the enhanced face point cloud data; and mapping the normalized enhanced human face point cloud data into a planar three-channel image to obtain second type sample data.

The normalization process and the mapping process into a planar three-channel image can be referred to the foregoing embodiments, and will not be described in detail here.

The face recognition model obtained by training based on the first type of sample data and the second type of sample data can resist the images of postures and illumination, and the 3D face recognition capability is improved.

In order to increase the Network operation rate, a lightweight Convolutional Neural Network (CNN) is constructed in the embodiments of the present application. In an alternative embodiment, the face recognition model may be a four-layer convolutional neural network. As shown in fig. 4, a schematic structural diagram of a convolutional neural network provided in an embodiment of the present application may include:

the number of convolution kernels in the four convolution layers is as follows from top to bottom: the first convolutional layer is configured with 32 convolution kernels, the second convolutional layer is configured with 64 convolution kernels, the third convolutional layer is configured with 128 convolution kernels, the fourth convolutional layer is configured with 256 convolution kernels, the sizes of the convolution kernels are unified into 3 × 3, in addition, each convolutional layer also comprises a Batch Normalization (BN) module and an activation function (not shown in the figure), each convolutional layer is followed by a pooling layer, a discarding layer (Dropout layer) is connected after the last pooling layer (namely the fourth pooling layer) to prevent overfitting, and finally a full connection layer and an output layer are connected.

Corresponding to the method embodiment, an embodiment of the present application further provides a face recognition apparatus, and a schematic structural diagram of the face recognition apparatus provided in the embodiment of the present application is shown in fig. 5, and may include:

an acquisition module 51, a correction module 52, a clipping module 53, a normalization module 54, a mapping module 55 and an identification module 56; wherein,

the obtaining module 51 is used for obtaining a depth image;

the correction module 52 is configured to perform pose correction on a face region according to point cloud data of the face region in the depth image;

the cutting module 53 is used for cutting out a target area from the face area after the posture correction;

the normalization module 54 is configured to perform normalization processing on the point cloud data of the target area to obtain normalized point cloud data;

the mapping module 55 is configured to map the normalized point cloud data into a planar three-channel image;

the recognition module 56 is configured to input the planar three-channel image into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained by training a sample data set marked with a recognition tag; the sample data set comprises first sample data of a face region in a depth image acquired by an image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image.

The face recognition device provided by the embodiment of the application performs attitude correction on a face area in a depth map by using point cloud data, cuts out a target area from the corrected face area, normalizes the point cloud data of the target area, maps the normalized point cloud data into a planar three-channel image, and inputs the three-channel image into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained by training a sample data set marked with a recognition label; the sample data set comprises first sample data of a face region in a depth image acquired by an image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image. The second type of sample data of the enhanced face area obtained based on the depth image acquired by the image acquisition device is added in the sample data used for training the face recognition model, so that the number of training samples is increased, and the posture correction is performed on the face in the depth image to be recognized in combination, so that the 3D face recognition precision is improved.

In an alternative embodiment, the calibration module 52 may include:

the detection unit is used for carrying out face detection and key point detection in a face area on the color image corresponding to the depth image to obtain face area coordinates and a plurality of key point coordinates;

the first determining unit is used for determining a face area to be processed and a plurality of key points from the depth image according to the face area coordinates and the plurality of key point coordinates;

the second determining unit is used for determining a posture transformation matrix according to the point cloud data of the plurality of key points and a preset point cloud data template of the plurality of key points;

and the first correction unit is used for carrying out attitude correction on the point cloud data of the human face area to be processed by utilizing the attitude transformation matrix.

In an optional embodiment, the plurality of key points include: two eyeball points, a nose tip and two mouth corner points;

the correction module 52 may further include:

the acquiring unit is used for acquiring the depth value of each pixel in a preset area around the nose cusp;

and the second correction unit is used for taking the median of the depth values of all the pixels in the preset area as the final depth value of the nose tip point.

In an alternative embodiment, the calibration module 52 may further include:

and the deleting unit is used for deleting the pixel points with the depth values smaller than the final depth value of the nose tip point in the depth map and the pixel points with the difference value between the depth value and the final depth value of the nose tip point larger than a preset threshold value.

In an optional embodiment, the face recognition apparatus may further include an enhancement module, configured to:

The face recognition device provided by the embodiment of the application can be applied to electronic equipment, such as a PC terminal, a smart phone, a translator, a robot, a smart home (household appliance), a remote controller, a cloud platform, a server cluster and the like. Alternatively, fig. 6 shows a block diagram of a hardware structure of the electronic device, and referring to fig. 6, the hardware structure of the electronic device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;

in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete mutual communication through the communication bus 4;

the processor 1 may be a central processing unit CPU, or an application specific Integrated circuit asic, or one or more Integrated circuits configured to implement embodiments of the present invention, etc.;

the memory 3 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory;

wherein the memory stores a program and the processor can call the program stored in the memory, the program for:

acquiring a depth image;

cutting out a target area from the face area after the posture correction;

mapping the normalized point cloud data into a planar three-channel image;

Alternatively, the detailed function and the extended function of the program may be as described above.

Embodiments of the present application further provide a storage medium, where a program suitable for execution by a processor may be stored, where the program is configured to:

acquiring a depth image;

cutting out a target area from the face area after the posture correction;

mapping the normalized point cloud data into a planar three-channel image;

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

It should be understood that the technical problems can be solved by combining and combining the features of the embodiments from the claims.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A face recognition method, comprising:

acquiring a depth image;

cutting out a target area from the face area after the posture correction;

mapping the normalized point cloud data into a planar three-channel image;

2. The method of claim 1, wherein the pose correction of the face region according to the point cloud data of the face region in the depth map comprises:

3. The method of claim 2, wherein the plurality of keypoints comprises: two eyeball points, a nose tip and two mouth corner points;

4. The method of claim 3, wherein after determining the final depth value of the nose tip, before performing pose correction on the point cloud data of the face region to be processed by using the pose transformation matrix, the method further comprises:

5. The method of claim 1, wherein the point cloud data of the face region is obtained by using the following formula:

z＝(double)I(i，j)/s

x＝(j-cx)*z/fx

y＝(i-cy)*z/fy

wherein, I represents a frame depth image; i (I, j) refers to the depth value of the (I, j) position in the image matrix; double is used for data format conversion; fx, fy is the focal length of the camera that collects the depth image; cx, cy are principal point-like coordinates; s is a tilt parameter; x, y, z are the point cloud coordinates after transformation.

6. The method of claim 1, wherein the second type of sample data is obtained by:

7. The method of claim 1, wherein the face recognition model is a four-layer convolutional neural network.

8. A face recognition apparatus, comprising:

the acquisition module is used for acquiring a depth image;

9. An electronic device comprising a memory and a processor;

the memory is used for storing programs;

the processor, configured to execute the program, implementing the steps of the face recognition method according to any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the face recognition method according to any one of claims 1 to 7.