CN111091075B

CN111091075B - Face recognition method and device, electronic equipment and storage medium

Info

Publication number: CN111091075B
Application number: CN201911213058.3A
Authority: CN
Inventors: 张彦博; 李骊
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2019-12-02
Filing date: 2019-12-02
Publication date: 2023-09-05
Anticipated expiration: 2039-12-02
Also published as: CN111091075A

Abstract

The embodiment of the application discloses a face recognition method, a device, electronic equipment and a storage medium, wherein the face region in a depth map is subjected to gesture correction by utilizing point cloud data, a target region is cut out from the corrected face region, the point cloud data of the target region is normalized, the normalized point cloud data is mapped into a planar three-channel image, and the three-channel image is input into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained through training of a sample data set marked with a recognition tag; the sample data set comprises first type sample data of a face region in a depth image acquired by the image acquisition device and second type sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a three-channel planar image. And 3D face recognition accuracy is improved.

Description

Face recognition method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of face recognition technologies, and in particular, to a face recognition method, device, electronic apparatus, and storage medium.

Background

Face recognition is widely applied to the fields of identity authentication, information security, criminal investigation, entrance and exit control and the like.

At present, the mainstream face recognition method is a two-dimensional (2D) face recognition method, and although the 2D face recognition method has the characteristics of small operand and quick recognition, the extracted features are limited, so that errors are easy to distinguish in a complex scene. To remedy this deficiency, three-dimensional (3D) face recognition applications have evolved.

However, the inventor researches and discovers that the 3D face recognition method based on the traditional method (the traditional method mainly adopts a data method to extract features from image data and performs face recognition based on the extracted features) only extracts some local features of the face (such as features of eyes, mouth, nose and other areas) to perform face matching, so that people with slightly similar faces can easily judge errors, and the recognition accuracy is low. Although the 3D face recognition method based on the neural network can improve the recognition accuracy, the method is based on the premise of having enough training samples, and the fact is that the current 3D face data is very little, so that the recognition accuracy of the current 3D face recognition method based on the neural network is also lower.

Disclosure of Invention

The application aims to provide a face recognition method, a face recognition device, electronic equipment and a storage medium, so as to improve 3D face recognition precision, and the method specifically comprises the following technical scheme:

a face recognition method, comprising:

acquiring a depth image;

correcting the pose of the face region according to the point cloud data of the face region in the depth image;

cutting out a target area from the face area after posture correction;

carrying out normalization processing on the point cloud data of the target area to obtain normalized point cloud data;

mapping the normalized point cloud data into a planar three-channel image;

inputting the planar three-channel image into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained through training of a sample data set marked with a recognition tag; the sample data set comprises first sample data of a face region in a depth image acquired by the image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image.

In the above method, preferably, the correcting the pose of the face area according to the point cloud data of the face area in the depth map includes:

performing face detection and key point detection in a face region on a color image corresponding to the depth image to obtain face region coordinates and a plurality of key point coordinates;

according to the face region coordinates and the plurality of key point coordinates, determining a face region to be processed and a plurality of key points from the depth image;

determining an attitude transformation matrix according to the point cloud data of the plurality of key points and a preset point cloud data template of the plurality of key points;

and carrying out posture correction on the point cloud data of the face area to be processed by utilizing the posture transformation matrix.

In the above method, preferably, the plurality of key points include: two eyeball points, a nose tip and two mouth corner points;

before determining the gesture transformation matrix according to the point cloud data of the plurality of key points and the preset point cloud data templates of the plurality of key points, the method further comprises the following steps:

acquiring depth values of all pixels in a preset area around the nose point;

and taking the median value of the depth values of all pixels in the preset area as the final depth value of the nose tip point.

In the above method, preferably, after determining the final depth value of the nose tip point, before performing pose correction on the point cloud data of the face area to be processed by using the pose transformation matrix, the method further includes:

and deleting the pixels in the depth map, the depth value of which is smaller than the final depth value of the nose tip point, and the pixels in the depth map, the difference value between the depth value and the final depth value of the nose tip point of which is larger than a preset threshold value.

In the above method, preferably, the point cloud data of the face area is obtained by using the following formula:

z＝(double)I(i，j)/s

x＝(j-cx)*z/fx

y＝(i-cy)*z/fy

wherein I represents a frame of depth image; i (I, j) refers to the depth value of the (I, j) position in the image matrix; double is used for data format conversion; fx, fy is the focal length of the camera that collects the depth map; cx, cy are the image principal point coordinates; s is a tilt parameter; x, y, z are the point cloud coordinates after conversion.

In the above method, preferably, the second type of sample data is obtained by:

capturing a first preset number of point cloud data closest to the nose point from the point cloud data of each frame of depth image acquired by the corresponding image acquisition device by taking the nose point as a center, and taking the point cloud data closest to the nose point as face point cloud data;

For any two frames of depth images acquired by the image acquisition device, calculating the difference of the two frames of depth images according to the intercepted face point cloud data;

taking depth image pairs with the second preset number and the largest difference as target depth image pairs according to the sequence from big to small;

corresponding to each target depth image pair, calculating to obtain new face point cloud data by utilizing the point cloud data of two frames of depth images in the target depth image pair;

for each new face point cloud data, carrying out various enhancement processes on the new face point cloud data to obtain enhanced face point cloud data, and carrying out gesture rotation and noise increase on each enhancement process;

for each enhanced face point cloud data, carrying out normalization processing on the enhanced face point cloud data; and mapping the normalized enhanced face point cloud data into a planar three-channel image to obtain second-class sample data.

In the above method, preferably, the face recognition model is a four-layer convolutional neural network.

A face recognition device, comprising:

the acquisition module is used for acquiring the depth image;

the correction module is used for correcting the pose of the face area according to the point cloud data of the face area in the depth image;

The clipping module is used for clipping a target area from the face area after the posture correction;

the normalization module is used for carrying out normalization processing on the point cloud data of the target area to obtain normalized point cloud data;

the mapping module is used for mapping the normalized point cloud data into a plane three-channel image;

the recognition module is used for inputting the planar three-channel image into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained through training of a sample data set marked with a recognition tag; the sample data set comprises first sample data of a face region in a depth image acquired by the image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image.

An electronic device includes a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the steps of the face recognition method according to any one of the above.

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the face recognition method as claimed in any one of the preceding claims.

According to the face recognition method, the device, the electronic equipment and the storage medium, gesture correction is carried out on a face region in a depth map by utilizing point cloud data, a target region is cut out from the corrected face region, normalization is carried out on the point cloud data of the target region, the normalized point cloud data are mapped into a planar three-channel image, and the three-channel image is input into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained through training of a sample data set marked with a recognition tag; the sample data set comprises first type sample data of a face region in a depth image acquired by the image acquisition device and second type sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a three-channel planar image. As the second type of sample data of the enhanced face region obtained based on the depth image acquired by the image acquisition device is added in the sample data used for training the face recognition model, the number of training samples is increased, and the pose correction is carried out by combining the faces in the depth image to be recognized, so that the 3D face recognition precision is improved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of an implementation of a face recognition method according to an embodiment of the present application;

fig. 2 is a flowchart of an implementation of performing pose correction on a face area according to point cloud data of the face area in a depth map according to an embodiment of the present application;

FIG. 3 is a flowchart of an implementation of obtaining second type sample data for enhancing a face region based on a depth image acquired by an image acquisition device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a face recognition device according to an embodiment of the present application;

fig. 6 is a block diagram of a hardware structure of an electronic device according to an embodiment of the present application.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in other sequences than those illustrated herein.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.

The expression model of the face is divided into a 2D face and a 3D face, the 2D face has the advantages that the realized algorithm is relatively more, a set of more mature flow is provided, the image data acquisition is relatively simple, and only one common camera is needed, so the face recognition based on the 2D image data is the current mainstream, and the method has application in various scenes such as security protection, monitoring, access control, attendance checking, financial identity auxiliary authentication, entertainment and the like. 2D face recognition can be divided into two main categories according to its technological development: traditional face recognition and face recognition based on neural network. The traditional 2D face recognition mainly adopts a mathematical method to extract corresponding features from an image matrix, wherein the features are generally scale-invariant features, and the common algorithms include SURF, SIFT, HARRIS, GFTT and the like; the 2D face recognition based on the neural network has the recognition accuracy rate of 99.80% in various face recognition challenges and on various open source data sets, and the recognition accuracy rate can be even comparable with human beings, but the 2D face recognition based on the neural network is only used as an auxiliary means in a severe financial environment, and other verification means such as inputting mobile phone numbers are needed after face recognition. Because 2D face recognition has a certain limitation, 3D face recognition has been developed to make up for the deficiency. The 2D face recognition only has RGB data, and the 3D face recognition is basically based on point cloud data, and the point cloud data has more depth information, so that a model based on the point cloud data has higher recognition accuracy and higher living body detection accuracy than a model only having RGB data.

The face recognition based on the neural network requires a large amount of data to train both the 2D algorithm and the 3D algorithm, and the currently disclosed 2D face data already reaches the millions level, so that a better face recognition model can be trained. The research of 3D face recognition is still in the development stage, and the data that disclose is few and little, and it takes long time and expensive cost to gather data, based on this, the application proposes, carry on the data enhancement on the data set of disclosure or a small amount of data sets that gather, reach the purpose of expanding data, and then utilize the data training 3D face recognition model after expanding, thus improve the precision of 3D face recognition model. The following describes embodiments of the present application in detail.

Referring to fig. 1, fig. 1 is a flowchart of an implementation of a face recognition method according to an embodiment of the present application, which may include:

step S11: a depth image is acquired.

The depth image is an image to be subjected to 3D face recognition. The depth image may be acquired by the image acquisition device, or the depth image acquired in advance by the image acquisition device may be read from the memory.

Step S12: and correcting the pose of the face region according to the point cloud data of the face region in the depth map.

The depth image is a planar image, i.e., a two-dimensional image. In the embodiment of the application, a face area is determined in a two-dimensional depth image, and the face area is a rectangular area containing a face. And then converting the face region into three-dimensional point cloud data, and correcting the three-dimensional point cloud data in posture to realize the correction of the face region.

Optionally, the point cloud data of the face area in the depth map may be obtained by:

z＝(double)I(i，j)/s

x＝(j-cx)*z/fx

y＝(i-cy)*z/fy

wherein I represents a frame of depth image; i (I, j) refers to the depth value of the (I, j) position in the image matrix; double is used for data format conversion; fx, fy is the focal length of the camera that acquired the depth image, typically fx=fy; cx, cy is the coordinates of the principal point of the image (i.e., the intersection of the photographing center and the perpendicular to the image plane) and the image plane; s is a tilt parameter; x, y, z are the point cloud coordinates after conversion. Here, fx, fy, cx, cy, and s are internal parameters of the image capturing apparatus for capturing the depth image, and the specific values of fx, fy, cx, cy, and s are determined by the image capturing apparatus, and these parameters of different image capturing apparatuses are generally different. For example, the above parameters of a certain image capturing device are configured as follows: fx=474.376, fy=474.376, s=1000, cx=317.301, cy= 245.217.

In order to facilitate the subsequent mapping of the normalized point cloud data into a planar three-channel image, the relative positions of each pixel point in the face region in the depth map may also be recorded in the process of obtaining the point cloud data of the face region. Specifically, for any one pixel in the face region (for convenience of description, the pixel is denoted as a pixel B, and the sitting sign is denoted as (x) _B ，y _B ) The relative position of the pixel point B in the depth map can be determined by (x) _B /n _c ，y _B /n _r ) Characterization, wherein n _c For the length of the depth map, n _r Is the width of the depth map.

Step S13: and cutting out a target area from the face area after the posture correction.

The face region determined in the two-dimensional depth image generally contains more redundant information, and in order to reduce the subsequent data processing amount, in the embodiment of the application, the face region after posture correction is cut. Alternatively, a region of a preset size may be intercepted as a target region centered on the nose tip, the target region being smaller than the face region after posture correction. The inventor researches and discovers that when taking a nose point as a center and taking a 180 multiplied by 180 area as a target area (namely, the length and the width of the target area are 180 pixels), the target area contains main information of a human face, and based on the main information, the human face recognition is performed, so that the recognition accuracy is ensured and the recognition speed is improved.

Specifically, the face region after posture correction may be converted into a depth map, in which a 180×180 region is truncated as a target region with the nose point as the center.

Step S14: and carrying out normalization processing on the point cloud data of the target area to obtain normalized point cloud data.

Optionally, the point cloud data of the target area may be normalized by using maximum and minimum values of coordinates of each dimension of the point cloud data of the face area after the posture correction, and specifically, the point cloud data of the target area may be normalized by using the following formula:

x′＝(2x-x _max -x _min )/(x _max -x _min )

y′＝(2y-y _max -y _min )/(y _max -y _min )

z′＝(2z-z _max -z _min )/(z _max -z _min )

wherein x, y, z are the coordinates of three dimensions before normalization, x ' is the coordinate after normalization of x, y ' is the coordinate after normalization of y, z ' is the coordinate after normalization of z, x _max Maximum value of dimension of x, x _min Is the minimum value of the dimension to which x belongs, y _max Is the maximum value of the dimension of y, y _min Is the minimum value of the dimension to which y belongs, z _max Maximum value of the dimension to which z belongs, z _min Is the minimum value of the dimension to which z belongs.

By the above normalization method, the coordinates of each point in the point cloud data of the target area are normalized to [ -1,1].

Step S15: the normalized point cloud data is mapped into a planar three-channel image (i.e., RGB image).

Alternatively, the normalized point cloud data may be mapped into a planar three-channel image of a target size, where the size of the planar three-channel image is equal to or smaller than the size of the target area. In a preferred embodiment, the size of the planar three-channel image is smaller than the size of the target area.

The normalized point cloud data can be mapped into a three-dimensional image according to the relative position of the pixel point in the depth map, which corresponds to the normalized point cloud data, and the target size of the three-dimensional image.

Specifically, the relative position of the pixel point B in the depth map is used (x _B /n _c ，y _B /n _r ) Characterizing as an example, the relative position of the pixel point B in the depth map is multiplied by the target size of the planar three-channel image, so as to obtain the position of the pixel point B in the planar three-channel image of the target size. Assume that the target size of the planar three-channel image is n' _c ×n′ _r I.e. the target length of the planar three-channel image is n' _c The target width is n' _r . Then, the position of the pixel point B in the plane three-channel image is: (x) _B ×n′ _c /n _c ，y _B ×n′ _r /n _r )。

Suppose that the point cloud coordinate of the pixel point B in the depth map is normalized to be (x' _B ，y′ _B ，z′ _B ) Then in the planar three-channel image (x _B ×n′ _c /n _c ，y _B ×n′ _r /n _r ) At the position, the values of RGB three channels are x 'in sequence' _B ，y′ _B ，z′ _B 。

Step S16: inputting the planar three-channel data obtained by mapping into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained through training of a sample data set marked with a recognition tag; the sample data set comprises first sample data of a face region in a depth image acquired by the image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image.

The identification tag of each sample data is used to identify which face the sample data belongs to. In the embodiment of the application, for a depth image (for convenience of description, denoted as M1) acquired by an image acquisition device, according to point cloud data of a face region in the depth image M1, posture correction is performed on the face region in the depth image M1, a target region is cut out from the face region after the posture correction, normalization processing is performed on the point cloud data of the target region, normalized point cloud data is obtained, and the normalized point cloud data is mapped into a planar three-channel image (i.e., first-class sample data).

The enhanced face region may be enhanced face point cloud data obtained by enhancing point cloud data of a face region in a depth image acquired by an image acquisition device, normalizing the enhanced face point cloud data to obtain normalized point cloud data, and mapping the normalized point cloud data into a three-dimensional image (i.e., second-class sample data).

According to the face recognition method provided by the embodiment of the application, the pose correction is carried out on the face region in the depth map by utilizing the point cloud data, the target region is cut out from the corrected face region, the point cloud data of the target region are normalized, the normalized point cloud data are mapped into a planar three-channel image, and the three-channel image is input into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained through training of a sample data set marked with a recognition tag; the sample data set comprises first type sample data of a face region in a depth image acquired by the image acquisition device and second type sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a three-channel planar image. As the second type of sample data of the enhanced face region obtained based on the depth image acquired by the image acquisition device is added in the sample data used for training the face recognition model, the number of training samples is increased, and the pose correction is carried out by combining the faces in the depth image to be recognized, so that the 3D face recognition precision is improved.

In an alternative embodiment, a flowchart of an implementation of performing pose correction on a face area according to the point cloud data of the face area in the depth map is shown in fig. 2, and may include:

step S21: and carrying out face detection and key point detection in the face region on the color image corresponding to the depth image to obtain face region coordinates and a plurality of key point coordinates.

Optionally, the plurality of keypoints may specifically be 5 keypoints, which specifically may include: two eyeball points, a nose point and two mouth corner points.

The image capture device captures a corresponding color image (i.e., RGB image) while capturing the depth image. In the embodiment of the application, the face detection and the key point detection can be performed by utilizing the color image corresponding to the depth image to determine the face region coordinates and the key point coordinates in the color image, and the face region coordinates in the color image corresponding to the depth image, namely the face region coordinates in the depth image, are the same as the face region coordinates in the corresponding color image.

Alternatively, face detection and key point detection may be performed on the color image using a pre-trained detection model, to obtain face region coordinates and a plurality of key point coordinates.

Step S22: and determining a face region to be processed and a plurality of key points from the depth image according to the face region coordinates and the plurality of key point coordinates, wherein the coordinates of the face region to be processed are the same as the coordinates of the face region detected in the step S21, and the coordinates of the plurality of key points in the depth image are the same as the coordinates of the plurality of key points detected in the step S21.

Step S23, determining an attitude transformation matrix according to the point cloud data of the plurality of key points and a preset point cloud data template of the plurality of key points.

The preset point cloud data templates of the plurality of key points refer to point cloud data of the plurality of key points in the preset standard face region.

In the point cloud data templates of the key points, the relative position relation among the key points is determined, and according to the relative position relation among the key points in the depth image and the relative position relation among the key points in the point cloud data templates, an attitude transformation matrix of the key points can be obtained, wherein the attitude transformation matrix reflects the mapping relation from the point cloud data of a plurality of key points in the depth image to the point cloud data templates.

How to obtain the gesture transformation matrix can refer to the prior art, and is not repeated here. In the embodiment of the application, the gesture transformation matrix of the key points is used as the gesture transformation matrix of the face region.

Step S24: and carrying out posture correction on the point cloud data of the face area to be processed by utilizing the posture transformation matrix.

The process of performing posture correction on point cloud data of a face region to be processed by using a posture transformation matrix can be referred to the prior art, and will not be described in detail here.

The inventor of the present application has studied and found that, due to defects of the apparatus and changes in the environment, nose points are detected in a color image, there is a high possibility that a void exists in a corresponding position in a depth image, i.e., the depth of the nose point is zero, which reduces recognition accuracy, and in order to avoid this, correction can be performed on the nose points before determining a posture transformation matrix from point cloud data of a plurality of key points and a preset point cloud data template of a plurality of key points. In particular, the method comprises the steps of,

in an optional embodiment, before determining the gesture transformation matrix according to the point cloud data of the plurality of key points and the preset point cloud data templates of the plurality of key points, the method may further include:

and obtaining the depth value of each pixel in a preset area around the nose tip point, and taking the average value of the depth values of all the pixels in the preset area as the final depth value of the nose tip point.

The inventors of the present application have studied and found that such a correction method improves the recognition accuracy to some extent, but the improvement effect is still not ideal.

In another optional embodiment, before determining the gesture transformation matrix according to the point cloud data of the plurality of key points and the preset point cloud data templates of the plurality of key points, the method may further include:

and acquiring depth values of all pixels in a preset area around the nose point. For example, depth values for individual pixels in a 10 x 10 region around the nasal tip may be obtained.

Compared with the method that the average value of the depth values of all pixels in the preset area is used as the final depth value of the nose tip, the method has the advantage that the face recognition accuracy is obviously improved by taking the median value of the depth values of all pixels in the preset area as the final depth value of the nose tip.

In order to further improve the recognition accuracy and the recognition speed, in the embodiment of the present application, before performing gesture correction on the point cloud data of the face area to be processed by using the gesture transformation matrix, the method may further include:

and deleting the pixel points in the depth map, the depth value of which is smaller than the final depth value of the nose tip point, and the pixel points in the depth map, the difference value between the depth value and the final depth value of the nose tip point of which is larger than a preset threshold value.

The inventor of the application researches and discovers that the depth of a general human face is within 30cm, so that points which are smaller than the final depth value of a nose tip point in a human face region to be processed in a depth image and points which are larger than the final depth value of the nose tip point and have the difference of more than 30cm with the final depth value of the nose tip point are regarded as redundant information, the points are deleted, and only the rest points in the human face region to be processed in the depth image are subjected to gesture correction, so that the face recognition accuracy is ensured and the data processing speed is improved.

In addition, the edges of the pose-corrected face point cloud data may still have some noisy points that do not belong to points of the face, such points being commonly referred to as outliers. The presence of such points can also reduce recognition accuracy. Therefore, in order to further improve the face recognition accuracy and speed, the outliers in the face region to be processed can be identified, and the outliers are deleted. Specifically, for any point (for convenience of description, point a) in the face region to be processed, counting the number of points in a circular region with the point a as the center and determined by the set radius, and if the number of points in the circular region is smaller than a threshold value, determining the point a as an outlier.

In an alternative embodiment, an implementation flowchart for obtaining the second type of sample data for enhancing the face area based on the depth image acquired by the image acquisition device is shown in fig. 3, and may include:

step S31: for each frame of depth image acquired by the image acquisition device (for convenience of description, the k (k=1, 2, …, N is the number of depth images acquired by the image acquisition device) frame of depth image is denoted as Mk), taking the nasal tip point as the center, and intercepting the first preset number (for convenience of description, denoted as N1, for example, N1 can take a value of 10000) of point cloud data closest to the nasal tip point from the point cloud data of the frame of depth image Mk as face point cloud data.

In the embodiment of the application, the distance between each point in the depth image Mk and the nose tip point is calculated, each point in the Mk is sequenced according to the sequence from the big distance to the nose tip point, and the first n1 points are taken as the point cloud data (Fk) of the human face in the depth image Mk. Wherein Fk= [ x _p ,y _p ,z _p ] ^T ；x _p ，y _p ，z _p The coordinates of the p-th point in the face point cloud data Fk; p=1, 2, …, P is the number of points in the face point cloud data Fk.

Step S32: for any two frames of depth images (i-th frame of depth image Mi and j-th frame of depth image Mj for convenience of description) acquired by the image acquisition device, the difference of the two frames of depth images is calculated according to the face point cloud data intercepted in the step S31.

After the face point cloud data Fi of the ith frame depth image Mi and the face point cloud data Fj of the jth frame depth image Mj are determined, a dense corresponding relation between the two face point cloud data is established, namely, for each point on the face point cloud data Fi, one point corresponds to the face point cloud data Fj, and the two points represent the same object on the face point cloud.

Alternatively, the bending energy value can be obtained by solving the equation set by taking the minimum bending energy in the process of completely matching corresponding points in the two face point cloud data as a target through a thin plate spline interpolation function and a constraint function, and the specific calculation process can refer to the prior art and is not described in detail herein. And the thin plate spline interpolation function determined when the bending energy is minimum is the mapping relation from one face point cloud data to another face point cloud data, namely the dense corresponding relation between the points in the face point cloud data and the points of the other face point cloud data.

The application is trueIn the embodiment, the bending energy when the dense correspondence between the two face point cloud data is established is calculated from two directions, namely: calculating bending energy gamma from Fi deformation to Fj _ij And bending energy γ from Fj deformation to Fi _ji 。

Will bend energy gamma _ij And bending energy gamma _ji As the difference between the i-th frame depth image Mi and the j-th frame depth image Mj.

Step S33: and taking a second preset number (for convenience of description, n2 is recorded as n2, for example, the value of n2 can be 10000) of depth image pairs with the largest difference as target depth image pairs according to the sequence of the difference from large to small.

Assuming that the number of depth images acquired by the image acquisition device n=600, a total ofDifferences to the depth image. In the embodiment of the application, all depth image pairs are ordered according to the order of the difference from big to small, and the first n2 pairs are taken as target depth image pairs.

Step S34: and corresponding to each target depth image pair, calculating to obtain new face point cloud data by utilizing the point cloud data of the two frames of depth images in the target depth image pair.

Alternatively, the average value of the coordinates of the points with the corresponding relationship in the target depth image pair may be used as the coordinates of the points in the new face point cloud data. The formula can be expressed as:

Wherein, the liquid crystal display device comprises a liquid crystal display device,representing new face point cloud data.

Step S35: and carrying out various enhancement processes on the new face point cloud data to obtain enhanced face point cloud data, wherein each enhancement process carries out one-time gesture rotation and increases noise.

The enhanced face point cloud data has at least one of face pose and noise (gaussian noise) changed as compared to the face point cloud data before enhancement.

Alternatively, data enhancement may be performed in two ways:

1. point cloud data of different poses:

and 3D rotation transformation is carried out on the new face point cloud data according to different angles by utilizing a pre-constructed 3D rotation matrix, so as to generate point cloud data with different postures. And for each new face point cloud data, respectively carrying out 3D rotation transformation on the new face point cloud data by using different 3D rotation transformation relations to generate point cloud data with different postures. In this way, point cloud data of tens or even more different poses can be generated corresponding to each new face point cloud data.

Wherein the angle comprises three dimensional components: horizontal, pitch and tilt angles; wherein the value range of the horizontal comparison is between-60 degrees and 60 degrees; the pitch angle is within the range of-45 degrees to 45 degrees; the value range of the inclination angle is between-45 degrees and 45 degrees. Different 3D rotation transformation relations can be obtained through different combinations of three components of a horizontal angle, a pitch angle and a tilt angle.

2. Point cloud data simulating noise:

in the embodiment of the application, the real point cloud data is simulated by adding the Gaussian noise to the face point cloud data, so that the richness of the data can be enhanced. Specifically, gaussian noise may be added to the point cloud data corresponding to each gesture generated using the 3D rotational transformation relationship, respectively. Gaussian noise is a normal distribution with a mean of zero and a variable difference, and the level of noise can be controlled by adjusting the variance. In an alternative embodiment, the variance is set to 0.06.

The added noise may be noise obtained by randomly sampling gaussian noise.

Step S36: for each enhanced face point cloud data, carrying out normalization processing on the enhanced face point cloud data; and mapping the normalized enhanced face point cloud data into a planar three-channel image to obtain second-class sample data.

The normalization process and the mapping process into a planar three-channel image can be referred to the previous embodiments, and will not be described in detail here.

The face recognition model obtained based on the training of the first type sample data and the second type sample data can resist the gesture and illumination images, and the 3D face recognition capability is improved.

To increase the network operating rate, embodiments of the present application construct a lightweight convolutional neural network (Convolutional Neural Network, CNN). In an alternative embodiment, the face recognition model may be a four-layer convolutional neural network. As shown in fig. 4, a schematic structural diagram of a convolutional neural network according to an embodiment of the present application may include:

the number of convolution kernels in the four convolution layers is as follows from top to bottom: the first convolution layer is provided with 32 convolution kernels, the second convolution layer is provided with 64 convolution kernels, the third convolution layer is provided with 128 convolution kernels, the fourth convolution layer is provided with 256 convolution kernels, the sizes of the convolution kernels are unified to be 3 x 3, in addition, each convolution layer further comprises a batch normalization (batch normlization, BN) module and an activation function (not shown in the figure), each convolution layer is connected with a pooling layer, a discard layer (Dropout layer) is connected with the last pooling layer (namely, the fourth pooling layer) to prevent overfitting, and finally the full connection layer and the output layer are connected.

Corresponding to the method embodiment, the embodiment of the application also provides a face recognition device, and a structural schematic diagram of the face recognition device provided by the embodiment of the application is shown in fig. 5, which may include:

the system comprises an acquisition module 51, a correction module 52, a clipping module 53, a normalization module 54, a mapping module 55 and an identification module 56; wherein, the liquid crystal display device comprises a liquid crystal display device,

the acquisition module 51 is used for acquiring a depth image;

the correction module 52 is configured to perform pose correction on a face area in the depth image according to point cloud data of the face area;

the clipping module 53 is configured to clip a target area from the face area after the posture correction;

the normalization module 54 is configured to normalize the point cloud data of the target area to obtain normalized point cloud data;

the mapping module 55 is configured to map the normalized point cloud data into a three-way planar image;

the recognition module 56 is configured to input the planar three-channel image into a pre-trained face recognition model, so as to obtain a face recognition result; the face recognition model is obtained through training of a sample data set marked with a recognition tag; the sample data set comprises first sample data of a face region in a depth image acquired by the image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image.

According to the face recognition device provided by the embodiment of the application, the pose correction is carried out on the face region in the depth map by utilizing the point cloud data, the target region is cut out from the corrected face region, the point cloud data of the target region are normalized, the normalized point cloud data are mapped into a planar three-channel image, and the three-channel image is input into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained through training of a sample data set marked with a recognition tag; the sample data set comprises first type sample data of a face region in a depth image acquired by the image acquisition device and second type sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a three-channel planar image. As the second type of sample data of the enhanced face region obtained based on the depth image acquired by the image acquisition device is added in the sample data used for training the face recognition model, the number of training samples is increased, and the pose correction is carried out by combining the faces in the depth image to be recognized, so that the 3D face recognition precision is improved.

In an alternative embodiment, correction module 52 may include:

the detection unit is used for carrying out face detection on the color image corresponding to the depth image and key point detection in the face area to obtain face area coordinates and a plurality of key point coordinates;

the first determining unit is used for determining a face area to be processed and a plurality of key points from the depth image according to the face area coordinates and the plurality of key point coordinates;

the second determining unit is used for determining an attitude transformation matrix according to the point cloud data of the plurality of key points and a preset point cloud data template of the plurality of key points;

and the first correction unit is used for carrying out posture correction on the point cloud data of the face area to be processed by utilizing the posture transformation matrix.

In an alternative embodiment, the plurality of keypoints includes: two eyeball points, a nose tip and two mouth corner points;

the correction module 52 may further include:

the acquisition unit is used for acquiring depth values of all pixels in a preset area around the nose point;

and the second correction unit is used for taking the median value of the depth values of all pixels in the preset area as the final depth value of the nose tip point.

In an alternative embodiment, the correction module 52 may further include:

and the deleting unit is used for deleting the pixel points of which the depth values are smaller than the final depth value of the nose tip point in the depth map and the pixel points of which the difference value between the depth values and the final depth value of the nose tip point is larger than a preset threshold value.

In an alternative embodiment, the face recognition device may further include an enhancement module for:

The face recognition device provided by the embodiment of the application can be applied to electronic equipment such as PC terminals, smart phones, translation machines, robots, intelligent home (household appliances), remote controllers, cloud platforms, servers, server clusters and the like. Alternatively, fig. 6 shows a block diagram of a hardware structure of the electronic device, and referring to fig. 6, the hardware structure of the electronic device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;

in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete the communication with each other through the communication bus 4;

processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application, etc.;

The memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;

wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to:

acquiring a depth image;

cutting out a target area from the face area after posture correction;

mapping the normalized point cloud data into a planar three-channel image;

Alternatively, the refinement function and the extension function of the program may be described with reference to the above.

The embodiment of the present application also provides a storage medium storing a program adapted to be executed by a processor, the program being configured to:

acquiring a depth image;

cutting out a target area from the face area after posture correction;

mapping the normalized point cloud data into a planar three-channel image;

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

It should be understood that in the embodiments of the present application, the claims, the various embodiments, and the features may be combined with each other, so as to solve the foregoing technical problems.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A face recognition method, comprising:

acquiring a depth image;

cutting out a target area from the face area after posture correction;

mapping the normalized point cloud data into a planar three-channel image;

inputting the planar three-channel image into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained through training of a sample data set marked with a recognition tag; the sample data set comprises first sample data of a face region in a depth image acquired by an image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image;

The second type of sample data is obtained by the following method:

for any two frames of depth images acquired by the image acquisition device, calculating the difference of the two frames of depth images according to the intercepted face point cloud data, wherein the method comprises the following steps: determining face point cloud data Fi of an ith frame depth image Mi and face point cloud data Fj of a jth frame depth image Mj, establishing a dense corresponding relation between the two face point cloud data, and calculating bending energy gamma from Fi deformation to Fj _ij And bending energy γ from Fj deformation to Fi _ji Will bend the energy gamma _ij And bending energy gamma _ji The mean value of (a) is used as the difference between the ith frame depth image Mi and the jth frame depth image Mj;

2. The method according to claim 1, wherein said performing pose correction on the face region according to the point cloud data of the face region in the depth map comprises:

3. The method of claim 2, wherein the plurality of keypoints comprises: two eyeball points, a nose point and two mouth corner points;

acquiring depth values of all pixels in a preset area around the nose point;

4. A method according to claim 3, wherein after determining the final depth value of the nose tip point, prior to performing pose correction on the point cloud data of the face region to be processed using the pose transformation matrix, further comprising:

5. The method of claim 1, wherein the point cloud data of the face region is obtained using the following formula:

z＝(double)I(i，j)/s

x＝(j-cx)*z/fx

y＝(i-cy)*z/fy

wherein I represents a frame of depth image; i (I, j) refers to the depth value of the (I, j) position in the image matrix; double is used for data format conversion; fx, fy is the focal length of the camera that acquired the depth image; cx, cy are the image principal point coordinates; s is a tilt parameter; x, y, z are the point cloud coordinates after conversion.

6. The method of claim 1, wherein the face recognition model is a four-layer convolutional neural network.

7. A face recognition device, comprising:

the acquisition module is used for acquiring the depth image;

the recognition module is used for inputting the planar three-channel image into a pre-trained face recognition model to obtain a face recognition result; the face recognition model is obtained through training of a sample data set marked with a recognition tag; the sample data set comprises first sample data of a face region in a depth image acquired by an image acquisition device and second sample data of an enhanced face region obtained based on the depth image acquired by the image acquisition device, and each sample data is a planar three-channel image;

The second type of sample data is obtained by the following method:

8. An electronic device comprising a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the steps of the face recognition method according to any one of claims 1-6.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the face recognition method according to any one of claims 1-6.