CN113780095B

CN113780095B - Training data expansion method, terminal equipment and medium of face recognition model

Info

Publication number: CN113780095B
Application number: CN202110943769.7A
Authority: CN
Inventors: 程耀; 贺菁菁
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Priority date: 2021-08-17
Filing date: 2021-08-17
Publication date: 2023-12-26
Anticipated expiration: 2041-08-17
Also published as: CN113780095A

Abstract

The invention discloses a training data expansion method of a face recognition model, terminal equipment and a computer readable storage medium. The method comprises the following steps: acquiring a two-dimensional face image corresponding to a target object, and constructing a three-dimensional face model corresponding to the target object according to the two-dimensional face image; performing position conversion on the three-dimensional face model based on a rotation matrix, and acquiring face images of different angles corresponding to the target object based on the three-dimensional face model after position conversion; and expanding a training data set of the face recognition model based on the face images of the different angles. The invention aims to achieve the effect of reducing the cost of face data in a training data set of an extended face recognition model.

Description

Training data expansion method, terminal equipment and medium of face recognition model

Technical Field

The present invention relates to the field of face recognition technologies, and in particular, to a training data expansion method for a face recognition model, a terminal device, and a computer readable storage medium.

Background

Face recognition is a biological recognition technology for carrying out identity authentication based on inherent physiological characteristic information of a face area, can quickly locate the face position and recognize the identity of a person in a long-distance and non-contact way, and is widely applied to the fields of payment, security protection and the like.

With the rapid development of deep learning technology, face recognition technology based on convolutional neural network (CNN, convolutional Neural Networks) is the most mainstream and efficient method. The processing flow is that a face detection technology is used for locating a face area and key points, then affine transformation based on the key point positions is used for realizing face alignment, and images are unified to a fixed size. And inputting the face image after the alignment into a CNN network to extract the characteristics, using a group of floating points as unique identification of the face, finally calculating the cosine distance or Euclidean distance between floating point arrays of different faces to obtain the similarity, and comparing the similarity with a preset threshold value to obtain the result.

However, because the face recognition scene is complex and various, the face recognition method is easy to be interfered by various factors such as shielding, angles and expressions, and a model with higher accuracy can be obtained only by large-scale training data based on the face recognition method of deep learning. Especially for face recognition models, in order to enlarge the inter-class distance and reduce the intra-class distance, the same person ID needs more images with different angles and expressions to achieve the models with better generalization capability and higher accuracy. In the related art, when face data is acquired, the face data is generally acquired through a public data set. Since many data sets are paid for in use, and for asian face data, the disclosed data sets are limited, and large-scale data acquisition is difficult due to data privacy. This results in a large cost of acquiring face data.

The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.

Disclosure of Invention

The invention mainly aims to provide a training data expansion method, terminal equipment and a computer readable storage medium of a face recognition model, aiming at achieving the effect of reducing the cost of face data in a training data set of the face recognition model expansion.

In order to achieve the above object, the present invention provides a training data expansion method for a face recognition model, the training data expansion method for a face recognition model comprising the steps of:

acquiring a two-dimensional face image corresponding to a target object, and constructing a three-dimensional face model corresponding to the target object according to the two-dimensional face image;

performing position conversion on the three-dimensional face model based on a rotation matrix, and acquiring face images of different angles corresponding to the target object based on the three-dimensional face model after position conversion;

and expanding a training data set of the face recognition model based on the face images of the different angles.

Optionally, after the step of expanding the training data set of the face recognition model based on the face images of different angles, the method further includes:

training the face recognition model based on the extended training data set;

and carrying out face recognition on the face image to be recognized through the trained face recognition model.

Optionally, the step of performing face recognition on the image to be recognized through the trained face recognition model includes:

the trained face recognition model determines first face features corresponding to the target object and second face features corresponding to the face image to be recognized based on the training data set;

determining a cosine distance between the first face feature and the second face feature;

and determining whether the object to be identified corresponding to the face image to be identified is the target object according to the cosine distance.

Optionally, the step of determining whether the object to be identified corresponding to the face image to be identified is the target object according to the cosine distance includes:

determining a generalization coefficient and an identification threshold;

determining a similarity score between the first face feature and the second face feature according to the cosine distance, the generalization coefficient and the recognition threshold;

and determining whether the object to be identified corresponding to the face image to be identified is the target object according to the similarity score.

Optionally, when the number of face images corresponding to the target object in the training data set is smaller than a preset threshold, executing the step of performing position conversion on the three-dimensional face model based on the rotation matrix, and acquiring face images of different angles corresponding to the target object based on the three-dimensional face model after position conversion.

Optionally, when the position of the three-dimensional face model is converted based on the rotation matrix, the rotation angle is determined according to a face angle rule and the number of face images.

Optionally, the step of obtaining a two-dimensional face image corresponding to the target object and constructing a three-dimensional face model corresponding to the target object according to the two-dimensional face image includes:

acquiring the two-dimensional face image comprising the front face of the target object;

and carrying out normalization processing on the two-dimensional face image, and constructing the three-dimensional face model based on the two-dimensional face image after normalization processing.

In addition, in order to achieve the above object, the present invention also provides a terminal device, which includes a memory, a processor, and a training data expansion program stored on the memory and executable on the processor, wherein the training data expansion program, when executed by the processor, implements the steps of the training data expansion method of the face recognition model as described above.

In addition, to achieve the above object, the present invention also provides a terminal device including:

the acquisition module acquires a two-dimensional face image corresponding to a target object, and constructs a three-dimensional face model corresponding to the target object according to the two-dimensional face image;

the acquisition module is used for carrying out position conversion on the three-dimensional face model based on the rotation matrix and acquiring face images of a plurality of different angles corresponding to the target object based on the three-dimensional face model;

and the expansion module expands a training data set of the face recognition model based on the face images of the plurality of different angles.

In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a training data expansion program which, when executed by a processor, implements the steps of the training data expansion method of the face recognition model as described above.

According to the training data expansion method, the terminal equipment and the computer readable storage medium of the face recognition model, two-dimensional face images corresponding to a target object are obtained, a three-dimensional face model corresponding to the target object is constructed according to the two-dimensional face images, position conversion is conducted on the three-dimensional face model based on a rotation matrix, face images of different angles corresponding to the target object are collected based on the three-dimensional face model after the position conversion, and a training data set of the face recognition model is expanded based on the face images of different angles. The face images of different angles, which are used for training and correspond to the target object, can be obtained based on the 2D face image of the target object. The training data is not required to be acquired through external data, so that the effect of reducing the cost of the face data in the training data set of the extended face recognition model is achieved. And the training data set of the face recognition is expanded, the face data sets of various angles are added, the accuracy of the face recognition model can be effectively improved, and the generalization capability of the face recognition model in various scenes is enhanced.

Drawings

FIG. 1 is a schematic diagram of a terminal structure of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an embodiment of a training data expansion method of a face recognition model according to the present invention;

FIG. 3 is a flow chart of an alternative implementation of an embodiment of the present invention;

fig. 4 is a schematic diagram of modularization of a terminal device according to an embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, fig. 1 is a schematic diagram of a terminal structure of a hardware running environment according to an embodiment of the present invention.

As shown in fig. 1, the control terminal may include: a processor 1001, such as a CPU, a network interface 1003, memory 1004, and a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The network interface 1003 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1004 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the terminal structure shown in fig. 1 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

As shown in fig. 1, an operating system, a network communication module, and a training data expansion program may be included in the memory 1004, which is a type of computer storage medium.

In the terminal shown in fig. 1, the processor 1001 may be configured to call a training data extension program stored in the memory 1004, and perform the following operations:

Further, the processor 1001 may call the training data expansion program stored in the memory 1004, and further perform the following operations:

training the face recognition model based on the extended training data set;

determining a generalization coefficient and an identification threshold;

In order to reduce the cost of face data in a training data set of an extended face recognition model, the embodiment of the invention provides a training data extension method of the face recognition model, which comprises the steps of firstly constructing a three-dimensional face model through two-dimensional face images, then carrying out position conversion on the three-dimensional face model based on a rotation matrix, and acquiring face images with different angles in the position conversion process so as to extend the training data set of the face recognition model through the acquired face images with different angles.

The training data expansion method of the face recognition model provided by the invention is further explained by a specific embodiment.

In an embodiment, referring to fig. 2, the training data expansion method of the face recognition model includes the following steps:

s10, acquiring a two-dimensional face image corresponding to a target object, and constructing a three-dimensional face model corresponding to the target object according to the two-dimensional face image;

step S20, performing position conversion on the three-dimensional face model based on a rotation matrix, and acquiring face images of different angles corresponding to the target object based on the three-dimensional face model after position conversion;

and step S30, expanding a training data set of the face recognition model based on the face images of different angles.

In this embodiment, a three-dimensional (3D) face reconstruction model may be constructed using a CNN deep learning model. And then after the two-dimensional face image corresponding to the target object is obtained, inputting the two-dimensional face image into the 3D face reconstruction model so as to construct a three-dimensional face model corresponding to the target object based on the two-dimensional face image through the 3D face reconstruction model. For example, the purpose of constructing a three-dimensional face model corresponding to the target object based on a two-dimensional face image can be achieved based on a PRNet model of the UV position map. The texture color of the triangle area of the constructed 3D face model is the average value of the colors of three vertexes of the triangle area.

Optionally, in a specific embodiment, the two-dimensional face image may be pre-selected first. For example, a two-dimensional face image of a target object may be acquired first, and then a face region in the two-dimensional face image may be located using a face detection technique. And further, based on the face region, determining and obtaining a posture angle value by using a face key point positioning algorithm. Therefore, the face with the overlarge angle is screened out, and the two-dimensional (2D, 2 dimension) face image for reconstruction is ensured to comprise the front face of the target object so as not to influence the effect of the reconstruction of the subsequent face of the model. Optionally, the values of the face angles pitch, yaw, roll are all set to be less than 5 degrees, and if Q represents whether reconstruction is performed (1: yes, 0: no), then:

alternatively, as an alternative embodiment, the normalization processing may be further performed on the 2D face image used to construct the 3D face model. For example, it may be processed into a uniform size (m×n×3) 2D face image to improve the accuracy of the 3D face model.

Further, after the 3D face model is generated, the rotation matrix R may be used to perform position conversion on the 3D face model. And acquiring face images of different angles corresponding to the target object based on the three-dimensional face model after position conversion.

For example, in order to ensure that the number of person IDs used for face recognition is balanced, so as to improve the accuracy and generalization capability of the model, the step of performing position conversion on the three-dimensional face model based on the rotation matrix and acquiring face images of different angles corresponding to the target object based on the three-dimensional face model after position conversion may be performed when the number of face images corresponding to the target object in the training data set is smaller than a preset threshold.

Optionally, in a specific implementation scenario, the position conversion is performed on the reconstructed face by using the rotation matrix R, so as to obtain a rotated face area, thereby realizing collection of multi-angle face data. The calculation formula of the rotated face coordinate V_T is as follows:

V_T＝V×R ^T

where r=rz× (ry×rx), R ^T And expressing the transposition of R, and respectively expressing the radian value of the space coordinate which needs to be rotated by using x, y and z, wherein:

Rx＝[[1,0,0],[0,cos(x),sin(x)],[0,sin(x),cos(x)]])

in the same way, rz and Ry can also be determined based on the determination of Rx.

In addition, in order to ensure that the number of face images used for face recognition is balanced so as to improve the accuracy and generalization capability of the model, the method comprises the steps of adjusting the angle delta in the process of carrying out position conversion on the three-dimensional face model based on a rotation matrix after a 3D face model is constructed, and determining the number C of the existing face images corresponding to a target object _2d C conforming to face angle rule _q And (3) jointly determining:

wherein floor () represents a downward rounding, cth represents the maximum number of face images corresponding to each target object when the floor () is used for face recognition training. It may be custom set, for example, set to 50;represents the maximum value of rotation in three directions, n represents the dimension of rotation. If only pitch direction rotation is performed, n=1, and if (roll) direction rotation is performed, n=2. Since face recognition requires correction, only rotation in the left-right and pitch directions is performed, and the maximum value of the left-right rotation angle is 90 and the pitch is 30.

It should be noted that, the step S20 may be repeated multiple times to obtain face images of multiple different angles corresponding to the target object. And until the number of the face images corresponding to the target object reaches the maximum number in the training data set of the face recognition model. It can be understood that different objects can be set as target objects, so that each object contains the maximum number of face image data in the training dataset of the face recognition model. Thereby achieving the purpose of expanding the training data set of the face recognition model.

The face images of different angles, which are used for training and correspond to the target object, can be obtained based on the 2D face image of the target object. The training data is not required to be acquired through external data, so that the effect of reducing the cost of the face data in the training data set of the extended face recognition model is achieved.

Optionally, referring to fig. 3, in an embodiment, after step S30, the method further includes:

step S40, training the face recognition model based on the extended training data set;

and S50, performing face recognition on the image to be recognized through the trained face recognition model.

In this embodiment, the face images in the training data set of the extended face recognition model may be preprocessed by using a color conversion method, a random clipping method, or the like, and then the face images in the training data set of the preprocessed face recognition model are input into a CNN deep neural network with the 2D image as an input to perform training. For example, the feature extraction network employed may be the restnet100, iteratively trained using the arclos function.

Further, the trained face recognition model determines a first face feature corresponding to the target object and a second face feature corresponding to the face image to be recognized based on the training data set, then determines a cosine distance between the first face feature and the second face feature, and determines whether the object to be recognized corresponding to the face image to be recognized is the target object according to the cosine distance.

For example, for an unknown face image (to-be-recognized face image), the unique feature, namely the second face feature, of the face information of the object corresponding to the to-be-recognized face image can be extracted through the trained face recognition model. The trained face recognition model can also determine first face features corresponding to the target object based on the training data set. And then determining the cosine distance between the two face features, and determining the similarity alpha between the first face feature and the second face feature according to the cosine distance. Alternatively, α may be in the range of [ -1,1 ].

In one embodiment, since the value of α is greatly disturbed by training data and models, if the data size is too large, the value score of α is pulled down as a whole. And the choice of whether the threshold T (i.e. the recognition threshold) is similar is also different on different test data sets. Therefore, the adjusted similarity score can be obtained by normalizing and adjusting the value of alpha

Wherein, beta is a generalization coefficient and takes on a floating point number larger than 1. For example, 10 may be set. T is the optimal threshold value selected from the test set. The final product isTake the value of 0,1]And the score threshold after transformation was 0.5 in the different test data sets. And when the similarity score is larger than a score threshold, determining that the object to be identified corresponding to the face image to be identified is the target object. Otherwise, judging and determining that the object to be identified corresponding to the face image to be identified is not the target object. Thus, through adjustment of beta, the point of similarity near the threshold value is pulled up, so that the sample score below the threshold value is lower, the sample score above the threshold value is higher, the partition is pulled apart, and the adaptability of the model is improved.

In the technical scheme disclosed in the embodiment, a three-dimensional face model corresponding to a target object is constructed according to the two-dimensional face image by acquiring the two-dimensional face image corresponding to the target object, position conversion is performed on the three-dimensional face model based on a rotation matrix, face images of different angles corresponding to the target object are acquired based on the three-dimensional face model after position conversion, and a training dataset of the face recognition model is expanded based on the face images of different angles. The face images of different angles, which are used for training and correspond to the target object, can be obtained based on the 2D face image of the target object. The training data is not required to be acquired through external data, so that the effect of reducing the cost of the face data in the training data set of the extended face recognition model is achieved. And the training data set of the face recognition is expanded, the face data sets of various angles are added, the accuracy of the face recognition model can be effectively improved, and the generalization capability of the face recognition model in various scenes is enhanced. In the test stage, the method corresponds to the parameter regression method during training, and the similarity value is adjusted by using a space transformation function, so that the generalization capability of the face recognition model on the test set is stronger.

In addition, the embodiment of the invention also provides a terminal device, which comprises: the training data expansion program comprises a memory, a processor and a training data expansion program which is stored in the memory and can run on the processor, wherein the training data expansion program realizes the steps of the training data expansion method of the face recognition model according to the various embodiments when the training data expansion program is executed by the processor.

In addition, referring to fig. 4, an embodiment of the present invention further proposes a terminal device 100, where the terminal device 100 includes:

the acquisition module 101 acquires a two-dimensional face image corresponding to a target object, and constructs a three-dimensional face model corresponding to the target object according to the two-dimensional face image;

the acquisition module 102 performs position conversion on the three-dimensional face model based on a rotation matrix, and acquires face images of a plurality of different angles corresponding to the target object based on the three-dimensional face model;

the expansion module 103 expands the training data set of the face recognition model based on the face images of the plurality of different angles.

In addition, the embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a training data expansion program, and the training data expansion program realizes the steps of the training data expansion method of the face recognition model in each embodiment when being executed by a processor.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (e.g. a PC or server) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. The training data expansion method of the face recognition model is characterized by comprising the following steps of:

acquiring a two-dimensional face image corresponding to a target object, wherein a face region in the two-dimensional face image is positioned by a face detection technology, and a posture angle value of the face region is confirmed based on a face key point positioning algorithm so as to reject the two-dimensional face image with an oversized angle;

constructing a three-dimensional face model corresponding to the target object according to the two-dimensional face image;

performing position conversion on the three-dimensional face model based on a rotation matrix, and acquiring face images of different angles corresponding to the target object based on the three-dimensional face model after the position conversion, wherein a face coordinate formula of the three-dimensional face model after the position conversion is as follows: v_t=v×r ^T Wherein r=r _z × (R _y × R _x ) The R is ^T R is a transposition of R, wherein the R is a rotation matrix, x, y and z respectively represent space coordinate radian values required to be rotated, and after a three-dimensional face model is constructed, when the three-dimensional face model is subjected to position conversion based on the rotation matrix, the angle delta of the three-dimensional face model is adjusted to determine rotation parameters of the position conversion of the three-dimensional face model according to the number C2d of the existing face images corresponding to a target object and the face angle rule Cq;

expanding a training data set of a face recognition model based on the face images of different angles, wherein the target object comprises the largest number of face image data in the training data by acquiring the face images of different angles corresponding to the target object for multiple times;

training the face recognition model based on the extended training data set;

performing face recognition on the face image to be recognized through the trained face recognition model;

the step of performing face recognition on the face image to be recognized through the trained face recognition model comprises the following steps:

determining whether the object to be identified corresponding to the face image to be identified is the target object according to the cosine distance;

the step of determining whether the object to be identified corresponding to the face image to be identified is the target object according to the cosine distance includes:

determining a generalization coefficient and an identification threshold;

2. The method for extending training data of a face recognition model according to claim 1, wherein when the number of face images corresponding to the target object in the training dataset is smaller than a preset threshold, the step of performing position conversion on the three-dimensional face model based on a rotation matrix and acquiring face images of different angles corresponding to the target object based on the three-dimensional face model after position conversion is performed.

3. The training data expansion method of a face recognition model according to claim 1, wherein the step of acquiring a two-dimensional face image corresponding to a target object and constructing a three-dimensional face model corresponding to the target object from the two-dimensional face image comprises:

4. A terminal device, characterized in that the terminal device comprises: a memory, a processor and a training data expansion program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the training data expansion method of a face recognition model according to any one of claims 1 to 3.

5. A terminal device, characterized in that the terminal device comprises:

the system comprises an acquisition module, a recognition module and a display module, wherein the acquisition module is used for acquiring a two-dimensional face image corresponding to a target object, positioning a face area in the two-dimensional face image through a face detection technology, and confirming a posture angle value of the face area based on a face key point positioning algorithm so as to eliminate the two-dimensional face image with an oversized angle; constructing a three-dimensional face model corresponding to the target object according to the two-dimensional face image;

the acquisition module is used for carrying out position conversion on the three-dimensional face model based on the rotation matrix, acquiring face images of different angles corresponding to the target object based on the three-dimensional face model after the position conversion, and the face coordinate formula of the three-dimensional face model after the position conversion is as follows: v_t=v×r ^T Wherein r=r _z × (R _y × R _x ) The R is ^T R is a transposition of R, wherein the R is a rotation matrix, x, y and z respectively represent space coordinate radian values required to be rotated, and after a three-dimensional face model is constructed, when the three-dimensional face model is subjected to position conversion based on the rotation matrix, the angle delta of the three-dimensional face model is adjusted to determine rotation parameters of the position conversion of the three-dimensional face model according to the number C2d of the existing face images corresponding to a target object and the face angle rule Cq;

the expansion module is used for expanding a training data set of the face recognition model based on the face images with different angles, wherein the target object comprises the largest number of face image data in the training data by acquiring the face images with different angles corresponding to the target object for multiple times; training the face recognition model based on the extended training data set; performing face recognition on face images to be recognized through the trained face recognition model, wherein the target object comprises the largest number of face image data in the training data through collecting face images of different angles corresponding to the target object for multiple times; training the face recognition model based on the extended training data set;

the expansion module is further used for determining a first face feature corresponding to the target object and a second face feature corresponding to the face image to be identified based on the training data set by the trained face recognition model; determining a cosine distance between the first face feature and the second face feature; determining whether the object to be identified corresponding to the face image to be identified is the target object according to the cosine distance; the step of determining whether the object to be identified corresponding to the face image to be identified is the target object according to the cosine distance includes: determining a generalization coefficient and an identification threshold; determining a similarity score between the first face feature and the second face feature according to the cosine distance, the generalization coefficient and the recognition threshold; and determining whether the object to be identified corresponding to the face image to be identified is the target object according to the similarity score.

6. A computer-readable storage medium, wherein a training data expansion program is stored on the computer-readable storage medium, which when executed by a processor, implements the steps of the training data expansion method of the face recognition model according to any one of claims 1 to 3.