WO2020199693A1

WO2020199693A1 - Large-pose face recognition method and apparatus, and device

Info

Publication number: WO2020199693A1
Application number: PCT/CN2019/130871
Authority: WO
Inventors: 乔宇; 曾小星; 彭小江
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2019-03-29
Filing date: 2019-12-31
Publication date: 2020-10-08
Also published as: CN110020620A; CN110020620B

Abstract

A large-pose face recognition method, comprising: learning a first image feature of a face training image by means of a texture learning network (S101); reconstructing a corresponding three-dimensional face according to the face training image, and converting shape information of the reconstructed three-dimensional face into a two-dimensional texture image (S102); learning a second image feature of the two-dimensional texture image by means of a shape learning network (S103); and combining the first image feature and the second image feature to recognize a face (S104). A two-dimensional planar feature and a three-dimensional feature can be expressed in a combined manner, such that the accuracy of large-pose face recognition is effectively improved. In addition, a training process is relatively simple, and the occupation of a storage space can be reduced.

Description

Face recognition method, device and equipment in large posture

Technical field

This application belongs to the field of face recognition, and in particular relates to a face recognition method, device and equipment in a big posture.

Background technique

In a non-cooperative and uncontrolled environment, when the user's face is recognized, the collected face images often have a variety of posture changes interference, that is, the collected face images are large postures, in order to improve this environment The accuracy of face recognition under the following conditions needs to be recognized for large poses.

The current large pose face recognition methods include the use of pose awakening networks and the use of deep networks to learn pose robust facial features. Among them, each sub-network in the posture awakening network is responsible for one posture, and the entire network covers all face postures. However, due to the need to train multiple sub-networks and the training data needs to be processed by postures, the training and testing processes are more complicated. Need more storage space. When using deep networks to learn pose robust face features, because the existing training data does not have a large number of large pose face images, it cannot effectively solve the problem of face recognition in the case of large poses.

technical problem

In view of this, the embodiments of the present application provide a face recognition method, device, and equipment in a large posture to solve the problem that the accuracy of face recognition in the prior art is not high, or the training and testing process is complicated and requires a large Storage space problem.

Technical solutions

The first aspect of the embodiments of the present application provides a face recognition method in a big posture, and the face recognition method in a big posture includes:

Learn the first image feature of the face training image through the texture learning network;

Reconstructing the corresponding three-dimensional face according to the face training image, and converting the shape information of the reconstructed three-dimensional face into a two-dimensional texture image;

Learning the second image feature of the two-dimensional texture image through a shape learning network;

Combining the first image feature and the second image feature to recognize the face.

With reference to the first aspect, in the first possible implementation of the first aspect, before the step of learning the first image feature of the face training image through the texture learning network, the method further includes:

Perform detection and alignment operations on the face training image, and mark the key points of the face in the face training image.

With reference to the first aspect, in a second possible implementation manner of the first aspect, the step of learning the first image feature of the face training image through the texture learning network includes:

Using a multi-layer residual network structure, layer-by-layer weight optimization through stochastic gradient descent, and the network's forward propagation to obtain the network's prediction label for the face training image;

The predicted label is compared with the real label of the face training image, and the first image feature of the face training image is learned through the supervision of the cross loss function.

With reference to the second possible implementation manner of the first aspect, in the third possible implementation manner of the first aspect, the cross loss function is:

Where x represents the character training image,

Indicates whether the image belongs to the i-th category,

Indicates the probability that the image belongs to the i-th category, C is the number of categories, and L _ce is the calculated loss value.

With reference to the first aspect, in a fourth possible implementation of the first aspect, the step of reconstructing a corresponding three-dimensional face according to the face training image, and converting the shape information of the reconstructed three-dimensional face into a two-dimensional texture image include:

Reconstructing the three-dimensional face corresponding to the face training image through the key point regression loss function and the prior loss function;

Project the item point coordinates of the reconstructed three-dimensional face to the texture space to obtain a two-dimensional texture image.

In combination with the fourth possible implementation manner of the first aspect, in the fifth possible implementation manner of the first aspect, the key point regression loss function and the prior loss function are:

Wherein, L _recon to loss of the calculated value, the first term on the right represents the return loss of function keys, N is the number of critical points, L ⁱ _gt label indicates the i-th critical points, L ⁱ _pr denotes the i th key The second item on the right represents the prior loss function, α represents the shape parameter of the three-dimensional deformation model, and λ represents the set loss function weight.

With reference to the first aspect, in a sixth possible implementation manner of the first aspect, the step of combining the first image feature and the second image feature to recognize the face includes:

The first image feature of the first dimension and the second image feature of the second dimension are spliced to obtain the fused third image feature of the third dimension, and face recognition is performed according to the third image feature of the third dimension, The third dimension=first dimension+second dimension.

A second aspect of the embodiments of the present application provides a face recognition device in a large posture, and the face recognition device in a large posture includes:

The first learning unit is used to learn the first image feature of the face training image through the texture learning network;

A reconstruction unit, configured to reconstruct a corresponding three-dimensional face according to the face training image, and convert the shape information of the reconstructed three-dimensional face into a two-dimensional texture image;

The second learning unit is configured to learn the second image feature of the two-dimensional texture image through a shape learning network;

The joint recognition unit is used to combine the first image feature and the second image feature to recognize the face.

The third aspect of the embodiments of the present application provides a face recognition device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor When the computer program is executed, the steps of the face recognition method in a large posture as described in any one of the first aspect are implemented.

The fourth aspect of the embodiments of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it implements the large-scale data described in any of the first The steps of the face recognition method under the posture.

Beneficial effect

Compared with the prior art, the embodiment of this application has the beneficial effect that the first image feature of the face training image is learned through the texture learning network, then the three-dimensional face is reconstructed, and the shape information of the reconstructed three-dimensional face is converted into two The second image feature of the two-dimensional texture image is learned through the shape learning network, and then the first image feature and the second image feature are combined to recognize the face, so that the two-dimensional planar feature and the three-dimensional feature can be jointly expressed, It effectively improves the accuracy of face recognition in a large posture, and the training process is relatively simple, which can reduce the occupation of storage space.

Description of the drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only of the present application. For some embodiments, for those of ordinary skill in the art, other drawings can be obtained from these drawings without creative labor.

FIG. 1 is a schematic diagram of the implementation process of a face recognition method in a large posture provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a face recognition structure provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a face recognition device in a large posture according to an embodiment of the present application;

Fig. 4 is a schematic diagram of a face recognition device provided by an embodiment of the present application.

Embodiments of the invention

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.

In order to illustrate the technical solutions described in the present application, specific embodiments are used for description below.

FIG. 1 is a schematic diagram of the implementation process of a face recognition method in a large posture provided by an embodiment of the application, and the details are as follows:

In step S101, the first image feature of the face training image is learned through the texture learning network;

Specifically, the big posture mentioned in this application refers to the user's posture being in an uncontrollable state, and the user has various postures. In order to describe the multiple posture scenarios of the user, this application expresses it as a big posture.

Before learning the first image feature, the present application may also include the step of detecting and aligning the face training image, aligning the face image in the face training image, and detecting key points in the face image. In this application, there may be 21 key points of the face.

In the texture learning network, the residual N (N can be 18) layer network structure can be used, and the pre-training model may not be used. The length and width of the input image can be pixels of a predetermined size (for example, 224), and the face detection and face alignment operations are performed on the faces in the image.

The batch size used in the training process can be 128, and the stochastic gradient descent method can be used to optimize the weights layer by layer. Send the corresponding face training image, get the predicted label of the image by the texture learning network through the forward propagation of the network, compare the predicted label with the real label of the image, and calculate the loss function of the classification through the cross loss function.

Where x represents the image,

Indicates whether the image belongs to the i-th category,

Indicates the probability that the image belongs to the i-th category, C is the number of categories, and L _ce is the calculated loss value. In this module, through the supervision of the cross-entropy loss function, the deep convolutional network can learn better image features, which provides a basis for the later combination of text features.

In step S102, a corresponding three-dimensional face is reconstructed according to the face training image, and the shape information of the reconstructed three-dimensional face is converted into a two-dimensional texture image;

The first image feature with semantic expression learned through the texture learning network in step S101 can be used for face recognition in this application, and can also be used for three-dimensional face reconstruction with identity authentication. The three-dimensional face reconstruction network closely follows the texture learning network, and inputs two-dimensional faces into the three-dimensional face reconstruction network. Unlike the texture learning network, the three-dimensional face reconstruction network may not perform the alignment operation of face detection. The key point information in the face in the face training image can be annotated, the shape and expression parameters of the three-dimensional deformation model can be predicted through the three-dimensional face reconstruction network, and then the three-dimensional face based on the three-dimensional deformation model can be reconstructed. The three-dimensional face reconstruction network is monitored through a supervised operation function. It can be specifically shown in Figure 2, including:

In step S201, the three-dimensional face corresponding to the face training image is reconstructed through the key point regression loss function and the prior loss function;

Input two-dimensional face training images into the three-dimensional face reconstruction network, and predict the shape and expression parameters of the three-dimensional deformation model through the network. Two supervision functions can be used to monitor the reconstruction of the three-dimensional face model, including the key point regression loss function And the prior loss function, as shown in the following formula:

In the key point regression process, it is necessary to predict the camera parameters, which can include rotation parameters, position offset parameters, and zoom coefficients. The rotation parameter is a 3-dimensional output, and the offset prediction is made for the three coordinate systems of X, Y, and Z at the same time, and finally all the position coordinates are scaled. Get the final three-dimensional key point prediction.

In step S202, project the item point coordinates of the reconstructed three-dimensional face to the texture space to obtain a two-dimensional texture image.

For the reconstructed 3D face, use the mapping relationship between the texture coordinates and the world coordinates in the 3D deformation model to project the vertex coordinates of the reconstructed 3D face into the texture space. In this way, the texture space can completely express the shape information of the three-dimensional face with a two-dimensional map. The number of channels in this map is 3, which represents the X, Y, and Z coordinate values of the three-dimensional face.

In step S103, the second image feature of the two-dimensional texture image is learned through a shape learning network;

According to the two-dimensional texture image obtained in step S102, the reconstructed three-dimensional coordinates are expressed. The posture robust feature in this two-dimensional texture image can be extracted through the residual network, and the supervision information can be the same as step S101.

Convert the 3D face from world coordinates to texture space coordinates, and convert the disordered 3D point cloud into an ordered texture map suitable for deep neural network processing. Using the deep neural network of the shape learning network, the three-dimensional reconstructed shape information is feature extracted to obtain features that are robust to the pose.

In step S104, the first image feature and the second image feature are combined to recognize the face.

In the testing phase, we can perform joint expressions. From steps S101-S103, our framework uses the texture learning network to extract the two-dimensional information of the face. This two-dimensional information is the general information for general face recognition. At the same time, we obtained three-dimensional identity information that is robust to posture. In the testing phase, we obtained joint expression by splicing the corresponding network fully connected output features, which can mine the identity authentication information of the face to the greatest extent, and the joint expression can significantly improve The performance of the face in the big pose scene.

The first image feature of the face training image is learned through the texture learning network, then the three-dimensional face is reconstructed, the shape information of the reconstructed three-dimensional face is converted into a two-dimensional texture image, and the shape of the two-dimensional texture image is learned through the shape learning network. The second image feature is then combined with the first image feature and the second image feature to recognize the face, so that the two-dimensional planar feature and the three-dimensional feature can be expressed jointly. For example, the first image feature is the first dimension, and the second image feature is the first Two dimensions, then the combined third image feature can be the third dimension, and the third dimension is the sum of the first dimension and the second dimension.

It effectively improves the accuracy of face recognition in a large posture, and the training process is relatively simple, which can reduce the occupation of storage space.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.

FIG. 3 is a schematic structural diagram of a face recognition device in a large posture provided by an embodiment of the application, and the details are as follows:

The face recognition device in the big posture includes:

The face recognition device in the large posture described in FIG. 3 corresponds to the face recognition method in the large posture described in FIG. 1.

Fig. 4 is a schematic diagram of a face recognition device provided by an embodiment of the present application. As shown in FIG. 4, the face recognition device 4 of this embodiment includes: a processor 40, a memory 41, and a computer program 42 stored in the memory 41 and running on the processor 40, for example, in a large attitude Face recognition program. When the processor 40 executes the computer program 42, the steps in the above embodiments of the face recognition method in each large posture are realized. Alternatively, when the processor 40 executes the computer program 42, the functions of the modules/units in the foregoing device embodiments are realized.

Exemplarily, the computer program 42 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 41 and executed by the processor 40 to complete This application. The one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program 42 in the face recognition device 4. For example, the computer program 42 can be divided into:

The face recognition device 4 can be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The face recognition device may include, but is not limited to, a processor 40 and a memory 41. Those skilled in the art can understand that FIG. 4 is only an example of the face recognition device 4, and does not constitute a limitation on the face recognition device 4. It may include more or less components than shown in the figure, or combine certain components. Or different components, for example, the face recognition device may also include input and output devices, network access devices, buses, and so on.

The so-called processor 40 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The memory 41 may be an internal storage unit of the face recognition device 4, such as a hard disk or a memory of the face recognition device 4. The memory 41 may also be an external storage device of the face recognition device 4, such as a plug-in hard disk equipped on the face recognition device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital) Digital, SD) card, flash card (Flash Card), etc. Further, the memory 41 may also include both an internal storage unit of the face recognition device 4 and an external storage device. The memory 41 is used to store the computer program and other programs and data required by the face recognition device. The memory 41 can also be used to temporarily store data that has been output or will be output.

Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above-mentioned functional units and modules is used as an example. In practical applications, the above-mentioned functions can be allocated to different functional units and modules as required. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only used to facilitate distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which is not repeated here.

In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

In the embodiments provided in this application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the device/terminal device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units. Or components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc. It should be noted that the content contained in the computer-readable medium can be appropriately added or deleted in accordance with the requirements of the legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to the legislation and patent practice, the computer-readable medium Does not include electrical carrier signals and telecommunication signals.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

A face recognition method in a big posture, characterized in that the face recognition method in a big posture includes:

Learn the first image feature of the face training image through the texture learning network;

Reconstructing the corresponding three-dimensional face according to the face training image, and converting the shape information of the reconstructed three-dimensional face into a two-dimensional texture image;

Learning the second image feature of the two-dimensional texture image through a shape learning network;

Combining the first image feature and the second image feature to recognize the face.
The face recognition method in a large posture according to claim 1, wherein before the step of learning the first image feature of the face training image through the texture learning network, the method further comprises:

Perform detection and alignment operations on the face training image, and mark the key points of the face in the face training image.
The face recognition method in a large posture according to claim 1, wherein the step of learning the first image feature of the face training image through the texture learning network comprises:

Using a multi-layer residual network structure, layer-by-layer weight optimization through stochastic gradient descent, and the network's forward propagation to obtain the network's prediction label for the face training image;

The predicted label is compared with the real label of the face training image, and the first image feature of the face training image is learned through the supervision of the cross loss function.
The face recognition method in a large pose according to claim 2, wherein the cross loss function is:

Where x represents the character training image,
Indicates whether the image belongs to the i-th category,
Indicates the probability that the image belongs to the i-th category, C is the number of categories, and L ce is the calculated loss value.
The face recognition method in a large posture according to claim 1, wherein the corresponding three-dimensional face is reconstructed according to the face training image, and the shape information of the reconstructed three-dimensional face is converted into a two-dimensional texture image The steps include:

Reconstructing the three-dimensional face corresponding to the face training image through the key point regression loss function and the prior loss function;

Project the item point coordinates of the reconstructed three-dimensional face to the texture space to obtain a two-dimensional texture image.
The face recognition method in a large pose according to claim 5, wherein the key point regression loss function and the prior loss function are:

Wherein, L recon to loss of the calculated value, the first term on the right represents the return loss of function keys, N is the number of critical points, L i gt label indicates the i-th critical points, L i pr denotes the i th key The second item on the right represents the prior loss function, α represents the shape parameter of the three-dimensional deformation model, and λ represents the set loss function weight.
The face recognition method in a large posture according to claim 1, wherein the step of combining the first image feature and the second image feature to recognize the face comprises:

The first image feature of the first dimension and the second image feature of the second dimension are spliced to obtain the fused third image feature of the third dimension, and face recognition is performed according to the third image feature of the third dimension, The third dimension=first dimension+second dimension.
A face recognition device in a large posture, characterized in that the face recognition device in a large posture includes:

The first learning unit is used to learn the first image features of the face training image through the texture learning network;

A reconstruction unit, configured to reconstruct a corresponding three-dimensional face according to the face training image, and convert the shape information of the reconstructed three-dimensional face into a two-dimensional texture image;

The second learning unit is configured to learn the second image feature of the two-dimensional texture image through a shape learning network;

The joint recognition unit is used to combine the first image feature and the second image feature to recognize the face.
A face recognition device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program as claimed in claim Steps of any one of 1 to 5 of the face recognition method in a large posture.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, wherein when the computer program is executed by a processor, a person in a large posture as described in any one of claims 1 to 5 is realized Steps of face recognition method.