CN106991364B

CN106991364B - Face recognition processing method and device and mobile terminal

Info

Publication number: CN106991364B
Application number: CN201610040749.8A
Authority: CN
Inventors: 易东; 刘荣; 张帆; 张伦; 楚汝峰
Original assignee: Alibaba Group Holding Ltd
Current assignee: Banma Zhixing Network Hongkong Co Ltd
Priority date: 2016-01-21
Filing date: 2016-01-21
Publication date: 2020-06-12
Anticipated expiration: 2036-01-21
Also published as: CN106991364A

Abstract

The application provides a face recognition processing method and a face recognition processing device, wherein the method comprises the following steps: determining a pose adaptation key point corresponding to the face image; carrying out convolution processing on the face image by adopting a trained convolution network, wherein the number of various layers in a network structure is matched with the processing resource of terminal equipment, and the parameters of various layers have attitude robustness; and performing pose robust processing on the first feature graph output from the first node in the convolutional network according to the pose adaptive key point to generate a second feature graph with pose robust response, and sending the second feature graph back to the convolutional network through the second node adjacent to the first node to extract the face features corresponding to the face image. The method and the device have the advantages that the efficient processing speed of a small system is guaranteed, meanwhile, the robustness and the accuracy of face recognition are improved, and the processing efficiency and the user experience degree are improved.

Description

Face recognition processing method and device and mobile terminal

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a face recognition processing method and apparatus, and a mobile terminal.

Background

With the development of portable terminal devices, many devices support a face recognition function, taking a mobile phone as an example: and carrying out applications such as mobile phone unlocking and the like through face recognition.

The current face recognition method with the highest recognition precision is a deep convolutional network. Different from the traditional face recognition method, the deep convolutional network has more layers, the multilayer structure can better express complex modes and distribution in data, and more robust and discriminative face features can be extracted.

However, the computation amount and memory consumption of the deep convolutional network are much higher than those of the traditional method, and are generally more than 10 times, so that the face recognition method based on the deep convolutional network is generally operated on a high-performance PC or a server and is provided for a user to call in a network API form. However, due to the limitations of power consumption and cost, the computing power and storage space of a small system are not enough to bear the face recognition algorithm based on the deep convolutional network, and the face recognition performance is poor.

Disclosure of Invention

The present application is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, a first objective of the present application is to provide a face recognition processing method, which improves robustness and accuracy of face recognition, and improves processing efficiency and user experience while ensuring efficient processing speed of a small system.

A second object of the present application is to provide a face recognition processing apparatus.

A third object of the present application is to provide a terminal device.

In order to achieve the above object, an embodiment of a first aspect of the present application provides a face recognition processing method, including: receiving a face image to be processed, and determining a posture adaptation key point corresponding to the face image; carrying out convolution processing on the face image by adopting a trained convolution network, wherein the number of various layers in a network structure is matched with the processing resource of terminal equipment, and the parameters of the various layers have attitude robustness; and performing pose robust processing on a first feature map output from a first node in the convolutional network according to the pose adaptive key point to generate a second feature map with pose robust response, and sending the second feature map back to the convolutional network through a second node adjacent to the first node, so that the convolutional network extracts the facial features corresponding to the facial image according to the second feature map.

The face recognition processing method comprises the steps of firstly receiving a face image to be processed, determining a posture adaptation key point corresponding to the face image, then carrying out convolution processing on the face image by adopting a trained convolution network, wherein the number of various layers in a network structure is matched with processing resources of terminal equipment, parameters of the various layers have posture robustness, and finally carrying out posture robust processing on a first feature map output from a first node in the convolution network according to the posture adaptation key point to generate a second feature map with posture robust response, and sending the second feature map back to the convolution network through a second node adjacent to the first node, so that the convolution network extracts face features corresponding to the face image according to the second feature map. Therefore, the robustness and the accuracy of face recognition are improved while the efficient processing speed of a small system is ensured, and the processing efficiency and the user experience degree are improved.

In order to achieve the above object, a second aspect of the present application provides a face recognition processing apparatus, including: the receiving module is used for receiving a face image to be processed; the first determining module is used for determining a posture adaptation key point corresponding to the face image; the convolution processing module is used for carrying out convolution processing on the face image by adopting a trained convolution network, wherein the number of various layers in a network structure is matched with the processing resources of terminal equipment, and the parameters of the various layers have attitude robustness; and the pose robust processing module is used for performing pose robust processing on a first feature map output from a first node in the convolutional network according to the pose adaptive key point, generating a second feature map with a pose robust response, and sending the second feature map back to the convolutional network through a second node adjacent to the first node, so that the convolutional network extracts the face features corresponding to the face image according to the second feature map.

The face recognition processing device receives a face image to be processed through the receiving module; determining a pose adaptation key point corresponding to the face image through a first determination module; performing convolution processing on the face image by adopting a trained convolution network through a convolution processing module, wherein the number of various layers in a network structure is matched with the processing resource of terminal equipment, and the parameters of the various layers have attitude robustness; and performing pose robust processing on a first feature map output from a first node in the convolutional network according to the pose adaptive key point by a pose robust processing module to generate a second feature map with a pose robust response, and sending the second feature map back to the convolutional network through a second node adjacent to the first node, so that the convolutional network extracts the facial features corresponding to the facial image according to the second feature map. Therefore, the robustness and the accuracy of face recognition are improved while the efficient processing speed of a small system is ensured, and the processing efficiency and the user experience degree are improved.

To achieve the above object, a third aspect of the present application provides a terminal device, including: the equipment body includes on the equipment body: a human face image acquisition device and a human face recognition processing device.

According to the terminal equipment of the embodiment of the application, a face image to be processed is received through a face recognition processing device, a posture adaptation key point corresponding to the face image is determined, then the face image is subjected to convolution processing through a trained convolution network, the number of various layers in a network structure is matched with processing resources of the terminal equipment, parameters of the various layers have posture robustness, and finally, the first feature graph output from a first node in the convolution network is subjected to posture robust processing according to the posture adaptation key point to generate a second feature graph with posture robust response, and the second feature graph is sent back to the convolution network through a second node adjacent to the first node, so that the convolution network extracts face features corresponding to the face image according to the second feature graph. Therefore, the robustness and the accuracy of face recognition are improved while the efficient processing speed of a small system is ensured, and the processing efficiency and the user experience degree are improved.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a face recognition processing method according to an embodiment of the present application;

FIG. 2 is a flow diagram of a convolutional network training process of one embodiment of the present application;

FIG. 3 is a schematic diagram of the convolutional network training process shown in FIG. 2;

FIG. 4 is a schematic diagram of positioning pose adaptation key points using a three-dimensional face model;

FIG. 5 is a schematic illustration of processing with local normalization;

FIG. 6 is a schematic diagram of pooling the convolved feature maps according to pose-adaptive keypoints;

FIG. 7 is a test flow diagram using the convolutional network trained in FIG. 3;

FIG. 8 is a schematic diagram of the convolutional network test process shown in FIG. 7;

FIG. 9 is a flow chart of a face recognition processing method according to another embodiment of the present application;

fig. 10 is a schematic structural diagram of a face recognition processing device according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a face recognition processing device according to another embodiment of the present application;

fig. 12 is a schematic structural diagram of a face recognition processing device according to another embodiment of the present application;

fig. 13 is a schematic structural diagram of a face recognition processing device according to another embodiment of the present application;

fig. 14 is a schematic structural diagram of a face recognition processing device according to another embodiment of the present application;

fig. 15 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

The following describes a face recognition processing method and apparatus according to an embodiment of the present application with reference to the drawings.

Fig. 1 is a flowchart of a face recognition processing method according to an embodiment of the present application.

As shown in fig. 1, the face recognition processing method includes:

step 101, receiving a face image to be processed, and determining a pose adaptation key point corresponding to the face image.

The face recognition processing method provided by the embodiment of the invention is applied to small-sized terminal equipment with a face recognition function, wherein the small-sized terminal equipment comprises the following steps: the face recognition function depends on specific terminal equipment and application environment, for example, the face unlocking function on the mobile phone verifies the identity of the face in the image shot by the mobile phone, and then determines whether to unlock the screen of the mobile phone according to the identity recognition result.

Because the computing power and the storage space of the small terminal equipment are not enough to bear the high-precision face recognition algorithm based on the deep convolutional network due to the limitation of power consumption and cost, the invention provides the high-precision face recognition algorithm suitable for the processing power of the small terminal equipment, which comprises the following specific steps:

firstly, a face image to be processed is received, and the source of the face image to be processed is not limited in this embodiment, and is different according to the acquisition environment and the source of the face image in specific application. For example: the face image of gathering under controllable environment to and the face image of gathering under uncontrollable environment, gather the source and include: a life photograph, a poster, a television, a monitoring picture, a standard photograph, etc.

Therefore, the acquisition environment and the source of the face image are rich and changeable, the change of the face pose can cause the violent change of the face image and even can cause self-shielding, and therefore the face pose is an important factor influencing the face recognition performance and precision.

Therefore, after receiving the face image to be processed, determining the pose adaptation key points corresponding to the face image, it should be noted that there are many positioning algorithm models of the pose adaptation key points, which can be selected according to a specific application scenario, and this embodiment does not limit this, for example: the key point detection method or the three-dimensional face modeling method is characterized in that the input of the positioning algorithm model is a face image to be processed, and the output is a group of coordinates corresponding to the pose key points. The number and the position of the posture adaptation key points cause various distribution ranges, such as a circle and a rectangle, and the distribution ranges can be adjusted according to the actual application requirements.

For each face image to be processed, the corresponding pose adaptive key points can be determined by applying the positioning mode, and because the semantics of each key point do not change along with the change of the pose, when the feature extraction is carried out on the face image by adopting a convolutional network in the subsequent process, the pose adaptive key points are embedded into the network to improve the robustness of the face pose. The pose robustness means that the face recognition algorithm can keep the recognition performance of the face under certain face pose change.

And 102, carrying out convolution processing on the face image by adopting a trained convolution network, wherein the number of various layers in a network structure is matched with the processing resources of terminal equipment, and the parameters of the various layers have attitude robustness.

After receiving the face image to be processed, carrying out convolution processing on the face image by adopting a trained convolution network. It should be emphasized that the number of the various layers in the network structure of the convolutional network related in the embodiments of the present invention matches with the processing resource of the terminal device, and the parameters of the various layers in the network structure are trained in advance to have pose robustness.

Firstly, description is made on matching of the number of various layers in a network structure in a convolutional network with processing resources of a terminal device:

in particular, the convolutional network is a multi-layer forward neural network, the response of the neurons of the convolutional network is only related to local regions of an input layer, and the convolutional network is widely applied to image and video analysis. Common layer modules in convolutional networks are: the convolution layer, the local connection non-sharing weight layer and the full connection layer are as follows:

a convolutional layer (convolutional layer) performs a convolution operation on an input signal using a plurality of filters, and outputs a multi-channel signal.

Local connections do not share weight layers (Local layers), Local connection layers for short, which are similar in structure to convolutional layers, but filters do not share weights, i.e., different filters are used at different positions of the input signal.

The Fully connected layer (Fully connected layer) multiplies the input vector by the weight matrix to obtain an output vector.

The input and output of the convolution layer and the local connection layer are multi-channel two-dimensional data, the two-dimensional structure of the image can be kept in the processing process, and the input and output of the full connection layer are vectors without considering the image structure. Convolutional networks generally include: at least one convolutional layer and/or at least one local tie layer, and further comprising at least one full tie layer. It should be noted, among others, that the order of the convolutional layers and/or the local connection layers can be reversed, if all three layers comprise: the convolutional layers are generally used near the input head, the local link layers are generally used at the middle, the full link layers are generally used at the tail, the number of each layer can be any, and the specific number of layers is determined by specific application.

The greater the number of layers in the convolutional network, the higher the recognition accuracy of the face image, but the slower the computational complexity and processing speed. Therefore, in order to realize efficient calculation processing for a small terminal device, the number of various layers in the network structure of the convolutional network involved in the embodiments of the present invention is matched with the processing resources of the terminal device.

Secondly, explaining that parameters of various layers in a network structure in a convolutional network have attitude robustness:

after determining the various layer structures in the convolutional network, it is also necessary to determine unknown parameters in the various layers, so that the convolutional network can be used to process the face image to be processed, i.e. the test process. The parameters of various layers in the network structure specifically include: the parameters of the convolutional layers and the local connection layers are called filters, and the parameters of the fully-connected layers are called projection matrices.

Since the number of layers in the convolutional network in embodiments of the present invention is greatly reduced compared to the number of layers in a deep convolutional network running on a high performance PC or server. In order to maintain the recognition performance of the deep convolutional network and reduce the complexity of operation and storage of the deep convolutional network so as to be suitable for small-sized terminal equipment, the convolutional network provided by the embodiment is subjected to pose robustness training in advance, so that parameters of various layers have pose robustness.

The training set of a convolutional network generally consists of two parts: the system comprises face images and face category labels, wherein each face image corresponds to one face category label, and the category labels represent the identities of people. By comparing the category labels, it can be determined which images are from the same person and which images are from different persons.

In the training process with pose robustness, a face image training set and a face class label training set are input from the head and the tail of a network, a pose robust processing mode for the face image is embedded into a convolutional network, and a preset objective function is optimized through forward propagation and backward propagation. The objective function is an engine in the network training process, and the invention does not limit the specific form of the objective function. Thus determining the parameters of various layers in the convolutional network, and the parameters have attitude robustness. It is emphasized that the network structure and the embedded robust processing mode adopted by the convolutional network in the training process are consistent with the network structure and the embedded robust processing mode adopted in the subsequent testing process.

And 103, performing pose robust processing on a first feature map output from a first node in the convolutional network according to the pose adaptive key point to generate a second feature map with a pose robust response, and sending the second feature map back to the convolutional network through a second node adjacent to the first node, so that the convolutional network extracts the face features corresponding to the face image according to the second feature map.

Specifically, after the above-described convolution network with the trained pose robustness is used to perform convolution processing on the face image to be processed, a first feature map subjected to convolution processing is first acquired from a first node in the convolution network.

In order to embed the pose adaptation key points corresponding to the face image into the convolutional network, the pose robust processing is performed on the first feature map according to the acquired pose adaptation key points, namely, pooling processing is performed on image data corresponding to the pose adaptation key points on the first feature map, wherein the pooling processing is to perform averaging equal integration processing on each convolution feature on the basis of convolution feature extraction, continue the processing process of reducing the feature dimension, and further generate a second feature map with pose robust response, namely, the second feature map has translational invariance on the pose change of the face.

And finally, sending the second feature graph back to the convolutional network through a second node adjacent to the first node, so that subsequent nodes in the convolutional network continue to process the second feature graph, and outputting a vector corresponding to the face image after processing of a full connection layer in the convolutional network, namely extracting the face feature corresponding to the face image.

It should be noted that, the first node and the second node have many specific expressions, and depending on the specific network structure, the specific expressions are as follows:

in a first aspect, the network structure includes: at least one convolution layer, at least one local connection layer and at least one full connection layer, wherein the convolution layer is positioned at the head part, the local connection layer is positioned at the middle part, and the full connection layer is positioned at the tail part,

in case one, when the first node is the convolutional layer, the second node is the local connection layer, so that the pose robust processing is performed on the first feature map output from the convolutional layer according to the pose adaptation key point, a second feature map with a pose robust response is generated, and the second feature map is sent back to the convolutional network through the local connection layer, so that the local connection layer continues to process the second feature map, and the face features are extracted through the full connection layer.

And in a second case, when the first node is a local connection layer, the second node is the full connection layer, so that the first feature graph output from the local connection layer is subjected to pose robust processing according to a pose adaptation key point, a second feature graph with pose robust response is generated, and the second feature graph is sent back to the convolutional network through the full connection layer, so that the second feature graph is processed through the full connection layer to extract the human face features.

In a second mode, if the convolutional network includes: the device comprises at least one convolution layer and at least one full-connection layer, wherein the convolution layer is positioned at the head part, and the full-connection layer is positioned at the tail part;

and when the first node is the convolutional layer, the second node is the fully-connected layer, so that the first feature graph output from the convolutional layer is subjected to posture robust processing according to the posture adaptation key point to generate a second feature graph with posture robust response, and the second feature graph is sent back to the convolutional network through the fully-connected layer so as to process the second feature graph through the fully-connected layer to extract the human face features.

Mode three, if the convolutional network includes: the device comprises at least one local connecting layer and at least one full connecting layer, wherein the local connecting layer is positioned at the head part, and the full connecting layer is positioned at the tail part;

and when the first node is the local connection layer, the second node is the full connection layer, so that the first feature graph output from the local connection layer is subjected to posture robust processing according to the posture adaptation key point, a second feature graph with posture robust response is generated, and the second feature graph is sent back to the convolutional network through the full connection layer, so that the second feature graph is processed through the full connection layer to extract the human face features.

In summary, compared with a deep convolutional network operating on a high-performance PC or a server, the convolutional network structure provided by this embodiment matches with processing resources of a terminal device, and embeds a robust processing process on a face pose into the convolutional network, which greatly reduces the pressure of the convolutional network in the aspect of pose robustness, so that a shallower convolutional network structure can be used to achieve a required recognition rate, and the requirements of the convolutional network on operation and storage resources are reduced, so that the convolutional network can operate on a resource-limited small terminal device quickly and with high performance, such as: mobile phones, tablet computers, vehicle-mounted devices, and the like.

The face recognition processing method provided by this embodiment includes receiving a face image to be processed, determining a pose adaptation key point corresponding to the face image, performing convolution processing on the face image by using a trained convolution network, where the number of various layers in a network structure is matched with processing resources of a terminal device, parameters of the various layers have pose robustness, performing pose robust processing on a first feature map output from a first node in the convolution network according to the pose adaptation key point, generating a second feature map having a pose robust response, and sending the second feature map back to the convolution network through a second node adjacent to the first node, so that the convolution network extracts a face feature corresponding to the face image according to the second feature map. Therefore, the robustness and the accuracy of face recognition are improved while the efficient processing speed of a small system is ensured, and the processing efficiency and the user experience degree are improved.

Further, in the face recognition process, the appearance (namely pixel values) of the face image is obviously influenced by illumination changes, the face images collected under different illumination environments have obvious differences, the convolution network generally has no invariance to the illumination changes, and when the face images with different illumination are input, the output of the convolution network can be obviously changed, so that the deeper the structure of the convolution network is, the better the illumination robust effect on the face image is.

In order to further alleviate the pressure of the convolutional network in the illumination robustness, based on the above embodiments, the parameters of the various layers in the convolutional network structure provided in this embodiment also have illumination robustness, and the robustness of the parameters of the various layers with pose is described as follows:

the trained convolutional network provided by the embodiment is subjected to illumination robustness training in advance, so that parameters of various layers have illumination robustness. In the training process with illumination robustness, a face image training set and a face class label training set are input from the head and the tail of a network, an illumination robust processing mode for the face image is embedded into a convolutional network, a preset objective function is optimized through forward propagation and backward propagation, and therefore parameters of various layers in the convolutional network are determined, and the parameters have illumination robustness. It is emphasized that the network structure and the embedded illumination robust processing mode adopted by the convolutional network in the training process are consistent with those adopted in the subsequent testing process.

After the convolutional network is subjected to illumination robust training, the face image to be processed can be identified, namely the testing process is carried out, and the specific process is as follows: based on the above embodiment, after the receiving the face image to be processed, the method further includes:

firstly, performing illumination robust processing on the face image to be processed to generate a third feature map with an illumination robust response, where the illumination robust processing has many ways, which can be selected according to actual application needs, and this embodiment does not limit this, for example: carrying out local normalization processing on pixels in the face image by adopting a multi-scale window; or, the face image is coded by adopting a Local Binary Pattern (LBP).

The third feature map with the illumination robust response is then sent to a trained convolutional network, which then performs a convolution process on the third feature map.

In summary, compared with the convolutional network provided in fig. 1, in the present embodiment, the robust processing process for the human face illumination is also embedded in the convolutional network, so as to further reduce the pressure of the convolutional network in terms of illumination robustness, thereby achieving the required recognition rate by using a shallower convolutional network structure, and reducing the requirements of the convolutional network on operation and storage resources, so that the convolutional network can operate on the resource-limited small terminal device more quickly and with high performance, such as: mobile phones, tablet computers, vehicle-mounted devices, and the like.

For the above embodiment, the network structure adopted in the training process of the convolutional network, and the posture robust processing mode and the illumination robust mode adopted for the face image are consistent with those in the testing process. In order to describe the training process and the testing process of the convolutional network, and the robust processing mode of the pose and the robust mode of the illumination embedded in the convolutional network in more detail, the embodiments shown in fig. 2 and fig. 8 are specifically described.

It should be noted that, because there are many structural manners of the convolutional network, as shown in step 103 in fig. 1, the following embodiment takes the network structure shown in the case one in the step 103 as an example for description, and the implementation processes of other network structural manners are similar and will not be described again in the following.

Fig. 2 is a flowchart of a convolutional network training process according to an embodiment of the present application, fig. 3 is a schematic diagram of a principle of the convolutional network training process shown in fig. 2, and referring to fig. 2 and fig. 3, this embodiment describes in detail how to determine a process of determining parameters with pose robustness and illumination robustness in various layers in a convolutional network through pose and illumination robustness training, where the training process specifically includes the following steps:

step 201, receiving a face image training set and a face class label training set, and sending the face class label training set to a preset objective function;

step 202, defining key points by adopting three-dimensional face model modeling, mapping the key points to the face image, and determining corresponding pose adaptive key points.

The pose is also an important factor affecting the face recognition performance, and the change of the pose causes the drastic change of the face image and can cause self-occlusion. In order to improve the robustness of the convolutional network to the pose, firstly, pose adaptive key points matched with the face images in the face image training set are determined.

In this embodiment, a three-dimensional face model is used to generate pose adaptive key points corresponding to a face image, fig. 4 is a schematic diagram of positioning the pose adaptive key points by using the three-dimensional face model, as shown in fig. 4, a left side diagram is a three-dimensional face model and key points, and a right side diagram is three pose adaptive key points corresponding to the face image, which is specifically described as follows:

firstly, constructing a three-dimensional face model, and defining a plurality of key points on the model; the model is then used to fit the input face image and the keypoints on the model are mapped onto the image plane. For each face, its corresponding pose adaptive key point can be obtained, the semantic meaning of each key point does not change with the change of pose, for the convenience of subsequent processing, the distribution range of the key points is generally rectangular, and the number of the key points in this embodiment is 32 × 32.

Step 203, local normalization processing is carried out on pixels in the face image by adopting a multi-scale window, and a third feature map with illumination robust response is generated;

in order to improve the robustness of the convolution network to illumination, the embodiment adopts a local normalization method to preprocess the face image. Fig. 5 is a schematic diagram of processing by local normalization, and as shown in fig. 5, the first behavior is a face image of the same face image under four illumination conditions, and the second behavior is a face image after local normalization processing, which is specifically described as follows:

the mean value in the local window is subtracted from each pixel of the face image, and then the mean value is divided by the standard deviation in the local window, so that the expression of illumination robustness can be obtained. Referring to fig. 4, it can be seen that the processed image is more stable than the original image and is less susceptible to light.

It should be noted that the effect of the local normalized window size is affected, the smaller the window is, the more robust the illumination is, but the more information is lost, and fig. 5 shows the effect when the 128x128 face image window is set to be 15x 15. Based on the characteristic, the window of the area with severe illumination change can be set to be small (such as a nose), and the window of the area with mild illumination change can be set to be large (such as a cheek), but the strategy needs to be manually controlled difficultly, so that a multi-scale normalization scheme is adopted in the embodiment. For the face image, windows with various sizes are used for carrying out normalization on the face image, then the face image is combined into a multi-channel image, the multi-channel image is automatically combined after entering a convolution network, and a combination coefficient of the multi-channel image is obtained through automatic network learning. The scheme avoids manual setting of parameters and is more flexible and reliable. The normalized face image has the same structure and resolution as the original image, so that the normalized face image can be easily docked with the convolution network to be used as the input of the convolution network.

And 204, performing convolution processing on the third feature map by adopting a trained convolution network, wherein the number of various layers in the network structure is matched with the processing resources of the terminal equipment, and the parameters of the various layers have attitude robustness and illumination robustness.

And carrying out convolution processing on the third feature map subjected to illumination processing by adopting a trained convolution network, wherein the number of various layers in the network structure is matched with the processing resource of the terminal equipment, and the parameters of the various layers have attitude robustness and illumination robustness. Compared with step 102 shown in fig. 1, the difference is that the input end of the convolution network in this embodiment is the third feature map that is subjected to the local normalization processing, and the embodiment shown in fig. 1 performs the convolution processing on the original image that is not processed, and the specific convolution processing process is the same and will not be described herein again.

Since the key points on the convolved first feature map are subsequently pooled, the face image parts other than the key points are not pooled, and therefore, in order to further improve the processing efficiency, only the key point positions in the face image may be convolved in this step, and therefore, in another embodiment, the following manner may be adopted:

firstly, determining a part of face image corresponding to the pose adaptation key point in the face image;

and then, carrying out convolution processing on the part of the face image by adopting the trained convolution network.

Therefore, the complexity of the merged post-processing is irrelevant to the size of input data and only relevant to the number of key points, and the complexity of the network can be greatly reduced.

Step 205, a local window is generated by taking the attitude adaptive key point as a center for the first feature map output from the first node in the convolutional network, and data is integrated in the local window to generate a second feature map with attitude robust response.

Fig. 6 is a schematic diagram illustrating the feature map subjected to the convolution processing according to the pose adaptation key points, where as shown in fig. 6, a first column is illustrated by an original, a second column is a first feature map output by the convolution processing, a third column is a second feature map subjected to the pooling processing, and the first row is a second feature map generated by applying max and averaging effects to the uniform grid of the first feature map.

And the second line is used for carrying out pooling processing on the first feature map in a local window taking the key point as the center to generate a second feature map, so that the attitude robustness and the translation invariance can be simultaneously obtained. The concrete pooling approach is to first generate a local window centered around the key point and then integrate the data within the local window to obtain a robust response, which can be: sampling (Sampling), maximum (Max), mean (Average), histogram, etc. The sampling speed is fastest, the histogram is slowest, the maximum value and the average value are moderate, and the maximum value and the average value can be selected according to the speed and performance requirements. For example: the pose change of the portrait includes a left face and a right face, the data of the second feature map generated by the conventional method of the first line is transformed with the pose change, and the data of the second feature map generated by the conventional method of the second line is not transformed with the pose change and has translational invariance.

Step 206, sending back the convolutional network through a second node adjacent to the first node, and sending back the convolutional network through a second node adjacent to the first node;

and step 207, optimizing the objective function by adopting a forward propagation mode and a backward propagation mode, and determining parameters with attitude robustness and illumination robustness in the various layers.

Fig. 7 is a test flowchart of the convolutional network trained in fig. 3, fig. 8 is a schematic diagram of a test process of the convolutional network shown in fig. 7, and referring to fig. 7 and fig. 8, this embodiment details how to perform a test by using the above-mentioned trained convolutional network, and the test process specifically includes the following steps:

as shown in fig. 7, the following steps may be included:

step 301, receiving a face image to be processed.

And 302, adopting a three-dimensional face model to model and define key points, mapping the key points to the face image, and determining corresponding pose adaptive key points.

303, performing local normalization processing on pixels in the face image by adopting a multi-scale window to generate a third feature map with illumination robust response;

and 304, performing convolution processing on the third feature map by adopting a trained convolution network, wherein the number of various layers in the network structure is matched with processing resources of the terminal equipment, and parameters of the various layers have attitude robustness and illumination robustness.

And 305, generating a local window by taking the attitude adaptive key point as a center for a first feature map output from a first node in the convolutional network, and integrating data in the local window to generate a second feature map with an attitude robust response.

Referring to steps 202 to 205 in the embodiment shown in fig. 2, the difference is that a processing object and a purpose are different, the processing object in fig. 2 is a face image in a face image training set, and the purpose is to determine a network parameter, the processing object in this embodiment is a face image to be processed, and the purpose is to extract a face feature, and the specific processing process is similar, and is not described herein again.

Step 306, sending back the convolutional network through a second node adjacent to the first node, so that the convolutional network extracts the facial features corresponding to the facial image according to the second feature map.

The face recognition processing method provided by the embodiment comprises the steps of firstly receiving a face image to be processed, adopting a three-dimensional face model to model and define key points, mapping the key points into the face image, determining corresponding posture adaptation key points, then adopting a multi-scale window to carry out local normalization processing on pixels in the face image, and generating a third feature map with illumination robust response; and performing convolution processing on the third feature map by adopting a trained convolution network, wherein the number of various layers in a network structure is matched with processing resources of terminal equipment, parameters of the various layers have pose robustness and illumination robustness, a local window is generated for a first feature map output from a first node in the convolution network by taking the pose adaptation key point as a center, data is integrated in the local window to generate a second feature map with pose robust response, and finally, the second feature map is sent back to the convolution network through a second node adjacent to the first node, so that the convolution network extracts face features corresponding to the face image according to the second feature map. Therefore, the robustness and the accuracy of face recognition are improved while the efficient processing speed of a small system is ensured, and the processing efficiency and the user experience degree are improved.

Fig. 9 is a flowchart of a face recognition processing method according to another embodiment of the present application.

As shown in fig. 9, the face image to be processed in this embodiment includes: a face image to be recognized and a registered set of face images, after step 103 or step 306, the method further comprising:

step 401, acquiring a first face feature corresponding to the face image to be recognized and a second face feature corresponding to the face image registration set from an output end of the convolutional network;

and 402, evaluating the similarity of the first face feature and the second face feature by applying a measurement function, and outputting a face recognition result.

Specifically, the face image to be recognized refers to a face image to be identified in a face recognition system, and is generally collected in an uncontrollable environment, and the source of the face image is different according to different specific applications, such as: a life photograph, a poster, a television, a monitoring picture, etc. The face image registration set refers to a face image set in a database of a face recognition system, the set generally includes identity information corresponding to images, the images are generally collected in a controllable environment, and the scale of the images is generally large, for example: tens of thousands to millions.

In the testing (or using) process of the network, the network can be called a feature extractor, is used as a black box, and acquires a first facial feature corresponding to the facial image to be recognized and a second facial feature corresponding to the facial image registration set from the output end of the convolutional network;

the metric function is used to evaluate the similarity between two face samples. The input is the registered face feature and the face feature to be recognized, and the output is the similarity. The present invention can employ various commonly used metric functions, such as: euclidean distance, cosine similarity, etc.

The face recognition result specifically includes:

when the face in the face image registration set is unique, judging whether the face is the same person according to the similarity; and when the face in the face image registration set is not unique, judging whether the face image to be identified is in the face image registration set or not according to the similarity.

In the face recognition method provided by this embodiment, a first face feature corresponding to the face image to be recognized and a second face feature corresponding to the face image registration set are obtained from an output end of the convolutional network; and then, evaluating the similarity of the first face feature and the second face feature by applying a metric function, and outputting a face recognition result. Therefore, the robustness and the accuracy of face recognition are improved while the efficient processing speed of a small system is ensured, and the processing efficiency and the user experience degree are improved.

In order to implement the above embodiments, the present application further provides a face recognition processing apparatus.

Fig. 10 is a schematic structural diagram of a face recognition processing device according to an embodiment of the present application.

As shown in fig. 10, the face recognition processing apparatus includes:

the receiving module 11 is configured to receive a face image to be processed;

a first determining module 12, configured to determine a pose adaptation key point corresponding to the face image;

the convolution processing module 13 is configured to perform convolution processing on the face image by using a trained convolution network, where the number of various layers in a network structure is matched with processing resources of a terminal device, and parameters of the various layers have pose robustness;

and the pose robust processing module 14 is configured to perform pose robust processing on a first feature map output from a first node in the convolutional network according to the pose adaptive key point, generate a second feature map with a pose robust response, and send the second feature map back to the convolutional network through a second node adjacent to the first node, so that the convolutional network extracts a facial feature corresponding to the facial image according to the second feature map.

The specific form of the first node and the second node depends on the network structure of the convolutional network, and specifically includes:

in a first aspect, the convolutional network comprises: the multilayer packaging structure comprises at least one convolution layer, at least one local connecting layer and at least one full connecting layer, wherein the convolution layer is positioned at the head part, the local connecting layer is positioned at the middle part, and the full connecting layer is positioned at the tail part;

in case one, when the first node is the convolutional layer, the second node is the local connection layer;

the pose robust processing module 14 is specifically configured to perform pose robust processing on the first feature map output from the convolutional layer according to the pose adaptive key point, generate a second feature map with a pose robust response, and send the second feature map back to the convolutional network through the local connection layer, so that the local connection layer and the full connection layer extract the facial features according to the second feature map.

In case two, when the first node is the local connection layer, the second node is the full connection layer;

the pose robust processing module 14 is specifically configured to perform pose robust processing on the first feature map output from the local connection layer according to the pose adaptive key point, generate a second feature map with a pose robust response, and send the second feature map back to the convolutional network through the full connection layer, so that the full connection layer extracts the face features according to the second feature map.

In a second aspect, the convolutional network comprises: the device comprises at least one convolution layer and at least one full-connection layer, wherein the convolution layer is positioned at the head part, and the full-connection layer is positioned at the tail part;

when the first node is the convolutional layer, the second node is the fully-connected layer;

the pose robust processing module 14 is specifically configured to perform pose robust processing on the first feature map output from the convolutional layer according to the pose adaptive key point, generate a second feature map with a pose robust response, and send the second feature map back to the convolutional network through the fully-connected layer, so that the fully-connected layer extracts the face features according to the second feature map.

Mode three, the convolutional network includes: the first node is a local connection layer, and the second node is a full connection layer;

It should be noted that the foregoing explanation on the embodiment of the face recognition processing method is also applicable to the face recognition processing apparatus of this embodiment, and details are not repeated here.

Fig. 11 is a schematic structural diagram of a face recognition processing apparatus according to another embodiment of the present application, where parameters of the various layers further have illumination robustness, as shown in fig. 11, and based on the embodiment shown in fig. 10, the apparatus further includes:

the illumination robust processing module 15 is configured to perform illumination robust processing on the face image to be processed after receiving the face image to be processed, and generate a third feature map with an illumination robust response;

the illumination robust processing module 15 is specifically configured to:

carrying out local normalization processing on pixels in the face image by adopting a multi-scale window;

alternatively, the first and second electrodes may be,

and coding the face image by adopting a Local Binary Pattern (LBP).

The convolution processing module 13 is specifically configured to perform convolution processing on the third feature map by using a trained convolution network.

The face recognition processing device of the embodiment of the application embeds the robust processing process of the illumination of the face into the convolution network, further relieves the pressure of the convolution network in the aspect of illumination robustness, so that a shallower convolution network structure can be used for achieving the required recognition rate, the requirements of the convolution network on operation and storage resources are reduced, and the convolution network can operate in a resource-limited small-sized terminal device with higher speed and higher performance.

Fig. 12 is a schematic structural diagram of a face recognition processing apparatus according to another embodiment of the present application, and as shown in fig. 12, based on the embodiment shown in fig. 11, the apparatus further includes: a processing module 16 and a second determining module 17, wherein,

the processing module 16 is configured to receive a face image training set and a face class label training set, and send the face class label training set to a preset objective function;

the first determining module 12 is further configured to determine a pose adaptation key point corresponding to a face image in the face image training set;

the convolution processing module 13 is further configured to perform convolution processing on the face image by using a convolution network which is not trained and has the network structure;

the pose robust processing module 14 is further configured to perform the pose robust processing on a first feature map output from a first node in the convolutional network according to the pose adaptive key point, generate a second feature map with a pose robust response, and send the second feature map back to the convolutional network through a second node adjacent to the first node;

and a second determining module 17, configured to determine parameters with pose robustness in the various layers by using forward propagation and backward propagation modes according to a preset objective function.

Further, the illumination robust processing module 15 is further configured to perform the illumination robust processing on the face images in the face training set, and generate a third feature map with an illumination robust response;

the convolution processing module 13 is further configured to perform convolution processing on the third feature map by using an untrained convolution network having the network structure;

the second determining module 17 is further configured to determine parameters with illumination robustness and pose robustness in the various layers.

Fig. 13 is a schematic structural diagram of a face recognition processing device according to another embodiment of the present application, and as shown in fig. 13, based on the embodiment shown in fig. 12,

the first determining module 12 specifically includes:

a modeling unit 121, configured to define key points by using a three-dimensional face model;

a mapping unit 122, configured to map the key points to the face image, and obtain corresponding pose adaptive key points;

further, the pose robust processing module 14 includes:

a generating unit 141, configured to generate a local window in the first feature map with the pose adaptation key point as a center;

and an integrating unit 142, configured to integrate the data in the local window to generate a second feature map with an attitude robust response.

Further, the convolution processing module 13 is specifically configured to:

determining a part of face images corresponding to the pose adaptation key points in the face images;

and carrying out convolution processing on the part of the face image by adopting a trained convolution network.

The face recognition processing device of the embodiment of the application comprises the steps of firstly receiving a face image to be processed, adopting a three-dimensional face model to model and define key points, mapping the key points to the face image, determining corresponding posture adaptation key points, then adopting a multi-scale window to carry out local normalization processing on pixels in the face image, and generating a third feature map with illumination robust response; and performing convolution processing on the third feature map by adopting a trained convolution network, wherein the number of various layers in a network structure is matched with processing resources of terminal equipment, parameters of the various layers have pose robustness and illumination robustness, a local window is generated for a first feature map output from a first node in the convolution network by taking the pose adaptation key point as a center, data is integrated in the local window to generate a second feature map with pose robust response, and finally, the second feature map is sent back to the convolution network through a second node adjacent to the first node, so that the convolution network extracts face features corresponding to the face image according to the second feature map. Therefore, the robustness and the accuracy of face recognition are improved while the efficient processing speed of a small system is ensured, and the processing efficiency and the user experience degree are improved.

Fig. 14 is a schematic structural diagram of a face recognition processing apparatus according to another embodiment of the present application, where the face image to be processed includes: as shown in fig. 13, based on the embodiment shown in fig. 11, the apparatus further includes:

an obtaining module 18, configured to obtain, from an output end of the convolutional network, a first face feature corresponding to the face image to be recognized and a second face feature corresponding to the face image registration set;

and the recognition module 19 is configured to apply a metric function to evaluate similarity between the first face feature and the second face feature, and output a face recognition result.

The face recognition processing device of the embodiment of the application acquires a first face feature corresponding to the face image to be recognized and a second face feature corresponding to the face image registration set from the output end of the convolution network; and then, evaluating the similarity of the first face feature and the second face feature by applying a metric function, and outputting a face recognition result. Therefore, the robustness and the accuracy of face recognition are improved while the efficient processing speed of a small system is ensured, and the processing efficiency and the user experience degree are improved.

In order to implement the above embodiments, the present application further provides a terminal device.

Fig. 15 is a schematic structural diagram of a terminal device according to an embodiment of the present application, and as shown in fig. 15, the terminal device includes: the equipment body includes on the equipment body: face image acquisition device 1 to and face identification processing apparatus 2, wherein, terminal equipment can include: the face image collecting device 1 is used for collecting a face image, the specific form depends on specific terminal devices, such as a camera on a mobile phone, a camera on a monitoring device, and the like, and the face recognition processing device 2 may adopt the face recognition processing device provided by the embodiment of the present invention, and is configured to process a face image to be processed by the face recognition method provided by the embodiment.

When the face recognition is deployed on a small terminal device (such as a mobile phone), the computation of the algorithm is greatly limited, and in order to achieve the speed as fast as possible, the invention is preferably realized by adopting a parallel instruction set, such as: the NEON instruction of ARM platform, the SSE instruction of X86 platform, etc.

It should be noted that the foregoing explanation on the embodiment of the face recognition processing method is also applicable to the terminal device of the embodiment, and is not repeated here.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A face recognition processing method is characterized by comprising the following steps:

receiving a face image to be processed, and determining a posture adaptation key point corresponding to the face image;

carrying out convolution processing on the face image by adopting a trained convolution network, wherein the number of various layers in a network structure is matched with the processing resource of terminal equipment, and the parameters of the various layers have attitude robustness;

performing pose robust processing on a first feature map output from a first node in the convolutional network according to the pose adaptive key point to generate a second feature map with pose robust response, and sending the second feature map back to the convolutional network through a second node adjacent to the first node, so that the convolutional network extracts the facial features corresponding to the facial image according to the second feature map;

wherein, the performing robust processing on the first feature graph output from the first node in the convolutional network according to the pose adaptation key point includes:

generating a local window by taking the attitude adaptation key point as a center in the first feature map;

and integrating data in the local window to generate a second feature map with an attitude robust response.

2. The face recognition processing method according to claim 1, wherein the parameters of the various layers further have illumination robustness, and after receiving the face image to be processed, the method further comprises:

performing illumination robust processing on the face image to be processed to generate a third feature map with illumination robust response;

the convolution processing of the face image by adopting the trained convolution network comprises the following steps:

and carrying out convolution processing on the third feature map by adopting a trained convolution network.

3. The face recognition processing method of claim 2, wherein the illumination robust processing comprises:

alternatively, the first and second electrodes may be,

and coding the face image by adopting a Local Binary Pattern (LBP).

4. The face recognition processing method of claim 2, wherein the method further comprises:

receiving a face image training set and a face class label training set, and sending the face class label training set to a preset objective function;

determining pose adaptation key points corresponding to the face images in the face image training set;

carrying out convolution processing on the face image by adopting a convolution network which is not trained and has the network structure;

performing the attitude robust processing on a first feature map output from a first node in the convolutional network according to the attitude adaptation key point, generating a second feature map with an attitude robust response, and sending the second feature map back to the convolutional network through a second node adjacent to the first node;

and optimizing the objective function by adopting a forward propagation mode and a backward propagation mode, and determining parameters with attitude robustness in the various layers.

5. The face recognition processing method of claim 4, before the convolving the face image, further comprising:

performing the illumination robust processing on the face images in the face training set to generate a third feature map with illumination robust response;

the convolution processing of the face image by adopting the untrained convolution network with the network structure comprises the following steps:

performing convolution processing on the third feature map by adopting a convolution network which is not trained and has the network structure;

the determining parameters with pose robustness in the various layers comprises:

and determining parameters with illumination robustness and posture robustness in the various layers.

6. The face recognition processing method of claim 1, wherein the face image to be processed comprises: the method comprises the following steps of identifying a face image to be identified and a face image registration set, wherein the method further comprises the following steps:

acquiring a first face feature corresponding to the face image to be recognized and a second face feature corresponding to the face image registration set from an output end of the convolutional network;

and evaluating the similarity of the first face feature and the second face feature by applying a measurement function, and outputting a face recognition result.

7. The face recognition processing method of claim 6, wherein the metric function comprises:

euclidean distance and cosine similarity.

8. The face recognition processing method of claim 6, wherein the face recognition result comprises:

when the face in the face image registration set is unique, judging whether the face is the same person according to the similarity;

and when the face in the face image registration set is not unique, judging whether the face image to be identified is in the face image registration set or not according to the similarity.

9. The face recognition processing method of any one of claims 1 to 8, wherein the determining pose adaptation key points corresponding to the face image comprises:

modeling and defining key points by adopting a three-dimensional face model;

and mapping the key points to the face image, and determining corresponding pose adaptation key points.

10. The face recognition processing method of claim 1, wherein the manner of integrating data comprises:

sample, maximum, mean, and histogram.

11. The face recognition processing method of any one of claims 1 to 8, wherein the performing convolution processing on the face image to be processed by using the trained convolution network comprises:

12. The face recognition processing method of any of claims 1-8, wherein the network structure comprises: the system comprises at least one convolution layer, at least one local connection layer and at least one full connection layer, wherein the convolution layer is positioned at the head part, the local connection layer is positioned at the middle part, the full connection layer is positioned at the tail part, and when the first node is the convolution layer, the second node is the local connection layer;

the performing pose robust processing on a first feature map output from a first node in the convolutional network according to the pose adaptive key point to generate a second feature map with a pose robust response, and sending the second feature map back to the convolutional network through a second node adjacent to the first node, includes:

and performing pose robust processing on the first feature map output from the convolutional layer according to the pose adaptive key points to generate a second feature map with a pose robust response, and sending the second feature map back to the convolutional network through the local connection layer so that the local connection layer and the full connection layer extract the face features according to the second feature map.

13. The face recognition processing method of claim 12, wherein when the first node is the local connection layer, then the second node is the full connection layer;

and performing pose robust processing on the first feature map output from the local connection layer according to the pose adaptive key points to generate a second feature map with pose robust response, and sending the second feature map back to the convolutional network through the full connection layer so that the full connection layer extracts the face features according to the second feature map.

14. The face recognition processing method of any of claims 1-8, wherein the convolutional network comprises: the device comprises at least one convolution layer and at least one full-connection layer, wherein the convolution layer is positioned at the head part, and the full-connection layer is positioned at the tail part; when the first node is the convolutional layer, the second node is the fully-connected layer;

and performing pose robust processing on the first feature map output from the convolutional layer according to the pose adaptive key points to generate a second feature map with a pose robust response, and sending the second feature map back to the convolutional network through the fully-connected layer so that the fully-connected layer extracts the face features according to the second feature map.

15. The face recognition processing method of any of claims 1-8, wherein the convolutional network comprises: the first node is a local connection layer, and the second node is a full connection layer;

16. A face recognition processing apparatus, comprising:

the receiving module is used for receiving a face image to be processed;

the first determining module is used for determining a posture adaptation key point corresponding to the face image;

the convolution processing module is used for carrying out convolution processing on the face image by adopting a trained convolution network, wherein the number of various layers in a network structure is matched with the processing resources of terminal equipment, and the parameters of the various layers have attitude robustness;

the pose robust processing module is used for performing pose robust processing on a first feature map output from a first node in the convolutional network according to the pose adaptive key point, generating a second feature map with pose robust response, and sending the second feature map back to the convolutional network through a second node adjacent to the first node, so that the convolutional network extracts the face features corresponding to the face image according to the second feature map;

wherein the pose robustness processing module comprises:

a generating unit, configured to generate a local window in the first feature map with the pose adaptation key point as a center;

and the integration unit is used for integrating the data in the local window to generate a second feature map with an attitude robust response.

17. The face recognition processing apparatus as claimed in claim 16, wherein the parameters of the various layers further have illumination robustness, the apparatus further comprising:

the illumination robust processing module is used for performing illumination robust processing on the face image to be processed after receiving the face image to be processed to generate a third feature map with illumination robust response;

and the convolution processing module is specifically configured to perform convolution processing on the third feature map by using a trained convolution network.

18. The face recognition processing apparatus of claim 17, wherein the illumination robustness processing module is specifically configured to:

alternatively, the first and second electrodes may be,

and coding the face image by adopting a Local Binary Pattern (LBP).

19. The face recognition processing apparatus of claim 17, wherein the apparatus further comprises:

the processing module is used for receiving a face image training set and a face class label training set and sending the face class label training set to a preset objective function;

the first determining module is further configured to determine a pose adaptation key point corresponding to a face image in the face image training set;

the convolution processing module is also used for carrying out convolution processing on the face image by adopting a convolution network which is not trained and has the network structure;

the attitude robust processing module is further configured to perform the attitude robust processing on a first feature map output from a first node in the convolutional network according to the attitude adaptation key point, generate a second feature map with an attitude robust response, and send the second feature map back to the convolutional network through a second node adjacent to the first node;

and the second determining module is used for optimizing the objective function by adopting a forward propagation mode and a backward propagation mode and determining parameters with attitude robustness in the various layers.

20. The face recognition processing apparatus of claim 19,

the illumination robust processing module is further used for performing the illumination robust processing on the face images in the face training set to generate a third feature map with illumination robust response;

the convolution processing module is further configured to perform convolution processing on the third feature map by using a convolution network which is not trained and has the network structure;

the second determining module is further configured to determine parameters with illumination robustness and pose robustness in the various layers.

21. The face recognition processing apparatus according to claim 16, wherein the face image to be processed includes: the device comprises a face image to be recognized and a face image registration set, and further comprises:

the acquisition module is used for acquiring a first face feature corresponding to the face image to be recognized and a second face feature corresponding to the face image registration set from the output end of the convolutional network;

and the recognition module is used for evaluating the similarity of the first face feature and the second face feature by applying a measurement function and outputting a face recognition result.

22. The face recognition processing apparatus according to any one of claims 16 to 21, wherein the first determining module specifically includes:

the modeling unit is used for modeling and defining key points by adopting a three-dimensional face model;

and the mapping unit is used for mapping the key points to the face image to obtain corresponding pose adaptive key points.

23. The face recognition processing apparatus of any one of claims 16-21, wherein the convolution processing module is specifically configured to:

24. The face recognition processing apparatus as claimed in any one of claims 16 to 21, wherein said convolutional network comprises: the multilayer packaging structure comprises at least one convolution layer, at least one local connecting layer and at least one full connecting layer, wherein the convolution layer is positioned at the head part, the local connecting layer is positioned at the middle part, and the full connecting layer is positioned at the tail part; when the first node is the convolutional layer, the second node is the local connection layer;

the pose robust processing module is specifically configured to perform pose robust processing on the first feature map output from the convolutional layer according to the pose adaptive key point, generate a second feature map with a pose robust response, and send the second feature map back to the convolutional network through the local connection layer, so that the local connection layer and the full connection layer extract the face features according to the second feature map.

25. The face recognition processing apparatus according to claim 24, wherein when the first node is the local connection layer, then the second node is the full connection layer;

the pose robust processing module is specifically configured to perform pose robust processing on the first feature map output from the local connection layer according to the pose adaptation key point, generate a second feature map with a pose robust response, and send the second feature map back to the convolutional network through the full connection layer, so that the full connection layer extracts the face features according to the second feature map.

26. The face recognition processing apparatus as claimed in any one of claims 16 to 21, wherein said convolutional network comprises: the device comprises at least one convolution layer and at least one full-connection layer, wherein the convolution layer is positioned at the head part, and the full-connection layer is positioned at the tail part; when the first node is the convolutional layer, the second node is the fully-connected layer;

the pose robust processing module is specifically configured to perform pose robust processing on the first feature map output from the convolutional layer according to the pose adaptive key point, generate a second feature map with a pose robust response, and send the second feature map back to the convolutional network through the fully-connected layer, so that the fully-connected layer extracts the face features according to the second feature map.

27. The face recognition processing apparatus as claimed in any one of claims 16 to 21, wherein said convolutional network comprises: the first node is a local connection layer, and the second node is a full connection layer;

28. A terminal device, comprising: the equipment body, its characterized in that, include on the equipment body: a face image acquisition device, and a face recognition processing device according to any one of claims 16-27.

29. The terminal device of claim 28, wherein the terminal device comprises: cell-phone, supervisory equipment and vehicle-mounted device.