CN113762118B

CN113762118B - Face recognition method, electronic device and storage medium

Info

Publication number: CN113762118B
Application number: CN202110997476.7A
Authority: CN
Inventors: 胡长胜; 付贤强; 何武; 朱海涛; 户磊
Original assignee: Hefei Dilusense Technology Co Ltd
Current assignee: Hefei Dilusense Technology Co Ltd
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2022-08-26
Anticipated expiration: 2041-08-27
Also published as: CN113762118A

Abstract

The embodiment of the invention relates to the field of image recognition, and discloses a face recognition method, electronic equipment and a storage medium. The face recognition method comprises the following steps: acquiring a face image to be recognized as a target face image; inputting the target face image into a preset target face recognition model to obtain a recognition result of the target face image, wherein the target face recognition model comprises a basic face recognition model and a local recognition network, the basic face recognition model is used for extracting a characteristic image of the target face image, and the local recognition network is used for acquiring local characteristics of the target face image. By adopting the embodiment of the application, the accuracy of face recognition can be improved.

Description

Face recognition method, electronic device and storage medium

Technical Field

The embodiment of the invention relates to the field of image recognition, in particular to a face recognition method, electronic equipment and a storage medium.

Background

The face recognition is a popular field in the image recognition field, and a neural network capable of performing face recognition, namely a face recognition model, is usually obtained through deep learning and training. At present, the training of a face recognition model mainly comprises the following two steps: and training the pre-training recognition model. Because the face recognition application scenes are different, the data used for training the face recognition model are different, the data amount suitable for different scenes is less, and the data participating in training needs to be in the order of thousands or tens of thousands, so that the data can be generally obtained according to the characteristics of the hardware infrastructure of the platform, such as the computing power and the storage capacity of the platform; and with reference to the types of classical CNN networks, such as the lightweight model ReseNet101, DeseNet169, the lightweight model: MobileNet series, ShuffleNet series, etc.; a generic pre-trained recognition model is customized and trained. The face identification contained in the pre-training recognition model during training can be in the ten-million scale, and meanwhile the pre-training recognition model has basic generalization capability and still has a certain distance from the requirement index of the product scene. After the pre-recognition model is trained, executing a second step: the pre-training recognition model is finely adjusted according to specific scene data to obtain a face recognition model suitable for a specific scene, for example, network parameters of the pre-training recognition model are finely adjusted by using a loss function of face recognition such as softmax or tripletlos to meet the requirements of the scene.

For the weight type face recognition model, because the weight type face recognition model has larger model capacity and stronger generalization capability, the weight type face recognition model generally has good effect when the scene data is subjected to fine adjustment or transfer learning training. However, for a lightweight face recognition model, due to its design characteristics, the model capacity is small, and its generalization capability is weaker than that of a conventional weight model, when a face image with local occlusion appears, such as: the mask is arranged to shield the mouth, the cap and the bang shield the forehead, the eyebrows and the like, the electronic equipment with the light-weight face recognition model is difficult to accurately recognize the face, and the face recognition accuracy is low.

Disclosure of Invention

The embodiment of the invention aims to provide a face recognition method, electronic equipment and a storage medium, which can improve the accuracy of face recognition.

In order to solve the above technical problem, an embodiment of the present invention provides a method for face recognition, including: acquiring a face image to be recognized as a target face image; inputting the target face image into a preset target face recognition model to obtain a recognition result of the target face image, wherein the target face recognition model comprises a basic face recognition model and a local recognition network, the basic face recognition model is used for extracting a characteristic image of the target face image, and the local recognition network is used for acquiring local characteristics of the target face image.

An embodiment of the present invention also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of face recognition as described above.

The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to realize the above-mentioned face recognition method.

According to the face recognition method in the embodiment of the application, the pre-trained target face recognition model comprises a basic face recognition model and a local recognition network, the basic face recognition model can obtain the feature image of the target face image, the local recognition network can obtain the local feature of the face image, and due to the fact that the local recognition network is introduced into the recognition target face recognition model, the obtained target face recognition model can obtain the local feature of the face, accuracy of the target face recognition model on the face with the shielding is improved, and even if the target face recognition model adopts a light weight network, the face with the shielding can be accurately recognized.

In addition, inputting each face sample image into a base face recognition model which is trained in advance, and acquiring a first feature image of each face sample image from an initial feature extraction layer of the base face recognition model; training an initial local recognition network to be convergent according to each first feature image and the basic face recognition model, wherein the local recognition network is connected with the initial feature extraction layer and is used for dividing the first feature image into N recognition areas and acquiring N local features of a face, and N is an integer greater than 1; and generating a target face recognition model according to the converged local recognition network and the basic face recognition model. The basic face recognition model is a model which is trained to be convergent in advance, so that the basic face recognition model is more accurate, and the first characteristic image acquired by the basic face recognition model is also accurate; the local recognition network is connected with the initial feature extraction layer of the basic face recognition model, divides the first feature image into N recognition areas and obtains N local features of the face, so that the obtained N local features are more accurate; the target face recognition model is determined based on the converged local recognition network and the basic face recognition model, the local recognition network is used for acquiring N local features of each first feature image, the basic face recognition model can acquire the first feature images, and due to the fact that the local features are introduced in the training process, the obtained target face recognition model not only can acquire all features of a face, but also can acquire local features of the face, and therefore accuracy of the target face recognition model in face recognition with shielding is improved.

In addition, before training the initial local recognition network to converge according to each first feature image, the method further comprises: a local average pooling layer is arranged between a global pooling layer and an initial feature extraction layer of the basic face recognition model, and is used for dividing the first feature image into N recognition regions; and after the local average pooling layer, respectively setting corresponding local feature extraction networks for the N identification areas to obtain a network structure of the local identification network, wherein the local feature extraction network comprises a first full connection layer. The first feature image is divided into N recognition areas through a local average pooling layer, a corresponding local feature extraction network is arranged for each recognition area and used for extracting local features, the local average pooling layer and the local feature extraction network form a local recognition network, and the local recognition network is a network newly added on a basic face recognition model, so that the recognition capability of the basic face recognition model is not influenced, and the applicability of the training method is improved.

In addition, the pooling core size of the locally averaged pooling layer is N x M, N represents the height of the pooling core, M represents the width of the pooling core, and both N and M are integers greater than 1; the length of the pooling step length is N-1, the width of the pooling step length is 1, and the height of the pooling step length is along the extension direction of the nose of the human face. Realize the piecemeal to first characteristic image through local average pooling layer, simultaneously, the height of pooling step length is relevant with N, and the direction of height is the direction that the nose of people's face extends, therefore, this local average pooling layer is to the direction of people's face along the nose to carry out the piecemeal, thereby can divide close people's face key point in a region, for example, the left and right eyes are drawn into in same region, thereby be convenient for draw the characteristic in every identification area, the target face recognition model after the training of being convenient for carries out accurate recognition to the face that shelters from.

In addition, training an initial local recognition network to converge according to each first feature image and the basic face recognition model comprises the following steps: inputting each first characteristic image into a local average pooling layer to obtain N identification areas of each first characteristic image; and adjusting network parameters in the local feature extraction networks corresponding to the N identification areas respectively according to the N identification areas of the first feature images and the second full-connection layer of the basic face identification model until the N local feature extraction networks and the second full-connection layer of the basic face identification model are converged to obtain the converged local identification network. By training each local feature extraction network, when the N local feature extraction networks converge and the second full-connection layer converges, the local features of each recognition area and the global features in the first feature image can be accurately extracted, and the problem of reducing the capability of the basic face recognition model for recognizing the global features is avoided.

In addition, after the local average pooling layer, respectively setting corresponding local feature extraction networks for the N identification areas, and obtaining the network structure of the local identification network, the method further comprises: and setting a preset proportion for each first full connection layer, so that the N first full connection layers output corresponding local features according to the dimensionality of the preset proportion, wherein the sum of the dimensionalities of the local features output by the N first full connection layers is equal to the dimensionality of the global feature output by the second full connection layer. The dimension sum of the local features output by the N first full-connection layers is equal to the dimension of the global feature output by the second full-connection layer, so that in the process of training the target face recognition model, the proportion of the local features in training is not more than the global feature of the face, the accuracy of recognition under the face non-shielding condition can be ensured, and the accuracy of non-shielding face recognition can also be ensured not to be reduced.

In addition, generating a target face recognition model according to the converged local recognition network and the basic face recognition model, comprising: fusing N local features output by the local recognition network and global features output by the basic recognition network as the recognition features of the human face; and adjusting each network parameter in the basic face recognition model and the local recognition network according to the face features until the basic face recognition model and the local recognition network are converged. And the global characteristics of the human face and the local characteristics of the human face are fused, and the converged local recognition network and the basic human face recognition model are trained according to the obtained recognition characteristics of the human face, so that the accuracy of the target human face recognition network is further improved.

In addition, the method for fusing N local features output by the local recognition network and the global feature output by the basic recognition network as the recognition feature of the face comprises the following steps: setting respective corresponding fusion proportions for the N local features and the global feature; and fusing the N local features and the global feature according to the fusion proportion to generate the recognition feature of the face.

In addition, the pooling core size was 3 × M. N is 3, namely the human face is divided into 3 areas, and the height of the pooling step length is 2, so that the human face can be uniformly divided into an upper part, a middle part and a lower part, and different identification areas have respective characteristics by the dividing mode, so that the human face identification is facilitated.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

Fig. 1 is a flowchart of a method for face recognition according to a first embodiment of the present application;

FIG. 2 is a schematic diagram of training a target face recognition model in a face recognition method according to a first embodiment of the present application;

FIG. 3 is a schematic structural diagram of a basic face recognition model according to a first embodiment of the present application;

fig. 4 is a flowchart of a method for face recognition according to a second embodiment of the present application;

fig. 5 is a schematic network structure diagram of a local identification network according to a second embodiment of the present application;

fig. 6 is a schematic diagram of a human face including 3 recognition regions according to a second embodiment of the present application;

fig. 7 is a flowchart of a method for face recognition according to a third embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present invention, and the embodiments may be mutually incorporated and referred to without contradiction.

A first embodiment of the present invention relates to a method of face recognition. The flow of the face recognition method of the present application is shown in fig. 1:

step 101: and acquiring a face image to be recognized as a target face image.

Specifically, the target face recognition model is deployed on the electronic device, the target face recognition model recognizes faces in the images according to the acquired images, and the application scenarios of the target face recognition model are numerous, for example, the face recognition model is applied to the intelligent door lock. The target face image can be acquired by an electronic device with a target face recognition model. Or other image acquisition equipment (such as a camera and the like) acquires the face image and transmits the face image to electronic equipment deployed with the target face recognition model.

Step 102: inputting the target face image into a preset target face recognition model to obtain a recognition result of the target face image, wherein the target face recognition model comprises a basic face recognition model and a local recognition network, the basic face recognition model is used for extracting a characteristic image of the target face image, and the local recognition network is used for acquiring local characteristics of the target face image.

Specifically, the target face recognition model may extract a feature image of the target face image, and may also obtain a local feature of the target face image, and the target face recognition model may obtain a recognition result of the face based on the feature image and the local feature of the target face image, where the recognition result may be a similarity between the face and each sample image in the repository.

According to the face recognition method in the embodiment of the application, the pre-trained target face recognition model comprises a basic face recognition model and a local recognition network, the basic face recognition model can obtain the characteristic image of the target face image, the local recognition network can obtain the local characteristic of the face image, and the local recognition network is introduced into the recognition target face recognition model, so that the obtained target face recognition model can obtain the local characteristic of the face, the accuracy of the target face recognition model on the face with the shielding is improved, and the face with the shielding can be accurately recognized even if the target face recognition model adopts a light-weight network.

It should be noted that, before executing step 102, a target face recognition model is trained in advance, and the training of the target face recognition model may be executed by an electronic device, and the flow of the process is shown in fig. 2:

step 1021: and inputting each face sample image into a base face recognition model which is trained in advance, and acquiring a first feature image of each face sample image from an initial feature extraction layer of the base face recognition model.

The basic face recognition model can adopt a weight model and also can adopt a light model. In this embodiment, taking the lightweight model, ShuffleNetV2 as an example, the basic face recognition model is trained in advance to converge, and the training process may be: and training the light-weight model by using a general human face training set to obtain a pre-training recognition model. And then, fine-tuning the pre-trained recognition model according to the specific scene data to obtain a basic face recognition model suitable for the specific scene, for example, using a loss function of face recognition such as softmax or tripletloss to meet the requirements of the scene. The specific scene is a scene to be applied by the face recognition model, such as an intelligent door lock recognition scene, an intelligent robot and the like.

The input data of the basic face recognition model is a face image, and the output data is a first characteristic image; for example, the size of the input face image is set to 224x224, and when the face image reaches the global average pooling layer (global _ avg _ pooling) after 4 times of downsampling, the feature map size is 7x7(224/32), and the downsampling can be implemented by the convolutional layer with the step size equal to 2 or the pooling layer with the step size equal to 2.

For ease of understanding, the network structure of the underlying face recognition model is described below in conjunction with FIG. 3. As shown in fig. 3, the base face recognition model includes an initial feature extraction layer (e.g., a1 in fig. 3), a global average pooling layer (e.g., a2 in fig. 3), and a second fully-connected layer (e.g., A3 in fig. 3), and the base face recognition model outputs global features, which may be of the input image, via the second fully-connected layer. The basic face recognition model compares the global features of the input image with the stored global features, and recognizes the face in the input image according to the comparison result. Only the structure of the global feature extraction part of the base face recognition model is shown in fig. 2.

After the basic face recognition model trained to be convergent is obtained, the face sample images in the training set can be input into the basic face recognition model, and the first feature images corresponding to the face sample images are obtained from the initial feature extraction layer of the basic face recognition model.

Step 1022: and training an initial local recognition network to be convergent according to each first feature image and the basic face recognition model, wherein the local recognition network is connected with the initial feature extraction layer and is used for dividing the first feature image into N recognition areas and acquiring N local features of the face, and N is an integer greater than 1.

Specifically, a local recognition network is connected to the initial feature extraction layer, the local recognition network divides the first feature image output by the initial feature extraction layer into N recognition areas, and the local recognition network further obtains local features of the N recognition areas. The local identification network can comprise an average pooling layer and a first full-connection layer which is arranged corresponding to the N identification regions, and feature extraction of the N identification regions is realized through the first full-connection layer.

It should be noted that, in the process of training the initial local recognition network, a first classifier is set behind the first full connection layer, the labeling of the local features in the training is realized through the first classifier, when the results of N first classifiers converge, the local recognition network is indicated to converge, and after the local recognition network converges, the first classifier is deleted, so that the converged local recognition network is obtained.

Step 1023: and generating a target face recognition model according to the converged local recognition network and the basic face recognition model.

The converged local recognition network and the basic face recognition model form a target face recognition model.

In the face recognition method in the embodiment of the application, because the basic face recognition model is a model which is trained to be convergent in advance, the basic face recognition model is more accurate, so that the first characteristic image obtained by the basic face recognition model is also accurate; the local recognition network is connected with the initial feature extraction layer of the basic face recognition model, divides the first feature image into N recognition areas and obtains N local features of the face, so that the obtained N local features are more accurate; the target face recognition model is determined based on the converged local recognition network and the basic face recognition model, the local recognition network is used for acquiring N local features of each first feature image, the basic face recognition model can acquire the first feature images, and the local features are introduced in the training process, so that the obtained target face recognition model can acquire the local features of the face, the accuracy of the target face recognition model in face recognition with shielding is improved, and the shielded face can be accurately recognized even if the target face recognition model adopts a light weight network.

A second embodiment of the present invention relates to a method of face recognition. The second embodiment specifically describes a process of constructing a local identification network. The method of face recognition is shown in fig. 4.

Step 201: and inputting each face sample image into a base face recognition model which is trained in advance, and acquiring a first feature image of each face sample image from an initial feature extraction layer of the base face recognition model.

This step is similar to step 101 in the first embodiment, and will not be described here.

Step 202: and a local average pooling layer is arranged between the global pooling layer and the initial feature extraction layer of the basic face recognition model, and is used for dividing the first feature image into N recognition regions.

Before training a local recognition network, the local recognition network can be constructed in advance, a local average pooling layer is arranged between a global pooling layer and an initial feature extraction layer of a basic face recognition model, the size of a pooling core of the local average pooling layer is N x M, N represents the height of the pooling core, M represents the width of the pooling core, and N and M are integers larger than 1; the height of the pooling step length is N-1, the width of the pooling step length is 1, and the height of the pooling step length is along the extension direction of the nose of the human face.

Specifically, as shown in fig. 5, a network structure of the local recognition network is that a local average pooling layer B1 is provided between the initial feature extraction layer a1 and the global average pooling layer a2, and the local average pooling layer B1 is connected to N first fully-connected layers B2. Since the features of the eyes, nose, and mouth in the key points of the face have a key role in face recognition, based on which the face can be divided into 3 recognition regions, the pooling kernel size of the local average pooling layer is 3 × M, M being an integer greater than 1, and in this example, the value of M is the same as the width of the first feature image, for example, the size of the first feature image is 7 × 7, and the value of M can be set to 7, so as to divide the recognition regions for the first feature image. The process of dividing the first feature image into N identification areas is described below.

Assuming that the dimension of the first feature image is BxHxWxC, where B denotes a dimension of a batch size (batch size), H denotes a height of the first feature image, W denotes a width of the first feature image, and C denotes a dimension of a channel of the first feature image, in this embodiment, the height and the width of the first feature image are exemplified by 7x7, and the first feature image is subjected to a blocking operation in the high dimension of the first feature image according to the characteristics of the face structure, that is, all of the other three dimensions (i.e., B, W and C) remain unchanged. In order to reduce the computational pressure of the local identification network, the division of the overlapping identification area can be realized by using a fixed-step average pooling layer. For example, adding an average pooling kernel higher than 3 to the base face recognition model, the width of the average pooling kernel being equal to 7, the height of the pooling step being 2, and the width of the pooling step being 1, the newly added average pooling layer can divide the face into 3 recognition regions, as shown in fig. 6, where the face image of the mask is shown in fig. 6, and the 3 recognition regions are the upper face D1, the middle face D2 and the lower face D3. Under the condition that the network structure of the original basic face recognition model is kept unchanged, the input dimension of the newly added local average pooling layer is Bx3x1xC, then the H dimension can be subjected to average segmentation by using a split operator, the dimensions of output branches of three local features are Bx1x1xC, and the three output local features correspond to the upper, middle and lower regions of the original input face sample image.

Step 203: and after the local average pooling layer, respectively setting corresponding local feature extraction networks for the N identification areas to obtain a network structure of the local identification network, wherein the local feature extraction network comprises a first full connection layer.

Specifically, if there are N identification regions, there are N corresponding first full-connection layers, and each identification region is connected to one first full-connection layer. A preset proportion may be set for each first fully-connected layer, so that the N first fully-connected layers output corresponding local features according to the dimensionality of the preset proportion, where the sum of the dimensionalities of the local features output by the N first fully-connected layers is equal to the dimensionality of the global feature output by the second fully-connected layer. For example, assuming that the output dimension of the second fully-connected layer in the base face recognition model is E, the output dimensions of the three second fully-connected layers at the corresponding face positions above, in and below can be set to E/4, E/2 and E/4, respectively, and the total dimension of the output dimensions of the three second fully-connected layers is E.

After

steps

202 and 203 are completed, the network structure of the local identification network is obtained. The values of the network parameters in the local identification network may be preset to form the initial local identification network.

Step 204: and inputting each first characteristic image into a local average pooling layer to obtain N identification areas of each first characteristic image.

The process of training the initial local recognition network to converge will be specifically described in step 204 and step 205. And inputting the first characteristic image into a local average pooling layer to obtain N identification areas of the first characteristic image.

Step 205: and adjusting network parameters in the local feature extraction networks corresponding to the N identification areas respectively according to the N identification areas of the first feature images and the second full-connection layer of the basic face identification model until the N local feature extraction networks and the second full-connection layer of the basic face identification model are converged to obtain the converged local identification network.

Network weight parameters are not in a local average pooling layer and a split layer in a local recognition network, three first full-connection layers (denoted as fc1, fc2 and fc3) need to be trained to update network parameters, and in order to keep the recognition capability of the existing basic face recognition model not to be reduced and reasonably update the network parameters of the first full-connection layers, softmax can be used for training and updating, and the specific process is as follows:

note that the first classifiers connected to the three first full-link layers are C1, C2, and C3, respectively, and the second classifier connected to the second full-link layer is C0; the second fully-connected layer output global features are denoted fc0, and the local features of the 3 first fully-connected layer outputs are denoted fc1, fc2, and fc3, respectively. The number of classes of the classifier is the numerical value of the face class in the training set; and only updating the network parameters in fc1, fc2, fc3, C0, C1, C2 and C3 during training, and freezing the network parameters of all other layers, wherein the frozen layer comprises a second fully-connected layer. It can be understood that, since the global features and the local features both affect the face recognition effect, in the training process of training the local recognition network, the sum of the loss functions of the second classifier and the N first classifiers may be used as the local loss function of the local recognition network, and then the network parameters of the 3 first classifiers are adjusted according to the local loss function until convergence. Usually, the weight of each loss function is set to 1, and the weight can be actually set according to the training set. Because the network parameters of the second full connection layer are fixed, the convergence speeds of the four first classifiers in the training are as follows in sequence: c0_ softmax > C2_ softmax > C3_ softmax or C1_ softmax, wherein the first classifier can adopt a softmax classifier, C0_ softmax represents the second classifier, C1_ softmax to C3_ softmax are 3 first classifiers, the convergence speeds of C3_ softmax and C1_ softmax are related to specific scene training sets, and the convergence speeds are the same; when C2 softmax converges, for example, the value of the loss function of C2 softmax is < 1.0. C3_ softmax and C1_ softmax approach a converged state, for example: the loss functions of C3_ softmax and C1_ softmax have a value of <2.0, which indicates that the newly added characteristics output by the three first full-connection layers have independent recognition capability, and the training can be stopped at the moment; the training process does not recommend the use of the tripletloss classifier because it is difficult or time consuming to get from the initialization state to the convergence state.

Step 206: and generating a target face recognition model according to the converged local recognition network and the basic face recognition model.

In this embodiment, by training each local feature extraction network, when the N local feature extraction networks converge and the second full-link layer converges, the local features of each recognition area and the global features in the first feature image can be accurately extracted.

A third embodiment of the present invention relates to a method of face recognition. The third embodiment is a further improvement of the first embodiment, and the main improvement lies in that: in the third embodiment of the present invention, the local features and the global features are fused as the recognition features of the face, and the network parameters of the basic face recognition model and the basic recognition network are adjusted based on the recognition features of the face, so as to obtain the target face recognition model. The flow is shown in fig. 7:

step 301: and inputting each face sample image into a pre-trained basic face recognition model, and acquiring a first feature image of each face sample image from an initial feature extraction layer of the basic face recognition model.

Step 302: and training an initial local recognition network to be convergent according to each first feature image and the basic face recognition model, wherein the local recognition network is connected with the initial feature extraction layer and is used for dividing the first feature image into N recognition areas and acquiring N local features of the face, and N is an integer greater than 1.

Step 3031: and fusing N local features output by the local recognition network and the global feature output by the basic recognition network as the recognition feature of the face.

Specifically, the global features and the local features, for example, FC0, FC1, FC2 and FC3, can be directly fused to obtain the recognition features FC of the human face. And a concat mode can be adopted for fusion, and the dimension of the obtained identification feature of the face is 2 times of the dimension of the global feature because the dimension sum of the local features output by the N first full-connection layers is equal to the dimension of the global feature output by the second full-connection layer.

In one example, setting respective corresponding fusion proportions for the N local features and the global features; and fusing the N local features and the global feature according to the fusion proportion to generate the recognition feature of the face.

For example, the dimensions of one second fully-connected layer output feature and the dimensions of 3 first fully-connected layer output features are, in order: the size setting experience of fc0> fc2> fc3 is fc4 according to the degree of influence of the face part on the recognition, and the convergence speed of softmax can also prove the rationality of the setting. Due to the inconsistent dimensions, the contribution influence of each part on the recognition result is different, which is equivalent to giving a default weight to each part. The fusion ratio is consistent with the preset ratio of each first fully-connected layer.

Step 3032: and adjusting each network parameter in the basic face recognition model and the local recognition network according to the face characteristics until the basic face recognition model and the local recognition network are converged.

Specifically, the learning rate of the model can be reduced, the network parameters in the basic face recognition model are unfrozen, and the network parameters in the basic face recognition model and the local recognition network are readjusted until the basic face recognition model and the local recognition network converge. When the learning rate is decreased, the convergence point is searched again, the new learning rate lr is 1e-5, le represents the last learning rate, and lr represents the currently adjusted learning rate.

In this embodiment, the local features and the global features are fused to obtain the recognition features of the face, and network parameters in the basic face recognition model and the local recognition network are adjusted according to the recognition features, so that the accuracy of the face recognition of the target face recognition model can be further improved.

The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.

A fourth embodiment of the present invention relates to an electronic device having a configuration as shown in fig. 8, and including: at least one processor 401; and a memory 402 communicatively coupled to the at least one processor 401; the memory 402 stores instructions executable by the at least one processor 401, and the instructions are executed by the at least one processor 401, so that the at least one processor 401 can execute the above-mentioned face recognition method.

The memory 402 and the processor 401 are connected by a bus, which may include any number of interconnected buses and bridges that link one or more of the various circuits of the processor 401 and the memory 402. The bus may also link various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 401 may be transmitted over a wireless medium via an antenna, which may receive the data and transmit the data to the processor 401.

The processor 401 is responsible for managing the bus and general processing and may provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory may be used to store data used by the processor in performing operations.

A fifth embodiment of the present invention relates to a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described face recognition method.

Those skilled in the art can understand that all or part of the steps in the method of the foregoing embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A method of face recognition, comprising:

acquiring a face image to be recognized as a target face image;

inputting the target face image into a preset target face recognition model to obtain a recognition result of the target face image, wherein the target face recognition model comprises a basic face recognition model and a local recognition network, the basic face recognition model is used for extracting a feature image of the target face image, and the local recognition network is used for acquiring local features of the target face image;

before the target face image is input to a preset target face recognition model and a recognition result of the target face image is obtained, the method further comprises:

inputting each face sample image into a base face recognition model which is trained in advance, and acquiring a first feature image of each face sample image from an initial feature extraction layer of the base face recognition model;

training an initial local recognition network to be convergent according to each first feature image and the basic face recognition model, wherein the local recognition network is connected with the initial feature extraction layer and is used for dividing the first feature image into N recognition regions and acquiring N local features of a face, N is an integer greater than 1, and the local recognition network comprises local feature extraction networks corresponding to the N recognition regions;

generating a target face recognition model according to the converged local recognition network and the basic face recognition model;

the training an initial local recognition network to converge according to each first feature image and the basic face recognition model comprises:

and adjusting network parameters in the local feature extraction networks corresponding to the N identification areas until the N local feature extraction networks and the second full-connection layer of the basic face identification model are converged to obtain the converged local identification network.

2. The method of claim 1, wherein before the training an initial local recognition network to converge according to each of the first feature images, the method further comprises:

setting a local average pooling layer between the global pooling layer of the basic face recognition model and the initial feature extraction layer, wherein the local average pooling layer is used for dividing the first feature image into N recognition regions;

and after the local average pooling layer, respectively setting corresponding local feature extraction networks for the N identification areas to obtain the network structure of the local identification network, wherein the local feature extraction network comprises a first full-connection layer.

3. The method of claim 2, wherein the pooling kernel size of the local average pooling layer is N x M, wherein N represents the height of the pooling kernel, wherein M represents the width of the pooling kernel, and wherein both N and M are integers greater than 1; the height of the pooling step length is N-1, the width of the pooling step length is 1, and the height of the pooling step length is in the direction extending along the nose of the face.

4. The method of claim 2 or 3, wherein the training an initial local recognition network to converge according to each of the first feature images and the basic face recognition model comprises:

inputting each first feature image into the local average pooling layer to obtain N identification areas of each first feature image;

and adjusting network parameters in local feature extraction networks corresponding to the N identification areas respectively according to the N identification areas of the first feature images and the second full-connection layer of the basic face identification model until the N local feature extraction networks and the second full-connection layer of the basic face identification model are converged to obtain the converged local identification network.

5. The method of face recognition according to claim 4, wherein after the local average pooling layer, the N recognition regions are respectively provided with corresponding local feature extraction networks, and after the network structure of the local recognition network is obtained, the method further comprises:

and setting a preset proportion for each first fully-connected layer, so that the N first fully-connected layers output corresponding local features according to the dimension of the preset proportion, wherein the dimension sum of the local features output by the N first fully-connected layers is equal to the dimension of the global feature output by the second fully-connected layer.

6. The method according to any one of claims 1 to 3, wherein the generating a target face recognition model according to the converged local recognition network and the base face recognition model comprises:

fusing N local features output by the local recognition network and global features output by the basic recognition network to serve as recognition features of the human face;

and adjusting each network parameter in the basic face recognition model and the local recognition network according to the recognition characteristics of the face until the basic face recognition model and the local recognition network are converged.

7. The method of claim 6, wherein fusing the N local features output by the local recognition network and the global feature output by the basic recognition network as the recognition features of the human face comprises:

setting respective corresponding fusion proportions for the N local features and the global feature;

and fusing the N local features and the global feature according to the fusion proportion to generate the recognition feature of the face.

8. An electronic device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of face recognition according to any one of claims 1-7.

9. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the method of face recognition according to any one of claims 1 to 7.