CN116129499A

CN116129499A - Occlusion face recognition method and device, electronic equipment and storage medium

Info

Publication number: CN116129499A
Application number: CN202310028458.7A
Authority: CN
Inventors: 蒋召; 黄泽元
Original assignee: Beijing Longzhi Digital Technology Service Co Ltd
Current assignee: Beijing Longzhi Digital Technology Service Co Ltd
Priority date: 2023-01-09
Filing date: 2023-01-09
Publication date: 2023-05-16

Abstract

The disclosure relates to the field of artificial intelligence and provides a method, a device, electronic equipment and a storage medium for face identification shielding. The method comprises the following steps: acquiring a face picture to be recognized, wherein the face picture to be recognized is a shielding face picture or a non-shielding face picture; inputting the face picture to be recognized into a convolutional neural network for feature extraction to obtain a face feature picture; inputting the face feature map into a multi-scale attention network for processing to obtain an attention weight matrix of the face feature map; and inputting the face feature map and the attention weight matrix into a recognition network, and outputting a face recognition result of the face picture to be recognized. The human face recognition method and device can greatly reduce the degree of manual intervention, reduce the labor cost, radically solve the problem of face recognition under the shielding scene, and remarkably improve the effect of face recognition.

Description

Occlusion face recognition method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of artificial intelligence, and in particular relates to a method, a device, electronic equipment and a storage medium for face identification shielding.

Background

Along with the continuous development of deep learning, the face recognition effect under the general scene is better, but when the face mask, the sunglasses and the like are faced to shield the scene, the face recognition effect is often poor.

The existing face recognition method in the occlusion scene generally relies on manual generation of large-scale occlusion face data, and then the face recognition effect is improved by training the occlusion face data. However, the method not only needs a large amount of manual intervention and has high labor cost, but also does not fundamentally solve the face recognition problem in the shielding scene, and the face recognition effect is still poor.

Disclosure of Invention

In view of the above, the embodiments of the present disclosure provide a method, an apparatus, an electronic device, and a storage medium for face recognition, so as to solve the problems in the prior art that a large amount of manual intervention is required for face recognition in a shielding scene, the labor cost is high, and the recognition effect is poor.

In a first aspect of an embodiment of the present disclosure, there is provided an occlusion face recognition method, including:

acquiring a face picture to be recognized, wherein the face picture to be recognized is a shielding face picture or a non-shielding face picture;

inputting the face picture to be recognized into a convolutional neural network for feature extraction to obtain a face feature picture;

inputting the face feature map into a multi-scale attention network for processing to obtain an attention weight matrix of the face feature map;

and inputting the face feature map and the attention weight matrix into a recognition network, and outputting a face recognition result of the face picture to be recognized.

In a second aspect of the embodiments of the present disclosure, there is provided an occlusion face recognition device, including:

the acquisition module is configured to acquire a face picture to be identified, wherein the face picture to be identified is an occlusion face picture or an unoccluded face picture;

the extraction module is configured to input the face picture to be identified into a convolutional neural network for feature extraction to obtain a face feature map;

the processing module is configured to input the face feature map into a multi-scale attention network for processing to obtain an attention weight matrix of the face feature map;

the recognition module is configured to input the face feature map and the attention weight matrix into a recognition network and output a face recognition result of the face picture to be recognized.

In a third aspect of the disclosed embodiments, an electronic device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect of the disclosed embodiments, a computer-readable storage medium is provided, which stores a computer program which, when executed by a processor, implements the steps of the above-described method.

Compared with the prior art, the embodiment of the disclosure has the beneficial effects that: acquiring a face picture to be recognized, wherein the face picture to be recognized is a shielding face picture or a non-shielding face picture; inputting the face picture to be recognized into a convolutional neural network for feature extraction to obtain a face feature picture; inputting the face feature map into a multi-scale attention network for processing to obtain an attention weight matrix of the face feature map; the face feature map and the attention weight matrix are input into a recognition network, and the face recognition result of the face picture to be recognized is output, so that the degree of manual intervention is greatly reduced, the labor cost is reduced, the face recognition problem in a shielding scene can be fundamentally solved, and the face recognition effect is remarkably improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 is a schematic flow chart of an occlusion face recognition method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a network structure of an occlusion-face recognition model in an occlusion-face recognition method according to an embodiment of the present disclosure;

fig. 3 is a network structure schematic diagram of an identification network 203 in an occlusion recognition method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an apparatus for face recognition by masking according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

An occlusion face recognition method and apparatus according to embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of an occlusion face recognition method according to an embodiment of the present disclosure. As shown in fig. 1, the occlusion face recognition method includes:

step S101, obtaining a face picture to be recognized, wherein the face picture to be recognized is an occlusion face picture or a non-occlusion face picture.

The picture to be identified may be an image/picture containing a human face captured by a camera device (e.g., a monocular camera, a binocular camera, a surveillance camera, etc.). Or an image/picture containing a human face, which is intercepted from a video stream acquired by the camera device. The face picture to be recognized can be a non-shielding face picture or a shielding face picture. An unobstructed face picture is generally a face picture that reveals a front face without other obstructing objects on the face. The blocked face picture is usually a face picture with a mask, glasses or sunglasses, a hat, or only a side face (not a front face) exposed.

Step S102, inputting the face picture to be recognized into a convolutional neural network for feature extraction, and obtaining a face feature map.

The convolutional neural network may be a depth residual network, such as a ResNet50 convolutional neural network, or the like. The ResNet50 convolutional neural network comprises 49 convolutional layers and a fully-connected layer.

The ResNet50 convolutional neural network is adopted to extract the characteristics of the face picture to be identified, so that the parameter quantity can be reduced to a certain extent, the calculated amount can be reduced to a certain extent, and the face recognition efficiency is improved.

Step S103, inputting the face feature map into a multi-scale attention network for processing to obtain an attention weight matrix of the face feature map.

The face picture often contains abundant characteristic information, including low-order characteristics such as color, brightness, texture, direction and the like, and also including high-order characteristics such as gesture, expression, age, race and the like. According to the embodiment, the face feature map is input into the multi-scale attention network designed by the disclosure for processing, so that rich feature information in the face map can be better, faster and more comprehensively extracted, and the subsequent face recognition effect can be improved.

The attention weight matrix is a 512-dimensional matrix formed by the attention value (the numerical value of 0-1) of each channel of the face feature map obtained through the multi-scale attention network.

Step S104, inputting the face feature image and the attention weight matrix into a recognition network, and outputting a face recognition result of the face image to be recognized.

According to the technical scheme provided by the embodiment of the disclosure, the face picture to be recognized is a shielding face picture or a non-shielding face picture by acquiring the face picture to be recognized; inputting the face picture to be recognized into a convolutional neural network for feature extraction to obtain a face feature picture; inputting the face feature map into a multi-scale attention network for processing to obtain an attention weight matrix of the face feature map; the face feature map and the attention weight matrix are input into a recognition network, and the face recognition result of the face picture to be recognized is output, so that the degree of manual intervention is greatly reduced, the labor cost is reduced, the face recognition problem in a shielding scene can be fundamentally solved, and the face recognition effect is remarkably improved.

Fig. 2 is a schematic diagram of a network structure of an occlusion-face recognition model in an occlusion-face recognition method according to an embodiment of the present disclosure.

As shown in fig. 2, the occlusion face recognition model includes a convolutional neural network 201, a multi-scale attention network 202, and a recognition network 203. The convolutional neural network 201 includes a first network layer 2011, a second network layer 2012, a third network layer 2013, a fourth network layer 2014, and a fifth network layer 2015 connected in sequence. The first network layer 2011 does not contain residual blocks, mainly including convolution blocks and pooling blocks, and the second network layer 2012, the third network layer 2013, the fourth network layer 2014, and the fifth network layer 2015 all contain residual blocks. The multi-scale attention network 202 comprises a first input layer 2021, a second input layer 2022, a third input layer 2023; a feature fusion layer 2024 connected to the first input layer 2021, the second input layer 2022, and the third input layer 2023, a global average pooling layer 2025 connected to the feature fusion layer 2024, a first full connection layer 2026 connected to the global average pooling layer 2025, and an activation function layer 2027 connected to the first full connection layer 2026. A second network layer 2012 is connected to the first input layer 2021, the third network layer 2013 is connected to the second input layer 2022, the fourth network layer 2014 is connected to the third input layer 2023, and the fifth network layer 2015 is connected to the identification network 203.

In some embodiments, the step S102 specifically includes the following steps:

inputting the face picture to be recognized into the first network layer, and outputting a first feature map;

inputting the first characteristic diagram into the second network layer, and outputting a second characteristic diagram;

inputting the second characteristic diagram into the third network layer, and outputting a third characteristic diagram;

inputting the third feature map into the fourth network layer, and outputting a fourth feature map;

and inputting the fourth feature map into the fifth network layer, and outputting a face feature map.

With reference to fig. 2, assuming that the face picture to be recognized is a face picture a photographed by a mask, inputting the face picture a into a first network layer 2011 in a convolutional neural network 201 in an occlusion face recognition model, and outputting a first feature map A1 through calculation of convolution, regularization, activation function and maximum pooling; then, the first feature map A1 is input into a second network layer 2012 in the convolutional neural network 201, and after convolution operation is performed by the second network layer 2012, a second feature map A2 is output; then, the second feature map A2 is input into a third network layer 2013 in the convolutional neural network 201, and after convolution operation is performed by the third network layer 2013, a third feature map A3 is output; then, the third feature map A3 is input into a fourth network layer 2014 in the convolutional neural network 201, and after convolutional operation is performed by the fourth network layer 2014, a fourth feature map A4 is output; finally, the fourth feature map A4 is input into a fifth network layer 2015 in the convolutional neural network 201, and the face feature map A5 is output after the convolution operation is performed by the fifth network layer 2015.

In some embodiments, the step S103 specifically includes the following steps:

inputting the face feature map and the fourth feature map into the third input layer, and outputting a first fusion feature map;

inputting the first fusion feature map and the third feature map into the second input layer, and outputting a second fusion feature map;

inputting the second fusion feature map and the second feature map into the first input layer, and outputting a third fusion feature map;

inputting the first fusion feature map, the second fusion feature map and the third fusion feature map into the feature fusion layer, and outputting a total fusion feature map;

and sequentially inputting the total fusion feature map into the global average pooling layer, the first full-connection layer and the activation function layer to obtain the attention weight matrix of the face feature map.

In an embodiment, the inputting the face feature map and the fourth feature map into the third input layer, and outputting a first fused feature map specifically includes:

upsampling the face feature map to obtain an upsampled feature map;

and inputting the up-sampling feature map and the fourth feature map into the third input layer to add the up-sampling feature map and the fourth feature map to obtain a first fusion feature map.

In combination with the above example, the face feature map A5 output by the fifth network layer 2015 is subjected to 2 times up-sampling processing to obtain an up-sampled feature map P1, and then the up-sampled feature map P1 and the fourth feature map A4 output by the fourth network layer 2014 are input into the third input layer 2023, so as to add the up-sampled feature map P1 and the fourth feature map A4 to obtain a first fused feature map Q1.

Similarly, the first fused feature map Q1 obtained above is up-sampled by 2 times (equivalent to up-sampling the face feature map A5 by 4 times) to obtain an up-sampled feature map P2, and then the up-sampled feature map P2 and the third feature map A3 output by the third network layer 2013 are input into the second input layer 2022, so as to add the up-sampled feature map P2 and the third feature map A3 to obtain a second fused feature map Q2.

The second fused feature map Q2 obtained as described above is up-sampled 2 times (equivalent to up-sampling the face feature map A5 by 8 times) to obtain an up-sampled feature map P3, and the up-sampled feature map P3 and the second feature map A2 output from the second network layer 2012 are input into the first input layer 2021 to add the up-sampled feature map P3 and the second feature map A2 to obtain a third fused feature map Q3.

Then, the first fused feature map Q1 output by the third input layer 2023, the second fused feature map Q2 output by the second input layer 2022, and the third fused feature map Q3 output by the first input layer 2021 are all input into the feature fusion layer 2024 of the multi-scale attention network 202, so as to perform addition fusion on the first fused feature map Q1, the second fused feature map Q2, and the third fused feature map Q3, thereby obtaining a total fused feature map Q.

Next, the total fused feature map Q output by the feature fusion layer 2024 is sequentially input into the global average pooling layer 2025, the first full-connection layer 2026 and the activation function layer 2027 (sigmoid activation function layer) in the multi-scale attention network 202 for processing, and the attention weight matrix of the face feature map A5 is output, so as to obtain the attention value (a numerical value of 0-1) of each channel of the face feature map A5. Assuming that the face feature map A5 has 512 channels, the attention weight matrix is a 512-dimensional vector.

In general, under a shielding scene, a shielded face part can cause loss of face feature information, so when face recognition is performed, if a pre-stored face picture and a face picture to be recognized are shielded, the other party is not shielded (if the pre-stored face picture is not shielded, the face picture to be recognized is shielded), a large information comparison difference can be caused, and therefore, how to effectively learn effective feature representation of a shielded face becomes the key of success or failure of face recognition under the shielding scene. According to the embodiment of the disclosure, the facial feature map A5 output by the fifth network layer 2015 is sequentially up-sampled by 2 times, 4 times and 8 times, so that the feature extraction pyramid with different scales is constructed, and the high-level semantic information extracted by the fifth network layer 2015 can be transferred to the bottom layer by layer, so that the bottom layer simultaneously has the high-level semantic information and the detail texture information, and the more abundant and comprehensive facial feature information is extracted, and meanwhile, the facial feature weight of a part of face can be reduced. And then the local fusion characteristic information extracted by the first input layer 2021, the second input layer 2022 and the third input layer 2023 is fused through the characteristic fusion layer 2024 to obtain global fusion characteristic information, so that the face characteristic recognition effect and efficiency under the shielding scene can be obviously improved.

Fig. 3 is a network structure schematic diagram of an identification network 203 in the occlusion recognition method according to the embodiment of the present disclosure.

As shown in fig. 3, the recognition network 203 includes a feature map generation network layer 2031, a second full connection layer 2032, and a classification layer 2033.

In some embodiments, the face feature map and the attention weight matrix are input into a recognition network, and a face recognition result of the face picture to be recognized is output, which specifically includes the following steps:

inputting the face feature map and the attention weight matrix into the feature map generation network layer to generate an attention feature map of the face feature map;

and sequentially inputting the attention feature map into the second full-connection layer and the classification layer, and outputting the face recognition result of the face picture to be recognized.

In combination with the above example, the face feature map A5 output by the fifth network layer 2015 and the attention weight matrix output by the multi-scale attention network 202 are input into the feature map generation network layer 2031 of the recognition network 203 to multiply the face feature map A5 and the attention weight matrix to generate the attention feature map M of the face feature map A5.

Finally, the attention profile M is sequentially input into a second full connection layer 2032 and a classification layer 2033 (such as a softmax classification layer) in the recognition network 203 for processing, and the face recognition result of the face picture to be recognized is output. Assuming that 100 face categories are provided, the face recognition result of the face picture to be recognized is the face recognition probability value corresponding to the 100 face categories.

In some embodiments, the above occlusion face recognition method further includes the following steps:

acquiring a classification label of a face picture in a face database;

and calculating the face recognition loss value according to the face recognition result and the classification label.

In practical application, usually, some face pictures are collected in advance and stored in a face database, and each face picture is marked in a manual or machine marking mode, wherein the marked content comprises a face ID, face key feature point coordinates and the like. The classification labels mainly refer to face IDs, and one face picture corresponds to one classification label.

After the face picture to be recognized is acquired, the face picture to be recognized is input into a shielding face recognition model shown in fig. 2 for recognition, and the probability value of each face picture which is possibly prestored in a face database and corresponds to the face picture to be recognized is output. Assuming that 100 face pictures exist in the face database, the face recognition result output by the occlusion face recognition model provided by the disclosure includes 100 probability values, namely the similarity between the face picture to be recognized and 100 face pictures in the face database.

According to the face recognition result output by the shielding face recognition model and the classification label of the face picture in the face database, the face recognition loss value is calculated, and then the face recognition loss value is used for guiding the shielding face recognition model to conduct network optimization training, so that the face recognition effect and efficiency of the model in the shielding scene can be further improved.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

Fig. 4 is a schematic structural diagram of an apparatus for face recognition by occlusion according to an embodiment of the present disclosure. As shown in fig. 4, the occlusion face recognition device includes:

an obtaining module 401, configured to obtain a face picture to be identified, where the face picture to be identified is an occlusion face picture or an unoccluded face picture;

the extraction module 402 is configured to input the face picture to be identified into a convolutional neural network for feature extraction, so as to obtain a face feature map;

the processing module 403 is configured to input the face feature map into a multi-scale attention network for processing, so as to obtain an attention weight matrix of the face feature map;

the recognition module 404 is configured to input the face feature map and the attention weight matrix into a recognition network, and output a face recognition result of the face picture to be recognized.

In some embodiments, the convolutional neural network comprises a first network layer, a second network layer, a third network layer, a fourth network layer, and a fifth network layer connected in sequence; the multi-scale attention network comprises a first input layer, a second input layer and a third input layer; the feature fusion layer is connected with the first input layer, the second input layer and the third input layer, the global average pooling layer is connected with the feature fusion layer, the first full-connection layer is connected with the global average pooling layer, and the activation function layer is connected with the first full-connection layer; the second network layer is connected with the first input layer, the third network layer is connected with the second input layer, the fourth network layer is connected with the third input layer, and the fifth network layer is connected with the identification network.

In some embodiments, the extracting module 402 may specifically include:

the first extraction unit is configured to input the face picture to be recognized into the first network layer and output a first feature map;

a second extraction unit configured to input the first feature map into the second network layer and output a second feature map;

a third extraction unit configured to input the second feature map into the third network layer and output a third feature map;

a fourth extraction unit configured to input the third feature map into the fourth network layer and output a fourth feature map;

and a fifth extraction unit configured to input the fourth feature map into the fifth network layer and output a face feature map.

In some embodiments, the processing module 403 may specifically include:

the first fusion unit is configured to input the face feature map and the fourth feature map into the third input layer and output a first fusion feature map;

a second fusion unit configured to input the first fusion feature map and the third feature map into the second input layer and output a second fusion feature map;

a third fusion unit configured to input the second fusion feature map and the second feature map into the first input layer and output a third fusion feature map;

the fourth fusion unit is configured to input the first fusion feature map, the second fusion feature map and the third fusion feature map into the feature fusion layer and output a total fusion feature map;

and the attention calculating unit is configured to sequentially input the total fusion feature map into the global average pooling layer, the first full-connection layer and the activation function layer to obtain an attention weight matrix of the face feature map.

In some embodiments, the first fusing unit may be specifically configured to:

upsampling the face feature map to obtain an upsampled feature map;

In some embodiments, the identification network includes a feature map generation network layer, a second full connection layer, and a classification layer. The identification module 404 may specifically include:

the generating unit is configured to input the face feature map and the attention weight matrix into the feature map generating network layer to generate an attention feature map of the face feature map;

the classifying unit is configured to sequentially input the attention feature map into the second full-connection layer and the classifying layer and output the face recognition result of the face picture to be recognized.

In some embodiments, the above-mentioned occlusion face recognition device may further include:

the label acquisition module is configured to acquire classification labels of face pictures in the face database;

and the loss calculation module is configured to calculate a face recognition loss value according to the face recognition result and the classification label.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiments of the disclosure.

Fig. 5 is a schematic diagram of an electronic device 5 provided by an embodiment of the present disclosure. As shown in fig. 5, the electronic apparatus 5 of this embodiment includes: a processor 501, a memory 502 and a computer program 503 stored in the memory 502 and executable on the processor 501. The steps of the various method embodiments described above are implemented by processor 501 when executing computer program 503. Alternatively, the processor 501, when executing the computer program 503, performs the functions of the modules/units in the above-described apparatus embodiments.

The electronic device 5 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 5 may include, but is not limited to, a processor 501 and a memory 502. It will be appreciated by those skilled in the art that fig. 5 is merely an example of the electronic device 5 and is not limiting of the electronic device 5 and may include more or fewer components than shown, or different components.

The processor 501 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

The memory 502 may be an internal storage unit of the electronic device 5, for example, a hard disk or a memory of the electronic device 5. The memory 502 may also be an external storage device of the electronic device 5, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 5. Memory 502 may also include both internal storage units and external storage devices of electronic device 5. The memory 502 is used to store computer programs and other programs and data required by the electronic device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are merely for illustrating the technical solution of the present disclosure, and are not limiting thereof; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included in the scope of the present disclosure.

Claims

1. The method for shielding face recognition is characterized by comprising the following steps of:

2. The method of claim 1, wherein the convolutional neural network comprises a first network layer, a second network layer, a third network layer, a fourth network layer, and a fifth network layer connected in sequence;

the multi-scale attention network comprises a first input layer, a second input layer and a third input layer; the feature fusion layer is connected with the first input layer, the second input layer and the third input layer, the global average pooling layer is connected with the feature fusion layer, the first full-connection layer is connected with the global average pooling layer, and the activation function layer is connected with the first full-connection layer;

the second network layer is connected with the first input layer, the third network layer is connected with the second input layer, the fourth network layer is connected with the third input layer, and the fifth network layer is connected with the identification network.

3. The method of claim 2, wherein inputting the face image to be identified into a convolutional neural network for feature extraction to obtain a face feature image, comprises:

4. A method according to claim 3, wherein inputting the face feature map into a multi-scale attention network for processing to obtain an attention weight matrix of the face feature map comprises:

5. The method of claim 4, wherein inputting the face feature map and the fourth feature map into the third input layer outputs a first fused feature map, comprising:

upsampling the face feature map to obtain an upsampled feature map;

6. The method of claim 1, wherein the identification network comprises a feature map generation network layer, a second full connection layer, and a classification layer;

inputting the face feature map and the attention weight matrix into a recognition network, and outputting a face recognition result of the face picture to be recognized, wherein the face recognition result comprises the following steps:

7. The method according to claim 1, wherein the method further comprises:

acquiring a classification label of a face picture in a face database;

8. A device for face recognition, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.