CN117671751A

CN117671751A - Face recognition method, system and device

Info

Publication number: CN117671751A
Application number: CN202210977193.0A
Authority: CN
Inventors: 颜聪泉; 杨彭举; 谢迪
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2022-08-15
Filing date: 2022-08-15
Publication date: 2024-03-08

Abstract

The embodiment of the application provides a face recognition method, a face recognition system and a face recognition device, which relate to the technical field of image processing, wherein the face recognition method comprises the following steps: acquiring a face image to be identified; according to a preset area determining mode, determining an image area of a gaze point of a target user from a face image to be recognized as a gaze area to be recognized; extracting features based on the face image to be identified and the gaze area to be identified to obtain face features to be identified; and comparing the face features to be identified with preset face features to obtain the identification result of the face image to be identified. By applying the embodiment of the application, the reliability of face recognition can be improved.

Description

Face recognition method, system and device

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a face recognition method, a face recognition system and a face recognition device.

Background

Along with the rapid development of image processing technology, the identity of a user can be authenticated based on face images in scenes such as entrance guard, building and the like.

In the related art, the electronic device may acquire a face image (may be referred to as a registered face image) when a user (may be referred to as a registered user) registers, and subsequently, may calculate a matching degree between the face image of the authenticated user and the registered face image, so as to perform identity authentication on the authenticated user based on the matching degree.

Therefore, in the related art, if a malicious attacker acquires a face image of a registered user in a malicious manner, identity authentication can be performed based on the acquired face image, so that the reliability of face recognition is not high.

Disclosure of Invention

The embodiment of the application aims to provide a face recognition method, a face recognition system and a face recognition device so as to improve the reliability of face recognition. The specific technical scheme is as follows:

according to a first aspect of embodiments of the present application, there is provided a face recognition method, the method including:

acquiring a face image to be identified; when the face image to be recognized is displayed, the point of regard of the target user in the face image to be recognized is located in a face area in the face image to be recognized;

according to a preset area determining mode, determining an image area of a gaze point of a target user from a face image to be recognized as a gaze area to be recognized;

extracting features based on the face image to be identified and the gaze area to be identified to obtain face features to be identified;

comparing the face features to be identified with preset face features to obtain an identification result of the face image to be identified;

the preset face features are as follows: extracting features of a registered face image of a registered user and a registered gazing area of the registered face image, to which a gazing point of the registered user belongs; the registered gazing area is determined according to a preset area determination mode; when the registered face image is displayed to the registered user, the gaze point of the registered user is located in a specified face area in the registered face image.

Optionally, the determining, according to a preset area determining manner, an image area to which a gaze point of the target user belongs from the face image to be identified, as the gaze area to be identified includes:

determining a designated image area taking a gazing point of a target user as a center point from a face image to be recognized as a gazing area to be recognized;

or,

and determining a face segmentation image area containing the gaze point of the target user in the face image to be recognized as a gaze area to be recognized based on the image segmentation.

Optionally, the feature extraction is performed based on the face image to be identified and the gaze area to be identified, to obtain the face feature to be identified, including:

determining the weight of each pixel point in the face image to be recognized based on the gazing area to be recognized;

and extracting features based on the determined weight and the face image to be identified to obtain the face features to be identified.

Optionally, the determining the weight of each pixel point in the face image to be identified based on the gaze area to be identified includes:

acquiring a thermodynamic diagram to be identified corresponding to a gazing area to be identified; wherein, each point in the thermodynamic diagram to be identified corresponds to each pixel point in the face image to be identified one by one; each point in the thermodynamic diagram to be identified represents: probability that the corresponding pixel point of the point in the face image to be recognized is the fixation point of the target user;

Aiming at each pixel point in the face image to be identified, taking the probability of representing the corresponding point of the pixel point in the thermodynamic diagram to be identified as the weight of the pixel point;

the feature extraction is performed based on the determined weight and the face image to be identified to obtain the face feature to be identified, and the feature extraction method comprises the following steps:

inputting the face image to be recognized and the thermodynamic diagram to be recognized into a convolutional neural network trained in advance to obtain face features to be recognized;

the classification network model of the convolutional neural network is obtained by training based on the first sample face image, the first sample thermodynamic diagram corresponding to the first sample face image and the first sample label.

Optionally, the training process of the convolutional neural network includes the following steps:

acquiring a first sample face image, a first sample thermodynamic diagram and a first sample label;

inputting the first sample face image and the first sample thermodynamic diagram into a convolutional neural network in a first classification network model of an initial structure to obtain first sample face characteristics;

inputting the first sample face features to a classification layer in a first classification network model of an initial structure to obtain a first prediction tag;

calculating a first loss value based on the first prediction tag and the first sample tag;

And adjusting model parameters of the first classified network model of the initial structure based on the first loss value, and continuing training until the first classified network model of the initial structure reaches convergence.

dividing the face image to be identified to obtain a plurality of sub-areas to be identified;

aiming at each subarea to be identified, calculating the ratio of the part belonging to the subarea to be identified in the gazing area to the size of the subarea to be identified as the weight of each pixel point of the subarea to be identified;

splicing each sub-region to be identified with the corresponding weight, and inputting each splicing result into a pre-trained self-attention network to obtain the face characteristics to be identified;

the classification network model to which the self-attention network belongs is obtained by training based on the second sample face image, a second sample gazing area corresponding to the second sample face image and a second sample label.

Optionally, the training process of the self-attention network includes the following steps:

Acquiring a second sample face image, a second sample fixation area and a second sample label;

dividing the second sample face image to obtain a plurality of sample subregions;

for each sample sub-region, calculating the ratio of the part belonging to the sample sub-region in the second sample injection region to the size of the sample sub-region as the weight of each pixel point of the sample sub-region;

splicing each sample sub-area with the corresponding weight, and inputting each splicing result to a self-attention network in a second classification network model of the initial structure to obtain a second sample face feature;

inputting the face features of the second sample into a classification layer in a second classification network model of the initial structure to obtain a second prediction label;

calculating a second loss value based on the second prediction tag and the second sample tag;

and adjusting model parameters of the second classification network model of the initial structure based on the second loss value, and continuing training until the second classification network model of the initial structure reaches convergence.

According to a second aspect of embodiments of the present application, there is provided a face recognition apparatus, the apparatus comprising:

the first acquisition module is used for acquiring a face image to be identified; when the face image to be recognized is displayed, the point of regard of the target user in the face image to be recognized is located in a face area in the face image to be recognized;

The first determining module is used for determining an image area to which the gazing point of the target user belongs from the face image to be recognized as a gazing area to be recognized according to a preset area determining mode;

the feature extraction module is used for extracting features based on the face image to be identified and the gaze area to be identified to obtain face features to be identified;

the comparison module is used for comparing the face features to be identified with preset face features to obtain the identification result of the face image to be identified;

optionally, the first determining module includes:

the first to-be-identified gazing area determining submodule is used for determining a designated image area taking the gazing point of the target user as a center point from the to-be-identified face image to serve as the to-be-identified gazing area;

or,

the second to-be-identified gazing area determining submodule is used for determining a face segmentation image area containing the gazing point of the target user in the to-be-identified face image to serve as the to-be-identified gazing area.

Optionally, the feature extraction module includes:

the weight determining submodule is used for determining the weight of each pixel point in the face image to be recognized based on the gaze area to be recognized;

and the characteristic obtaining sub-module is used for carrying out characteristic extraction based on the determined weight and the face image to be identified to obtain the face characteristic to be identified.

Optionally, the weight determining submodule includes:

the thermodynamic diagram acquisition unit is used for acquiring thermodynamic diagrams to be identified corresponding to the gazing areas to be identified; wherein, each point in the thermodynamic diagram to be identified corresponds to each pixel point in the face image to be identified one by one; each point in the thermodynamic diagram to be identified represents: probability that the corresponding pixel point of the point in the face image to be recognized is the fixation point of the target user;

the first weight acquisition unit is used for regarding the probability that the corresponding point of each pixel point in the thermodynamic diagram to be identified is represented by the pixel point as the weight of the pixel point;

the characteristic obtaining submodule is specifically used for inputting a face image to be recognized and a thermodynamic diagram to be recognized into a convolutional neural network trained in advance to obtain face characteristics to be recognized;

Optionally, the apparatus further includes:

the first training module is used for acquiring a first sample face image, a first sample thermodynamic diagram and a first sample label;

Optionally, the weight determining submodule includes:

the image dividing unit is used for dividing the face image to be identified to obtain a plurality of sub-areas to be identified;

the second weight acquisition unit is used for calculating the ratio of the part belonging to the subarea to be identified in the gazing area to be identified to the size of the subarea to be identified as the weight of each pixel point of the subarea to be identified;

the characteristic obtaining sub-module is specifically used for splicing each sub-area to be identified with the corresponding weight, and inputting each splicing result into a pre-trained self-attention network to obtain the characteristic of the face to be identified;

Optionally, the apparatus further includes:

the second training module is used for acquiring a second sample face image, a second sample gazing area and a second sample label;

According to a third aspect of embodiments of the present application, there is provided a face recognition system, the system comprising an image acquisition component and a processing component;

the image acquisition component is used for acquiring a face image to be used as a face image to be identified;

the processing component is configured to execute the face recognition method described in any one of the above.

Optionally, the system further includes a display component, configured to display the face image to be identified, and display reminding information when the gaze point of the target user is not located in a face area in the currently displayed face image to be identified, so that the target user adjusts the gaze point of the target user.

According to a fourth aspect of embodiments of the present application, there is provided an electronic device, including:

the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any one of the above first aspects when executing a program stored on a memory.

According to a fifth aspect of embodiments of the present application, there is provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the method steps of any of the first aspects described above.

Embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the first aspects described above.

The technical scheme provided by the embodiment of the audit can comprise the following beneficial effects:

the embodiment of the application provides a face recognition method, which can acquire a face image to be recognized; when the face image to be recognized is displayed, the point of regard of the target user in the face image to be recognized is located in a face area in the face image to be recognized; according to a preset area determining mode, determining an image area of a gaze point of a target user from a face image to be recognized as a gaze area to be recognized; extracting features based on the face image to be identified and the gaze area to be identified to obtain face features to be identified; and comparing the face features to be identified with preset face features to obtain the identification result of the face image to be identified.

Based on the processing, the face image and the face features reflected by the gazing area of the user in the face image can be combined for recognition. Therefore, even if a malicious attacker acquires a face image of a registered user, the gaze area corresponding to the face image acquired by the malicious attacker is different from the gaze area corresponding to the face image of the registered user during registration, and accordingly, identity authentication cannot be passed, and further, the reliability of face recognition can be improved.

In addition, aiming at different face recognition scenes, the same user can watch different face areas during registration, and correspondingly, even if a malicious attacker acquires a face image of the user during registration in one scene, the malicious attacker cannot pass the identity authentication of the other scene through the face image, so that the reliability of face recognition can be further improved.

Of course, not all of the above-described advantages need be achieved simultaneously in practicing any one of the products or methods of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will briefly introduce the drawings that are required to be used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other embodiments may also be obtained according to these drawings to those skilled in the art.

Fig. 1 is a flowchart of an embodiment of a face recognition method provided in an embodiment of the present application;

fig. 2 is a flowchart of a second embodiment of a face recognition method according to an embodiment of the present application;

fig. 3 is a flowchart of a third embodiment of a face recognition method according to an embodiment of the present application;

Fig. 4 is a flowchart of a fourth embodiment of a face recognition method provided in the embodiments of the present application;

fig. 5 is a flowchart of a fifth embodiment of a face recognition method according to the embodiment of the present application;

fig. 6 is a schematic diagram of acquiring face features to be identified based on a convolutional neural network according to an embodiment of the present application;

fig. 7 is a flowchart of a training method of a convolutional neural network according to an embodiment of the present application;

fig. 8 is a flowchart of a sixth embodiment of a face recognition method provided in the embodiment of the present application;

fig. 9 is a schematic diagram of acquiring a feature of a face to be identified based on a self-attention network according to an embodiment of the present application;

FIG. 10 is a flowchart of a self-attention network training method according to an embodiment of the present application;

fig. 11 is an exemplary diagram of a face recognition procedure provided in an embodiment of the present application;

fig. 12 is a block diagram of a face recognition device according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. Based on the embodiments herein, a person of ordinary skill in the art would be able to obtain all other embodiments based on the disclosure herein, which are within the scope of the disclosure herein.

In order to improve the reliability of face recognition, the embodiments of the present application provide a face recognition method, a device, an electronic device, and a storage medium, where the following details are respectively described in detail:

the embodiment of the application provides a face recognition method which can be applied to electronic equipment. For example, the electronic device may be a device for authenticating a user in an access control system, and accordingly, the electronic device may acquire a face image of the user to perform face recognition, and determine whether to allow the user to pass based on the recognition result.

Referring to fig. 1, fig. 1 is a flowchart of a face recognition method provided in an embodiment of the present application. As shown in fig. 1, the method comprises the following steps:

step S101, a face image to be recognized is acquired.

When the face image to be recognized is displayed, the fixation point of the target user in the face image to be recognized is located in the face area in the face image to be recognized.

Step S102, determining an image area to which a gaze point of a target user belongs from the face image to be recognized as a gaze area to be recognized according to a preset area determination mode.

And step S103, extracting features based on the face image to be recognized and the gaze area to be recognized to obtain face features to be recognized.

Step S104, comparing the face features to be identified with preset face features to obtain the identification result of the face image to be identified.

For step S101, when the face image to be recognized is displayed, the gaze point of the target user in the face image to be recognized is located in the face region in the displayed face image to be recognized.

In one implementation, an image collector for collecting images and an image display for displaying images may be provided. Furthermore, when authentication is required, the image collector can collect an image of an object currently required to be authenticated, and the collected image is displayed through the image display. The image collector and the image display described above may be integrated within an electronic device. Alternatively, the image collector and the image display may be other devices independent of the electronic device, and accordingly, the electronic device may be in data communication with the image collector to obtain the collected image.

For example, when the authentication object is a user, for example, when a user faces the image collector, the image collector may collect a face image (i.e. a face image to be identified) of the user, and then the collected face image to be identified is displayed to the user through the image display. Further, it may be determined whether the gaze point of the user is located in a face region in the displayed face image to be identified. In this case, the target user is the user who is currently required to be authenticated.

For another example, when the object to be authenticated is a non-living image (e.g., a face image), for example, when a user places a face image to be recognized in front of the image collector, the image collector may collect an image corresponding to the face image to be recognized, the collected image being identical to the face image to be recognized. And further, the acquired face image to be identified is displayed through the image display, and whether the gaze point of the user in the face image to be identified positioned in front of the image collector is positioned in the face area in the displayed face image to be identified can be determined. In this case, the target user is the user in the face image to be recognized provided by the user.

The above situation may indicate that a malicious attacker acquires a face image of a registered user and performs authentication. Because the gaze area corresponding to the face image obtained by the user is different from the gaze area corresponding to the face image of the registered user during registration, the user cannot pass identity authentication correspondingly, and further, the reliability of face recognition can be improved.

That is, if the registered user needs to pass the identity authentication, the registered user can adjust his gaze point so that the gaze point is located in a face area of his face image displayed on the image display and the face area to which the gaze point belongs coincides with the face area to which the gaze point of the registered user belongs at the time of registration. For example, if a user is registered and the gaze point is located in the nose area, when the user is performing face recognition, if the gaze point is located in the nose area, the user can pass identity authentication; if the user performs face recognition without the gaze point being located in the nose area, such as the chin area and the eye area, the user cannot pass the authentication.

In the above process, it may be determined whether the gaze point of the user is located in a face region in the displayed face image to be recognized based on a preset gaze point determination algorithm. The preset gaze point determination algorithm adopted in the embodiment of the present application may be based on a machine learning algorithm, or may also be implemented based on a deep learning algorithm.

In one implementation, the preset gaze point determination algorithm comprises a line of sight estimation algorithm, e.g., a two-dimensional gaze point estimation, a three-dimensional gaze point estimation algorithm, etc. In another implementation, the preset gaze point determination algorithm includes a gaze point estimation algorithm, that is, a gaze of a target user in the image to be identified may be determined based on the gaze point estimation algorithm, and then the corresponding gaze point is calculated through space geometric conversion. The gaze point estimation algorithm may be a monocular/binocular gaze estimation algorithm, a semantic information based gaze estimation, a full face gaze estimation, a cross-frame gaze target estimation, and the like.

In one implementation manner, if the gaze point of the user is not located in the face region in the currently displayed face image to be recognized, a reminder may be displayed, so that the user adjusts his gaze point after obtaining the reminder. Correspondingly, the image collector can collect the new face image to be recognized corresponding to the user again, and display the collected new face image to be recognized through the image display, and so on until the point of regard of the user is determined to be located in the face area in the currently displayed face image to be recognized.

For step S102, at least the following two ways may be adopted to obtain the gaze area to be identified:

mode one: in one embodiment, referring to fig. 2, fig. 2 is a flowchart of a second embodiment of the face recognition method provided in the embodiment of the present application. On the basis of fig. 1, the step S102 may include the following steps:

s1021: and determining a designated image area taking the gazing point of the target user as a center point from the face image to be recognized as the gazing area to be recognized.

Wherein the shape and size of the designated image area can be set according to the need. For example, the designated image area may be square, or may be elliptical, or may be circular.

In a second manner, in another embodiment, referring to fig. 3, fig. 3 is a flowchart of a third embodiment of the face recognition method provided in the embodiment of the present application. On the basis of fig. 1, the step S102 may include the following steps:

s1022: and determining a face segmentation image area containing the gaze point of the target user in the face image to be recognized as a gaze area to be recognized based on the image segmentation.

In this embodiment of the present application, the electronic device may perform image segmentation on the face image to be identified, and determine each segmented image area (i.e., a face segmented image area) in the face image to be identified. For example, the determined face segmentation image regions may respectively represent image regions corresponding to facial organs, such as an image region corresponding to eyes, an image region corresponding to nose, an image region corresponding to mouth, an image region corresponding to ears, and an image region corresponding to forehead.

After determining each face segmentation image region, the electronic device may determine, based on the image coordinates of the gaze point, a face segmentation image region including the gaze point, that is, a gaze region to be identified.

Aiming at the step S103, the obtained face features to be recognized are fused with the face images to be recognized and the image features of the gazing area to be recognized, and correspondingly, recognition is carried out based on the face features to be recognized, so that limitation of recognition based on the face images to be recognized only can be avoided, and the reliability of recognition is improved.

In one implementation manner, feature extraction can be performed on a face image to be identified and a gaze region to be identified respectively to obtain respective corresponding image features, and then feature fusion is performed on the obtained image features to obtain the face features to be identified.

In one implementation manner, the face image to be identified and the gaze area to be identified can be fused, and then feature extraction is performed on the fusion result to obtain the face feature to be identified.

In the above processing, the feature extraction may be performed based on machine learning, deep learning, or the like, but is not limited thereto.

In one embodiment, referring to fig. 4, fig. 4 is a flowchart of a fourth embodiment of the face recognition method provided in the embodiment of the present application. On the basis of fig. 1, S103 includes the following steps:

S1031: and determining the weight of each pixel point in the face image to be recognized based on the gazing area to be recognized.

S1032: and extracting features based on the determined weight and the face image to be identified to obtain the face features to be identified.

In the embodiment of the application, the determined weight of the pixel point in the face image to be identified can embody the association degree between the pixel point and the gaze point of the target user, namely, the closer the pixel point is to the gaze point of the target user, the stronger the association degree between the pixel point and the gaze point of the target user is, and correspondingly, the greater the weight of the pixel point is; conversely, the farther the point of gaze of the pixel and the target user is, the weaker the degree of association between the point of gaze of the pixel and the target user is, and correspondingly, the smaller the weight of the pixel is.

Based on the processing, the mentioned face features to be recognized not only contain the image features of the face image to be recognized, but also can embody the information of the gaze point of the target user. Furthermore, the recognition is performed based on the extracted face features, so that the reliability of the recognition can be improved.

In one embodiment, referring to fig. 5, fig. 5 is a flowchart of a fifth embodiment of the face recognition method provided in the embodiment of the present application. On the basis of fig. 4, the step S1031 may include the steps of:

S10311: and acquiring a thermodynamic diagram to be identified corresponding to the gazing area to be identified.

Wherein, each point in the thermodynamic diagram to be identified corresponds to each pixel point in the face image to be identified one by one; each point in the thermodynamic diagram to be identified represents: the probability that the corresponding pixel point of the point in the face image to be recognized is the fixation point of the target user.

S10312: and aiming at each pixel point in the face image to be identified, taking the probability of representing the corresponding point of the pixel point in the thermodynamic diagram to be identified as the weight of the pixel point.

Accordingly, the step S1032 may include the steps of:

s10321: and inputting the face image to be recognized and the thermodynamic diagram to be recognized into a convolutional neural network trained in advance to obtain the face characteristics to be recognized.

In the embodiment of the present application, after determining the gaze area to be identified, the electronic device may generate a thermodynamic diagram corresponding thereto (i.e., a thermodynamic diagram to be identified). Each point in the thermodynamic diagram to be identified corresponds to a pixel point in the face image to be identified one by one. I.e., the size of the thermodynamic diagram to be identified is consistent with the face image to be identified,

In one implementation manner, in the thermodynamic diagram to be identified, the value of each point in the area corresponding to the gaze area to be identified (may be referred to as the target thermodynamic diagram area) is greater than 0 and less than or equal to 1. Accordingly, the points in the thermodynamic diagram to be identified other than the target thermodynamic diagram area may have a value of 0.

Illustratively, the values of the points in the target thermodynamic diagram region are gaussian, and the value of the target point (i.e., the point to which the target user's gaze point corresponds in the thermodynamic diagram to be identified) is 1. In the target thermodynamic diagram region, for any one point, the farther the point is from the target point, the smaller the value of the point.

Each first sample face image may correspond to a first sample thermodynamic diagram determined based on the gaze point of the user in the first sample face image.

If the identities of the users in the two first sample face images are the same and the semantic information of the gaze point of the users is the same, the first sample labels corresponding to the two first sample face images are the same.

If the identities of the users in the two first sample face images are different and the semantic information of the gaze point of the users is the same, the first sample tags corresponding to the two first sample face images are different.

If the identities of the users in the two first sample face images are different and the semantic information of the gaze point of the users is different, the first sample tags corresponding to the two first sample face images are different.

If the identities of the users in the two first sample face images are the same and the semantic information of the gaze point of the users is different, the first sample tags corresponding to the two first sample face images are different.

Wherein the semantic information of the gaze point represents a face region of the gaze point in the image. For example, if both gaze points are located in the region corresponding to the mouth, semantic information indicating that the two gaze points are consistent; if one gaze point is located in the region corresponding to the mouth and the other gaze point is located in the region corresponding to the nose, the semantic information indicating that the two gaze points are inconsistent.

Based on this, features that are close to semantic information of the same user and the gaze point can be zoomed in; by pulling the features of different users and the same user with far semantic information of the gaze point far away, the face features extracted based on the convolutional neural network obtained through training can not only embody the identity of the user, but also embody the semantic information of the gaze point of the user. Furthermore, the recognition is performed based on the extracted face features, so that the reliability of the recognition can be improved.

In one embodiment, referring to fig. 6, fig. 6 is a schematic diagram of acquiring a feature of a face to be identified based on a convolutional neural network according to an embodiment of the present application.

In fig. 6, the upper left image represents a face image to be recognized, and the upper right image represents a thermodynamic diagram to be recognized corresponding to a gaze area to be recognized. In fig. 6, the gaze point of the user is located in the region corresponding to the eyes, and the portion of the upper right corner of the thermodynamic diagram to be identified corresponding to the elliptical portion in the face image to be identified is the gaze region to be identified. Correspondingly, the face image to be recognized and the thermodynamic diagram to be recognized are input into a face feature extraction model (namely the convolutional neural network in the embodiment of the application), so that the face features to be recognized can be obtained. Because the face feature to be identified combines the weights of the pixel points, the face feature with the face region weighting information can be called as the face feature.

In one embodiment, referring to fig. 7, fig. 7 is a flowchart of a training method of a convolutional neural network according to an embodiment of the present application, including the following steps:

s701, acquiring a first sample face image, a first sample thermodynamic diagram and a first label.

S702, inputting the first sample face image and the first sample thermodynamic diagram into a convolutional neural network to obtain the first sample face characteristics.

S703, inputting the first sample face features to a first classification layer in the classification network model to obtain a first prediction label.

S704, calculating a first loss value based on the first prediction tag and the first sample tag.

And S705, adjusting the classification network model of the convolutional neural network based on the first loss value, and continuing training until the classification network model achieves convergence.

The first classification layer may be a full connection layer.

In one implementation, the difference between the first prediction tag and the first tag may be calculated based on a cross entropy loss function to obtain a first loss value, and based on the first loss value, the classification network model may be adjusted in a gradient descent manner.

In one implementation, for each two first sample face images, a similarity between first prediction tags corresponding to the two first sample face images (may be referred to as a first prediction similarity) may be calculated, and a similarity between first sample tags corresponding to the two first sample face images (may be referred to as a first sample similarity) may be calculated, and further, a first loss value may be calculated based on a difference between the first prediction similarity and the first sample similarity.

In one embodiment, referring to fig. 8, fig. 8 is a flowchart of a sixth embodiment of the face recognition method provided in the embodiment of the present application. On the basis of fig. 4, S1031 includes the steps of:

s10313: dividing the face image to be identified to obtain a plurality of sub-areas to be identified;

s10314: and calculating the ratio of the part belonging to the subarea to be identified in the gazing area to the size of the subarea to be identified as the weight of each pixel point of the subarea to be identified.

Accordingly, the step S1032 may include the steps of:

s10322: and splicing each sub-region to be identified with the corresponding weight, and inputting each splicing result into a pre-trained self-attention network to obtain the face characteristics to be identified.

In the embodiment of the present application, the manner of determining whether the second sample face image is consistent with the second sample gazing area is similar to the manner of determining whether the first sample face image is consistent with the first sample gazing area, and reference may be made to the related detailed description.

After determining the gaze area to be identified, the electronic device may divide the face image to be identified to obtain a plurality of sub-areas (i.e., the sub-areas to be identified in the embodiment of the present application). For example, the electronic device may divide the face image to be identified on average, so as to obtain a plurality of sub-regions to be identified. The number of the subareas to be identified obtained by dividing can be determined according to requirements, for example, the number can be determined according to the size of the gazing area to be identified; the size of the gaze area to be identified is inversely related to the number.

Then, for each sub-region to be identified, determining the part belonging to the sub-region to be identified in the fixation region to be identified, and calculating the ratio of the size of the part to the size of the sub-region to be identified as the weight of each pixel point of the sub-region to be identified. That is, the weights of the pixel points in the same sub-area to be identified are the same.

In the embodiment of the present application, the self-focusing network may be a transform network. For example Vision Transformer (visual transition network).

Referring to fig. 9, fig. 9 is a schematic diagram of acquiring a feature of a face to be identified based on a self-attention network according to an embodiment of the present application.

In fig. 9, the left image represents a face image to be recognized, the grid in the face image to be recognized represents dividing the face image to be recognized, and each obtained image represents a sub-region to be recognized. The oval part in the face image to be recognized represents the gaze area to be recognized. Based on the above division, the weights corresponding to the pixel points in each sub-region to be identified can be obtained, and then each sub-region to be identified and the corresponding weights are spliced, that is, each sub-region to be identified and the corresponding weights are subjected to weighted coding. In fig. 9, a rectangle filled with oblique lines represents each sub-region to be identified. The numerical value before each rectangle filled with oblique lines represents the weight of the pixel point in the corresponding sub-region to be identified.

Then, each spliced result may be input to a pre-trained self-attention network (including Transformer Encoder (transform network encoder) and MLP Head (Multilayer Perceptron Head, multi-layer perceptron Head)), and the result output by the MLP Head is obtained, so as to obtain the face feature to be identified. Because the face feature to be identified combines the weights of the pixel points, the face feature to be identified can be called as the face feature with the face region weighting information.

Based on the processing, the extracted face features to be recognized not only contain the image features of the face images to be recognized, but also can embody the information of the gaze point of the target user. Furthermore, the recognition is performed based on the extracted face features to be recognized, so that the reliability of the recognition can be improved.

In one embodiment, referring to fig. 10, fig. 10 is a flowchart of a self-attention network training method provided in an embodiment of the present application, including the following steps:

s1001, a second sample face image, a second sample fixation area and a second sample label are acquired.

S1002, dividing the second sample face image to obtain a plurality of sample subregions.

S1003, for each sample sub-region, calculating the ratio of the part of the second sample injection region, which belongs to the sample sub-region, to the size of the sample sub-region as the weight of each pixel point of the sample sub-region.

And S1004, splicing each sample sub-area with the corresponding weight, and inputting each splicing result to a self-attention network in a second classification network model of the initial structure to obtain a second sample face feature.

S1005, inputting the second sample face features to a classification layer in a second classification network model of the initial structure to obtain a second prediction label.

S1006, calculating a second loss value based on the second prediction tag and the second sample tag.

And S1007, adjusting model parameters of the second classification network model of the initial structure based on the second loss value, and continuing training until the second classification network model of the initial structure reaches convergence.

The second classification layer may be a full connection layer.

Each second sample face image may correspond to a second sample gaze region determined based on the gaze point of the user in the second sample face image.

If the identities of the users in the two second sample face images are the same and the semantic information of the gaze point of the users is the same, the second sample labels corresponding to the two second sample face images are the same.

If the identities of the users in the two second sample face images are different and the semantic information of the gaze point of the users is the same, the second sample labels corresponding to the two second sample face images are different.

If the identities of the users in the two second sample face images are different and the semantic information of the gaze point of the users is different, the second sample labels corresponding to the two second sample face images are different.

If the identities of the users in the two second sample face images are the same and the semantic information of the gaze point of the users is different, the second sample labels corresponding to the two second sample face images are different.

The self-attention network is trained in a manner similar to that described above for convolutional neural networks, and reference is made to the detailed description thereof.

In the embodiment of the present application, the manner of dividing the second sample face image is similar to the manner of dividing the face image to be identified described above, and reference may be made to the related description.

In one implementation, the difference between the second prediction tag and the second tag may be calculated based on a cross entropy loss function to obtain a second loss value, and based on the second loss value, the classification network model may be adjusted in a gradient descent manner.

Aiming at step S104, in one implementation manner, a similarity between the face feature to be identified and the preset face feature may be calculated, so as to obtain a corresponding identification result. If the similarity is not smaller than the preset threshold, the identity authentication is passed; if the similarity is smaller than the preset threshold, the identity authentication is not passed.

In addition, in different scenes, the subsequent processing may also be performed based on the recognition result. For example, in an access control system, if the identity authentication is passed, the passage can be allowed; otherwise, no pass is allowed. The registered user may represent a user having corresponding authority, for example, in an access control system, the registered user is a user having a passing authority, such as a company employee. The process of acquiring the registered face image is similar to the process of acquiring the face image to be recognized described above, and reference is made to the description thereof.

In one embodiment, referring to fig. 11, fig. 11 is an exemplary diagram of a face recognition procedure provided in an embodiment of the present application.

In fig. 11, a face image of a user (i.e., a face image to be recognized in the embodiment of the present application) may be acquired, and the face image may be displayed through a screen. Then, it may be determined based on gaze point estimation whether the eyes of the user gazed at the face in the screen (i.e. the face area in the displayed face image) when the face image was acquired. If not, re-acquiring the face image until eyes of the user watch the face in the screen when the acquired face image is displayed.

Furthermore, a gazing face region (i.e., a gazing region to be identified in the embodiment of the present application) may be obtained, and face features may be extracted by combining with the face image to be identified, and feature comparison may be performed. Namely, the similarity between the extracted face features (i.e., the face features to be identified in the embodiment of the present application) and the base features (i.e., the preset face features in the embodiment of the present application) is calculated, and whether the similarity exceeds a threshold value is determined, if the similarity exceeds the threshold value, a corresponding operation (e.g., allowing traffic) is performed. If the similarity does not exceed the threshold, the process may end.

Based on the same inventive concept, the embodiment of the application also provides a face recognition system, which comprises an image acquisition component and a processing component. The image acquisition component is used for acquiring a face image as the face image to be identified. The processing component is configured to perform any one of the face recognition methods described above.

Wherein the processing component may comprise a processor. The image acquisition component may include the image acquisition device in the above embodiment, for example, the image acquisition device may be a camera, through which a face image of a user may be acquired as the face image to be identified.

In some embodiments, the system may further include a display component configured to display a face image to be recognized, and display a reminder when the gaze point of the target user is not located in a face region in the currently displayed face image to be recognized, so that the target user adjusts the gaze point of the target user after obtaining the reminder.

For example, the display component may include the image display in the foregoing embodiment, and accordingly, the face image to be recognized may be displayed in the image display, and further, it may be determined whether the gaze point of the target user is located in the face area in the currently displayed face image to be recognized. In addition, if the gaze point of the target user is not located in the face region in the currently displayed face image to be recognized, the reminding information can be displayed in the image display. After browsing the reminding information, the user can adjust the gaze point of the user, so that the adjusted gaze point is positioned in a face area in the currently displayed face image to be identified.

Based on the same inventive concept as the face recognition method, the embodiment of the present application further provides a face recognition device, referring to fig. 12, fig. 12 is a structural diagram of the face recognition device provided in the embodiment of the present application, where the device may include:

a first obtaining module 1201, configured to obtain a face image to be identified; when the face image to be recognized is displayed, the point of regard of the target user in the face image to be recognized is located in a face area in the face image to be recognized;

a first determining module 1202, configured to determine, according to a preset area determining manner, an image area to which a gaze point of a target user belongs from a face image to be identified, as a gaze area to be identified;

the feature extraction module 1203 is configured to perform feature extraction based on the face image to be identified and the gaze area to be identified, so as to obtain a face feature to be identified;

the comparison module 1204 is configured to compare the face feature to be identified with a preset face feature to obtain an identification result of the face image to be identified;

Optionally, the first determining module 1202 includes:

or,

Optionally, the feature extraction module 1203 includes:

Optionally, the weight determination submodule includes:

the characteristic obtaining sub-module is specifically used for inputting the face image to be recognized and the thermodynamic diagram to be recognized into a convolutional neural network trained in advance to obtain the face characteristics to be recognized;

Optionally, the apparatus further includes:

Optionally, the weight determination submodule includes:

Optionally, the apparatus further includes:

The embodiment of the present application further provides an electronic device, as shown in fig. 13, including a processor 1301, a communication interface 1302, a memory 1303 and a communication bus 1304, where the processor 1301, the communication interface 1302, and the memory 1303 complete communication with each other through the communication bus 1304,

A memory 1303 for storing a computer program;

processor 1301, when executing the program stored in memory 1303, implements the following steps:

acquiring a face image to be identified;

wherein, the preset face features are: extracting features of a registered face image of a registered user and a registered gazing area of the registered face image, to which a gazing point of the registered user belongs; the registered gazing area is determined according to the preset area determination mode; and when the registered face image is displayed to the registered user, the point of regard of the registered user is positioned in a designated face area in the registered face image.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment provided herein, there is also provided a computer readable storage medium having stored therein a computer program which when executed by a processor implements the steps of any of the face recognition methods described above.

In yet another embodiment provided herein, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the face recognition methods of the above embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, systems, electronic devices, computer readable storage media, and computer program product embodiments, the description is relatively simple as it is substantially similar to method embodiments, as relevant points are found in the partial description of method embodiments.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method of face recognition, the method comprising:

acquiring a face image to be identified; when the face image to be recognized is displayed, the fixation point of a target user in the face image to be recognized is positioned in a face area in the face image to be recognized;

according to a preset area determining mode, determining an image area to which the gaze point of the target user belongs from the face image to be recognized as a gaze area to be recognized;

2. The method according to claim 1, wherein the determining, according to a predetermined area determining manner, an image area to which the gaze point of the target user belongs from the face image to be recognized, as the gaze area to be recognized, includes:

determining a designated image area taking the gaze point of the target user as a central point from the face image to be recognized as a gaze area to be recognized;

or,

and determining a face segmentation image area containing the gaze point of the target user in the face image to be recognized based on image segmentation, and taking the face segmentation image area as a gaze area to be recognized.

3. The method according to claim 1, wherein the feature extraction based on the face image to be identified and the gaze area to be identified, to obtain face features to be identified, includes:

4. A method according to claim 3, wherein the determining weights of the pixels in the face image to be identified based on the gaze area to be identified comprises:

Acquiring a thermodynamic diagram to be identified corresponding to the gazing area to be identified; wherein each point in the thermodynamic diagram to be identified corresponds to each pixel point in the face image to be identified one by one; each point in the thermodynamic diagram to be identified represents: the probability that the corresponding pixel point of the point in the face image to be recognized is the fixation point of the target user;

the classification network model to which the convolutional neural network belongs is obtained by training based on a first sample face image, a first sample thermodynamic diagram corresponding to the first sample face image and a first sample label.

5. The method of claim 4, wherein the training process of the convolutional neural network comprises the steps of:

Acquiring the first sample face image, a first sample thermodynamic diagram and a first sample label;

inputting the first sample face image and the first sample thermodynamic diagram to a convolutional neural network in a first classification network model of an initial structure to obtain first sample face characteristics;

inputting the first sample face features to a classification layer in a first classification network model of the initial structure to obtain a first prediction tag;

6. A method according to claim 3, wherein the determining weights of the pixels in the face image to be identified based on the gaze area to be identified comprises:

dividing the face image to be recognized to obtain a plurality of subareas to be recognized;

calculating the ratio of the part belonging to the subarea to be identified in the gazing area to the size of the subarea to be identified as the weight of each pixel point of the subarea to be identified;

the classification network model to which the self-attention network belongs is obtained by training based on a second sample face image, a second sample gazing area corresponding to the second sample face image and a second sample label.

7. The method of claim 6, wherein the training process of the self-attention network comprises the steps of:

acquiring the second sample face image, a second sample fixation area and a second sample label;

Inputting the second sample face features to a classification layer in a second classification network model of the initial structure to obtain a second prediction label;

8. A face recognition system, the system comprising an image acquisition component and a processing component;

the processing component being adapted to perform the method steps of any of claims 1-7.

9. The system of claim 8, further comprising a display component configured to display the face image to be recognized, and to display a reminder when the gaze point of the target user is not located in a face region in the currently displayed face image to be recognized, such that the target user adjusts his gaze point after obtaining the reminder.

10. A face recognition device, the device comprising:

The first acquisition module is used for acquiring a face image to be identified; when the face image to be recognized is displayed, the fixation point of a target user in the face image to be recognized is positioned in a face area in the face image to be recognized;

the first determining module is used for determining an image area of the gaze point of the target user from the face image to be recognized according to a preset area determining mode, and the image area is used as a gaze area to be recognized;

the feature extraction module is used for extracting features based on the face image to be identified and the gazing area to be identified to obtain face features to be identified;

and the comparison module is used for comparing the face features to be identified with preset face features to obtain the identification result of the face image to be identified.