CN117373091A

CN117373091A - Model training method, identity verification method and device

Info

Publication number: CN117373091A
Application number: CN202311412535.5A
Authority: CN
Inventors: 林佳滢; 付星赫; 袁轶珂; 李玺; 金璐
Original assignee: Zhejiang University ZJU; Alipay Hangzhou Information Technology Co Ltd
Current assignee: Zhejiang University ZJU; Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-10-27
Filing date: 2023-10-27
Publication date: 2024-01-09

Abstract

The specification discloses a model training method, an identity verification method and a device, which specifically comprise the following steps: the method comprises the steps of obtaining a real face image and a fake face image corresponding to the real face image, and carrying out image enhancement on the real face image and partial image areas of the real face image and the fake face image so as to obtain a first image and a second image. According to the real face image and the first image and the forged face image and the second image, a first sample pair and a second sample pair are constructed, image characteristics of the first sample pair and the second sample pair are detected by utilizing a face detection model to be trained, and the similarity of the face image characteristics in the same sample pair is maximized, the similarity of the face image characteristics in different sample pairs is minimized to serve as an optimization target, and the face detection model is trained. And carrying out identity verification on the face image by using the trained face detection model. The face detection model capable of accurately detecting the face image can be obtained through the method, and safety of personal information of a user is guaranteed.

Description

Model training method, identity verification method and device

Technical Field

The present disclosure relates to the field of face recognition, and in particular, to a method for model training, and a method and apparatus for identity verification.

Background

With the rapid development of computer technology and artificial intelligence technology, the face recognition technology is mature gradually, and is widely popularized and applied in society, and face recognition verification is also becoming an indispensable part of people in daily life for transfer payment, identity authentication and the like.

However, today with the development of the prior art, when face verification is performed by face recognition technology, the face verification is often affected by face attack, for example, a face image which is formed by impersonation by a special means is recognized as a real face image of a user, so that personal information of the user is randomly acquired and utilized by other people, and the security of important information such as privacy data, property information and the like of the user is threatened.

Therefore, how to effectively detect the attack behavior of the face in the face verification process is particularly important.

Disclosure of Invention

The present disclosure provides a method for model training and a method for identity verification, so as to partially solve the above problems in the prior art.

The technical scheme adopted in the specification is as follows:

the present specification provides a method of model training, comprising:

acquiring a real face image and a fake face image corresponding to the real face image;

image enhancement is carried out on partial image areas in the real face image to obtain a first image, and image enhancement is carried out on partial image areas in the fake face image to obtain a second image;

constructing a first sample pair according to the real face image and the first image, and constructing a second sample pair according to the fake face image and the second image;

inputting the first sample pair and the second sample pair into a face detection model to be trained, so as to determine the image characteristics corresponding to each face image contained in the first sample pair and the image characteristics corresponding to each face image contained in the second sample pair through the face detection model;

and training the face detection model by taking the similarity between the corresponding image features of the face images in the same sample pair as an optimization target for maximizing the similarity between the corresponding image features of the face images in different sample pairs and minimizing the similarity between the corresponding image features of the face images in different sample pairs.

Optionally, performing image enhancement on a part of image areas in the real face image to obtain a first image, and performing image enhancement on a part of image areas in the fake face image to obtain a second image, which specifically includes:

dividing the real face image to divide each image area in the real face image, and dividing the fake face image to divide each image area in the fake face image;

selecting partial image areas with the corresponding number of the preset image enhancement proportion from all the image areas divided in the real face image according to the preset image enhancement proportion, and carrying out image enhancement to obtain a first image;

and selecting partial image areas with the corresponding number of the preset image enhancement proportion from the image areas divided in the fake face image according to the preset image enhancement proportion, and carrying out image enhancement to obtain a second image.

Optionally, before training the face detection model with the objective of maximizing the similarity between the corresponding image features of the face images in the same sample pair and minimizing the similarity between the corresponding image features of the face images in different sample pairs, the method further includes:

Inputting the first image into the face detection model to obtain a detection result corresponding to the first image;

training the face detection model by taking the similarity between the corresponding image features of the face images in the same sample pair as an optimization target and the similarity between the corresponding image features of the face images in different sample pairs as a minimum, wherein the training comprises the following steps:

and training the face detection model by taking the deviation between the detection result corresponding to the first image and the label result corresponding to the first image as an optimization target, wherein the similarity between the corresponding image features of the face images in the same sample pair is maximized, the similarity between the corresponding image features of the face images in different sample pairs is minimized.

Optionally, if the face detection model determines that the input face image is a real face image, a first value is output, if the face detection model determines that the input face image is a fake face image, a second value is output, and if the enhancement degree of the first image compared with the real face image is higher, the label result corresponding to the first image is more approximate to the second value.

Optionally, before training the face detection model with the optimization objective of maximizing the similarity between the image features corresponding to the face images in the same sample pair, minimizing the similarity between the image features corresponding to the face images in different sample pairs, and minimizing the deviation between the detection result corresponding to the first image and the label result corresponding to the first image, the method further includes:

inputting the second image into the face detection model to obtain a detection result corresponding to the second image;

training the face detection model by taking the deviation between the detection result corresponding to the first image and the label result corresponding to the first image as an optimization target, wherein the similarity between the corresponding image features of the face images in the same sample pair is maximized, the similarity between the corresponding image features of the face images in different sample pairs is minimized, and the deviation between the detection result corresponding to the first image and the label result corresponding to the first image is minimized, and the training method specifically comprises the following steps:

and training the face detection model by taking the deviation between the detection result corresponding to the second image and the label result corresponding to the fake face image as an optimization target, wherein the deviation is used for maximizing the similarity between the corresponding image features of the face images in the same sample pair, minimizing the similarity between the corresponding image features of the face images in different sample pairs and minimizing the deviation between the detection result corresponding to the second image and the label result corresponding to the fake face image.

The specification provides a method of identity verification, comprising:

acquiring a face image of a user;

inputting the face image into a face detection model to obtain a face image detection result corresponding to the face image, wherein the face detection model is obtained by training according to the model training method;

and carrying out identity authentication on the user according to the face image detection result.

The present specification provides an apparatus for model training, comprising:

the acquisition module is used for acquiring the real face image and the fake face image corresponding to the real face image;

the enhancement module is used for carrying out image enhancement on partial image areas in the real face image to obtain a first image, and carrying out image enhancement on partial image areas in the fake face image to obtain a second image;

the construction module is used for constructing a first sample pair according to the real face image and the first image, and constructing a second sample pair according to the fake face image and the second image;

the detection module is used for inputting the first sample pair and the second sample pair into a face detection model to be trained, so as to determine the image characteristics corresponding to each face image contained in the first sample pair and the image characteristics corresponding to each face image contained in the second sample pair through the face detection model;

And the training module is used for training the face detection model by taking the similarity between the corresponding image features of the face images in the same sample pair as an optimization target and the similarity between the corresponding image features of the face images in different sample pairs as a minimum.

Optionally, the enhancement module is specifically configured to divide the real face image to divide each image area in the real face image, and divide the fake face image to divide each image area in the fake face image; selecting partial image areas with the corresponding number of the preset image enhancement proportion from all the image areas divided in the real face image according to the preset image enhancement proportion, and carrying out image enhancement to obtain a first image; and selecting partial image areas with the corresponding number of the preset image enhancement proportion from the image areas divided in the fake face image according to the preset image enhancement proportion, and carrying out image enhancement to obtain a second image.

Optionally, the detection module is specifically configured to input the first image into the face detection model to obtain a detection result corresponding to the first image;

The training module is specifically configured to train the face detection model with an optimization objective of maximizing a similarity between corresponding image features of face images in the same sample pair, minimizing a similarity between corresponding image features of face images in different sample pairs, and minimizing a deviation between a detection result corresponding to the first image and a label result corresponding to the first image.

Optionally, the detection module is specifically configured to output a first value if the face detection model determines that the input face image is a real face image, output a second value if the face detection model determines that the input face image is a fake face image, and if the first image has a higher enhancement degree than the real face image, the label result corresponding to the first image is more similar to the second value.

Optionally, the detection module is specifically configured to input the second image into the face detection model, to obtain a detection result corresponding to the second image;

the training module is specifically configured to train the face detection model with an optimization objective of maximizing a similarity between corresponding image features of face images in the same sample pair and minimizing a similarity between corresponding image features of face images in different sample pairs, and minimizing a deviation between a detection result corresponding to the second image and a label result corresponding to the counterfeit face image.

The present specification provides an apparatus for authentication, comprising:

the acquisition module is used for: the method comprises the steps of acquiring a face image of a user;

and a detection module: inputting the face image into a face detection model to obtain a face image detection result corresponding to the face image, wherein the face detection model is obtained by training according to the model training method;

and (3) a verification module: and carrying out identity authentication on the user according to the face image detection result.

The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the method of model training, the method of identity verification described above.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above described method of model training, method of authentication when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

from the above method, in the model training method and the identity verification method provided in the present specification, a real face image and a fake face image corresponding to the real face image are obtained, and image enhancement is performed on the real face image and a part of image areas in the fake face image corresponding to the real face image, so as to obtain a first image and a second image. Then, according to the real face image and the first image, a first sample pair is constructed, and according to the fake face image and the second image, a second sample pair is constructed, and the face detection model to be trained is utilized to detect and determine the image characteristics of the first sample pair and the second sample pair, so that the similarity between the corresponding image characteristics of the face image in the same sample pair is maximized, the similarity between the corresponding image characteristics of the face image in different sample pairs is minimized as an optimization target, and the face detection model to be trained is trained. And finally, carrying out identity verification on the face image of the user by using the trained face detection model.

From the above, it can be seen that, in the model training method and the identity verification method provided in the present specification, after the image enhancement is performed on the real face image and the fake face image corresponding to the real face image, a corresponding first image and a corresponding second image are obtained, and the training optimization is performed on the face detection model to be trained by using the first sample pair and the second sample pair formed by the images. The face detection model capable of accurately detecting whether the face image is a malicious fake image or not can be obtained through the method, and has stronger robustness when facing to the attack of the faces of various different types through the training process of enhancing the face image, the detection and identification precision of the fake image is higher, the leakage of user information is effectively prevented, and the safety of personal information of the user is ensured.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:

FIG. 1 is a flow chart of a method of model training provided in the present specification;

FIGS. 2 (a), 2 (b), and 2 (c) are schematic diagrams of an image enhancement process provided in the present specification;

FIG. 3 is a flow chart of a method of identity verification provided in the present specification;

FIG. 4 is a schematic diagram of a model training apparatus provided herein;

FIG. 5 is a schematic diagram of an apparatus for authentication provided herein;

fig. 6 is a schematic structural view of an electronic device corresponding to fig. 1 and 3 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a flow chart of a method for model training provided in the present specification, including the following steps:

S101: and acquiring a real face image and a fake face image corresponding to the real face image.

S102: and carrying out image enhancement on the partial image area in the real face image to obtain a first image, and carrying out image enhancement on the partial image area in the fake face image to obtain a second image.

Along with the continuous development of scientific technology, face recognition technology is gradually popularized and becomes an integral part in daily life of people nowadays, such as face payment in online shopping and transfer, face brushing and station entering in out travel and the like, and face recognition technology is applied, but various face attacks generated therewith bring great security threat to personal data such as privacy information, property information and the like of vast users. Therefore, how to effectively and accurately identify and detect the face attack behaviors such as counterfeiting and fake face data in the face verification process is of great importance.

For this reason, the present specification provides a model training method and an authentication method, where the execution subject adopted in the method provided in the present specification may be a terminal device such as a desktop computer, a notebook computer, or a server, and in addition thereto, the execution subject in the present specification may also be a subject in the form of software, such as a client installed in the terminal device, or the like. For convenience of explanation, the present description below will explain the provided method of model training and the method of authentication with only the terminal device as the execution subject.

Based on the above, the terminal device applying the model training method and the authentication method provided in the present specification can perform face recognition on the face image input to the terminal device, thereby determining whether the input face image is a fake face image that is falsified. If the detected face image is the real face image of the user, the relevant authority corresponding to the user information is opened to the user (such as through identity verification), so that the user can perform the next business operation, and if the detected face image is the forged face image, the user is determined to not pass the identity verification.

The face recognition task executed by the terminal equipment can be specific according to an actual scene, for example, in a daily consumption shopping scene, the terminal equipment can accurately recognize the face image acquired by the payment link, and stop the fake face image which impersonates other people to pay and consume, so that malicious loss of user property is avoided while the safety of a user account is ensured; for another example, in a scene that an access control system in an important place can only enter by brushing the face, the terminal equipment can accurately identify the face image of any person who wants to enter the target place, and forbids the unauthorized person forging the face image from randomly entering the target place, so that the protection capability and the security of private information of the target place are improved.

The method provided in the specification mainly comprises two stages, namely a model training stage and an actual application stage, wherein in the model training stage, a terminal device can acquire a real face image and a fake face image corresponding to the real face image, further, partial image areas are selected from the real face image and the fake face image for image enhancement, the real face image after image enhancement is used as a first image, and the fake face image after image enhancement is used as a second image.

Selecting a part of image areas for image enhancement specifically means that an acquired face image is divided into image areas with the same size in an equal proportion, the part of image areas are randomly selected from the divided image areas according to a preset proportion for geometric transformation, the geometric transformation specifically comprises transformation operations such as rotation, translation, scaling and the like, and the following formula is specifically referred to:

where x is represented as the original face image that has not been image enhanced,the image is represented as an enhanced face image after image enhancement, and the x equal proportion is divided into image areas with the same size as s, so that x is actually a set of the image areas with the size of s, and can be represented by the following formula:

Wherein x is ⁱ The i-th image area is divided.

In the same way, can obtain

The terminal equipment makes a pair according to a preset proportion rRandomly selecting each image area in the divided original face image, and performing geometric transformation on each randomly selected image area according to a geometric transformation function T (-) to obtainTo facilitate the description of how the face image is enhanced, a schematic diagram of an image enhancement process will be described below, as shown in fig. 2.

Fig. 2 (a), fig. 2 (b), and fig. 2 (c) are schematic diagrams of an image enhancement process provided in the present specification.

In the image enhancement process shown in fig. 2 (a) to 2 (c), after the original face image is input to the terminal device, the original face image is equally proportioned into 16 image areas, namely, as shown in fig. 2 (a). And (2) selecting 8 image areas from the divided image areas according to the preset selection ratio r which is 0.5, namely, representing the gray image areas in fig. 2 (b). The selected image region is geometrically transformed to enhance the original face image, and the enhanced face image is shown in fig. 2 (c).

It should be noted that when the terminal device obtains a complete face image, the face image generally includes many different types of information, and the information generally includes some useless information unrelated to the face recognition task, for example, expression information of a face, non-face information with decoration, etc., where the useless information is often related to structural integrity of the complete face image, and such information generally cannot bring a good training effect to the model in the training process of the face detection model to be trained. The different types of information generally contained in the face image also contain texture information, material information and the like related to the face, and the information has higher training benefit for the face detection model to be trained, so that the face recognition capability of the face detection model can be effectively improved. Therefore, the whole face image is enhanced by carrying out geometric transformation on partial image areas of the face image, the whole structural integrity of the face image can be well damaged, but the information such as the texture and the material of the face is reserved, so that the influence of useless information in the image on the training process of the face detection model to be trained is reduced, and the training benefit of the information such as the texture and the material on the face detection model is improved.

S103: a first sample pair is constructed from the real face image and the first image, and a second sample pair is constructed from the counterfeit face image and the second image.

In the specification, the terminal device combines the image of the real face image and the image of the image-enhanced real face image with the first image into a first sample pair, and combines the image of the forged face image and the image of the image-enhanced forged face image with the second image into a second sample pair.

The first sample pair and the second sample pair are also mutually corresponding, and the real face image in the first sample pair is the real face image corresponding to the fake face image in the second sample pair. For example, two different real face images A1 and A2 and corresponding fake face images B1 and B2 are input into the terminal device at the same time, after image enhancement, the first images and the second images corresponding to the images A1 and B1 are respectively A1 and B1, the first images and the second images corresponding to the images A2 and B2 are respectively A2 and B2, the terminal device takes A1 and A1 as a first sample pair, the second sample pair corresponding to the first sample pair is formed by B1 and B1, and simultaneously takes A2 and A2 as another first sample pair, and the second sample pair corresponding to the first sample pair is formed by B2 and B2. The first sample pair formed by A1 and A1 has no relation with the second sample pair formed by B2 and B2, and the first sample pair and the second sample pair have no mutual influence in the subsequent training process.

S104: and inputting the first sample pair and the second sample pair into a face detection model to be trained, so as to determine the image characteristics corresponding to each face image contained in the first sample pair and the image characteristics corresponding to each face image contained in the second sample pair through the face detection model.

S105: and training the face detection model by taking the similarity between the corresponding image features of the face images in the same sample pair as an optimization target for maximizing the similarity between the corresponding image features of the face images in different sample pairs and minimizing the similarity between the corresponding image features of the face images in different sample pairs.

In this specification, the terminal device may extract image features of face images included in the first sample pair and the second sample pair through the face detection model. The training optimization of the face detection model may be performed by maximizing the similarity between the image features of the face images included in the first sample pair or the second sample pair, that is, the single sample pair, and minimizing the similarity between the image features of the face images corresponding to each other included in the first sample pair and the second sample pair corresponding to the first sample pair.

The step of maximizing the similarity between the image features of the face images contained in the single sample pair specifically means that, for each face image acquired by the terminal device, feature extraction is performed on the face image through a face detection model, and the image feature distance between the face image and the image feature distance between the corresponding face image in the sample pair (the first sample pair or the second sample pair) where the face image is located is reduced, so that the image features of the face images in the single sample pair are more similar.

For example, it is assumed that the face detection model in the terminal device acquires a set of first sample pairs a including one real face image a1 and a first image a2 corresponding to a1, and a second sample pair B including a fake face image B1 corresponding to a1 and a second image B2 corresponding to B1 corresponding to the first sample pair. Maximizing the similarity between the image features of the face images contained in a single sample pair refers herein specifically to reducing the distance between the image features of the authentic face image a1 and the first image a2 in the first sample pair a and reducing the distance between the image features of the counterfeit face image B1 and the second image B2 in the second sample pair B such that the image features between the authentic face image a1 and the first image a2 are more similar and the image features between the counterfeit face image B1 and the second image B2 are more similar.

Further, the terminal device may further minimize the similarity between the first sample pair and each image feature containing each face image in the second sample pair corresponding thereto. For each real face image, increasing the image feature distance between the image feature of the first sample where the real face image is located and the image feature of the second sample pair where the fake face image corresponding to the real face image is located, so that the image feature difference between each face image in the first sample pair and each face image in the second sample pair is stronger, and the distinguishing degree of the real face image and the fake face image is more obvious.

The above method can be specifically referred to as the following formula:

wherein,and the 2N face images including the face image after image enhancement are obtained by the face detection model in the terminal equipment. The front N face images are original face images which are not enhanced by the image, namely a real face image and a fake face image, and the rear N face images are enhanced face images which are enhanced by the image, namely a first image and a second image. (x) _i ,x _i+N ) Expressed as the ith real face image or fake face image x _i A first image or a second image x corresponding to the first image or the second image _i+N A first pair of samples or a second pair of samples is constructed. />Expressed as the image feature of the ith face image, the image feature (h) of each face image in the same sample pair _i ,h _i+N ) The distance between the two images is shortened, so that the similarity of the image characteristics of the face images in the sample pair is more similar.

Further, for each sample pair (x _i ,x _i+N ) For example, the terminal device may determine the image characteristics (h _i ,h _i+N ) The distance between the image features of the first sample pair or the second sample pair corresponding to the sample pair is increased, so that the similarity difference between the image features of the face images in different sample pairs is stronger.

It should be noted that, the above-mentioned manner of similarity of image features between face images may be various, for example, the distance between two image features may be determined by calculating a cosine distance between the image features, or the distance between two image features may be determined by calculating a euclidean distance between the two image features. This is not particularly limited in this specification.

In this specification, the terminal device may calculate the loss value by calculating the similarity between the image features, and further train the face detection model by using the obtained loss value, where the loss function adopted by the terminal device may refer to the following formula:

i ^′ ＝((i+N-1)mod 2N)+1

The specific numerical value of the first super parameter preset in the τ terminal device may be determined according to practical situations, and is not specifically limited in the present specification. L (L) _ssa And obtaining a loss value for calculating the similarity between the image features. h is a _i For representing image features of an original image (real or fake face image), h _i′ Image features for representing the corresponding enhanced image of the original image (if the original image is a true face image, the corresponding enhanced image is an enhanced image of the true face image, if the original image is a fake one)A face image, its corresponding enhanced image is an enhanced image of a counterfeit face image). h is a _j For representing the original image in the other sample pair. y is _i ≠y _j The representation of the original image i and the original image j is from different pairs of samples.

Furthermore, after the face detection model is trained by the feature detection training method, the face detection model has stronger detection capability on local detail information such as materials, textures and the like in the face image, and has higher precision and robustness when the face image subjected to special processing such as geometric transformation and the like is subjected to feature detection, so that the safety of the whole identity verification link is improved.

In this specification, before training the face detection model by using the similarity between the features of the face images corresponding to the same sample pair and the similarity between the features of the face images corresponding to the minimum sample pair as the optimization target, the terminal device may further input the first image and the second image into the face detection model to obtain a detection result corresponding to the first image and a detection result corresponding to the second image.

On the basis, the human face detection model can be trained by taking the minimum deviation between the detection result corresponding to the first image and the label result corresponding to the first image and the minimum deviation between the detection result corresponding to the second image and the label result corresponding to the fake image as optimization targets.

In the model training stage, a face image input to a face detection model to be trained is provided with a preset label result, and the preset label result is used for reflecting whether the face image belongs to a real face image or a fake face image. The preset label result may be expressed by the following formula:

wherein,representing label results corresponding to a true face image and a fake face image in 2N face images, and y _i The value is 0 or 1. If y _i If y is 0, the image is true face image _i =1, this indicates that the image is a counterfeit face image.

According to the label result corresponding to the real face image and the fake face imagey _i When the face detection model detects the obtained face image, if the input face image is determined to be a real face image, a first numerical value, namely y, is output as can be seen from E {0,1} _i =0, if the input face image is determined to be a fake face image, outputting a second value, i.e., y _i ＝1。

For the enhanced image of the real face image (i.e. the first image) and the enhanced image of the fake face image (i.e. the second image), the two images also respectively correspond to preset label results. The label result used by the second image is the same as the label result of the fake face image, namely, the label result is used for indicating that the second image is the fake face image. The label result of the first image may be represented by a second superparameter set in the terminal device: y is _i ＝γ，γ∈(0,1)。

That is, the label result of the first image is a value between 0 and 1, and the specific value of the value is related to the enhancement degree of the first image. Specifically, the label result of the first image is related to a preset selection proportion r of an enhanced image area when the terminal equipment performs image enhancement on the real face image, and the label result is positively related. From the above, γ is between 0 and 1, and if the predetermined selection ratio r is larger and more similar to 1, i.e. the image enhancement intensity is stronger, γ is also gradually increased and more similar to 1, i.e. y _i In this case, when the face detection model is actually expected to perform face detection on the first image, it can be obtained that the first image is pseudoAnd (5) generating a detection result of the face image. Conversely, if the value of the preset selection ratio r is smaller and approaches 0, i.e. the image enhancement intensity is weaker, γ will also gradually decrease and approach 0, i.e. y _i =γ≡0, then it is actually expected that the face detection model can obtain a detection result that the first image is a true face image when face detection is performed on the first image.

In this specification, the terminal device may calculate the loss function according to minimizing the deviation between the detection result corresponding to the first image and the label result corresponding to the first image, and the deviation between the detection result corresponding to the second image and the label result corresponding to the counterfeit image, and train the face detection model according to the obtained loss value, which may refer to the following formula:

wherein s is _i The value interval is between 0 and 1 for representing the probability that the image input into the face detection model is a fake face image (or a real face image). L (L) _bce For representing a loss value obtained by calculating a deviation between a detection result of the image and a label result.

In practical application, the real face image is destroyed more or less when being forged, so in the mode, the real face image is enhanced according to different enhancement degrees, which is actually equivalent to the destruction of the real face image according to different degrees, and the first image is provided with a label result which is between the detection result (namely 0) of the real face image and the detection result (namely 1) of the forged face image to train the face detection model, so that the face detection model can learn trace features left by the face image after the destruction of different degrees, and the forged face image can be effectively identified in practical application, so as to ensure the information safety of users.

Of course, in practical application, the terminal device may not only consider the similarity between the image features, but also consider the detection result of the face image, and comprehensively train the face detection model, that is, the terminal device may combine the two losses to train the face detection model.

The terminal equipment can train the face detection model in various combination modes. For example, the terminal device may train the face detection model by maximizing the similarity between the image features corresponding to the face images in the same sample pair, minimizing the similarity between the image features corresponding to the face images in different sample pairs, minimizing the deviation between the detection result corresponding to the first image and the label result corresponding to the first image, and minimizing the deviation between the detection result corresponding to the second image and the label result corresponding to the second image.

For another example, the enhanced image of the real face image is considered to more effectively improve the detection capability of the face detection model, so that the terminal device can train the face detection model by only considering the deviation between the detection result of the first image and the label result corresponding to the first image besides considering the similarity between the image features. That is, the terminal device may train the face detection model by maximizing the similarity between the image features corresponding to the face images in the same sample pair, minimizing the similarity between the image features corresponding to the face images in different sample pairs, and minimizing the deviation between the detection result corresponding to the first image and the label result corresponding to the first image.

Of course, the terminal device may only consider the deviation between the detection result of the second image and the label result corresponding to the second image, in addition to the similarity between the image features, to train the face detection model, that is, train the face detection model by maximizing the similarity between the image features corresponding to the face image in the same sample pair, minimizing the similarity between the image features corresponding to the face image in different sample pairs, and minimizing the deviation between the detection result corresponding to the second image and the label result corresponding to the counterfeit face image.

In summary, in the model training stage of the present specification, the total loss function of the overall training process of the face detection model may be expressed as l=l _ssa +L _bce . And the terminal equipment carries out iterative training on the model according to the total loss value obtained by the total loss function so as to obtain a face detection model capable of accurately carrying out identity verification. And by combining the two training methods to train the face detection model, the detection capability of the face detection model on local detail information such as materials, textures and the like in the face image is improved, the recognition interception capability of the face detection model on the destroyed face image is also improved, and the safety and reliability of a user in the identity verification process are further improved.

Besides the training mode, the terminal equipment can further train the face detection model by combining a detection result obtained by detecting the real face image and a detection result obtained by detecting the fake face image. That is, in addition to the above-mentioned loss, the terminal device may further add an optimization objective to minimize a deviation between the detection result corresponding to the real face image and the label result corresponding to the real face image, and/or minimize a deviation between the detection result corresponding to the counterfeit face image and the label result corresponding to the counterfeit face image, so as to train the face detection model, so as to further improve the detection capability of the face detection model.

Note that, in the present specification, the actual network structure of the face detection model in the terminal device is not specifically limited. The human face detection model has high expandability in the actual constitution of the human face detection model, can use a common network structure common in the prior art to extract the characteristics of the human face image, and can also adopt a more advanced higher network structure, such as a network structure ViT based on a transducer, and the like to form the human face detection model.

The method provided in the present specification is mainly divided into two phases, a model training phase and an actual application phase. The model training stage is mainly used for acquiring the face detection model with the face detection function after model training, so that the face detection model can be applied to a server or terminal equipment used in the identity verification process in the actual application stage. In order to facilitate introduction of the authentication method, a flow diagram of the authentication method will be described below, as shown in fig. 3.

Fig. 3 is a flow chart of a method of identity verification provided in the present specification, including the following steps:

s301: and acquiring a face image of the user.

To date, rapid progress of various emerging technologies brings convenience to daily life of people, and face recognition technology is one of which influence is great. The wide application of various facial recognition in the life of people brings convenience, and meanwhile, the generated facial attack behaviors seriously threaten the privacy and safety of personal data of users. Therefore, how to accurately filter the fake face information in the authentication process and enable the real user to pass the authentication, so that the protection of the user information is important.

For this reason, the present specification provides a method for authentication, where the execution subject adopted in the method provided in the present specification may be a server, or may be a terminal device such as a desktop computer, a notebook computer, or the like, and in addition thereto, the execution subject in the present specification may also be a subject in the form of software, such as a client installed in the terminal device, or the like. For convenience of explanation, the method of authentication provided will be explained below with the server as the execution subject.

Based on the above, the server applying the authentication method provided by the specification can perform face detection on the obtained face image of the user, so as to judge whether the obtained face image is a real face image, if so, the obtained face image is matched with the face image of each user stored in the database, thereby determining the identity of the user, and opening the next corresponding operation authority for the user.

The authentication task executed by the server may be specific to an actual scene, for example, in a scene of entering and exiting a train station by brushing a face during an outgoing trip, the server may detect a face image of a user acquired at a scene gate through a face detection model, and if the detected face image is a real face image, match the face image acquired at the scene gate with the face image of each user stored in a database to determine passenger information; for another example, in a scene of fast checkout through face-brushing payment during supermarket shopping checkout, the server can perform face detection on the collected face image through a face detection model, and if the face image is confirmed to be a real face image, the collected face image is matched with face images of all users stored in a database so as to locate a specific deduction account, and payment operation is performed.

S302: and inputting the face image into a face detection model to obtain a face image detection result corresponding to the face image, wherein the face detection model is obtained by training according to the model training method.

S303: and carrying out identity authentication on the user according to the face image detection result.

In the specification, a trained face detection model is preset in a server, and face detection is performed on an acquired face image of a user by using the trained face detection model to obtain a detection result of the face image. For each acquired face image, the detection result of the face image may be used to indicate whether the face image is a real face image of the user.

If the detection result of the face image shows that the face image is a real face image, performing the next authentication process on the face image, thereby determining the identity information of the user and opening the next operation authority for the user; if the detection result of the face image indicates that the face image is not a real face image, namely, a fake face image, the face image is directly judged to not pass through the authentication link.

From the above, it can be seen that, in the model training method and the identity verification method provided in the present specification, after the image enhancement is performed on the real face image and the corresponding fake face image, a corresponding first image and second image are obtained, and the first sample pair and the second sample pair formed by the images are used to train the face detection model to be trained, so as to improve the detection capability of the face detection model on local detail information such as materials, textures, and the like in the face image.

In addition, a label result between the detection result of the real face image and the detection result of the fake face image is introduced to the first image in the model training process, and the label result can enable the face detection model to learn trace features left by the face image damaged to different degrees, so that the face detection model can effectively improve the detection capability of the face detection model on the face image, has stronger robustness, can also have higher detection and identification precision on various fake images, effectively prevents user information leakage, and ensures the safety of personal information of users.

The foregoing is a method of one or more implementations of the present specification, and the present specification further provides a corresponding apparatus for model training based on the same concept, as shown in fig. 4.

Fig. 4 is a schematic diagram of a model training apparatus provided in the present specification, including:

an obtaining module 401, configured to obtain a real face image and a fake face image corresponding to the real face image;

an enhancement module 402, configured to perform image enhancement on a part of the image area in the real face image to obtain a first image, and perform image enhancement on a part of the image area in the counterfeit face image to obtain a second image;

A construction module 403, configured to construct a first sample pair according to the real face image and the first image, and construct a second sample pair according to the fake face image and the second image;

a detection module 404, configured to input the first sample pair and the second sample pair into a face detection model to be trained, so as to determine, through the face detection model, an image feature corresponding to each face image included in the first sample pair, and determine an image feature corresponding to each face image included in the second sample pair;

the training module 405 is configured to train the face detection model with the objective of maximizing the similarity between the corresponding image features of the face images in the same sample pair and minimizing the similarity between the corresponding image features of the face images in different sample pairs.

Optionally, the enhancing module 402 is specifically configured to divide the real face image to divide each image area in the real face image, and divide the fake face image to divide each image area in the fake face image; selecting partial image areas with the corresponding number of the preset image enhancement proportion from all the image areas divided in the real face image according to the preset image enhancement proportion, and carrying out image enhancement to obtain a first image; and selecting partial image areas with the corresponding number of the preset image enhancement proportion from the image areas divided in the fake face image according to the preset image enhancement proportion, and carrying out image enhancement to obtain a second image.

Optionally, the detection module 404 is specifically configured to input the first image into the face detection model to obtain a detection result corresponding to the first image;

the training module 405 is specifically configured to train the face detection model with an optimization objective that maximizes a similarity between corresponding image features of face images in the same sample pair, minimizes a similarity between corresponding image features of face images in different sample pairs, and minimizes a deviation between a detection result corresponding to the first image and a label result corresponding to the first image.

Optionally, the detection module 404 is specifically configured to output a first value if the face detection model determines that the input face image is a real face image, output a second value if the face detection model determines that the input face image is a fake face image, and if the first image has a higher enhancement degree than the real face image, the label result corresponding to the first image is more similar to the second value.

Optionally, the detection module 404 is specifically configured to input the second image into the face detection model to obtain a detection result corresponding to the second image;

The training module 405 is specifically configured to train the face detection model with an optimization objective of maximizing a similarity between corresponding image features of face images in the same sample pair and minimizing a similarity between corresponding image features of face images in different sample pairs, and minimizing a deviation between a detection result corresponding to the second image and a label result corresponding to the counterfeit face image.

Based on the same thought, the present specification also provides a corresponding device for identity verification, as shown in fig. 5.

Fig. 5 is a schematic diagram of an apparatus for authentication provided in the present specification, including:

the acquisition module 501: acquiring a face image of a user;

the detection module 502: inputting the face image into a face detection model to obtain a face image detection result corresponding to the face image, wherein the face detection model is obtained by training according to the model training method;

verification module 503: and carrying out identity authentication on the user according to the face image detection result.

The present specification also provides a computer readable storage medium storing a computer program operable to perform the method of model training and the method of identity verification provided in figures 1 and 3 above.

The present specification also provides a schematic structural diagram of the electronic device shown in fig. 6. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 6, although other hardware required by other services may be included. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the model training method shown in fig. 1 or the authentication method shown in fig. 3. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Ci rcu it Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present application.

Claims

1. A method of model training, comprising:

2. The method of claim 1, wherein the image enhancement is performed on a part of the image area in the real face image to obtain a first image, and the image enhancement is performed on a part of the image area in the fake face image to obtain a second image, specifically including:

3. The method of claim 1, further comprising, prior to training the face detection model with the objective of maximizing similarity between corresponding image features of face images in the same sample pair and minimizing similarity between corresponding image features of face images in different sample pairs:

4. A method according to claim 3, wherein if the face detection model determines that the input face image is a real face image, a first numerical value is output, if the face detection model determines that the input face image is a fake face image, a second numerical value is output, and if the first image has a higher enhancement degree than the real face image, a label result corresponding to the first image is more approximate to the second numerical value.

5. A method according to claim 1 or 3, the method further comprising, prior to training the face detection model with the aim of maximizing similarity between corresponding image features of face images in the same sample pair, minimizing similarity between corresponding image features of face images in different sample pairs, and minimizing deviation between the detection result corresponding to the first image and the label result corresponding to the first image as optimization targets:

6. A method of identity verification, comprising:

acquiring a face image of a user;

inputting the face image into a face detection model to obtain a face image detection result corresponding to the face image, wherein the face detection model is obtained by training according to the method of any one of claims 1-5;

7. An apparatus for model training, comprising:

8. The apparatus of claim 7, the enhancement module being specifically configured to divide the real face image to divide each image region in the real face image, and divide the counterfeit face image to divide each image region in the counterfeit face image; selecting partial image areas with the corresponding number of the preset image enhancement proportion from all the image areas divided in the real face image according to the preset image enhancement proportion, and carrying out image enhancement to obtain a first image; and selecting partial image areas with the corresponding number of the preset image enhancement proportion from the image areas divided in the fake face image according to the preset image enhancement proportion, and carrying out image enhancement to obtain a second image.

9. The apparatus of claim 7, wherein the detection module is specifically configured to input the first image into the face detection model to obtain a detection result corresponding to the first image;

10. The apparatus of claim 9, wherein the detection module is specifically configured to output a first value if the face detection model determines that the input face image is a real face image, output a second value if the face detection model determines that the input face image is a fake face image, and enable a label result corresponding to the first image to approach the second value if the enhancement degree of the first image compared with the real face image is higher.

11. The apparatus of claim 7 or 9, wherein the detection module is specifically configured to input the second image into the face detection model to obtain a detection result corresponding to the second image;

12. An apparatus for authentication, comprising:

and a detection module: inputting the face image into a face detection model to obtain a face image detection result corresponding to the face image, wherein the face detection model is obtained by training according to the method of any one of claims 1-5;

13. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-6.

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-6 when the program is executed.