CN115731620A

CN115731620A - Method for detecting counter attack and method for training counter attack detection model

Info

Publication number: CN115731620A
Application number: CN202211376287.9A
Authority: CN
Inventors: 曹佳炯
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2023-03-03

Abstract

The embodiment of the specification discloses a method for detecting an anti-attack, a method for training an anti-attack detection model, a device, a storage medium and electronic equipment, which are used for extracting the characteristics of a plurality of face images acquired by a target object under different illumination conditions to obtain the reference image characteristics of each face image. And performing characteristic reconstruction on the reference image characteristics of each face image to obtain the reconstructed image characteristics of each face image. Because the anti-attack images have larger fluctuation under different illumination conditions, whether the face images are the anti-attack images or not can be determined based on the similarity between the reference image characteristics of the face images and the reconstructed image characteristics corresponding to the same illumination conditions, and therefore the detection of the anti-attack is achieved.

Description

Method for detecting counter attack and method for training counter attack detection model

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method for detecting an attack countermeasure, a method for training an attack countermeasure detection model, an apparatus, a storage medium, and an electronic device.

Background

With the development of computer technology, face recognition technology has been widely used in recent years, for example, a face recognition system using face recognition technology is widely deployed on a payment platform, and a user can quickly complete payment through the face recognition system.

However, while bringing convenience to production and life of people, the face recognition system is also under the test of various attacks, for example, counterattack is a kind of attack mode with a large threat, and counterattack refers to an identity counterfeiting attack performed by wearing and pasting counterpatterns (such as stickers) on the face in the face recognition stage. Since the counter attack is more covert than the normal live attack, a solution for detecting the counter attack is urgently needed.

Disclosure of Invention

The present specification provides a method for detecting an anti-attack, a method for training an anti-attack detection model, an apparatus, a storage medium, and an electronic device, which can detect an anti-attack in a face recognition process, and improve the security of face recognition.

In one aspect, an embodiment of the present specification provides a method for detecting an anti-attack, including:

performing feature extraction on a plurality of face images of a target object to obtain reference image features of the face images, wherein the face images are acquired under different illumination conditions;

performing feature reconstruction on the reference image features of each face image to obtain reconstructed image features of each face image, wherein the reference image features and the reconstructed image features obtained after the reference image features are subjected to feature reconstruction correspond to different illumination conditions;

and determining whether the plurality of face images are anti-attack images or not based on the similarity between the reference image features of the face images and the reconstructed image features corresponding to the same illumination condition.

In one aspect, an embodiment of the present specification provides a method for training an anti-attack detection model, including:

inputting a plurality of sample face images of a sample object into an anti-attack detection model, and performing feature extraction on the plurality of sample face images through the anti-attack detection model to obtain reference image features of the plurality of sample face images, wherein the plurality of sample face images are acquired under different illumination conditions;

generating feature response graphs of the plurality of sample face images based on the plurality of sample face images and reference image features of the plurality of sample face images through the counter attack detection model, wherein the feature response graphs are used for representing corresponding relations of the reference image features in the sample face images;

training the counter attack detection model based on first difference information between the reference image features of every two sample face images in the plurality of sample face images and second difference information between the feature response images of every two sample face images, wherein the counter attack detection model is used for determining whether the face images are counter attack images or not.

In one aspect, an embodiment of the present specification provides an apparatus for detecting an anti-attack, including:

the characteristic extraction unit is used for extracting the characteristics of a plurality of face images of a target object to obtain the reference image characteristics of each face image, and the face images are acquired under different illumination conditions;

the characteristic reconstruction unit is used for performing characteristic reconstruction on the reference image characteristics of each face image to obtain the reconstructed image characteristics of each face image, and the reference image characteristics and the reconstructed image characteristics obtained after the reference image is subjected to the characteristic reconstruction correspond to different illumination conditions;

and the attack detection unit is used for determining whether the face images are anti-attack images or not based on the similarity between the reference image characteristics of the face images and the reconstructed image characteristics corresponding to the same illumination condition.

In a possible implementation manner, the feature extraction unit is configured to input the face images into an anti-attack detection model, and perform feature extraction on the face images through the anti-attack detection model to obtain reference image features of the face images.

In a possible implementation manner, the feature extraction unit is configured to, for any one of the face images, perform any one of convolution, full concatenation, and attention coding on the face image through a feature extraction sub-model of the counter attack detection model to obtain a reference image feature of the face image.

In a possible implementation manner, the feature reconstruction unit is configured to input the reference image feature of each face image into a counter attack detection model, perform feature reconstruction on the reference image feature of each face image through the counter attack detection model, and output the reconstructed image feature of each face image.

In a possible implementation manner, the feature reconstruction unit is configured to perform full connection on the reference image features of each face image multiple times through the anti-attack detection model, and output the reconstructed image features of each face image.

In a possible implementation manner, the attack detection unit is configured to fuse similarity between a reference image feature of each face image and a reconstructed image feature corresponding to the same illumination condition to obtain anti-attack scores of the plurality of face images; determining whether the plurality of face images are counter attack images based on the counter attack scores.

In a possible implementation, the attack detection unit is configured to perform any one of:

determining the plurality of face images as anti-attack images under the condition that the anti-attack scores are smaller than a score threshold value;

determining that the plurality of face images are not anti-attack images if the anti-attack score is greater than or equal to the score threshold.

In a possible embodiment, the apparatus further comprises:

a filtering unit for acquiring a plurality of initial images of the target object, the plurality of initial images being acquired under different lighting conditions; and filtering the plurality of initial images to obtain the plurality of face images.

In a possible implementation manner, the filtering unit is configured to perform face detection on the plurality of initial images, and determine whether each of the initial images includes a face; deleting the initial images which do not comprise the human faces in the plurality of initial images to obtain a plurality of human face filtering images; determining an image quality score of each face filtering image; and deleting the face filtering images with the image quality scores smaller than the quality score threshold value in the plurality of face filtering images to obtain the plurality of face images.

In one aspect, an embodiment of the present specification provides an apparatus for training an attack-countermeasure detection model, including:

the system comprises an input unit, a detection unit and a comparison unit, wherein the input unit is used for inputting a plurality of sample face images of a sample object into an anti-attack detection model, and performing feature extraction on the plurality of sample face images through the anti-attack detection model to obtain reference image features of the plurality of sample face images, and the plurality of sample face images are acquired under different illumination conditions;

a feature response graph generating unit, configured to generate, by the counter attack detection model, feature response graphs of the plurality of sample face images based on the plurality of sample face images and reference image features of the plurality of sample face images, where the feature response graphs are used to represent correspondence relationships of the reference image features in the sample face images;

and the training unit is used for training the counter attack detection model based on first difference information between the reference image features of every two sample face images in the plurality of sample face images and second difference information between the feature response images of every two sample face images, and the counter attack detection model is used for determining whether the face images are counter attack images or not.

In a possible implementation manner, the feature response map generating unit is configured to determine, for any sample face image in the multiple sample face images, weights of multiple pixel points in the sample face image based on a reference image feature of the sample face image through the anti-attack detection model; and generating a characteristic response image of the sample face image based on the sample face image and the weights of a plurality of pixel points in the sample face image.

In a possible embodiment, the training unit is configured to construct a first loss function based on the first difference information and the second difference information; and training a feature extraction sub-model of the anti-attack detection model based on the first loss function, wherein the feature extraction sub-model is used for extracting the reference image features of the face image.

In a possible implementation manner, the training unit is further configured to perform identity recognition based on reference image features of the plurality of sample face images to obtain a predicted identity corresponding to each sample face image;

and training the feature extraction submodel of the anti-attack detection model based on third difference information between the corresponding predicted identities of every two sample face images.

In a possible implementation manner, the training unit is further configured to perform feature reconstruction on a reference image feature of a first sample face image in the multiple sample face images through the anti-attack detection model to obtain a reconstructed image feature of the first sample face image, where the reconstructed image feature of the first sample face image and the reference image feature of a second sample face image in the multiple sample face images correspond to the same illumination condition; and training the anti-attack detection model based on fourth difference information between the reconstructed image feature of the first sample face image and the reference image feature of the second sample face image.

In a possible implementation manner, the training unit is further configured to construct a second loss function based on the fourth difference information; and training a characteristic reconstruction submodel of the counterattack detection model based on the second loss function, wherein the characteristic reconstruction submodel is used for carrying out characteristic reconstruction.

In one aspect, embodiments of the present specification provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and execute the above method.

In one aspect, an embodiment of the present specification provides an electronic device, including: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method.

In one aspect, embodiments of the present specification provide a computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to perform the method described above.

Through the technical scheme provided by the embodiment of the specification, the characteristic extraction is carried out on a plurality of face images of the target object collected under different illumination conditions, and the reference image characteristic of each face image is obtained. And performing characteristic reconstruction on the reference image characteristics of each face image to obtain the reconstructed image characteristics of each face image. Because the anti-attack images have larger fluctuation under different illumination conditions, whether the face images are the anti-attack images or not can be determined based on the similarity between the reference image characteristics of the face images and the reconstructed image characteristics corresponding to the same illumination conditions, and therefore the detection of the anti-attack is achieved.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only some embodiments of the present specification, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a method for detecting an anti-attack according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a method for detecting an attack countermeasure provided in an embodiment of the present specification;

fig. 3 is a flowchart of another method for detecting counterattack according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an interface provided in an embodiment of the present disclosure;

FIG. 5 is a flowchart of another method for detecting counterattack according to an embodiment of the present disclosure;

FIG. 6 is a flow chart of a method for training an anti-attack detection model provided by an embodiment of the present specification;

fig. 7 is a schematic structural diagram of an attack-countermeasure detection model provided in an embodiment of the present specification;

fig. 8 is a schematic structural diagram of an apparatus for detecting an anti-attack according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an apparatus for training an anti-attack detection model according to an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of an electronic device provided in an embodiment of the present specification.

Detailed Description

In order to make the features and advantages of the present specification more apparent and understandable, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only a part of the embodiments of the present specification, and not all the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without making any creative effort fall within the protection scope of the present specification.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.

Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

Biological identification: the biometric identification technology is to closely combine a computer with high-tech means such as optics, acoustics, biosensors and the principle of biometrics, and identify the identity of an individual by utilizing the inherent physiological characteristics (such as fingerprints, facial images, irises and the like) and behavior characteristics (such as handwriting, voice, gait and the like) of the human body.

Face recognition: face recognition is a biometric technique for identifying an identity based on facial feature information of a person. A series of related technologies, also commonly called face recognition and face recognition, are used to collect images or video streams containing faces by using a camera or a video camera, automatically detect and track the faces in the images, and then perform face recognition on the detected faces.

And (3) living body detection: the living body detection is a method for determining the real physiological characteristics of an object in some identity verification scenes, and in the application of face recognition, the living body detection can verify whether a user operates for the real living body by combining actions of blinking, mouth opening, shaking, nodding and the like and using technologies such as face key point positioning, face tracking and the like. Common attack means such as photos, videos, face changing, masks, shelters, 3D animations and screen reproduction can be effectively resisted, so that the user can be helped to discriminate fraudulent behaviors, and the benefit of the user is guaranteed.

Active light: in the embodiment of the present specification, the method is used for actively lighting by using a screen as a light source in a face recognition stage.

And (3) resisting the attack: in the embodiment of the specification, the face recognition stage is to carry out identity counterfeiting attack by wearing and pasting a countermeasure pattern (such as a sticker) on the face.

And (3) resisting attack detection: refers to a technique and method for detecting and intercepting the above attacks.

Normalization: and the arrays with different value ranges are mapped to the (0, 1) interval, so that the data processing is facilitated. In some cases, the normalized values may be directly implemented as probabilities.

Random inactivation (Dropout): the method is used for optimizing the artificial neural network with the deep structure, and reduces interdependency among nodes by randomly zeroing partial weight or output of a hidden layer in the learning process, thereby realizing regularization of the neural network and reducing the structural risk of the neural network. For example, in the model training process, there is a vector (1, 2,3, 4), and after inputting the vector into the random inactivation layer, the random inactivation layer can randomly convert a number in the vector (1, 2,3, 4) into 0, for example, 2 into 0, and then the vector becomes (1, 0,3, 4).

Learning Rate (Learning Rate): the learning rate can guide how the model adjusts the network weight by using the gradient of the loss function in the gradient descent method. If the learning rate is too large, the loss function can directly cross the global optimum point, and the loss is too large at the moment; if the learning rate is too small, the change speed of the loss function is slow, which greatly increases the convergence complexity of the network and is easily trapped in a local minimum or saddle point.

Embedded Coding (Embedded Coding): the embedded code mathematically represents a corresponding relationship, namely data on an X space is mapped to a Y space through a function F, wherein the function F is a single-shot function, the mapping result is structure storage, the single-shot function represents that the mapped data is uniquely corresponding to the data before mapping, the structure storage represents that the size relationship of the data before mapping is the same as the size relationship of the mapped data after the size relationship of the data before mapping is stored, for example, data X1 and X2 exist before mapping, and Y1 corresponding to X1 and Y2 corresponding to X2 are obtained after mapping. If the pre-mapped data X1 > X2, then correspondingly, the post-mapped data Y1 is greater than Y2. For words, the words are mapped to another space, so that subsequent machine learning and processing are facilitated.

Attention weight: may represent the importance of certain data in the training or prediction process, the importance representing the magnitude of the impact of the input data on the output data. The data of high importance has a high value of attention weight, and the data of low importance has a low value of attention weight. Under different scenes, the importance of data is different, and the process of training attention weight of the model is the process of determining the importance of the data.

It should be noted that the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, displayed data, etc.) and signals referred to in the embodiments of the present description are authorized by the user or fully authorized by various parties, and the collection, use and processing of the relevant data need to comply with relevant laws and regulations and standards in relevant countries and regions. For example, the face image referred to in the embodiments of the present specification is acquired with sufficient authorization.

Next, an environment for implementing the technical solution provided in the embodiments of the present specification will be described.

Fig. 1 is a schematic diagram of an implementation environment of a method for detecting a counter attack according to an embodiment of the present disclosure, and referring to fig. 1, the implementation environment includes a terminal 110 and a server 120.

The terminal 110 is connected to the server 120 through a wireless network or a wired network. Optionally, the terminal 110 is a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart watch, etc., but is not limited thereto. The terminal 110 is installed and operated with an application program supporting face recognition.

The server 120 is an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, distribution Network (CDN), and a big data and artificial intelligence platform. The server 120 provides a background service for the application running on the terminal 110, for example, the server 120 provides a corresponding service for the application running on the terminal 110, and in this embodiment of the present specification, the server 120 provides a background service for the application running on the terminal and supporting face recognition.

Those skilled in the art will appreciate that the number of terminals 110 and servers 120 described above may be greater or fewer. For example, each of the terminal 110 and the server 120 is only one, or each of the terminal 110 and the server 120 is dozens or hundreds, or more, in this case, other terminals and servers are also included in the implementation environment, and the number of the terminals and the types of the devices are not limited in the embodiments of the present specification.

After the implementation environment of the embodiment of the present specification is described, an application scenario of the embodiment of the present specification will be described below with reference to the implementation environment, in the following description, a terminal is a terminal 110 in the implementation environment, and a server is a server 120 in the implementation environment. The technical solution provided in the embodiment of the present specification can be applied to various scenarios in which a face recognition system is applied, for example, to various payment applications that provide a face-brushing payment function, or to various payment devices that provide a face-brushing payment function, or to various vending machines with a face-brushing payment function, or to various access control devices with face recognition, which is not limited in the embodiment of the present specification.

The technical scheme provided by the embodiment of the specification is applied to various payment applications providing a face-brushing payment function as an example, when the face-brushing payment function provided by the payment applications is used, a terminal collects a plurality of face images of a target object under different illumination conditions, the terminal is a terminal for running the payment applications, and the target object is a user using the terminal. The terminal sends the plurality of face images of the target object to the server, and the server obtains the plurality of face images of the target object. The server extracts the features of the face images to obtain the reference image features of the face images, and the reference image features of the face images are used for representing the face features of the target object under different illumination conditions as the face images are collected under different illumination conditions. And the server carries out characteristic reconstruction on the reference image characteristics of each face image to obtain the reconstructed image characteristics of each face image, wherein the reference image characteristics and the reconstructed image obtained after reconstructing the reference image characteristics correspond to different illumination conditions. And the server determines whether the plurality of images are anti-attack images or not based on the similarity between the reference image features of the face images and the reconstructed image features corresponding to the same illumination condition, so that the detection on the anti-attack is realized.

The above description is given by taking an example that the technical scheme provided in the embodiment of the present specification is applied to various payment applications providing a face-brushing payment function, and in the above other application scenarios, counterattack can be detected in the above manner, and the specific process is not described herein again.

After the implementation environment and the application scenario of the embodiment of the present specification are introduced, the technical solutions provided by the embodiments of the present specification are introduced below, with reference to fig. 2, where an execution subject is a server, and the method includes the following steps.

202. The server extracts the features of a plurality of face images of the target object to obtain the reference image features of each face image, wherein the face images are collected under different illumination conditions.

Wherein the target object is a user using a face recognition service, and the lighting condition includes at least one of a color of lighting and an intensity of lighting. The plurality of facial images are acquired under different lighting conditions, which means that any two facial images in the plurality of facial images correspond to different lighting conditions. In some embodiments, the plurality of face images are acquired by the terminal, and the different lighting conditions are generated by the terminal when the plurality of face images are acquired.

204. And the server performs characteristic reconstruction on the reference image characteristics of each face image to obtain the reconstructed image characteristics of each face image, wherein the reference image characteristics and the reconstructed image characteristics obtained after the reference image is subjected to the characteristic reconstruction correspond to different illumination conditions.

The purpose of feature reconstruction is to generate a reconstructed image feature under another illumination condition based on a reference image feature under one illumination condition.

206. And the server determines whether the plurality of face images are anti-attack images or not based on the similarity between the reference image features of the face images and the reconstructed image features corresponding to the same illumination condition.

Whether the face images are anti-attack images or not refers to whether the face images contain anti-attack patterns or not, and the face images are anti-attack images under the condition that the face images contain the anti-attack patterns; in the case where the plurality of face images do not include the counter attack pattern, the plurality of face images are not counter attack images. The process of determining whether the plurality of face images are counterattack images is to perform counterattack detection.

According to the technical scheme provided by the embodiment of the specification, the characteristics of a plurality of face images acquired by a target object under different illumination conditions are extracted, and the reference image characteristics of each face image are obtained. And performing characteristic reconstruction on the reference image characteristics of each face image to obtain the reconstructed image characteristics of each face image. Because the anti-attack images have larger fluctuation under different illumination conditions, whether the plurality of face images are anti-attack images can be determined based on the similarity between the reference image characteristics of each face image and the reconstructed image characteristics corresponding to the same illumination condition, and therefore the detection of the anti-attack is realized.

The above steps 202 to 206 are simple descriptions of the technical solutions provided by the embodiments of the present disclosure, and in order to more clearly describe the technical solutions provided by the embodiments of the present disclosure, the technical solutions provided by the embodiments of the present disclosure will be described below with reference to some examples, and referring to fig. 3, the method includes the following steps.

302. The server obtains a plurality of initial images of the target object, wherein the plurality of initial images are acquired under different lighting conditions.

Wherein the target object is a user using a face recognition service, and the lighting condition includes at least one of a color of lighting and an intensity of lighting. The plurality of initial images are acquired under different lighting conditions, which means that any two of the plurality of initial images correspond to different lighting conditions. In some embodiments, the plurality of initial images are acquired by the terminal during face recognition, and the different lighting conditions are generated by the terminal during acquisition of the plurality of initial images. Different lighting conditions are introduced in the initial image, i.e. active light information is introduced in the initial image.

In one possible implementation, in response to a face recognition operation, the terminal acquires a plurality of initial images of the target object under different lighting conditions, wherein the different lighting conditions are generated by the terminal control screen. The terminal sends the plurality of initial images to the server, and the server acquires the plurality of initial images.

The terminal is a terminal running with payment applications, or a payment device providing a face-brushing payment function, or an automatic vending machine with a face-brushing payment function, or an access control device with face recognition, and the like, and the embodiment of the specification does not limit the terminal.

In this embodiment, in response to the face recognition operation, the terminal controls the screen to generate different lighting conditions, acquires a plurality of initial images of the target object under the different lighting conditions, and sends the plurality of initial images to the server, so that the server subsequently realizes detection of the attack countermeasure based on the plurality of initial images.

For example, in response to a face recognition operation, the terminal displays a face recognition interface for prompting that the target object is about to start face recognition. After the target duration, the terminal controls the screen to generate a plurality of lighting conditions. The terminal controls the shooting equipment to shoot initial images, and at least one initial image is shot under each illumination condition. The terminal sends the plurality of shot initial images of the target object to the server, and the server obtains the plurality of initial images of the target object. The target duration is set by a technician according to an actual situation, and is not limited by the embodiment of the present specification.

For example, taking the application of the technical solution provided in the embodiment of the present specification in the payment field as an example, referring to fig. 4, the terminal displays a payment method selection interface 400, and a plurality of payment methods are displayed on the payment method selection interface. In the case that a face brushing payment 401 in the multiple payment modes is selected, the terminal displays a face recognition interface 402, wherein the face recognition interface 402 comprises an image preview area 403, and the image preview area 403 is used for previewing an image acquired by the terminal. Correspondingly, the face recognition interface 402 further includes a prompt information display area 404, where the prompt information display area 404 is used for displaying prompt information, and the prompt information includes the remaining duration for starting face recognition. After the target duration, the terminal controls the screen to emit light of multiple colors or controls the screen to alternately flash light and dark, wherein the light of multiple colors and the light and dark of the screen alternately correspond to multiple lighting conditions. The terminal controls the camera to shoot at least one initial image under each lighting condition. The terminal sends the plurality of shot initial images to the server, and the server receives the plurality of initial images of the target object.

In one possible embodiment, the server obtains a plurality of initial images of the target object from an object image database, the plurality of initial images being captured under different lighting conditions, the object image database storing the plurality of initial images of the target object, the plurality of initial images of the target object being captured under different lighting conditions. In some embodiments, for any one of the plurality of objects, after the terminal used by the object acquires the initial image of the object under different lighting conditions, the initial image of the object is uploaded to an object image database, the server acquires the initial image of the object from the object image database, and a subsequent face recognition related operation is performed based on the initial image of the object. Of course, after the initial image in the object image database is processed by the initial object, the initial image in the object image database is deleted, so that the privacy of the objects is protected, and the abuse of the initial image is avoided.

In the embodiment, the object image database is set as the transfer of the initial image, so that the server cannot timely process the situation of data loss when a large number of concurrent human face recognition and anti-attack detection tasks occur, and the success rate of the human face recognition and anti-attack detection is improved.

304. And the server filters the plurality of initial images to obtain a plurality of face images of the target object.

The process of filtering the plurality of initial images is a process of preprocessing the plurality of initial images, and the purpose of preprocessing is to filter out initial images that do not meet the condition in the plurality of initial images, where the condition is set by a technician according to an actual situation, and the embodiment of the present specification does not limit this.

In one possible implementation, the server performs face detection on the plurality of initial images, and determines whether each of the initial images includes a face. And the server deletes the initial images which do not comprise the human faces in the plurality of initial images to obtain a plurality of human face filtering images. The server determines the image quality scores of the face filter images. And the server deletes the face filtering images of which the image quality scores are smaller than the quality score threshold value from the plurality of face filtering images to obtain the plurality of face images.

The purpose of face detection is to determine whether an image includes a face, and since the technical scheme provided by the embodiment of the present specification is applied to the field of face recognition, an image processed by a server is an image including a face, and by performing face detection on the initial image, it can be determined whether a plurality of initial images include a face. The image quality score is used for representing the quality of the initial image, and the higher the image quality score is, the better the quality of the initial image is; the lower the image quality score, the worse the quality of the initial image. The quality score threshold is set by a technician according to actual conditions, and is not limited by the embodiment of the specification.

In this embodiment, before performing the counterattack detection, the server can filter the initial image in a face detection and image quality score determination manner, so as to obtain a plurality of qualified face images, and the counterattack detection based on the face images has high accuracy.

For example, the server inputs a plurality of initial images of the target object into a face detection model, performs face detection on the plurality of initial images through the face detection model, and outputs prediction labels of the plurality of initial images, where the prediction labels are used to indicate whether corresponding initial images include faces, and the face detection model is a binary classification model. And the server deletes the initial images of which the predicted labels represent that the initial images do not comprise the face images in the plurality of initial images to obtain a plurality of face filtering images of the target object, wherein the plurality of face filtering images are all images comprising faces. The server determines the Image Quality scores of the face filter images by means of Subjective Image Quality Assessment (S-IQA) or Objective Image Quality Assessment (O-IQA). And deleting the face filtering images with the image quality scores smaller than the quality score threshold value in the plurality of face filtering images by the server to obtain a plurality of face images of the target object.

For example, the server inputs a plurality of initial images of the target object into a face detection model, and performs feature extraction on the plurality of initial images through the face detection model to obtain initial image features of the plurality of initial images. And the server maps the initial image characteristics of the plurality of initial images through the face detection model to obtain the probability that the plurality of initial images comprise the faces. The server sets a prediction label of an initial image with a probability of a face being greater than a preset probability threshold as a first label, sets a prediction label of an initial image with a probability of a face being less than or equal to the preset probability threshold as a second label, the first label indicates that the initial image includes a face, the second label indicates that the initial image does not include a face, and the preset probability threshold is set by a technician according to the fact, which is not limited in the embodiments of the present specification. And the server deletes the initial images of the second labels in the plurality of initial images to obtain a plurality of face filtering images. The server determines the Image Quality Score of each face filter Image by means of Mean Opinion Score (MOS), full Reference (FR-IQA), half Reference (RR-IQA), no Reference (NR-IQA), or the like. And deleting the face filtering images with the image quality scores smaller than the quality score threshold value in the plurality of face filtering images by the server to obtain a plurality of face images of the target object.

In one possible implementation, the server determines image quality scores for a plurality of initial images of the target object. And the server deletes the initial images of which the image quality scores are smaller than the quality score threshold value in the plurality of initial images to obtain a plurality of quality filtering images of the target object. The server performs face detection on the plurality of quality filtering images and determines whether each quality filtering image comprises a face. The server deletes the quality filtering images which do not comprise the human faces in the plurality of quality filtering images to obtain a plurality of human face images of the target object.

In this embodiment, before performing the counterattack detection, the server can filter the initial image through image quality score determination, face detection and other manners to obtain a plurality of face images meeting the conditions, and the counterattack detection based on the face images has higher accuracy.

For example, the server determines the image quality scores of the plurality of initial images by way of subjective image quality assessment or objective image quality assessment. And deleting the initial images with the image quality scores smaller than the quality score threshold value in the plurality of initial images by the server to obtain a plurality of quality filtering images of the target object. The server inputs the multiple quality filtering images of the target object into a face detection model, performs face detection on the multiple quality filtering images through the face detection model, and outputs prediction labels of the multiple quality filtering images, wherein the prediction labels are used for indicating whether the corresponding quality filtering images comprise faces, and the face detection model is a two-classification model. The server deletes the quality filtering images of which the predicted labels represent that the images do not comprise the face images in the plurality of quality filtering images to obtain a plurality of face images of the target object, wherein the face images are all images comprising faces.

For example, the server determines the image quality score of each initial image by mean opinion score, full reference, half reference, or no reference. And deleting the initial images with the image quality scores smaller than the quality score threshold value in the plurality of initial images by the server to obtain a plurality of quality filtering images of the target object. And the server inputs the multiple quality filtering images of the target object into a face detection model, and performs feature extraction on the multiple quality filtering images through the face detection model to obtain the quality filtering image features of the multiple quality filtering images. And the server maps the quality filtering image characteristics of the quality filtering images through the face detection model to obtain the probability that the quality filtering images comprise the face. The server sets the prediction label of the quality filtering image with the probability of the face being greater than the preset probability threshold as a first label, sets the prediction label of the quality filtering image with the probability of the face being less than or equal to the preset probability threshold as a second label, the first label indicates that the quality filtering image includes the face, the second label indicates that the quality filtering image does not include the face, and the preset probability threshold is set by a technician according to the fact. And deleting the quality filtering images of the second labels in the quality filtering images by the server to obtain a plurality of face images.

It should be noted that, the above is described by taking an example that a plurality of initial images are filtered in a manner of performing face detection and image quality scores on the initial images to obtain face images of the target object, in other possible embodiments, the server may further add other filtering conditions on the basis of the face detection and the image quality scores, and this is not limited by the embodiments in this specification.

306. The server extracts the features of the face images of the target object to obtain the reference image features of the face images.

The purpose of extracting the features of the face image is to abstract the face image so as to facilitate subsequent anti-attack detection.

In a possible implementation manner, the server inputs the face images into a counter attack detection model, performs feature extraction on the face images through the counter attack detection model, and outputs reference image features of the face images.

Wherein, the method for training the anti-attack detection model is described in the following steps 502-510.

In this embodiment, the server can perform feature extraction on the plurality of face images through the anti-attack detection model, and the efficiency and the accuracy of the feature extraction are high.

For example, for any face image in the plurality of face images, the server performs any one of convolution, full connection and attention coding on the face image through the feature extraction submodel of the anti-attack detection model, and outputs the reference image feature of the face image.

In some embodiments, the feature extraction submodel of the anti-attack detection model takes stability of image features under different illumination conditions into consideration during training, and can extract features of the face image more accurately, so that the feature extraction submodel of the anti-attack detection model is also called as a lupont comparison model.

In order to more clearly illustrate the above embodiments, the feature extraction method in the above example will be further described by several examples.

Example 1, for any face image in the plurality of face images, the server extracts a sub-model through the feature of the anti-attack detection model, convolutes the face image, and outputs the reference image feature of the face image.

In a possible implementation manner, for any face image in the plurality of face images, the server uses at least one convolution kernel to perform sliding on the face image through the convolution layer of the feature extraction submodel, and performs convolution operation with the covered part in the sliding process to obtain the reference image feature of the face image. For example, the server extracts the convolution layer of the sub-model by the feature, slides on the face image by using a plurality of convolution kernels, and performs convolution operation with the covered part in the sliding process to obtain a plurality of convolution features of the plurality of convolution kernels, wherein the convolution kernels correspond to the convolution features one by one. And the server fuses the plurality of convolution characteristics to obtain the reference image characteristics of the face image. In some embodiments, the number of the plurality of convolution kernels is an integer multiple of the number of color channels of the face image.

In the above embodiment, the server can extract the reference image feature of the face image by convolution operation, and since the convolution operation is fast, the server can also extract the feature of the face image quickly.

It should be noted that, the above is described by taking an example that the server performs feature extraction on one face image through the feature extraction submodel to obtain the reference image feature of the face image, and a manner in which the server performs feature extraction on other face images in the plurality of face images and a manner in which the server performs feature extraction on the face image belong to the same inventive concept, and are not described herein again. In addition, the above description has been given by taking an example in which the server performs feature extraction on a single face image, and in other possible embodiments, the server may perform feature extraction on a plurality of face images at the same time, which is not limited in the embodiments of the present specification.

And 2, for any face image in the face images, the server fully connects the face images through the feature extraction submodel of the anti-attack detection model, and outputs the reference image feature of the face image.

In a possible implementation manner, for any face image in the plurality of face images, the server multiplies the face image by at least one fully-connected matrix through a fully-connected layer of the feature extraction sub-model to obtain a reference image feature of the face image. For example, the server multiplies the face image by a plurality of full-link matrixes through a full-link layer of the feature extraction submodel, and then performs pooling to obtain the reference image feature of the face image. By pooling, the dimensions of the features can be reduced, and the subsequent attack counterattack detection efficiency can be improved.

In the above embodiment, the server can extract the reference image feature of the face image in a full-connection manner, and since the full-connection speed is high, the server can also extract the feature of the face image quickly.

It should be noted that, the above is described by taking an example that the server performs feature extraction on one face image through the feature extraction submodel to obtain the reference image features of the face image, and the manner in which the server performs feature extraction on other face images in the plurality of face images and the manner in which the server performs feature extraction on the face image belong to the same inventive concept, and are not described herein again. In addition, the above description has been given by taking an example in which the server performs feature extraction on a single face image, and in other possible embodiments, the server may perform feature extraction on a plurality of face images at the same time, which is not limited in the embodiments of the present specification.

And 3, for any face image in the face images, the server performs attention coding on the face image through the feature extraction sub-model of the anti-attack detection model, and outputs the reference image feature of the face image.

In one possible implementation, for any one of the plurality of face images, the server divides the face image into a plurality of sections. The server extracts the attention coding layer of the submodel through the characteristics, and carries out embedded coding on the multiple parts to obtain multiple embedded characteristics of the multiple parts, wherein one embedded characteristic corresponds to one part. The server extracts an attention coding layer of the sub-model through the features, and determines attention weights between each two parts in the parts based on a plurality of embedded features of the parts. The server determines a plurality of attention features of the plurality of portions, one attention feature corresponding to each portion, based on attention weights between each two portions of the plurality of portions and the plurality of embedded features, through an attention coding layer of the feature extraction submodel. And the server extracts the attention coding layer of the sub-model through the features, and fuses a plurality of attention features of the plurality of parts to obtain the reference image features of the face image.

In the embodiment, the server encodes the face image based on the attention mechanism to obtain the reference image characteristics of the face image, and the attention mechanism can fully utilize the association among all parts of the face image, so that the obtained reference image characteristics can more accurately reflect the characteristics of the face image.

For example, for any one of the face images, the server divides the face image into a plurality of parts, the plurality of parts have the same size, each part is an image area of the face image, and the plurality of parts do not have overlapping positions. And the server carries out embedded coding on the plurality of parts through the attention coding layer of the feature extraction submodel to obtain a plurality of embedded features of the plurality of parts, wherein one embedded feature corresponds to one part. The server extracts the attention coding layer of the sub-model through the features, and multiplies the embedded features by the query parameter matrix, the key parameter matrix and the value parameter matrix respectively to obtain the query matrix, the key matrix and the value matrix of each part. The server extracts an attention coding layer of the sub-model through the features, and determines attention weights between every two parts based on the query matrix and the key matrix of every two parts in the plurality of parts. And the server extracts the attention coding layer of the sub-model through the characteristics, and fuses the value matrixes of each two parts based on the attention weight between each two parts to obtain the attention characteristics between each two parts. And the server extracts the attention coding layer of the sub-model through the features, and fuses a plurality of attention features of the plurality of parts to obtain the reference image features of the face image.

The server may extract the reference image feature of the face image by any of the above methods, which is not limited in the embodiment of the present specification.

308. And the server performs characteristic reconstruction on the reference image characteristics of each face image to obtain the reconstructed image characteristics of each face image, wherein the reference image characteristics and the reconstructed image characteristics obtained after the reference image is subjected to the characteristic reconstruction correspond to different illumination conditions.

The purpose of feature reconstruction is to generate a reconstructed image feature under another illumination condition based on a reference image feature under one illumination condition. For example, the server obtains two face images of the target object, wherein the first face image a is in the lighting condition T ₁ The second face image B is collected under the illumination condition T ₂ The reference image of the face image A is acquiredAfter the characteristic reconstruction is carried out on the characteristics, the reconstructed image characteristics of the face image A can be obtained, and the reconstructed image characteristics of the face image A correspond to the illumination condition T ₂ ，T ₁ And T ₂ For different lighting conditions.

In a possible implementation mode, the server inputs the reference image features of each facial image into the anti-attack detection model, performs feature reconstruction on the reference image features of each facial image through the anti-attack detection model, and outputs the reconstructed image features of each facial image.

In the embodiment, the server can perform feature reconstruction by resisting the attack detection model, and the feature reconstruction efficiency is high.

For example, the server performs multiple full connections on the reference image features of each face image through the anti-attack detection model, and outputs the reconstructed image features of each face image. In some embodiments, the anti-attack detection model includes a feature reconstruction sub-model, and after the server inputs the reference image features of each face image into the anti-attack detection model, the server performs multiple full connections on the reference image features of each face image through the feature reconstruction sub-model of the anti-attack detection model, and outputs the reconstructed image features of each face image. In some embodiments, the feature reconstruction submodel is a model that includes a 5-layer MLP (Multilayer Perceptron).

310. And the server determines whether the plurality of face images are anti-attack images or not based on the similarity between the reference image features of the face images and the reconstructed image features corresponding to the same illumination condition.

Whether the face images are the anti-attack images or not refers to whether the face images contain anti-attack patterns or not, and the face images are the anti-attack images under the condition that the face images contain the anti-attack patterns. In the case where the plurality of face images do not include the counter attack pattern, the plurality of face images are not counter attack images. The process of determining whether the plurality of face images are counterattack images is to perform counterattack detection. The reference image feature and the reconstructed image feature corresponding to the same illumination condition mean that the reference image feature and the reconstructed image feature correspond to the same illumination condition, the reference image feature is obtained by extracting features of a face image, and the reconstructed image feature is obtained by reconstructing the features based on the reference image feature of another face image. In some embodiments, the similarity between features is expressed by parameters such as cosine similarity or cosine distance, which is not limited in the embodiments of the present specification.

In a possible implementation manner, the server fuses the similarity between the reference image features of each facial image and the reconstructed image features corresponding to the same illumination condition to obtain the anti-attack scores of the multiple facial images. The server determines whether the plurality of face images are anti-attack images based on the anti-attack scores.

In this embodiment, the server can determine whether the acquired face image is the anti-attack image or not through the anti-attack score, and the anti-attack detection efficiency is high.

In order to more clearly explain the above embodiment, the above embodiment will be explained in two parts.

The first part is that the server fuses the similarity between the reference image characteristics of each face image and the reconstructed image characteristics corresponding to the same illumination condition to obtain the anti-attack scores of the face images.

In a possible implementation manner, the server adds the similarity between the reference image features of each facial image and the reconstructed image features corresponding to the same illumination condition, and then divides the sum by the number of the multiple facial images to obtain an average similarity, which is the anti-attack score of the multiple facial images.

And a second part, the server determines whether the face images are anti-attack images or not based on the anti-attack scores.

In one possible implementation, in a case where the counter attack score is less than a score threshold, the server determines the plurality of face images as counter attack images. In a case where the counter attack score is greater than or equal to the score threshold, the server determines that the plurality of face images are not counter attack images.

The score threshold is set by a technician according to actual conditions, and the embodiment of the present specification does not limit this.

For example, in the case where the average similarity s is smaller than the score value T, the server determines that the plurality of face images are identified as being resistant to the attack. And under the condition that the average similarity s is greater than or equal to the score, the server determines that the plurality of face images are recognized as normal recognition. In some embodiments, the above described implementation is also referred to as anti-attack determination.

It should be noted that, the steps 302 to 310 are described by taking an execution subject as an example, and in other possible embodiments, the steps 302 to 310 may be executed by a terminal, which is not limited in this embodiment of the present disclosure.

Referring to fig. 5, the above steps 302-310 will be described with reference to fig. 5, and the server obtains a plurality of initial images of the target object, which are acquired under different lighting conditions. The server filters the plurality of initial images to obtain a plurality of face images of the target object, and the two steps are called data acquisition and preprocessing. The server extracts the features of a plurality of face images of the target object to obtain the reference image features of each face image, and the step is also called face feature extraction. And the server carries out characteristic reconstruction on the reference image characteristics of each face image to obtain the reconstructed image characteristics of each face image. The server determines whether the plurality of face images are anti-attack images or not based on the similarity between the reference image features of the face images and the reconstructed image features corresponding to the same illumination condition, and the two steps are also called feature correlation fluctuation analysis and anti-attack judgment.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present specification, and are not described in detail herein.

According to the technical scheme provided by the embodiment of the specification, the characteristics of a plurality of face images acquired by a target object under different illumination conditions are extracted, and the reference image characteristics of each face image are obtained. And performing feature reconstruction on the reference image features of each face image to obtain the reconstructed image features of each face image. Because the anti-attack images have larger fluctuation under different illumination conditions, whether the plurality of face images are anti-attack images can be determined based on the similarity between the reference image characteristics of each face image and the reconstructed image characteristics corresponding to the same illumination condition, and therefore the detection of the anti-attack is realized.

In order to more clearly illustrate the technical solution provided by the embodiment of the present specification, the following describes the method for training the counter attack detection model, taking an execution subject of training the counter attack detection model as an example, with reference to fig. 6, the method includes the following steps, the training of the model includes a plurality of iterative processes, one iterative process is taken as an example to describe below, and other iterative processes belong to the same inventive concept.

602. And the server inputs a plurality of sample face images of the sample object into the anti-attack detection model, and performs feature extraction on the plurality of sample face images through the anti-attack detection model to obtain the reference image features of the plurality of sample face images.

The sample face images are used for model training, and the collection and the use of the plurality of sample face images are fully authorized by corresponding objects of the sample face images. Referring to fig. 7, the counter attack detection model 700 includes a feature extraction sub-model 701, a feature reconstruction sub-model 702, and a counter attack determination sub-model 703. The feature extraction submodel 701 is configured to extract features of a face image, the feature reconstruction submodel 702 is configured to determine fluctuation of features of the face image acquired under different illumination conditions, and the counterattack determination submodel 703 is configured to determine whether the acquired face image is a counterattack image.

In a possible implementation manner, the server inputs the plurality of sample face images into the counter attack detection model, and performs feature extraction on the plurality of sample face images through a feature extraction sub-model of the counter attack detection model to obtain reference image features of the plurality of sample face images.

The way in which the server performs feature extraction on the sample face image through the feature extraction submodel is the same as the way described in step 306, and the implementation process is not described in detail.

604. And the server generates a characteristic response graph of the plurality of sample face images based on the plurality of sample face images and the reference image characteristics of the plurality of sample face images through the anti-attack detection model, wherein the characteristic response graph is used for representing the corresponding relation of the reference image characteristics in the sample face images.

The correspondence of the reference image features in the sample face image refers to a corresponding region of the reference image features in the sample face image. This characteristic response map is also referred to as a response map.

In a possible implementation manner, for any sample face image in the plurality of sample face images, the server determines, through the counter attack detection model, the weights of a plurality of pixel points in the sample face image based on the reference image feature of the sample face image. And the server generates a characteristic response graph of the sample face image based on the sample face image and the weights of a plurality of pixel points in the sample face image.

In this embodiment, the server can generate the feature response map of the sample face image by resisting the attack detection model, and can train the model based on the feature response map subsequently, so that the efficiency is high.

For example, for any sample face image in the plurality of sample face images, the server inputs the sample face image into the feature extraction submodel of the anti-attack detection model, and determines the weights of a plurality of pixel points in the sample face image based on the reference image feature of the sample face image through the response image generation module of the feature extraction submodel. And the server generates a characteristic response graph of the sample face image based on the sample face image and the weights of a plurality of pixel points in the sample face image through the response graph generation module.

606. The server trains the counter attack detection model based on first difference information between the reference image features of every two sample face images in the plurality of sample face images and second difference information between the feature response images of every two sample face images, and the counter attack detection model is used for determining whether the face images are counter attack images or not.

In one possible embodiment, the server constructs a first loss function based on the first difference information and the second difference information. And the server trains a feature extraction sub-model of the anti-attack detection model based on the first loss function, wherein the feature extraction sub-model is used for extracting the reference image features of the face image.

For the same sample object, the reference image features of the sample face image acquired by the sample object under different lighting conditions should be consistent, and correspondingly, the feature response maps of the sample object under different lighting conditions should also be consistent. The purpose of training the feature extraction submodel is to make the first difference information and the second difference information as small as possible. For resisting attack, the difference between the reference image features of the sample face image of the sample object acquired under different illumination conditions is large, and the difference between the feature response maps under different illumination conditions is also large.

In this embodiment, the feature extraction submodel is trained based on the first difference information between the reference image features and the second difference information between the feature response maps, and can be fully used for the reference image features and the sample face images, so that the training effect of the feature extraction submodel is good.

For example, the server constructs a first loss function based on the first difference information and the second difference information. And the server trains the feature extraction submodel of the anti-attack detection model by adopting a gradient descent method based on the first loss function.

Alternatively, on the basis of the above embodiment, the server may train the feature extraction submodel in the following manner, in addition to training the feature extraction submodel based on the first difference information and the second difference information.

In a possible implementation manner, the server performs identity recognition based on the reference image features of the multiple sample face images to obtain the predicted identity corresponding to each sample face image. And training the feature extraction submodel by the server based on third difference information between the predicted identities corresponding to every two sample face images.

For different sample face images of the same sample object, the result of identity recognition on different face images should be the same, and the purpose of this training mode is to make the third difference information as small as possible.

For example, the server inputs the reference image features of the sample face images into an identity recognition model, maps the reference image features of the sample face images through the identity recognition model, and outputs the prediction identity corresponding to each sample face image. And the server trains the feature extraction submodel based on third difference information between the predicted identities corresponding to every two sample face images.

The first difference information, the second difference information, and the third difference information in the above two embodiments may constitute a joint loss function, and the server may train the feature extraction submodel by using a gradient descent method based on the joint loss function (the first loss function), for example, the form of the joint loss function is shown in the following formula (1).

Loss _total ＝Loss _cls +Loss _feat +Loss _map

Therein, loss _total Being a joint Loss function, loss _feat Loss of feature similarity corresponding to the first difference information, loss _map Loss of consistency for the response graph corresponding to the second difference information, loss _cls And identity loss corresponding to the third difference information. In the training mode, the condition that the same object is different is considered in the training processAnd (3) stability of image features under illumination conditions (stability is analyzed from the aspect of feature similarity and feature response graph consistency).

608. And the server carries out characteristic reconstruction on the reference image characteristics of a first sample face image in the plurality of sample face images through the anti-attack detection model to obtain the reconstructed image characteristics of the first sample face image, wherein the reconstructed image characteristics of the first sample face image and the reference image characteristics of a second sample face image in the plurality of sample face images correspond to the same illumination condition.

In one possible implementation manner, the server performs feature reconstruction on the reference image feature of the first sample face image in the plurality of sample face images through the feature reconstruction submodel of the anti-attack detection model to obtain the reconstructed image feature of the first sample face image. The feature reconstruction submodel is also referred to as a feature correlation fluctuation analysis submodel.

The method for reconstructing the characteristics of the sample face image by the server through the characteristic reconstruction submodel is the same as the method described in the step 308, and the implementation process is not repeated.

610. The server trains the counterattack detection model based on fourth difference information between the reconstructed image feature of the first sample face image and the reference image feature of the second sample face image.

In one possible embodiment, the server constructs the second loss function based on the fourth difference information. And the server trains a characteristic reconstruction sub-model of the counterattack detection model based on the second loss function.

In the method, for the same real sample object, the reference image feature and the reconstructed image feature under the same illumination condition should be consistent, and when the sample object is confronted with counterattack, the difference between the reference image feature and the reconstructed image feature is large, that is, the features may fluctuate greatly.

All the above optional technical solutions may be combined arbitrarily to form an optional embodiment of the present specification, and are not described herein again.

Through the technical scheme provided by the embodiment of the specification, the server can perform feature extraction on the plurality of sample face images through the anti-attack detection model to obtain the reference image features of the plurality of sample face images. Generating a feature response map of the plurality of sample face images based on the plurality of sample face images and reference image features of the plurality of sample face images. The attack detection model is subjected to the first difference information between the reference image features of the sample face images and the second difference information between the feature response images, so that the sample face images can be fully utilized, and the effect of training the anti-attack detection model is improved.

Fig. 8 is a schematic structural diagram of an apparatus for detecting counterattack according to an embodiment of the present specification, and referring to fig. 8, the apparatus includes: a feature extraction unit 801, a feature reconstruction unit 802, and an attack detection unit 803.

The feature extraction unit 801 is configured to perform feature extraction on a plurality of face images of a target object to obtain reference image features of the face images, where the face images are acquired under different illumination conditions.

The feature reconstruction unit 802 is configured to perform feature reconstruction on the reference image features of each face image to obtain reconstructed image features of each face image, where the reference image features and the reconstructed image features obtained after performing feature reconstruction on the reference image correspond to different illumination conditions.

An attack detection unit 803, configured to determine whether the face images are anti-attack images based on the similarity between the reference image features of the face images and the reconstructed image features corresponding to the same illumination condition.

In a possible implementation manner, the feature extraction unit 801 is configured to input the face images into an anti-attack detection model, and perform feature extraction on the face images through the anti-attack detection model to obtain reference image features of the face images.

In a possible implementation manner, the feature extraction unit 801 is configured to, for any face image in the plurality of face images, perform any one of convolution, full concatenation, and attention coding on the face image through a feature extraction sub-model of the counter attack detection model to obtain a reference image feature of the face image.

In a possible implementation manner, the feature reconstruction unit 802 is configured to input the reference image features of each face image into the counter attack detection model, perform feature reconstruction on the reference image features of each face image through the counter attack detection model, and output the reconstructed image features of each face image.

In a possible implementation manner, the feature reconstruction unit 802 is configured to perform full connection on the reference image features of each face image multiple times through the counter attack detection model, and output the reconstructed image features of each face image.

In a possible implementation manner, the attack detection unit 803 is configured to fuse the similarity between the reference image feature of each face image and the reconstructed image feature corresponding to the same illumination condition, so as to obtain the anti-attack scores of the plurality of face images. And determining whether the plurality of face images are anti-attack images or not based on the anti-attack scores.

In a possible implementation, the attack detection unit 803 is configured to perform any one of the following:

and determining the plurality of face images as anti-attack images under the condition that the anti-attack scores are smaller than a score threshold value.

Determining that the plurality of face images are not counter attack images in a case that the counter attack score is greater than or equal to the score threshold.

In one possible embodiment, the apparatus further comprises:

a filtering unit for obtaining a plurality of initial images of the target object, the plurality of initial images being acquired under different lighting conditions. And filtering the plurality of initial images to obtain a plurality of face images.

In a possible implementation, the filtering unit is configured to perform face detection on the plurality of initial images, and determine whether each of the initial images includes a face. And deleting the initial images which do not comprise the human faces in the plurality of initial images to obtain a plurality of human face filtering images. And determining the image quality score of each face filtering image. And deleting the face filtering images with the image quality scores smaller than the quality score threshold value in the plurality of face filtering images to obtain the plurality of face images.

It should be noted that: in the apparatus for detecting an anti-attack provided in the foregoing embodiment, when detecting an anti-attack, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus for detecting an anti-attack and the method for detecting an anti-attack provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 9 is a schematic structural diagram of an apparatus for training an anti-attack detection model according to an embodiment of the present disclosure, and referring to fig. 9, the apparatus includes: input section 901, feature response map generation section 902, and training section 903.

An input unit 901, configured to input a plurality of sample face images of a sample object into an anti-attack detection model, and perform feature extraction on the plurality of sample face images through the anti-attack detection model to obtain reference image features of the plurality of sample face images, where the plurality of sample face images are acquired under different lighting conditions.

A feature response map generating unit 902, configured to generate, by the counter attack detection model, feature response maps of the plurality of sample face images based on the plurality of sample face images and the reference image features of the plurality of sample face images, where the feature response maps are used to represent correspondence relationships of the reference image features in the sample face images.

A training unit 903, configured to train the counter attack detection model based on first difference information between reference image features of every two sample face images in the multiple sample face images and second difference information between feature response maps of every two sample face images, where the counter attack detection model is used to determine whether a face image is a counter attack image.

In a possible implementation manner, the feature response map generating unit 902 is configured to determine, by the counter attack detection model, weights of a plurality of pixel points in the sample face image based on a reference image feature of the sample face image for any sample face image in the plurality of sample face images. And generating a characteristic response image of the sample face image based on the sample face image and the weights of a plurality of pixel points in the sample face image.

In a possible embodiment, the training unit 903 is configured to construct a first loss function based on the first difference information and the second difference information. And training a feature extraction sub-model of the counterattack detection model based on the first loss function, wherein the feature extraction sub-model is used for extracting the reference image features of the face image.

In a possible implementation manner, the training unit 903 is further configured to perform identity recognition based on the reference image features of the multiple sample face images, so as to obtain a predicted identity corresponding to each sample face image.

And training the feature extraction submodel of the counterattack detection model based on third difference information between the corresponding predicted identities of every two sample face images.

In a possible implementation manner, the training unit 903 is further configured to perform feature reconstruction on the reference image features of a first sample face image in the plurality of sample face images through the anti-attack detection model to obtain reconstructed image features of the first sample face image, where the reconstructed image features of the first sample face image and the reference image features of a second sample face image in the plurality of sample face images correspond to the same illumination condition. And training the counterattack detection model based on fourth difference information between the reconstructed image feature of the first sample face image and the reference image feature of the second sample face image.

In a possible embodiment, the training unit 903 is further configured to construct a second loss function based on the fourth difference information. And training a characteristic reconstruction submodel of the counterattack detection model based on the second loss function, wherein the characteristic reconstruction submodel is used for carrying out characteristic reconstruction.

It should be noted that: in the apparatus for training an anti-attack detection model according to the above embodiment, when the anti-attack detection model is trained, only the division of the functional modules is used for illustration, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the apparatus for detecting an anti-attack and the method for detecting an anti-attack provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

The embodiments of the present disclosure also provide a computer storage medium, where multiple program instructions may be stored in the computer storage medium, and the program instructions are suitable for being loaded by a processor and executing the scheme described in the foregoing method embodiments, and are not described herein again.

An embodiment of the present specification further provides a computer program product, where the computer program product stores at least one instruction, and the at least one instruction is loaded by the processor and executes the scheme described in the foregoing method embodiment, which is not described herein again.

Referring to fig. 10, a schematic structural diagram of an electronic device provided in an exemplary embodiment of the present disclosure is shown, where the electronic device may be provided as a server or a terminal. The electronic device in this specification may include one or more of the following components: a processor 1010, a memory 1020, an input device 1030, an output device 1040, and a bus 1060. The processor 1010, memory 1020, input device 1030, and output device 1040 may be connected by a bus 1060.

Processor 1010 may include one or more processing cores. The processor 1010 interfaces with various components throughout the electronic device using various interfaces and circuitry to perform various functions of the electronic device 1000 and process data by executing or performing instructions, programs, code sets, or instruction sets stored in the memory 1020 and invoking data stored in the memory 1020. Alternatively, the processor 1010 may be implemented in hardware using at least one of Digital Signal Processing (DSP), field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1010 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 1010, but may be implemented by a communication chip.

The Memory 1020 may include a Random Access Memory (RAM) or a Read-only Memory (ROM). Optionally, the memory 1020 includes a Non-transitory Computer-readable Medium (Non-transitory Computer-readable Storage Medium). The memory 1020 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1020 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (e.g., a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like, and the operating system may be an Android (Android) system, including a system based on Android system depth development, an IOS system developed by apple, including a system based on IOS system depth development, or other systems.

In order to enable the operating system to distinguish a specific application scenario of the third-party application program, data communication between the third-party application program and the operating system needs to be opened, so that the operating system can acquire current scenario information of the third-party application program at any time, and further perform targeted system resource adaptation based on the current scenario.

The input device 1030 is used for receiving input instructions or data, and the input device 1030 includes, but is not limited to, a keyboard, a mouse, a camera, a microphone, or a touch device. The output device 1040 is used for outputting instructions or data, and the output device 1040 includes, but is not limited to, a display device, a speaker, and the like. In one example, the input device 1030 and the output device 1040 may be co-located, the input device 1030 and the output device 1040 being touch screens.

In addition, those skilled in the art will appreciate that the configurations of the electronic devices illustrated in the above-described figures do not constitute limitations on the electronic devices, which may include more or fewer components than illustrated, or some components may be combined, or a different arrangement of components. For example, the electronic device further includes a radio frequency circuit, an input unit, a sensor, an audio circuit, a Wireless Fidelity (WiFi) module, a power supply, a bluetooth module, and other components, which are not described herein again.

In the electronic device shown in fig. 10, the processor 1010 may be configured to call an application program stored in the memory 1020 for detecting an attack countermeasure, so as to execute the method described in the above method embodiment.

The foregoing is a schematic view of an electronic device according to an embodiment of the present specification. It should be noted that the technical solution of the electronic device and the technical solution of the method for detecting the counter attack and the method for training the counter attack detection model belong to the same concept, and details that are not described in detail in the technical solution of the electronic device can be referred to the description of the technical solution of the method for detecting the counter attack.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium and executed by a computer, and the processes of the embodiments of the methods described above may be included in the programs. The storage medium of the computer program may be a magnetic disk, an optical disk, a read-only memory or a random access memory.

The above description is only an example of the alternative embodiments of the present disclosure, and not intended to limit the present disclosure, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Claims

1. A method of detecting a counter attack, comprising:

2. The method of claim 1, wherein the extracting features of a plurality of face images of a target object to obtain reference image features of each of the face images comprises:

inputting the face images into an anti-attack detection model, and extracting the features of the face images through the anti-attack detection model to obtain the reference image features of the face images.

3. The method of claim 2, wherein the extracting the features of the face images by the anti-attack detection model to obtain the reference image features of the face images comprises:

and for any face image in the face images, performing any one of convolution, full connection and attention coding on the face image through the feature extraction submodel of the anti-attack detection model to obtain the reference image feature of the face image.

4. The method according to claim 1, wherein the performing feature reconstruction on the reference image feature of each face image to obtain the reconstructed image feature of each face image comprises:

inputting the reference image characteristics of each face image into a counterattack detection model, performing characteristic reconstruction on the reference image characteristics of each face image through the counterattack detection model, and outputting the reconstructed image characteristics of each face image.

5. The method of claim 4, wherein the feature reconstructing the reference image feature of each face image by the counter attack detection model, and outputting the reconstructed image feature of each face image comprises:

and performing multiple full connection on the reference image characteristics of each face image through the anti-attack detection model, and outputting the reconstructed image characteristics of each face image.

6. The method of claim 1, wherein the determining whether the plurality of facial images are attack-resisting images based on the similarity between the reference image features of the facial images and the reconstructed image features corresponding to the same illumination condition comprises:

fusing the similarity between the reference image characteristics of each face image and the reconstructed image characteristics corresponding to the same illumination condition to obtain the anti-attack scores of the face images;

determining whether the plurality of face images are counter attack images based on the counter attack scores.

7. The method of claim 6, the determining whether the plurality of face images are counter-attack images based on the counter-attack scores comprising any one of:

determining the plurality of face images as anti-attack images if the anti-attack scores are less than a score threshold;

determining that the plurality of face images are not counter-attack images if the counter-attack score is greater than or equal to the score threshold.

8. The method of claim 1, wherein before the feature extraction of the plurality of face images of the target object to obtain the reference image features of each of the face images, the method further comprises:

acquiring a plurality of initial images of the target object, the plurality of initial images being acquired under different lighting conditions;

and filtering the plurality of initial images to obtain the plurality of face images.

9. The method of claim 8, wherein filtering the plurality of initial images to obtain the plurality of facial images comprises:

performing face detection on the plurality of initial images to determine whether each initial image comprises a face;

deleting the initial images which do not comprise the human faces in the plurality of initial images to obtain a plurality of human face filtering images;

determining an image quality score of each face filtering image;

and deleting the face filtering images with the image quality scores smaller than the quality score threshold value in the plurality of face filtering images to obtain the plurality of face images.

10. A method of training an anti-attack detection model, comprising:

generating feature response graphs of the plurality of sample face images based on the plurality of sample face images and the reference image features of the plurality of sample face images through the anti-attack detection model, wherein the feature response graphs are used for representing the corresponding relation of the reference image features in the sample face images;

11. The method of claim 10, wherein generating, by the counter attack detection model, feature response maps for the plurality of sample face images based on the plurality of sample face images and reference image features of the plurality of sample face images comprises:

for any sample face image in the plurality of sample face images, determining the weights of a plurality of pixel points in the sample face image based on the reference image characteristics of the sample face image through the anti-attack detection model; and generating a characteristic response image of the sample face image based on the sample face image and the weights of a plurality of pixel points in the sample face image.

12. The method of claim 10, the training, by the counter attack detection model, the counter attack detection model based on first difference information between reference image features of every two sample face images of the plurality of sample face images and second difference information between feature response maps of every two sample face images comprises:

constructing a first loss function based on the first difference information and the second difference information;

and training a feature extraction sub-model of the anti-attack detection model based on the first loss function, wherein the feature extraction sub-model is used for extracting the reference image features of the face image.

13. The method according to claim 12, after the feature extraction of the sample face images by the anti-attack detection model to obtain the reference image features of the sample face images, the method further comprising:

performing identity recognition based on the reference image characteristics of the plurality of sample face images to obtain the corresponding predicted identity of each sample face image;

14. The method of claim 10, further comprising:

performing feature reconstruction on the reference image features of a first sample face image in the plurality of sample face images through the anti-attack detection model to obtain the reconstructed image features of the first sample face image, wherein the reconstructed image features of the first sample face image and the reference image features of a second sample face image in the plurality of sample face images correspond to the same illumination condition;

and training the anti-attack detection model based on fourth difference information between the reconstructed image feature of the first sample face image and the reference image feature of the second sample face image.

15. The method of claim 14, wherein the training the counter attack detection model based on fourth difference information between the reconstructed image features of the first sample face image and the reference image features of the second sample face image comprises:

constructing a second loss function based on the fourth difference information;

and training a characteristic reconstruction submodel of the anti-attack detection model based on the second loss function, wherein the characteristic reconstruction submodel is used for carrying out characteristic reconstruction.

16. An apparatus for detecting a counter attack, comprising:

17. An apparatus for training an anti-attack detection model, comprising:

the system comprises an input unit, a detection unit and a comparison unit, wherein the input unit is used for inputting a plurality of sample face images of a sample object into an anti-attack detection model, and extracting the characteristics of the sample face images through the anti-attack detection model to obtain the reference image characteristics of the sample face images, and the sample face images are acquired under different illumination conditions;

the training unit is used for training the counter attack detection model based on first difference information between the reference image features of every two sample face images in the plurality of sample face images and second difference information between the feature response images of every two sample face images, and the counter attack detection model is used for determining whether the face images are counter attack images or not.

18. A computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the method according to any one of claims 1 to 15.

19. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method according to any of claims 1-15.

20. A computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to carry out the method according to any one of claims 1 to 15.