CN115937584A

CN115937584A - Detection method and system for resisting attack

Info

Publication number: CN115937584A
Application number: CN202211537038.3A
Authority: CN
Inventors: 曹佳炯
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2023-04-07

Abstract

According to the detection method and the detection system for resisting the attacks, after the facial video of the target user under the illumination of multiple colors is obtained, the material classification is carried out on the pixels of each facial image frame in the multiple frames of facial image frames of the facial video by adopting the detection model for resisting the attacks, the pixel material category of each pixel is obtained, the detection result of the resisting attacks of the target user is determined based on the pixel material category, and the detection result of the resisting attacks is output; the scheme can improve the detection precision of the anti-attack.

Description

Detection method and system for resisting attack

Technical Field

The present disclosure relates to the field of image recognition, and in particular, to a method and a system for detecting an attack.

Background

In recent years, with the rapid development of internet technology, the application range of face recognition is becoming wider and wider. In the face recognition process, fighting against attacks is one of the highest security risk countermeasures due to its concealment. Taking face recognition as an example, the countermeasure attack refers to an attack means that attaches a countermeasure sticker to a face area or wears counterglasses or the like, so that the face recognition system makes a misjudgment (for example, a user a is mistakenly recognized as a user B after the countermeasure sticker is attached). Existing detection methods for combating attacks are often based on technical routes for combating attack discovery and detection or on technical routes for combating attack invalidation.

In the research and practice process of the prior art, the inventor of the application finds that a method based on counterattack and detection is difficult to accurately detect the attack with a small face of an attack element, and a method based on counterattack invalidation invalidates the counterattack element, but a normal sample is preprocessed, so that the success rate of the normal sample in subsequent face recognition is influenced, and therefore, the detection accuracy of the counterattack is low.

Disclosure of Invention

The specification provides a detection method and a detection system for resisting attacks with higher accuracy.

In a first aspect, the present specification provides a method of detecting against an attack, comprising: acquiring a face video of a target user under illumination of multiple colors, wherein the face video comprises multiple frames of face image frames; adopting an anti-attack detection model to classify the material of the pixels of each facial image frame in the multi-frame facial image frames to obtain the pixel material category of each pixel; and determining the anti-attack detection result of the target user based on the pixel material category, and outputting the anti-attack detection result.

In some embodiments, the obtaining the facial video of the target user under illumination of a plurality of colors comprises: selecting a plurality of colors from a preset color set, and determining a target light-emitting sequence corresponding to the plurality of colors; displaying color light corresponding to the multiple colors based on the target light-emitting sequence and preset display time; and acquiring a face video of the face of the target user under the color light reflection.

In some embodiments, the plurality of colors includes at least two of red, blue, yellow, or green.

In some embodiments, the light of the color corresponding to each color of the plurality of colors is emitted separately, and the duration of the emission exceeds a preset time threshold.

In some embodiments, said capturing a facial video of the target user's face under the color light reflection comprises: carrying out face detection on a preset number of continuous video frames in the collected real-time face video of the target user; when the face of the target user is not detected in the continuous video frames, stopping displaying the color light corresponding to the multiple colors, and displaying prompt information so that the target user can adjust the acquisition position based on the prompt information; and returning to the step of selecting the multiple colors from the preset color set until the face of the target user is detected in the continuous video frames, and obtaining a face video of the face of the target user under the reflection of the color light.

In some embodiments, the counter attack detection model comprises a pixel material classification network; and the step of classifying the material of the pixels of each facial image frame in the multi-frame facial image frames by adopting the anti-attack detection model to obtain the pixel material category of each pixel comprises the following steps: the method comprises the steps of adopting the pixel material network to extract the features of each facial image frame in the multi-frame facial image frames to obtain a feature map corresponding to each facial image frame, carrying out material classification on each pixel of the corresponding facial image frames based on the feature maps to obtain the pixel material classification probability of each pixel, and determining the pixel material category of each pixel based on the pixel material classification probability.

In some embodiments, the training process of the pixel material classification network comprises the following steps: acquiring a facial image sample, wherein the facial image sample comprises a facial image of an labeled material category; predicting a prediction region boundary and a prediction material category of each pixel in the face image sample by adopting a preset pixel material classification network, wherein the prediction region boundary is a boundary between a living body region and an attack region in the face image sample; and determining target classification loss information of the facial image sample based on the prediction region boundary, the prediction material category and the labeling material category, and converging the preset pixel material classification network based on the target classification loss information to obtain the pixel material classification network.

In some embodiments, the preset pixel texture classification network comprises a feature map coding sub-network, a pixel texture classification sub-network, and a boundary awareness sub-network; and predicting the prediction region boundary and the prediction material category of each pixel in the face image sample by adopting a preset pixel material classification network, wherein the method comprises the following steps of: the feature extraction is carried out on the facial image sample by adopting the feature map coding sub-network to obtain a sample feature map, the material classification is carried out on each pixel of the facial image sample by adopting the pixel material classification sub-network on the basis of the sample feature map to obtain the predicted material category of each pixel of the facial image sample, and the boundary between the living body region and the attack region is identified in the facial image sample by adopting the boundary perception sub-network on the basis of the sample feature map to obtain the predicted region boundary.

In some embodiments, said determining target classification loss information for said facial image sample based on said predicted region boundary, predicted material class and annotated material class comprises: comparing the prediction material category with the labeling material category to obtain pixel classification loss information of pixels in the face image sample; selecting the adjacent prediction material classes of the preset number of adjacent pixels of the pixels in the face image sample from the prediction material classes, and comparing the adjacent prediction material classes with the prediction material classes of the corresponding pixels to obtain continuity classification loss information; acquiring a target area boundary labeled in the face image sample, and comparing the target area boundary with the prediction area boundary to obtain boundary classification loss information; and accumulating the pixel classification loss information, the continuity classification loss information and the boundary classification loss information to obtain the target classification loss information.

In some embodiments, the pixel material class comprises one of a living class or a non-living class; and determining the anti-attack detection result of the target user based on the pixel material category, wherein the determination comprises the following steps: selecting at least one target pixel corresponding to the non-living body type from the facial image frame, identifying a target attack area corresponding to the at least one target pixel in the facial image frame, and determining the anti-attack detection result of the target user based on the target attack area.

In some embodiments, the determining, based on the target attack area, the detection result of the counter attack of the target user includes: determining the area ratio of the target attack area in the corresponding facial image frame to obtain a target area ratio; and when the target area ratio is larger than a preset area ratio threshold value, determining that the target user is a counterattack user, and using the counterattack user as the counterattack detection result.

In some embodiments, the counter attack detection model further comprises an optical sequence check network; and after the obtaining of the facial video of the target user under the illumination of the plurality of colors, further comprising: inputting the plurality of frames of face image frames into the light sequence verification network to obtain a predicted image feature and a current light-emitting sequence of color light corresponding to the plurality of colors, wherein the predicted image feature comprises a predicted image feature of a next frame of face image frame of each of the plurality of frames of face image frames, determining an anti-attack detection result of the target user based on the predicted image feature and the current light-emitting sequence, and outputting the anti-attack detection result.

In some embodiments, the optical sequence verification network comprises an optical sequence feature coding sub-network, an optical sequence prediction sub-network, and a feature prediction sub-network; and the inputting the plurality of frames of facial image frames into the light sequence check network to obtain a predicted image feature and a current light emitting sequence of the color light corresponding to the plurality of colors, comprising: the light sequence feature coding sub-network is used for extracting features of the plurality of facial image frames to obtain image features of each facial image frame, the light sequence prediction sub-network is used for predicting the current light emitting sequence of the color light corresponding to the plurality of colors based on the image features, and the feature prediction sub-network is used for predicting the predicted image features of the facial image frame of the next frame of each facial image frame based on the image features.

In some embodiments, the determining, based on the current light-emitting sequence and the predicted image feature, a result of the detection of the attack countermeasure by the target user includes: acquiring a feature difference value between the image feature of each facial image frame and the corresponding predicted image feature; and when the current light-emitting sequence is wrongly predicted or the characteristic difference value is larger than a preset characteristic difference value threshold value, determining the target user as an anti-attack user, and taking the anti-attack user as the anti-attack detection result.

In some embodiments, the training process of the optical sequence verification network comprises the steps of: acquiring a face video sample of a user sample under illumination of multiple colors, wherein the face video sample comprises multiple frames of face image frame samples; performing feature extraction on the multiple frames of facial image frame samples by adopting a preset optical sequence check network to obtain sample image features of each facial image frame sample in the multiple frames of facial image frame samples; determining a predicted lighting order of the illumination of the plurality of colors and a predicted sample image feature based on the sample image features, the predicted sample image feature comprising a predicted sample image feature of a next frame image frame sample of each of the plurality of frames of facial image frame samples; and determining target verification loss information of the user sample based on the predicted light emitting sequence and the predicted sample image characteristics, and converging the preset optical sequence verification network based on the target verification loss information to obtain the trained optical sequence verification network.

In some embodiments, the user sample is a live user sample.

In some embodiments, said determining target verification loss information for said user sample based on said predicted lighting order and said predicted sample image features comprises: acquiring a labeling light-emitting sequence corresponding to the face video sample, and comparing the labeling light-emitting sequence with the predicted light-emitting sequence to obtain the light sequence loss information; determining a sample feature difference between a sample image feature of each facial image frame sample and the corresponding predicted sample image feature to obtain the feature loss information; and fusing the optical sequence loss information and the characteristic loss information to obtain target verification loss information of the user sample.

In a second aspect, the present specification also provides a detection system for combating attacks, comprising: at least one storage medium storing at least one set of instructions for performing detection of a counter attack; and at least one processor communicatively connected to the at least one storage medium, wherein when the detection system for countering the attacks is operated, the at least one processor reads the at least one instruction set and executes the detection method for countering the attacks according to the instruction of the at least one instruction set.

According to the technical scheme, after the face video of the target user under the illumination of multiple colors is obtained, the anti-attack detection model is adopted to classify the material of the pixels of each face image frame in the multi-frame face image frames of the face video, the pixel material category of each pixel is obtained, the anti-attack detection result of the target user is determined based on the pixel material category, and the anti-attack detection result is output; according to the scheme, random dazzling (illumination of multiple colors) is introduced at the interaction stage of anti-attack detection, so that more facial information is introduced, then the pixel material category of each facial image frame is predicted in the collected facial video, and the anti-attack detection result is determined based on the pixel material category, so that the attack with smaller attack element area can be responded, and the subsequent face identification success rate of a normal sample cannot be influenced, so that the detection precision of the anti-attack can be improved.

Other functions of the detection method and system against attacks provided by the present specification will be set forth in part in the description that follows. The following numerical and exemplary descriptions will be readily apparent to those of ordinary skill in the art in view of the description. The inventive aspects of the detection methods and systems for countering attacks presented in this specification can be fully explained by the practice or use of the methods, apparatus and combinations described in the detailed examples below.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram illustrating an application scenario of a detection system for resisting attacks according to an embodiment of the present specification;

FIG. 2 illustrates a hardware block diagram of a computing device provided in accordance with an embodiment of the present description;

FIG. 3 is a flow chart of a method for detecting attacks according to an embodiment of the present disclosure; and

fig. 4 is a schematic overall flow chart illustrating anti-attack detection in a face recognition scenario according to an embodiment of the present disclosure.

Detailed Description

The following description is presented to enable any person skilled in the art to make and use the present disclosure, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present description. Thus, the present description is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. For example, as used herein, the singular forms "a", "an" and "the" may include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," "includes," and/or "including," when used in this specification, are intended to specify the presence of stated integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

These and other features of the present specification, as well as the operation and function of the elements of the structure related thereto, and the combination of parts and economies of manufacture, may be particularly improved upon in view of the following description. Reference is made to the accompanying drawings, all of which form a part of this specification. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the specification. It should also be understood that the drawings are not drawn to scale.

The flow diagrams used in this specification illustrate the operation of system implementations according to some embodiments of the specification. It should be clearly understood that the operations of the flow diagrams may be performed out of order. Rather, the operations may be performed in reverse order or simultaneously. In addition, one or more other operations may be added to the flowchart. One or more operations may be removed from the flowchart.

For convenience of description, the present specification will explain terms that will appear from the following description as follows:

and (3) resisting the attack: taking a face recognition scene as an example, the method can be an attack means that a face recognition system makes a misjudgment by pasting a countermeasure sticker on a face region or wearing countermeasure glasses (the area is small, generally less than 50% of the face). For example, the a-user is mistakenly identified as the B-user after pasting a countermeasure sticker or wearing countermeasure glasses, and so on.

Randomly dazzling: taking a face scene as an example, at the face recognition stage, random lighting (for example, color transformation is performed according to the order of red, yellow and blue) is performed through the interactive screen, and different colors of light exhibit different reflection characteristics for different materials, and these characteristics can be used for anti-attack detection.

Material classification: taking a face scene as an example, the detection of attack resistance can be realized through material classification for the face region, judging the material of different regions of the face, and through the material classification. Mainly, the countermeasure stickers (or the countermeasure glasses and the like) usually use materials such as paper and the like, and have obvious difference with normal human faces, so the detection of the countermeasure attack can be realized through the difference.

Before describing the specific embodiments of the present specification, the following description will be made for the application scenarios of the present specification:

the detection method for resisting attacks provided by the specification can be applied to any detection scene for resisting attacks, for example, in scenes such as face payment or face recognition, the detection method for resisting attacks can be used for detecting the resisting attacks of the collected face video collected under random colorful conditions of the target user to be paid or recognized; in an identity authentication scene, the acquired face video can be subjected to anti-attack detection through the anti-attack detection method of the specification; the method can also be applied to any attack resisting detection scene, and is not repeated herein.

It should be understood by those skilled in the art that the in-vivo detection method and system described herein may be applied to other usage scenarios and are within the scope of the present disclosure.

Fig. 1 is a schematic diagram illustrating an application scenario of a detection system 001 for resisting an attack according to an embodiment of the present specification. The anti-attack detection system 001 (hereinafter, referred to as the system 001) may be applied to detection of anti-attack in any scenario, for example, detection of anti-attack in a face payment scenario, detection of anti-attack in a face recognition scenario, detection of anti-attack in an identity verification scenario, and the like, as shown in fig. 1, the system 001 may include a user 100, a client 200, a server 300, and a network 400.

The user 100 may be a user who triggers detection of an attack countermeasure on a facial video of a target user, and the user 100 may perform detection operation of the attack countermeasure at the client 200.

The client 200 may be a device that collects a face video of a target user under illumination of a plurality of colors for a detection operation against an attack in response to the user 100 and performs detection against an attack. In some embodiments, the detection method against attacks may be performed on the client 200. At this time, the client 200 may store data or instructions for executing the detection method against attacks described in this specification, and may execute or be used to execute the data or instructions. In some embodiments, the client 200 may include a hardware device having a data information processing function and a program necessary for driving the hardware device to operate. As shown in fig. 1, client 200 may be communicatively coupled to server 300. In some embodiments, the server 300 may be communicatively coupled to a plurality of clients 200. In some embodiments, the client 200 may interact with the server 300 over the network 400 to receive or send messages or the like, such as receiving or sending facial videos. In some embodiments, the client 200 may include a mobile device, a tablet, a laptop, a built-in device of a motor vehicle, or the like, or any combination thereof. In some embodiments, the mobile device may include a smart home device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home device may include a smart television, a desktop computer, or the like, or any combination thereof. In some embodiments, the smart mobile device may include a smartphone, a personal digital assistant, a gaming device, a navigation device, etc., or any combination thereof. In some embodiments, the virtual reality device or augmented reality device may include a virtual reality helmet, virtual reality glasses, a virtual reality handle, an augmented reality helmet, augmented reality glasses, an augmented reality handle, or the like, or any combination thereof. For example, the virtual reality device or the augmented reality device may include google glasses, head mounted displays, VRs, and the like. In some embodiments, the built-in devices in the motor vehicle may include an on-board computer, an on-board television, and the like. In some embodiments, the client 200 may include an image capture device for capturing facial video of a target user under multiple colors of light. In some embodiments, the image capture device may be a two-dimensional image capture device (such as an RGB camera), and a depth image capture device (such as a 3D structured light camera, a laser detector, etc.). In some embodiments, the client 200 may be a device with location technology for locating the location of the client 200.

In some embodiments, the client 200 may have one or more Applications (APPs) installed. The APP can provide the user 100 with the ability to interact with the outside world and an interface over the network 400. The APP includes but is not limited to: the system comprises a webpage browser type APP program, a search type APP program, a chat type APP program, a shopping type APP program, a video type APP program, a financing type APP program, an instant messaging tool, a mailbox client, social platform software and the like. In some embodiments, a target APP may be installed on the client 200. The target APP can collect facial videos of a target user under illumination of multiple colors for the client 200. In some embodiments, the user 100 may also trigger a detection request against an attack through the target APP. The target APP may execute the detection method for resisting the attack described in this specification in response to the detection request for resisting the attack. The detection method against attacks will be described in detail later.

The server 300 may be a server that provides various services, such as a background server that supports anti-attack detection of facial videos captured on the client 200. In some embodiments, the detection method against attacks may be performed on the server 300. At this time, the server 300 may store data or instructions to perform the detection method against the attack described in the present specification, and may execute or be used to execute the data or instructions. In some embodiments, the server 300 may include a hardware device having a data information processing function and a program necessary for driving the hardware device to operate. The server 300 may be communicatively coupled to a plurality of clients 200 and receive data transmitted by the clients 200.

Network 400 is the medium used to provide communication links between clients 200 and server 300. The network 400 may facilitate the exchange of information or data. As shown in fig. 1, the client 200 and the server 300 may be connected to a network 400 and transmit information or data to each other through the network 400. In some embodiments, the network 400 may be any type of wired or wireless network, as well as combinations thereof. For example, network 400 may include a cable network, a wireline network, a fiber optic network, a telecommunications network, an intranet, the Internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), the Public Switched Telephone Network (PSTN), bluetooth, or a network such as the Internet ^TM Network and ZigBee ^TM A network, a Near Field Communication (NFC) network, or the like. In some embodiments, network 400 may include one or more network access points. For example, network 400 may include a wired or wireless network access point, such as a base station or an internet exchange point, through which one or more components of client 200 and server 300 may connect to network 400 to exchange data or information.

It should be understood that the number of clients 200, servers 300, and networks 400 in fig. 1 is merely illustrative. There may be any number of clients 200, servers 300, and networks 400, as desired for an implementation.

It should be noted that the detection method for resisting attacks may be completely executed on the client 200, may also be completely executed on the server 300, may also be partially executed on the client 200, and may also be partially executed on the server 300.

Fig. 2 illustrates a hardware block diagram of a computing device 600 provided in accordance with an embodiment of the present description. The computing device 600 may perform the detection method described herein against attacks. The detection method against attacks is described elsewhere in this specification. When the detection method against attacks is performed on the client 200, the computing device 600 may be the client 200. When the detection method against attacks is performed on the server 300, the computing device 600 may be the server 300. When the detection method against attacks can be executed partly on the client 200 and partly on the server 300, the computing device 600 can be the client 200 and the server 300.

As shown in fig. 2, computing device 600 may include at least one storage medium 630 and at least one processor 620. In some embodiments, computing device 600 may also include a communication port 650 and an internal communication bus 610. Meanwhile, computing device 600 may also include I/O components 660.

Internal communication bus 610 may connect various system components including storage medium 630, processor 620 and communication port 650.

I/O components 660 support input/output between computing device 600 and other components.

Communication port 650 provides for data communication between computing device 600 and the outside world, for example, communication port 650 may provide for data communication between computing device 600 and network 400. The communication port 650 may be a wired communication port or a wireless communication port.

Storage medium 630 may include a data storage device. The data storage device may be a non-transitory storage medium or a transitory storage medium. For example, the data storage device may include one or more of a disk 632, a read only memory medium (ROM) 634, or a random access memory medium (RAM) 636. The storage medium 630 also includes at least one set of instructions stored in the data storage device. The instructions are computer program code that may include programs, routines, objects, components, data structures, procedures, modules, and the like that perform the detection methods for combating attacks provided by the present specification.

The at least one processor 620 may be communicatively coupled to at least one storage medium 630 and a communication port 650 via an internal communication bus 610. The at least one processor 620 is configured to execute the at least one instruction set. When the computing device 600 is running, the at least one processor 620 reads the at least one instruction set and, based on instructions of the at least one instruction set, performs the detection method for countering attacks provided by the present description. The processor 620 may perform all the steps involved in the detection method against attacks. The processor 620 may be in the form of one or more processors, and in some embodiments, the processor 620 may include one or more hardware processors, such as microcontrollers, microprocessors, reduced Instruction Set Computers (RISC), application Specific Integrated Circuits (ASICs), application specific instruction set processors (ASIPs), central Processing Units (CPUs), graphics Processing Units (GPUs), physical Processing Units (PPUs), microcontroller units, digital Signal Processors (DSPs), field Programmable Gate Arrays (FPGAs), advanced RISC Machines (ARMs), programmable Logic Devices (PLDs), any circuit or processor capable of executing one or more functions, or the like, or any combination thereof. For illustrative purposes only, only one processor 620 is depicted in the computing device 600 in this description. However, it should be noted that the computing device 600 may also include multiple processors, and thus, the operations and/or method steps disclosed in this specification may be performed by one processor as described in this specification, or may be performed by a combination of multiple processors. For example, if in this description the processor 620 of the computing device 600 performs steps a and B, it should be understood that steps a and B may also be performed jointly or separately by two different processors 620 (e.g., a first processor performing step a, a second processor performing step B, or both a first and a second processor performing steps a and B).

Fig. 3 shows a flowchart of a detection method P100 for countering an attack according to an embodiment of the present specification. As before, the computing device 600 may perform the detection method against attacks P100 of the present description. Specifically, the processor 620 may read an instruction set stored in its local storage medium and then execute the detection method against attacks P100 of the present specification according to the specification of the instruction set. As shown in fig. 3, method P100 may include:

s110: a face video of a target user under illumination of multiple colors is acquired.

The face video may be video information acquired by the face of the target user under illumination of light of multiple colors. The face video includes a plurality of frames of face image frames. The number of frames of the face image frame in the face video may be two frames or more. The face image frame may be a video frame including a face of a target user in a face video. The face video may be video information captured under illumination of a plurality of colors by the face of the target user.

Wherein the plurality of colors may include at least two of red, blue, yellow, or green. Any other colors may be included in addition to the above colors. The color light corresponding to each color in the multiple colors is independently emitted, and the continuous light-emitting time exceeds a preset time threshold. For example, in the case of a plurality of colors including red, yellow and blue, the three colors emit only one color light per emission, for example, when infrared light is emitted, yellow and blue light is not emitted any more. In addition, each time the light is emitted, the duration of the light of each color needs to exceed a preset time threshold, for example, the preset time threshold is 1 second, and the multiple colors include red, yellow and blue, and in the case of red light, the duration of the red light needs to exceed 1 second. The reason that the continuous light emitting time of the color light exceeds the preset time threshold is mainly to collect a face video containing enough information and under the illumination of the color light, so that the accuracy of the classification of the pixel material of the face is improved.

The manner of acquiring the face video of the target user under the illumination of multiple colors may be multiple, and specifically may be as follows:

for example, the processor 620 may directly acquire a facial video of the target user under the illumination of multiple colors, which is sent by the user 100 through the client 200 or the terminal, or may also acquire a facial video of the target user under the illumination of multiple colors through an image acquisition device, and so on.

The user 100 and the target user may be the same user or different users. In addition, when the face video of the target user under illumination of multiple colors, which is sent by the user 100 through the client 200 or the terminal, is obtained, the light emitting sequence of the multiple colors in the face video may also be obtained, and the target light emitting sequence is obtained.

For example, the processor 620 may select multiple colors from a preset color set, determine a target light emitting sequence corresponding to the multiple colors, display color light corresponding to the multiple colors based on the target light emitting sequence and preset display time, and collect a face video of the face of the target user under reflection of the color light.

For example, the processor 620 may randomly select multiple colors from the preset color set, or may further obtain a collecting distance between the target user and the image collecting device, select multiple colors corresponding to the collecting distance from the preset color set, and the like.

Wherein, the collecting distance may be a distance between the target user and the image collecting device. The light of different colors has different penetration ability or reflection performance, and thus, the longer the collection distance, the more intense the penetration ability or the better reflection performance of the light of the plurality of colors is required to irradiate the face of the target user.

After selecting the plurality of colors, the processor 620 may determine the target lighting sequence corresponding to the plurality of colors. The target light emission sequence may be timing information of each of the plurality of colors to emit light individually. For example, the processor 620 may randomly determine the order of light emission of each of the plurality of colors, so as to obtain the target light emission order, or may obtain a historical light emission order corresponding to the plurality of colors, and randomly determine the target light emission order based on the historical light emission order, where the target light emission order may be different from the historical light emission order or may be partially the same as the historical light emission order. In addition, it should be noted that, in the target light-emitting sequence, a light-emitting sequence of one of the multiple colors may be included, for example, taking the target light-emitting sequence as yellow, red and green, in this case, the multiple colors of color lights may be emitted such that the yellow light is emitted for a preset time, then the red light is emitted for a preset time, and then the green light is emitted for a preset time. The target lighting sequence may also include a complete lighting sequence, for example, if the target lighting sequence is yellow, red, yellow, green and red, the whole lighting process may include the duration of the yellow light lighting, the duration of the red light lighting, the duration of the yellow light lighting, the duration of the green light lighting, the duration of the red light lighting, and the last of the lighting of the respective colors is stopped. The duration of each color light may be the same or different, but is required to be greater than a predetermined time threshold.

After the plurality of colors are selected and the target light emitting sequence corresponding to the plurality of colors is determined, the processor 620 may display the color lights corresponding to the plurality of colors based on the target light emitting sequence and the preset display time. The preset display time may be a total light emitting time of the color lights of the plurality of colors, or may be a light emitting time (corresponding to the preset time for the light emission duration) of each color light of the plurality of colors. The display here is understood to be the emission of color light corresponding to that color. For example, taking the preset display time as the total light emitting time of the multiple colors as an example, the processor 620 may display the corresponding color light in the display area based on the target light emitting sequence until the preset display time is reached, or taking the preset display time as the continuous light emitting time of the color light of each color as an example, the processor 620 may display the corresponding color light in the display area based on the target light emitting sequence, and when the continuous light emitting time of the color light reaches the preset display time, switch to the color light of another color until the color lights of the multiple colors are all displayed. In addition, it should be noted that, when the light of the corresponding color of the plurality of colors is displayed, timing may be performed, so that the light emission may be stopped or switched to the light emission of another color after the preset display time is obtained.

The processor 620 may capture a video of the face of the target user when the color lights corresponding to the plurality of colors are displayed. For example, the processor 620 performs face detection on a preset number of consecutive video frames in the captured real-time face video of the target user, stops displaying color lights corresponding to the multiple colors when the face of the target user is not detected in the consecutive video frames, displays a prompt message so that the target user adjusts the capture position based on the prompt message, and returns to perform the step of selecting the multiple colors from the preset color set until the face of the target user is detected in the consecutive video frames, so as to obtain the face video of the face of the target user under the color light emission.

The real-time face video may be a video of the face of the target user collected in real time when a plurality of color lights are displayed. The consecutive video frames may be a preset number of video frames adjacent in time sequence to the real-time facial video frames. For example, the processor 620 may detect whether the target user's face is included in a preset number of consecutive video frames in the real-time face video of the target user, for example, if the preset number is 3, it may detect whether the target user's face (human face) is included in consecutive 3 frames in the real-time face video, and so on.

The processor 620 may, upon detecting the target user's face in successive video frames, take the captured real-time face video as a face video of the target user's face under color light emission. The processor 620 stops displaying the color lights corresponding to the plurality of colors and displays the prompt information when the face of the target user is not detected in the continuous video frames, so that the target user adjusts the acquisition position based on the prompt information. The stop of displaying the color lights corresponding to the plurality of colors here is understood as the stop of light emission of the color lights. The prompt message may prompt the user to keep the face (human face) in a preset acquisition area (for example, the middle or a preset area of the light-emitting screen), and so on. The target user can adjust the position of the target user according to the prompt message, so that the image acquisition device or the light-emitting device can acquire a real-time face video containing the face of the target user.

After stopping displaying the color light and the explicit prompt information corresponding to the colors, the processor 620 may return to the step of selecting the colors from the preset color set until the face of the target user is detected in the consecutive video frames, so that the captured real-time face video may be used as the face video of the target user's face under the reflection of the color light.

It should be noted that after the color lights corresponding to the plurality of colors and the display prompt information are stopped being displayed, the randomly dazzled lighting/display may be restarted (the color lights corresponding to the plurality of colors are selected and randomly lighted), and the timing may be continued while the randomly selected color lights are lighted/displayed, but the timing is started from 0, that is, the timing is started from 0 again each time the color lights corresponding to the plurality of colors are selected and lighted, so that the face video corresponding to the preset display time may be obtained.

S120: and performing material classification on the pixels of each facial image frame in the multi-frame facial image frames by adopting an anti-attack detection model to obtain the pixel material category of each pixel.

The pixel material category may be a material category corresponding to a pixel of the facial image frame, and the pixel material category may include one of a living body category or a non-living body category. The living body type may be the material of a normal face. The non-living object may be a material that resists attack, such as paper, glasses, or a screen.

The anti-attack detection model can be a model for performing anti-attack detection on the face video of the target user. The counter attack detection model may include a pixel material classification network. The pixel material classification network may be a network that classifies the material of pixels in the facial image frame.

The method for obtaining the pixel material category of each pixel by using the anti-attack detection model to classify the material of the pixel of each face image frame in the multiple frame face image frames may be various, and specifically may be as follows:

for example, the processor 620 may perform feature extraction on each face image frame in a plurality of face image frames by using a pixel material classification network, obtain a feature map corresponding to each face image frame, perform material classification on each pixel of the corresponding face image frame based on the feature map, obtain a pixel material classification probability of each pixel, and determine a pixel material category of each pixel based on the pixel material classification probability.

The pixel material classification network may further include a sub-network of material classification at the pixel level. The pixel material classification probability may be. For example, the processor 620 may extract a material classification feature of each pixel of the corresponding facial image frame from the feature map by using a material classification sub-network, and determine a pixel material classification probability of the corresponding pixel based on the material classification feature, or the processor 620 may extract a pixel image feature of each pixel of the corresponding facial image frame from the feature map, and classify the pixel image feature by using the material classification sub-network, thereby obtaining a pixel material classification probability of the corresponding pixel.

After obtaining the pixel material classification probability for each pixel, the processor 620 determines the pixel material class for each pixel based on the pixel material classification probability. The pixel material classification may include a probability corresponding to each pixel material class, or may include a probability corresponding to a particular pixel material class. For example, the processor 620 may extract a classification probability corresponding to the candidate pixel material category from the pixel material classification probabilities, and select a candidate pixel material category with the highest classification probability from the candidate pixel material categories as the pixel material category of the corresponding pixel, or may extract a classification probability corresponding to the candidate pixel material category from the pixel material classification probabilities, and select a candidate pixel material category with a classification probability exceeding a preset probability threshold from the candidate pixel material categories as the pixel material category of the corresponding pixel.

The pixel material classification network can be directly obtained, and a preset pixel material classification network can be trained, so that the pixel material classification network is obtained. Therefore, the training process of the pixel material classification network may include the following steps: the processor 620 may obtain a facial image sample including a facial image of an annotated material category, predict a prediction boundary region and a prediction material category of each pixel in the facial image sample using a preset pixel material classification network, determine target classification loss information of the facial image sample based on the prediction region boundary, the prediction material category, and the annotated material category, and converge the preset pixel material classification network based on the target classification loss information to obtain a pixel material classification network.

The predicted region boundary may be a boundary between the live body region and the attack region in the predicted face image sample. The living body region may be a region in which a material in the face is a living body. The attack area may be an area of the surface where the material is not a living body, and the countermeasure is taken as an example of the countermeasure sticker, and the attack area may be an area where the countermeasure sticker is located, and therefore, may be referred to as a countermeasure area. The preset pixel texture classification network may include a feature map coding sub-network, a pixel texture classification sub-network, and a boundary awareness sub-network. The feature map coding sub-network is a network for performing feature map coding on the face image sample, the pixel material classification network may be a network for performing material classification on each pixel in the face image sample, and the boundary sensing sub-network may be a network for identifying a region boundary between a living body region and an attack region in the face image sample. For example, the processor 620 may perform feature extraction on the facial image sample using the feature map coding subnetwork to obtain a sample feature map, perform material classification on each pixel of the facial image sample using the pixel material classification subnetwork based on the sample feature map to obtain a predicted material category of each pixel of the facial image sample, and identify a boundary between the living body region and the attack region in the facial image sample using the boundary sensing subnetwork based on the sample feature map to obtain a predicted region boundary.

The manner of predicting the predicted material type of each pixel in the face image sample by the pixel material classification sub-network in the processor 620 based on the sample feature map is similar to the manner of predicting the predicted material type of each pixel in the face video frame, which is described above in detail and is not described herein any more.

For example, the processor 620 may identify a region boundary classification probability corresponding to each pixel in the face image sample by using the boundary sensing subnetwork, determine a target pixel corresponding to a region boundary in the face image sample based on the region boundary classification probability, determine a boundary between the living body region and the attack region of the face image sample based on a pixel position of the target pixel, and obtain a predicted region boundary, or may also identify the living body region and the attack region in the face image sample by using the boundary sensing subnetwork, thereby obtaining a region boundary between the living body region and the attack region, and obtaining a predicted region boundary.

After obtaining the predicted region boundary and the predicted material class for each pixel of the facial image sample, the processor 620 may determine target classification loss information for the facial image sample based on the predicted region boundary, the predicted material class, and the labeled material class. The target classification loss information may be loss information generated by pixel texture classification in the face image sample. For example, the processor 620 may compare the predicted material category with the labeled material category to obtain pixel classification loss information of pixels in the facial image sample, select a preset number of pixel predicted material categories of adjacent pixels of pixels in the facial image sample from the predicted material categories, compare the adjacent predicted material categories with the predicted material categories of corresponding pixels to obtain continuity classification loss information, obtain a labeled target boundary region in the facial image sample, compare the target boundary region with a predicted region boundary to obtain boundary classification loss information, and accumulate the pixel classification loss information, the continuity classification loss information, and the boundary classification loss information to obtain the target classification loss information.

The pixel classification loss information may be loss information generated by a difference between a predicted material type of each pixel in the face image sample and a material type labeled by the pixel in the face image sample. For example, the processor 620 may compare the predicted pixel material category with the labeled material category by using a first classification loss function to obtain pixel classification loss information of each pixel in the face image sample, or may compare the predicted pixel material category with the labeled material category by using another loss function to obtain pixel classification loss information of each pixel in the face image sample, or the like.

The first classification loss function may be of various types, and may include, for example, a cross-entropy loss function or other types of loss functions, and so on.

The continuous classification loss information may be loss information generated by a difference between a predicted material class of each pixel in the face image sample and a predicted material class of a pixel adjacent to the pixel. The constraint condition of the continuity classification loss information is that the pixel material classification results of the constrained adjacent pixels should be kept consistent as much as possible. The adjacent pixels may be pixels adjacent to one pixel in the face image sample, for example, the preset number is 4, the adjacent pixels may be pixels around one pixel, and for example, the preset number is 8, the adjacent pixels may be pixels in eight directions, i.e., front left, front right, front back, left back, rear right, left and right. For example, the processor 620 selects a preset number of adjacent pixels corresponding to each pixel from the facial image sample, and extracts a predicted material category corresponding to the adjacent pixel from the predicted material category to obtain an adjacent predicted material category.

After selecting the neighboring predicted material category of the neighboring pixel corresponding to each pixel, the processor 620 may compare the neighboring predicted material category with the predicted material category of the corresponding pixel to obtain the continuity classification loss information. For example, the processor 620 compares the adjacent predicted material category with the predicted material category of the corresponding pixel, selects a target adjacent pixel with the same material category from the adjacent pixels based on the comparison result, and determines the continuous classification loss information based on the number of pixels of the target adjacent pixel, or may also compare the adjacent predicted material category with the predicted material category of the corresponding pixel by using a second classification loss function, so as to obtain the continuous classification loss information, and so on.

Wherein the second classification loss function may be a cross-entropy loss function or other classification loss function, and so on. The first classification loss function and the second classification loss function may be the same or different.

The boundary classification loss information may be loss information generated by a difference between a predicted boundary region predicted by using a boundary sensing subnetwork and a target region boundary. There are various ways to compare the target area boundary with the prediction area boundary, for example, the processor 620 may obtain a position error between the target area boundary and the prediction area boundary, and determine the boundary classification loss information based on the position error, or may also compare the target area boundary with the prediction area boundary by using a third classification loss function to obtain the boundary classification loss information, and so on.

The type of the third classification loss function may be various, and for example, the third classification loss function may include a cross-entropy loss function or other types of classification loss functions. The first classification loss function, the second classification loss function, and the third classification loss function may be the same or different.

After determining the pixel classification loss information, the continuity classification loss information, and the boundary classification loss information, the processor 620 may accumulate the pixel classification loss information, the continuity classification loss information, and the boundary classification loss information to obtain target classification loss information, which may be specifically represented by formula (1):

Loss _total1 ＝Loss _pixel-cls +Loss _{cls-consistency} +Loss _edge (1)

therein, loss _total1 Loss information for target classification, loss _pixel-cls Classifying Loss information, loss, for pixels _{cls-consistency} For continuous classification of Loss information, loss _edge The loss information is classified for the boundary.

In some embodiments, the pixel classification loss information, the continuity classification loss information, and the boundary classification loss information are weighted and then accumulated, for example, the processor 620 may obtain a classification loss weight, respectively weight the pixel classification loss information, the continuity classification loss information, and the boundary classification loss information based on the classification loss weight, and accumulate the weighted pixel classification loss information, the weighted contact classification loss information, and the weighted boundary classification loss information to obtain the target classification loss information.

After determining the target classification loss information, the processor 620 may converge the predetermined pixel material classification network based on the target classification loss information to obtain a pixel material classification network. For example, the processor 620 may update the network parameters of the preset pixel material classification network by using a gradient descent algorithm based on the target classification loss information, and return to perform the step of obtaining the facial image sample until the preset pixel material classification network converges, so as to obtain the trained pixel material classification network, or may update the network parameters of the preset pixel material classification network by using other network parameter update algorithms based on the target classification loss information, and return to perform the step of obtaining the facial image sample until the preset pixel material classification network converges, so as to obtain the trained pixel material classification network, and the like.

S130: and determining the anti-attack detection result of the target user based on the pixel material category, and outputting the anti-attack detection result.

For example, the processor 620 may select at least one target pixel corresponding to a non-living body category in the face image frame, identify a non-living body region corresponding to the at least one target pixel in the face image frame, and determine a result of the detection of the counter attack by the target user based on the non-living body region.

The target attack area may be an area occupied by a target pixel corresponding to a non-living body type in the facial image frame. For example, the processor 620 may determine an area ratio of the target attack region in the corresponding facial image frame to obtain a target area ratio, and when the target area ratio is greater than a preset area ratio threshold, determine the target user as the counter attack user, and use the counter attack user as the counter attack detection result.

Wherein, the target area ratio may be a ratio between an area of the target attack region and an area of the facial image frame. When the target area ratio is greater than the preset area ratio threshold, it can be understood that the area of the predicted attack area (countermeasure area) in the face of the target user exceeds the preset area threshold, and at this time, it can be determined that the attack area exists on the face of the target user, and the attack area can be an area occupied by a countermeasure sticker or counterglasses, so that it can be determined that the target user is a countermeasure attack user. On the contrary, when the target face proportion is smaller than the preset face proportion threshold, it can be understood that the face of the predicted attack region (confrontation region) in the face of the target user does not exceed the preset area threshold, and at this time, it can be determined that the attack region does not necessarily exist in the face of the target user, and therefore, it can be determined that the target user is a living user, and the living user is taken as a confrontation attack detection result of the target user.

The processor 620 may output the counter attack detection result after determining the counter attack detection result of the target user. For example, the processor 620 may directly send the counter attack detection result to the client 200, the terminal, or the server corresponding to the user 100, so that the client 200, the terminal, or the server responds to the target user or the service request corresponding to the target user based on the counter attack detection result, or may directly visually display the counter attack detection result, or the like.

For example, the processor 620 may directly display the counterattack detection result, or may display the counterattack detection result in a sound-light manner or the like (for example, the counterattack detection result is broadcasted by voice, or different types of counterattack detection results are displayed by different colors of light, or the counterattack detection result is displayed in a sound-light linkage manner), or may display the counterattack detection result for a specific type of counterattack detection result (for example, only the counterattack detection result for the counterattack user is displayed, or only the counterattack detection result for the normal user is displayed, or the like).

In some embodiments, the processor 620 may further determine an anti-attack detection result of the target user or output the anti-attack detection result, and then respond to the target user or the service request corresponding to the target user based on the anti-attack detection result, and the responding manner may be multiple, for example, the processor 620 may directly intercept the service request corresponding to the target user or the target user, or the processor 620 may also directly perform secondary verification on the target user, and perform final response on the service request corresponding to the target user or the target user based on the secondary verification result, and the like.

In some embodiments, the counter attack detection model may further include a light sequence check network, where the light sequence check network checks light emitting timings of the color lights corresponding to the plurality of colors, so as to implement counter attack detection on the target user. For example, after the processor 620 obtains a facial video of the target user under illumination of multiple colors, the processor 620 may further input the multiple frames of facial image frames into the optical sequence verification network to obtain a predicted image feature and a current light emitting sequence of color light corresponding to the multiple colors, determine a detection result of the target user against the attack based on the predicted image feature and the current light emitting sequence, and output the detection result of the target user against the attack.

The predicted image feature may include an image feature of a next frame of face image frame of each of the predicted plurality of frame face image frames, the image feature may be a feature vector representing image information in the face image frame, and the feature vector may be multidimensional, for example, 256 dimensions or other dimensions. The optical sequence verification network may include an optical sequence signature coding subnetwork, an optical sequence prediction subnetwork, and a signature prediction subnetwork. The light sequence feature coding sub-network is a network for carrying out feature coding on the facial image frames. The light sequence prediction sub-network is a network for predicting the light emission sequence of color light corresponding to a plurality of colors. The feature prediction subnetwork is a network that predicts image features of a next frame of the facial image frame. For example, the processor 620 may perform feature extraction on the multiple frames of face image frames by using the light sequence feature coding sub-network to obtain an image feature of each face image frame, predict the current light emitting sequence of the color light corresponding to the multiple colors by using the light sequence prediction sub-network based on the image feature, and predict a predicted image feature of a face image frame of a next frame of each face image frame by using the feature prediction sub-network based on the image feature.

The processor 620 may determine the image feature and the current light-emitting sequence after determining the image feature and the current light-emitting sequence of the color lights corresponding to the plurality of colors, and determine the detection result of the target user against the attack. For example, the processor 620 may obtain a feature difference value between an image feature of each facial image frame and a corresponding predicted image feature, determine the target user as an anti-attack user when the current lighting sequence prediction error or the feature difference value is greater than a preset feature difference value threshold, and use the anti-attack user as an anti-attack detection result.

For example, the facial image feature of the first frame may predict a predicted image feature of the facial image feature of the second frame, and so on, so that the processor 620 may compare the image feature of the facial image frame of the second frame with the predicted image feature of the facial image frame of the second frame predicted by the image feature of the first facial image frame, thereby obtaining a feature difference between an actual image feature of the second facial image frame and the corresponding predicted image feature, and so on, and thus obtain a feature difference corresponding to each facial image frame (excluding the first facial image frame).

The target light-emitting sequence of the color light corresponding to the multiple colors can be obtained, the target light-emitting sequence is compared with the predicted current light-emitting sequence, if the target light-emitting sequence is the same as the current light-emitting sequence, the current light-emitting prediction can be determined to be correct, and if the target light-emitting sequence is different from the current light-emitting sequence, the current light-emitting prediction can be determined to be wrong.

The processor 620 may determine that the target user is the counterattack user when the current light-emitting sequence prediction is wrong or the feature difference is greater than the preset feature difference threshold, and determine that the counterattack user is the counterattack detection result of the target user. The current light-emitting order prediction error may be such that the current light-emitting order is different from a target light-emitting order in which color lights corresponding to the plurality of colors actually emit light. The condition that the feature difference value is greater than the preset feature difference threshold may be various, for example, the feature difference value of each of the face image frames is greater than the preset feature difference threshold, or an extreme value (maximum value and/or minimum value) of the feature difference values of each of the face image frames is greater than the preset feature difference threshold, or a mean value of the feature difference values of each of the face image frames is greater than the preset feature difference threshold, and so on.

The processor 620 may further determine that the target user is a living user when the current light emitting sequence prediction is correct or the feature difference value is smaller than the preset feature difference value threshold, and detect a result of counterattack attack on the living user as the target user. The current lighting order prediction may be correct such that the current lighting order is the same as the target lighting order in which the color lights corresponding to the plurality of colors actually emit light. The condition that the feature difference value is smaller than the preset feature difference value threshold may be various, for example, the feature difference value of each of the face image frames is smaller than the preset feature difference value threshold, or an extreme value (maximum value and/or minimum value) of the feature difference values of each of the face image frames is smaller than the preset feature difference value threshold, or an average value of the feature difference values of each of the face image frames is smaller than the preset feature difference value threshold, and so on.

The processor 620 may output the anti-attack detection result after determining the anti-attack detection result of the target user, and the process of outputting the anti-attack detection result is described above, and is not described herein again.

In some embodiments, before the processor 620 uses the optical sequence check network to perform attack-countermeasure detection on the target user, the optical sequence check network may also be directly obtained or a preset optical sequence check network may be trained, so as to obtain a trained optical sequence check network.

The training process of the optical sequence verification network may include the following steps: the processor 620 may obtain a face video sample of a user sample under illumination of multiple colors, where the face video sample includes multiple frames of face image sample frames, perform feature extraction on the multiple frames of face image sample frames by using a preset optical sequence verification network to obtain a sample image feature of each face image sample in the multiple frames of face image sample frames, determine a predicted light-emitting sequence and a predicted sample image feature of illumination of multiple colors based on the sample image features, where the predicted sample image feature includes a sample image feature of a next frame of face image sample of each face image sample in the predicted multiple frames of face image sample frames, and determine a target verification sequence of the user sample based on the predicted light-emitting sequence and the predicted sample image feature, and converge the preset optical sequence verification network based on target verification loss information to obtain a trained optical sequence verification network.

The user sample is a living body user sample, and the living body user sample can be understood as that the training sample is a living body user, the face of the living body user does not include a confrontation area or an attack area, or can be understood as that all pixel materials of the face image of the user sample are living bodies. When the preset optical sequence check network is trained through the face video sample of the living body user sample, the preset optical sequence check network can learn the light-emitting sequence (optical sequence information) and the image characteristics of the prediction sample in the face video sample corresponding to the living body user sample (a normal sample or a non-counterattack sample), and further the detection precision of counterattack is improved. In addition, the way of acquiring the face video sample by the processor 620 is similar to the way of acquiring the face video, which is described above in detail, and is not described in detail here. In addition, the manner of extracting the sample image features and predicting the predicted light-emitting sequence and the predicted sample image features by the processor 620 is similar to the manner of extracting the image features, predicting the current light-emitting sequence and predicting the image features, which is described above in detail and is not repeated herein.

The processor 620, after predicting the predicted lighting order and the predicted sample image characteristics, may determine target verification loss information for the user sample based on the predicted lighting order and the predicted sample image characteristics. The target verification loss information may be loss information generated after predicting a light emitting order corresponding to a face video of the user sample and a sample image feature using the light sequence. For example, the processor 620 may obtain a labeled light emitting sequence corresponding to the facial video sample, compare the labeled light emitting sequence with the predicted light emitting sequence to obtain light sequence loss information, determine a sample feature difference between a sample image feature of each facial image frame and a corresponding predicted sample image feature to obtain feature loss information, and fuse the light sequence loss information and the feature loss information to obtain target verification loss information of the user sample.

The light sequence loss information may be loss information generated by a difference between a light emitting sequence predicted by the light sequence verification network and a real light emitting sequence corresponding to the face video sample. There are various ways to compare the annotated lighting order with the predicted lighting order, for example, the processor 620 may compare the annotated lighting order with the predicted lighting order by using a comparison loss function to obtain the light sequence loss information, or may compare the annotated lighting order with the predicted lighting order and determine the light sequence loss information based on the comparison result, and so on.

The characteristic loss information may be loss information generated by a difference between a predicted sample image characteristic of the facial image frame sample predicted by the optical sequence verification network and an actually extracted sample image characteristic. The manner of determining the sample feature difference between the sample image features of each facial image frame sample and the corresponding predicted sample image features is similar to the manner of determining the feature difference, which is described above in detail, and is not repeated here. The processor 620 may determine the feature loss information based on the sample feature differences after determining the sample feature differences, for example, the processor 620 may use the sample feature differences as the feature loss information directly, or may obtain a mean or an extreme (maximum and/or minimum) of the sample feature differences for each facial image frame sample and use the mean or extreme as the feature loss information, and so on.

After determining the optical sequence loss information and the characteristic loss information, the processor 620 may fuse the optical sequence loss information and the characteristic loss information to obtain target verification loss information. There are various ways to fuse the optical sequence loss information and the characteristic loss information, for example, the processor 620 may directly add the optical sequence loss information and the characteristic loss information to obtain the target verification loss information, which may be specifically shown in formula (2):

Loss _total2 ＝Loss _light +Loss _pred (2)

therein, loss _total2 Checking Loss information for purposes of Loss _light Loss of information for optical sequences, loss _pred Loss information is characteristic.

In some embodiments, the manner in which the processor 620 fuses the light sequence loss information and the characteristic loss information may further include: the processor 620 obtains a preset verification loss weight, weights the optical sequence loss information and the characteristic loss information based on the preset verification weight, and adds the weighted optical sequence loss information and the weighted characteristic loss information to obtain target verification loss information.

After determining the target verification loss information, the processor 620 may converge the predetermined optical sequence verification network based on the target verification loss information. The manner of converging the predetermined optical sequence check network is similar to the manner of converging the predetermined pixel material classification network, which is described above in detail and is not described herein again.

It should be noted that the attack-countermeasure detection model may include at least one of a pixel material classification network or an optical sequence verification network. That is to say, in the detection method of the counterattack, the processor 620 may simultaneously use the pixel material classification network and the optical sequence verification network to perform counterattack detection on the target user, and if any one detection result of the counterattack is the counterattack user, it may be determined that the final detection result of the counterattack of the target user is the counterattack user; alternatively, any one of the pixel material classification network and the optical sequence verification network may be used to perform the counter attack detection, and the detection result is used as the final counter attack detection result of the target user, for example, when the processor 620 performs the counter attack detection by using the pixel material classification network, if the counter attack detection result is the counter attack user, the counter attack user may be used as the final counter attack detection result of the target user, and if the counter attack detection result is the live user, the live user may be used as the final counter attack detection result of the target user.

For example, the processor 620 may directly display the anti-attack detection result, or may display the anti-attack detection result in a sound and light manner (for example, the anti-attack detection result is broadcasted through voice, or different types of anti-attack detection results are displayed through different colors of light, or the anti-attack detection result is displayed in a sound and light linkage manner), or may display the anti-attack detection result for a specific type of anti-attack detection result (for example, only the type of anti-attack detection result for an anti-attack user is displayed, or only the type of anti-attack detection result for a normal user is displayed, etc.), and so on.

In some embodiments, the processor 620 may further determine an anti-attack detection result of the target user or output the anti-attack detection result, and then respond to the target user or the service request corresponding to the target user based on the anti-attack detection result, where the responding manner may be multiple, for example, the processor 620 may directly intercept the service request corresponding to the target user or the target user, or the processor 620 may also directly perform secondary verification on the target user and perform final response on the service request corresponding to the target user or the target user based on the secondary verification result, and so on.

In a face recognition scene, the present solution may adopt an anti-attack detection model to perform anti-attack detection on a target user, taking the anti-attack detection model as an example, where the anti-attack detection model includes a pixel material classification network and an optical sequence verification network, and the overall flow of the anti-attack detection may be as shown in fig. 4, and specifically may include four parts, such as data acquisition and preprocessing, pixel-level material classification, optical sequence verification, and anti-attack detection, and specifically may be as follows:

(1) Data acquisition and preprocessing: in the conventional anti-attack detection method, silent acquisition is generally adopted, that is, no active information is introduced in the acquisition process, so that the purpose of improving the user experience is achieved, but the detection capability for some anti-attacks is obviously reduced because no additional interactive information is introduced. Therefore, random lighting information is introduced in the data acquisition stage, and then the anti-attack detection is carried out by utilizing the reflection characteristics of different materials to different colors of light. The data acquisition process may include randomly selecting three colors in { red, blue, yellow, green }, lighting in a random order for 3 seconds, wherein each selected color lasts for 1 second, and then continuously acquiring the face data of the user within 3 seconds of the random lighting. The data preprocessing process may include face detection on the collected data, and if no face is detected in 3 consecutive frames, the lighting is exited, and the user is prompted to keep the face in the center of the screen, and simultaneously, the lighting and timing of the random dazzling are restarted (the timing is restarted from 0), so that the facial video of the target user under the illumination (light sequence) of various colors is obtained.

(2) Pixel level material classification: the material classification task is applied to a living body detection algorithm, for example, classifying the material of a face image into a living body, a screen, paper, and the like. However, the conventional method classifies the whole face area as one material, which is difficult to be applied to the attack countermeasure (the countermeasure sticker occupies only a part of the face area, and thus it is a part of living bodies, a part of non-living bodies (such as paper sheets), and the like in practice). Therefore, in the scheme, the pixel material classification network at the pixel level is trained, the specific training process refers to the above, the trained pixel material classification network is used for classifying the materials of the pixels in the facial image frame, and the specific classification process refers to the above, so that the counter attack which only occupies a part of the area of the face is realized. In addition, due to the introduction of various light information, the result of material classification is more accurate than that of the common light condition (equivalent to multispectral material judgment), so that the detection accuracy of resisting attack is improved.

(3) And (3) optical sequence verification: in order to further mine information of optical information which is beneficial to anti-attack detection, the link performs optical sequence verification network training by mining time sequence information and using optical sequence prediction as an agent task to obtain a trained optical sequence verification network, and the specific training process can be referred to as the above. And taking the optical sequence check network and the pixel material classification network as anti-attack detection models.

(4) And (3) resisting attack detection: it can be classified into counterattack detection based on material classification and counterattack detection based on optical sequence verification. The process of detecting the counterattack based on the material classification can include inputting the collected polished face image into a pixel-level material classification model to obtain corresponding material classification probability, and if the area occupation ratio of the material classification as a non-living body region (an attack region or a counterattack region) is higher than a preset threshold value T1, judging the material classification as the counterattack, otherwise, judging the material classification as the living body. The counterattack detection process based on the light sequence verification can include inputting the collected shining face image into a light sequence verification model, obtaining a light sequence prediction result (predicted light-emitting sequence) and a corresponding characteristic prediction result, calculating a difference value between the predicted characteristic and a real characteristic, judging as counterattack if the light sequence prediction result is wrong or the characteristic difference value is larger than a preset threshold value T2, and otherwise, judging as a living body. For the two methods, as long as one method judges that the attack is resisted, the attack is judged to be resisted, and otherwise, the attack is judged to be a living body.

According to the scheme, firstly, random colorful light (such as light with various colors) is introduced in an interaction stage, corresponding collection and pretreatment are carried out, so that more information is introduced, and then a material classification model at a pixel level is trained to deal with counterattack which only occupies a part of the area of the face. In addition, the detection of the attack can be assisted by performing color prediction on the acquired sequence. And finally, performing final anti-attack detection based on the material classification result of the pixel level and the analysis result of the color sequence. Random actions are introduced in the scheme, and detection precision of the attack resistance is greatly improved on the premise of ensuring that normal samples are not influenced.

To sum up, in the detection method P100 and the system 001 for resisting attacks provided in this specification, after the facial video of the target user under illumination of multiple colors is obtained, the material classification is performed on the pixels of each facial image frame in the multi-frame facial image frames of the facial video by using the attack-resisting detection model, the pixel material category of each pixel is obtained, the attack-resisting detection result of the target user is determined based on the pixel material category, and the attack-resisting detection result is output; according to the scheme, random dazzling (illumination of multiple colors) is introduced at the interaction stage of anti-attack detection, so that more facial information is introduced, then the pixel material category of each facial image frame is predicted in the collected facial video, and the anti-attack detection result is determined based on the pixel material category, so that the attack with smaller attack element area can be responded, and the subsequent face identification success rate of a normal sample cannot be influenced, so that the detection precision of the anti-attack can be improved.

Another aspect of the present description provides a non-transitory storage medium storing at least one set of executable instructions for performing detection against an attack. When executed by a processor, the executable instructions direct the processor to perform the steps of the detection method against attacks P100 described herein. In some possible implementations, various aspects of the present description may also be implemented in the form of a program product including program code. When the program product is run on the computing device 600, the program code is adapted to cause the computing device 600 to perform the steps of the detection method against attacks P100 described in this specification. A program product for implementing the above-described methods may employ a portable compact disc read only memory (CD-ROM) including program code and may be run on the computing device 600. However, the program product of this description is not limited in this respect, as a readable storage medium can be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system. The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations for this specification may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on computing device 600, partly on computing device 600, as a stand-alone software package, partly on computing device 600 and partly on a remote computing device, or entirely on the remote computing device.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In conclusion, upon reading the present detailed disclosure, those skilled in the art will appreciate that the foregoing detailed disclosure can be presented by way of example only, and not limitation. Those skilled in the art will appreciate that the present specification contemplates various reasonable variations, enhancements and modifications to the embodiments, even though not explicitly described herein. Such alterations, improvements, and modifications are intended to be suggested by this specification, and are within the spirit and scope of the exemplary embodiments of this specification.

Furthermore, certain terminology has been used in this specification to describe embodiments of the specification. For example, "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined as suitable in one or more embodiments of the specification.

It should be appreciated that in the foregoing description of embodiments of the specification, various features are grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the specification, for the purpose of aiding in the understanding of one feature. This is not to be taken as an admission that any of the above-described features are required in combination, and it is fully possible for a person skilled in the art, on reading this description, to identify some of the devices as single embodiments. That is, the embodiments in the present specification may also be understood as an integration of a plurality of sub-embodiments. And each sub-embodiment described herein is equally applicable to less than all features of a single foregoing disclosed embodiment.

Each patent, patent application, publication of a patent application, and other material, such as articles, books, descriptions, publications, documents, articles, and the like, cited herein is hereby incorporated by reference. All matters hithertofore set forth herein except to the extent they relate to any prosecution history, any prosecution history which may be inconsistent or conflicting with this document or any prosecution history which may have a limiting effect on the broadest scope of the claims appended hereto. Now or later associated with this document. For example, if there is any inconsistency or conflict in the description, definition, and/or use of terms associated with any of the included materials with respect to the terms, descriptions, definitions, and/or uses associated with this document, the terms in this document are used.

Finally, it should be understood that the embodiments of the application disclosed herein are illustrative of the principles of the embodiments of the present specification. Other modified embodiments are also within the scope of this description. Accordingly, the embodiments disclosed herein are to be considered in all respects as illustrative and not restrictive. Those skilled in the art may implement the applications in this specification in alternative configurations according to the embodiments in this specification. Therefore, the embodiments of the present description are not limited to the embodiments described precisely in the application.

Claims

1. A detection method against attacks, comprising:

acquiring a face video of a target user under illumination of multiple colors, wherein the face video comprises multiple frames of face image frames;

adopting an anti-attack detection model to classify the material of the pixels of each facial image frame in the multi-frame facial image frames to obtain the pixel material category of each pixel; and

and determining the anti-attack detection result of the target user based on the pixel material category, and outputting the anti-attack detection result.

2. The attack-countering detection method of claim 1, wherein the acquiring of the facial video of the target user under illumination of a plurality of colors comprises:

selecting a plurality of colors from a preset color set, and determining a target light-emitting sequence corresponding to the plurality of colors;

displaying color light corresponding to the plurality of colors based on the target light emitting sequence and preset display time; and

and acquiring a face video of the face of the target user under the reflection of the color light.

3. The detection method against attack as claimed in claim 2, wherein the plurality of colors includes at least two of red, blue, yellow or green.

4. The method for detecting attack immunity according to claim 2, wherein the color light corresponding to each color of the plurality of colors is emitted separately and the duration of the emission exceeds a preset time threshold.

5. The attack-countering detection method of claim 2, wherein the capturing of the facial video of the target user's face under the color light reflection comprises:

carrying out face detection on a preset number of continuous video frames in the collected real-time face video of the target user;

when the face of the target user is not detected in the continuous video frames, stopping displaying the color lights corresponding to the multiple colors, and displaying prompt information so that the target user can adjust the acquisition position based on the prompt information; and

and returning to the step of selecting multiple colors from the preset color set until the face of the target user is detected in the continuous video frames, and obtaining a face video of the face of the target user under the reflection of the color light.

6. The attack-countering detection method of claim 1, wherein the attack-countering detection model comprises a pixel material classification network; and

the method for classifying the material of the pixels of each facial image frame in the multi-frame facial image frames by adopting the anti-attack detection model to obtain the pixel material category of each pixel comprises the following steps:

extracting the features of each facial image frame in the multiple facial image frames by adopting the pixel material network to obtain a feature map corresponding to each facial image frame,

performing material classification on each pixel of the corresponding facial image frame based on the feature map to obtain a pixel material classification probability of each pixel, and

and determining the pixel material category of each pixel based on the pixel material classification probability.

7. The attack-resistant detection method according to claim 6, wherein the training process of the pixel material classification network comprises the following steps:

acquiring a facial image sample, wherein the facial image sample comprises a facial image of an annotation material category;

predicting a prediction region boundary and a prediction material category of each pixel in the facial image sample by adopting a preset pixel material classification network, wherein the prediction region boundary is a boundary between a living body region and an attack region in the facial image sample; and

and determining target classification loss information of the facial image sample based on the prediction region boundary, the prediction material category and the labeling material category, and converging the preset pixel material classification network based on the target classification loss information to obtain the pixel material classification network.

8. The attack-resistant detection method according to claim 7, wherein the preset pixel texture classification network comprises a feature map coding sub-network, a pixel texture classification sub-network and a boundary perception sub-network; and

the predicting the prediction region boundary and the prediction material category of each pixel in the face image sample by adopting a preset pixel material classification network comprises the following steps:

adopting the feature map coding sub-network to perform feature extraction on the facial image sample to obtain a sample feature map,

based on the sample feature map, performing material classification on each pixel of the facial image sample by using the pixel material classification sub-network to obtain a predicted material category of each pixel of the facial image sample, and

and identifying the boundary between the living body region and the attack region in the face image sample by adopting the boundary perception subnetwork based on the sample feature map to obtain the boundary of the prediction region.

9. The attack-countering detection method of claim 7, wherein the determining target classification loss information for the facial image sample based on the predicted region boundary, predicted material class, and annotated material class comprises:

comparing the prediction material category with the labeling material category to obtain pixel classification loss information of pixels in the face image sample;

selecting the adjacent prediction material classes of the preset number of adjacent pixels of the pixels in the face image sample from the prediction material classes, and comparing the adjacent prediction material classes with the prediction material classes of the corresponding pixels to obtain continuity classification loss information;

acquiring a target area boundary labeled in the face image sample, and comparing the target area boundary with the prediction area boundary to obtain boundary classification loss information; and

and accumulating the pixel classification loss information, the continuity classification loss information and the boundary classification loss information to obtain the target classification loss information.

10. The method of detecting countering attacks according to claim 1, wherein the pixel material category includes one of a living category or a non-living category; and

the determining the anti-attack detection result of the target user based on the pixel material category comprises:

selecting at least one target pixel corresponding to the non-living body category in the facial image frame,

identifying a target attack area corresponding to the at least one target pixel in the facial image frame, an

And determining the anti-attack detection result of the target user based on the target attack area.

11. The method for detecting counterattack according to claim 10, wherein the determining the counterattack detection result of the target user based on the target attack area comprises:

determining the area ratio of the target attack area in the corresponding facial image frame to obtain a target area ratio; and

and when the target area ratio is larger than a preset area ratio threshold value, determining that the target user is a counterattack user, and taking the counterattack user as the counterattack detection result.

12. The attack-countering detection method of claim 1, wherein the attack-countering detection model further comprises an optical sequence check network; and

after the obtaining of the facial video of the target user under the illumination of the plurality of colors, the method further comprises:

inputting the plurality of facial image frames to the light sequence verification network to obtain a predicted image feature and a current light emission sequence of color light corresponding to the plurality of colors, the predicted image feature including a predicted image feature of a facial image frame next to each of the plurality of facial image frames, and

and determining the anti-attack detection result of the target user based on the predicted image characteristic and the current light-emitting sequence, and outputting the anti-attack detection result.

13. The attack-resistant detection method of claim 12, wherein the optical sequence verification network comprises an optical sequence feature coding sub-network, an optical sequence prediction sub-network and a feature prediction sub-network; and

the inputting the plurality of facial image frames into the optical sequence verification network to obtain a predicted image feature and a current light emitting sequence of the color light corresponding to the plurality of colors includes:

adopting the light sequence feature coding sub-network to perform feature extraction on the plurality of facial image frames to obtain the image features of each facial image frame,

predicting a current light emission sequence of the color lights corresponding to the plurality of colors using the light sequence prediction subnetwork based on the image features, an

Based on the image features, predicting, with the feature prediction subnetwork, predicted image features of a facial image frame next to the each facial image frame.

14. The method for detecting counterattack according to claim 13, wherein the determining the target user's counterattack detection result based on the current lighting sequence and the predicted image feature comprises:

acquiring a feature difference value between the image feature of each facial image frame and the corresponding predicted image feature; and

and when the current light-emitting sequence is wrongly predicted or the characteristic difference value is larger than a preset characteristic difference value threshold value, determining the target user as an anti-attack user, and taking the anti-attack user as the anti-attack detection result.

15. The method of detecting attack immunity according to claim 12, wherein the training process of the optical sequence verification network comprises the steps of:

acquiring a face video sample of a user sample under illumination of multiple colors, wherein the face video sample comprises multiple frames of face image frame samples;

performing feature extraction on the multiple frames of facial image frame samples by adopting a preset optical sequence check network to obtain sample image features of each facial image frame sample in the multiple frames of facial image frame samples;

determining a predicted lighting order of the illumination of the plurality of colors and a predicted sample image feature based on the sample image features, the predicted sample image feature comprising a predicted sample image feature of a next frame image frame sample of each of the plurality of frames of facial image frame samples; and

and determining target verification loss information of the user sample based on the predicted light emitting sequence and the predicted sample image characteristics, and converging the preset optical sequence verification network based on the target verification loss information to obtain the trained optical sequence verification network.

16. The method of detecting a counter attack according to claim 15, wherein the user sample is a live user sample.

17. The attack-resistant detection method according to claim 15, wherein the determining target verification loss information of the user sample based on the predicted lighting sequence and the predicted sample image feature comprises:

acquiring a labeling light-emitting sequence corresponding to the face video sample, and comparing the labeling light-emitting sequence with the predicted light-emitting sequence to obtain the light sequence loss information;

determining a sample feature difference value between a sample image feature of each facial image frame sample and the corresponding predicted sample image feature to obtain the feature loss information; and

and fusing the optical sequence loss information and the characteristic loss information to obtain target verification loss information of the user sample.

18. A detection system for countering an attack, comprising:

at least one storage medium storing at least one set of instructions for counter attack detection; and

at least one processor communicatively coupled to the at least one storage medium,

wherein, when the counter attack detection system is running, the at least one processor reads the at least one instruction set and performs the counter attack detection method of any one of claims 1-17 according to the indication of the at least one instruction set.