CN116246356A

CN116246356A - Living body detection method and system

Info

Publication number: CN116246356A
Application number: CN202310182082.5A
Authority: CN
Inventors: 曹佳炯
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-02-16
Filing date: 2023-02-16
Publication date: 2023-06-09

Abstract

According to the living body detection method and system, a face image of a target user is obtained, a first target attack probability corresponding to the face image is output through a one-stage living body detection model based on the face image, and a target scheme is determined and executed based on the first target attack probability, wherein the target scheme is one of multiple schemes. The plurality of protocols includes a two-stage protocol: and obtaining a palm vein image of the target user, and inputting at least the palm vein image into a two-stage living body detection model to obtain a target detection result of the target user. That is, whether the living body and the attack can be judged is determined based on the first target attack probability obtained by the living body detection of the target face image, and when the living body and the attack cannot be judged, the target palm vein image of the target user is obtained, and further the two-stage detection is performed based on the target palm vein image, so that the safety of the living body detection is improved.

Description

Living body detection method and system

Technical Field

The present disclosure relates to the field of image processing, and in particular, to a living body detection method and system.

Background

Face recognition systems have been widely used in recent years, and have been successful in such situations as face payment, face entry, face attendance, etc. Living body detection is an important ring of a face recognition system, and is usually a silent living body detection method, the method is naturally completed in the face recognition process, a user does not need to conduct extra verification behaviors, and user experience is good. However, it is very difficult to apply the method to a scene with high security requirements due to poor security capability caused by lack of additional verification. Therefore, a high-safety living body detection method is required.

Disclosure of Invention

According to the living body detection method and system, the target palm vein image of the target user is obtained based on the first target attack probability of living body detection of the target user, and further two-stage detection is performed based on the target palm vein image, namely, the living body detection is assisted by means of additional verification behaviors, namely, the palm vein image of two stages is combined, so that the living body detection safety is improved.

In a first aspect, the present specification provides a living body detection method, comprising: obtaining a target face image of a target user;

outputting a first target attack probability corresponding to the target face image through a one-stage living body detection model based on the target face image; and determining and executing a target scheme based on the first target attack probability, the target scheme being one of a plurality of schemes, the plurality of schemes including a two-stage scheme, comprising: obtaining a target palm vein image of the target user, and inputting at least the target palm vein image into a two-stage living body detection model to obtain a target detection result of the target user.

In some embodiments, outputting, based on the target face image, a first target attack probability corresponding to the target face image through a one-stage living body detection model includes: performing unstructured operation on the target face image to obtain a target unstructured image, wherein the unstructured operation comprises the step of removing structural information of the target face image, and the structural information of the target face image comprises five-sense organ distribution information and/or face contour information; and inputting the target unstructured image into the one-stage living body detection model to carry out living body detection, and outputting the first target attack probability.

In some embodiments, the performing a unstructured operation on the target face image to obtain a target unstructured image includes: dividing the target face image into a plurality of target image blocks; and rearranging the positions of the plurality of target image blocks in the target face image based on a preset rule to obtain the target unstructured image.

In some embodiments, the preset rules include at least one of: randomly arranging, adjusting row positions of the plurality of tiles, adjusting column positions of the plurality of tiles, and adjusting the row positions and the column positions of the plurality of tiles.

In some embodiments, the one-phase living detection model includes: the block feature extraction module is configured to perform feature extraction on a plurality of image blocks contained in the unstructured image subjected to the unstructured operation, and output a plurality of block features corresponding to the plurality of image blocks; and the feature fusion module is configured to fuse the plurality of block features and output a first fusion feature and a first attack probability.

In some embodiments, the constraint objective of the one-phase living detection model when trained includes a first loss less than a first preset loss value, the first loss including: and fusing the classification loss, and constraining the difference between the predicted value and the true value corresponding to the first attack probability.

In some embodiments, the block feature extraction module further outputs a plurality of block attack probabilities corresponding to the plurality of block features, and the first penalty further includes at least one of: a local classification penalty, the local classification penalty being derived by weighted summation based on a plurality of block classification penalties corresponding to the plurality of block attack probabilities, each of the plurality of local classification penalties configured to constrain a difference between a predicted value and a true value corresponding to its corresponding block attack probability; a local and global consistency penalty configured to constrain consistency between the plurality of block features and the first fusion feature; and a block feature consistency penalty configured to constrain consistency between the plurality of block features.

In some embodiments, before the determining and executing a target solution based on the first target attack probability, the method further comprises: and generating a target judgment threshold corresponding to the target face image through a threshold generation model based on the target face image.

In some embodiments, the threshold generation model comprises: the first face feature coding module is configured to perform feature extraction on the face image and output first face features; the palm vein feature generation module is configured to generate corresponding cross-mode palm vein features based on the first face features and output the cross-mode palm vein features; and a threshold regression module configured to generate a judgment threshold corresponding to the face image based on the first face feature and the cross-modality palm vein feature.

In some embodiments, the constraint objective of the threshold generation model when trained includes that the second penalty is less than a second preset penalty value, the second penalty comprising: and a threshold regression loss configured to constrain a difference between the judgment threshold and a true threshold corresponding to the face image.

In some embodiments, the second loss further comprises: a palmar venous return loss configured to constrain a difference between the cross-modal palmar venous feature and a real palmar venous feature, the difference between the cross-modal palmar venous feature and the real palmar venous feature comprising a distance between the cross-modal palmar venous feature and the real palmar venous feature.

In some embodiments, the target determination threshold includes a first target threshold and a second target threshold, the first target threshold being less than the second target threshold.

In some embodiments, the determining and executing a target scheme based on the first target attack probability comprises: and determining that the first target attack probability meets the two-stage threshold condition, determining that the target scheme is the two-stage scheme and executing, wherein the two-stage threshold condition comprises a position between the first target threshold and the second target threshold.

In some embodiments, the plurality of aspects further comprises: the first scheme is that the target user is output as a living body; and a second scheme, outputting the target user as attack.

In some embodiments, the determining and executing a target scheme based on the first target attack probability comprises: determining that the first target attack probability is smaller than the first target threshold, determining that the target scheme is the first scheme and executing the first scheme; or determining that the first target attack probability is larger than the second target threshold, determining that the target scheme is the second scheme and executing.

In some embodiments, the target determination threshold further includes a third target threshold, and the obtaining the target detection result of the target user includes: determining a second target attack probability corresponding to the target palm vein image output by the two-stage living body detection model; and determining the target detection result based on the second target attack probability, including: and determining that the second target attack probability is smaller than the third target threshold, determining that the target detection result is a living body, or determining that the second target attack probability is larger than the third target threshold, and determining that the target detection result is an attack.

In some embodiments, the inputting at least the target palm vein image into a two-stage living body detection model comprises: and inputting the target face image and the target palm vein image into a two-stage living body detection model, and outputting a second target fusion attack probability, wherein the second target fusion attack probability comprises the second target fusion attack probability.

In some embodiments, the two-stage biopsy model includes: the second face feature coding module is configured to perform feature extraction on the face image and output second face features; the first palm vein feature encoding module is configured to perform feature extraction on the palm vein image and output a first palm vein feature; and a fusion decision module configured to fuse the second face feature and the first palm vein feature and output a second fusion attack probability.

In some embodiments, the fusion decision module is further configured to: and outputting a consistency judgment result, wherein the consistency judgment result represents the consistency degree of the face image and the palm vein image belonging to the same user, and the consistency judgment result comprises the distance between the second face feature and the first palm vein feature.

In some embodiments, the outputting the second fusion attack probability comprises: and outputting the second fusion attack probability by referring to the consistency judgment result, wherein the consistency judgment result is in direct proportion to the second attack fusion probability.

In some embodiments, the constraint objective of the two-stage living detection model when trained includes a third loss less than a third preset loss value, the third loss including: the living body detection loss is configured to constrain a difference between a predicted value and a true value of the second fusion attack probability.

In some embodiments, the third penalty further comprises: a face palm vein consistency penalty configured to constrain a difference between the second face feature and the first palm vein feature, the difference between the second face feature and the first palm vein feature comprising a distance between the second face feature and the first palm vein feature.

In some embodiments, the inputting at least the palm vein image into a two-stage living body detection model comprises: and inputting the target palm vein image into the two-stage living body detection model, and outputting a second target decision attack probability, wherein the second target attack probability comprises the second target decision attack probability.

In some embodiments, the two-stage biopsy model includes: the second palm vein feature encoding module is configured to perform feature extraction on the palm vein image and output second palm vein features; and the decision module is configured to make living decision according to the second palm vein characteristics and output a second decision attack probability.

In some embodiments, the obtaining the target palm vein image of the target user comprises: sending an acquisition instruction to the palm vein acquisition device; and obtaining the target palm vein image from the palm vein collection device.

In a second aspect, the present specification further provides a living body detection system including a client, the client including: the image acquisition device is configured to acquire a target face image of a target user; a palmar vein acquisition device configured to acquire a target palmar vein image of the target user; at least one storage medium storing at least one set of instructions for performing in vivo detection; and at least one processor in communication with the image acquisition device, the palm vein acquisition device, and the at least one storage medium, wherein when the biopsy system is in operation, the at least one processor reads the at least one instruction set and implements the biopsy method of the first aspect.

In a third aspect, the present specification also provides a living body detection system including a server including: at least one storage medium storing at least one set of instructions for performing in vivo detection; and at least one processor communicatively coupled to the at least one storage medium, wherein the at least one processor reads the at least one instruction set and implements the in-vivo detection method of the first aspect when the in-vivo detection system is operating.

According to the technical scheme, the living body detection method and the living body detection system provided by the specification can determine whether living body and attack judgment can be performed or not based on the first target attack probability obtained by living body detection of the target face image, and when the living body and attack judgment cannot be performed, the target palm vein image of the target user is obtained, and further two-stage detection is performed based on the target palm vein image. Therefore, the living body detection method and the living body detection system can take the palm vein image as an additional verification action beyond the face image, and can carry out auxiliary judgment of living body detection by means of the additional verification action, so that the safety of living body detection is improved. Meanwhile, compared with auxiliary judgment of action interaction and short message verification, auxiliary judgment of the palm vein is more efficient and natural, and user experience is better.

Additional functionality of the biopsy method and system provided in the present specification will be set forth in part in the description that follows. The following numbers and examples presented will be apparent to those of ordinary skill in the art in view of the description. The inventive aspects of the living being detection methods and systems provided herein may be fully explained by practicing or using the methods, devices, and combinations described in the detailed examples below.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present description, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 illustrates an application scenario diagram of a living body detection system provided according to some embodiments of the present specification;

FIG. 2 illustrates a hardware architecture diagram of a computing device provided in accordance with some embodiments of the present description;

FIG. 3 illustrates a flow chart of a method of in-vivo detection provided in accordance with some embodiments of the present description; and

Fig. 4 illustrates a schematic diagram of a unstructured operation on a target face image provided in accordance with some embodiments of the present specification.

Detailed Description

The following description is presented to enable one of ordinary skill in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the disclosure. Thus, the present description is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. For example, as used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. The terms "comprises," "comprising," "includes," and/or "including," when used in this specification, are taken to specify the presence of stated integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

These and other features of the present specification, as well as the operation and function of the related elements of structure, as well as the combination of parts and economies of manufacture, may be significantly improved upon in view of the following description. All of which form a part of this specification, reference is made to the accompanying drawings. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the description. It should also be understood that the drawings are not drawn to scale.

The flowcharts used in this specification illustrate operations implemented by systems according to some embodiments in this specification. It should be clearly understood that the operations of the flow diagrams may be implemented out of order. Rather, operations may be performed in reverse order or concurrently. Further, one or more other operations may be added to the flowchart. One or more operations may be removed from the flowchart.

Before describing the specific embodiments of the present specification, the application scenario of the present specification will be described as follows:

the living body detection method and system provided by the specification can be applied to various scenes needing living body detection, such as face-brushing payment, face-brushing attendance, face-brushing inbound, face-brushing transfer and the like. For example, in a vending machine shop, a consumer may detect whether the consumer is living or not when the consumer uses the face brushing device to make a face brushing payment. The face brushing device can be connected with an independent palm vein acquisition device, and when the palm vein image of the consumer is judged to be acquired, the palm vein acquisition device is started, so that the face image and the palm vein image of the consumer are combined for in-vivo detection. For example, a palm vein acquisition device can be arranged in a scene of attendance on duty and business trip and subway station entrance and exit, and living body detection is implemented by using the method of the specification. For another example, a malicious user may incur a loss when using a photo of another person to perform malicious transfer, so that an intelligent device such as a mobile phone may perform the method of the present specification to perform living detection, and in this process, the mobile phone may have a palm vein image acquisition function in addition to the face image acquisition function. It should be understood that the living body detection method and system provided in the present description can also be applied to other scenes, and is not limited to the above scenes.

For convenience of description, the present specification explains terms that will appear from the following description:

attack/living body attack: non-living, means of attack presented by the pointer to the recognition system, including photographs displayed on the cell phone screen, printed paper photographs, high precision masks, molds, prostheses, and the like.

Living body detection: in a face recognition system, an algorithm technology for detecting and intercepting attacks is generally used to judge whether a user is living or an attack.

Palm vein image: and acquiring vein images of the palm of the user through palm vein acquisition images. The palmar vein refers to the vein system in the palm, which is invisible through the skin of the palm.

Fig. 1 illustrates an application scenario diagram of a living body detection system 001 provided according to some embodiments of the present specification. As shown in fig. 1, system 001 may include target user 100, client 200, server 300, and network 400.

The target user 100 may be any user who performs living body detection using the client 200.

The client 200 may include an image capture device 210. The image capturing device 210 is configured to capture a target portion of the target user 100, where the target portion has physiological characteristics of the target user, and the target portion is a face, a fingerprint, an iris, a retina, a hand shape, and the like, so as to capture a biological image of the target portion, such as a face image, a fingerprint image, an iris image, a retina image, a hand shape image, and the like. The image capturing device 210 may also capture behavior characteristics of the target user 100, and capture behavior information, such as behavior characteristics, gait characteristics, and the like. The image capturing device may be a two-dimensional image capturing device (such as an RGB camera), or a combination of a two-dimensional image capturing device (such as an RGB camera) and a depth image capturing device (such as a 3D structured light camera, a laser detector, etc.).

The client 200 may include a palm vein collection device 220. The palm vein capture device 220 is configured to capture a target palm vein image of a target user. For example, the palm vein collection unit 220 may emit near infrared rays to illuminate the palm of the human body and sense light reflected by the palm, so as to obtain a clear image of vein lines, i.e., a target palm vein image. In some embodiments, as shown in fig. 1, the palm vein capture device 220 and the image capture device 210 may be integrated together on the body structure of the client 200. In some embodiments, the image capturing device 210 and the palm vein capturing device 220 may also be separately disposed with respect to the main structure of the client 200, and disposed on different devices, that is, be external to the main structure of the client 200, and be in communication connection with the main structure of the client 200 through a wired or wireless manner, so as to send the captured target face image and the target palm vein image to the main structure of the client 200 through a wired or wireless manner. For example, the palm vein collection device 220 is connected to the main structure of the client 200 through a data line, and the palm vein collection device 220 sends the collected target palm vein image to the main structure of the client 200 through the data line. The palm vein capture device 220 may be replaced by a palm print capture device (not shown in the figure), that is, the present specification may implement in vivo detection through a face image and a palm print image. The client 200 may also include a palmar vein collection device 220 and a palmar print collection device at the same time, that is, the present specification may also implement living body detection through a face image, a palmar vein image, and a palmar print image. For convenience of description, the present specification will hereinafter describe a living body detection method by taking a face image and a palm vein image as examples.

In some embodiments, the in-vivo detection method may be performed on the client 200. At this time, the client 200 may store data or instructions to perform the living body detection method described in the present specification, and may execute or be used to execute the data or instructions. In some embodiments, the client 200 may include a hardware device having a data information processing function and a program necessary to drive the hardware device to operate. As shown in fig. 1, a client 200 may be communicatively connected to a server 300. In some embodiments, the server 300 may be communicatively coupled to a plurality of clients 200. In some embodiments, the client 200 may interact with the server 300 over the network 400 to receive or send messages or the like, such as facial images, palm vein images, or various feature information or the like.

In some embodiments, the client 200 may include a mobile device, tablet, notebook, built-in device for a motor vehicle or the like, dragonfly device for a payroll, vending machine, sales counter, or any combination thereof. In some embodiments, the mobile device may include a smart home device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home device may include a smart television, a desktop computer, or the like, or any combination. In some embodiments, the smart mobile device may include a smart phone, personal digital assistant, gaming device, navigation device, etc., or any combination thereof. In some embodiments, the virtual reality device or augmented reality device may include a virtual reality helmet, virtual reality glasses, virtual reality patch, augmented reality helmet, augmented reality glasses, augmented reality patch, or the like, or any combination thereof. For example, the virtual reality device or the augmented reality device may include google glass, head mounted display, VR, or the like. In some embodiments, the built-in devices in the motor vehicle may include an on-board computer, an on-board television, and the like. In some embodiments, the client 200 may be a device with positioning technology for locating the position of the client 200. In some embodiments, client 200 may be installed with one or more Applications (APP). The APP can provide the target user 110 with the ability to interact with the outside world via the network 400 as well as an interface. The APP includes, but is not limited to: web browser-like APP programs, search-like APP programs, chat-like APP programs, shopping-like APP programs, video-like APP programs, financial-like APP programs, instant messaging tools, mailbox clients, social platform software, and the like. In some embodiments, the client 200 may have a target APP installed thereon. The target APP is able to acquire a biometric image for the client 200. In some embodiments, the target APP is also capable of identifying biological images. The target user 100 may trigger a liveness detection request through the target APP. The target APP may perform a living body detection method in response to the living body detection request.

The server 300 may be a server providing various services, such as a background server providing support for pages displayed on the client 200. In some embodiments, the in-vivo detection method may be performed on the server 300. For example, the server 300 acquires a target face image and a target palm vein image of a target user from the client 200, thereby performing the living body detection method. At this time, the server 300 may store data or instructions to perform the living body detection method described in the present specification, and may execute or be used to execute the data or instructions. In some embodiments, the server 300 may include a hardware device having a data information processing function and a program necessary to drive the hardware device to operate. The server 300 may be communicatively connected to a plurality of clients 200 and receive data transmitted from the clients 200.

The network 400 is a medium used to provide communication connections between the client 200 and the server 300. The network 400 may facilitate the exchange of information or data. As shown in fig. 1, the client 200 and the server 300 may be connected to a network 400 and transmit information or data to each other through the network 400. In some embodiments, the network 400 may be any type of wired or wireless network, or a combination thereof. For example, network 400 may include a cable network, a wired network, a fiber optic network, a telecommunications network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Public Switched Telephone Network (PSTN), a bluetooth network, a ZigBee network, a Near Field Communication (NFC) network, or the like. In some embodiments, network 400 may include one or more network access points. For example, the network 400 may include a wired or wireless network access point, such as a base station or an internet switching point, through which one or more components of the client 200 and server 300 may connect to the network 400 to exchange data or information.

It should be understood that the number of clients 200, servers 300, and networks 400 in fig. 1 are merely illustrative. There may be any number of clients 200, servers 300, and networks 400, as desired for implementation.

It should be noted that the living body detection method may be performed entirely on the client 200, entirely on the server 300, or partially on the client 200 and partially on the server 300.

Fig. 2 illustrates a hardware architecture diagram of a computing device 600 provided in accordance with some embodiments of the present description. The computing device 600 may perform the in-vivo detection method described herein. The living body detection method is described in other parts of the specification. When the in-vivo detection method is performed on the client 200, the computing device 600 may be the client 200 or a part of the client 200. When the in-vivo detection method is performed on the server 300, the computing device 600 may be the server 300 or a part of the server 300. When the in-vivo detection method may be partially performed on the client 200 and partially performed on the server 300, the computing device 600 may be part of the client 200 and the server 300 or the client 200 and the server 300.

As shown in fig. 2, computing device 600 may include at least one storage medium 630 and at least one processor 620. In some embodiments, computing device 600 may also include a communication port 650 and an internal communication bus 610. Meanwhile, computing device 600 may also include I/O component 660.

Internal communication bus 610 may connect the various system components including storage medium 630, processor 620, and communication ports 650.

I/O component 660 supports input/output between computing device 600 and other components. The other components may include an image acquisition device 210 and a palmar venous acquisition device 220.

The communication port 650 is used for data communication between the computing device 600 and the outside world, for example, the communication port 650 may be used for data communication between the computing device 600 and the network 400. The communication port 650 may be a wired communication port or a wireless communication port.

The storage medium 630 may include a data storage device. The data storage device may be a non-transitory storage medium or a transitory storage medium. For example, the data storage devices may include one or more of magnetic disk 632, read Only Memory (ROM) 634, or Random Access Memory (RAM) 636. The storage medium 630 may store at least one set of instructions for performing living detection. The instructions are computer program code that may include programs, routines, objects, components, data structures, procedures, modules, etc. that perform the methods of in-vivo detection provided herein. The storage medium 630 may also store a biopsy model for implementing a biopsy method, such as a one-stage biopsy model, a two-stage biopsy model, etc. At this point, the model may be one or more sets of instructions stored in the storage medium 630 that execute the corresponding instructions and are executed by the processor 620 in the computing device 600. Of course, the model may also be part of a circuit, hardware device, or module in computing device 600. For example, a one-phase biopsy model may be a hardware device/module in computing device 600 that implements one-phase biopsy, a two-phase biopsy model may be a hardware device/module in computing device 600 that implements two-phase biopsy, and so on. At this point, processor 620 may have stored therein at least one set of instructions or instruction sets for controlling the model.

The at least one processor 620 may be communicatively coupled with at least one storage medium 630 and a communication port 650 via an internal communication bus 610. The at least one processor 620 is configured to execute the at least one instruction set. When the computing device 600 is running, the at least one processor 620 may read the living body identification model, the alignment model, the living body verification model, and the at least one instruction set, and perform the living body detection method provided herein according to an instruction of the at least one instruction set. The processor 620 may perform all the steps involved in the in vivo detection method. The processor 620 may be in the form of one or more processors, and in some embodiments, the processor 620 may include one or more hardware processors, such as microcontrollers, microprocessors, reduced Instruction Set Computers (RISC), application Specific Integrated Circuits (ASICs), application specific instruction set processors (ASIPs), central Processing Units (CPUs), graphics Processing Units (GPUs), physical Processing Units (PPUs), microcontroller units, digital Signal Processors (DSPs), field Programmable Gate Arrays (FPGAs), advanced RISC Machines (ARM), programmable Logic Devices (PLDs), any circuit or processor capable of executing one or more functions, or the like, or any combination thereof. For illustrative purposes only, only one processor 620 is depicted in the computing device 600 in this specification. It should be noted, however, that computing device 600 may also include multiple processors, and thus, operations and/or method steps disclosed in this specification may be performed by one processor as described herein, or may be performed jointly by multiple processors. For example, if the processor 620 of the computing device 600 performs steps a and B in this specification, it should be understood that steps a and B may also be performed by two different processors 620 in combination or separately (e.g., a first processor performs step a, a second processor performs step B, or the first and second processors perform steps a and B together).

Fig. 3 shows a flowchart of a living body detection method P100 provided according to some embodiments of the present specification. As previously described, the computing device 600 may perform the in-vivo detection method P100 described herein. Specifically, the processor 620 may read an instruction set stored in its local storage medium and then execute the living body detection method P100 described in the present specification according to the specification of the instruction set. As shown in fig. 3, the method P100 may include:

s110: a target face image of the target user 100 is obtained.

When the target user 100 is located in front of the client 200, the image capturing device 210 on the client 200 may capture the target face image of the target user 100, and send the captured face image to the processor 620. The target user 100 may be a real user, or may be a paper photo, a mobile phone screen, a mask, or the like. The target face image may include a face region of the target user 100, and may further include a background region.

S130: and outputting a first target attack probability corresponding to the target face image through a one-stage living body detection model based on the target face image.

In some embodiments, the processor 620 may pre-process the target face image to extract a target face region and treat the target face region as a target face image for subsequent image processing. In some embodiments, the processor 620 may perform a deconstructing operation on the target face image to obtain a target deconstructed image, and input the target deconstructed image into a one-stage living body detection model to perform living body detection, so as to output a first target attack probability corresponding to the target face image. In some embodiments, the processor 620 may not perform the unstructured operation on the target face image, but directly input the target face image into the one-stage living body detection model for living body detection, thereby outputting the first target attack probability.

The unstructured operation comprises the step of removing structural information of the target face image. The structure information of the target face image comprises facial feature distribution information and/or face contour information. The distribution information of the five sense organs is the relative position between the five sense organs, such as the eyes, nose and mouth. The structural information of a face generally belongs to information common to living bodies and attacks, for example, the distribution of five sense organs in a live image is the same as that in an attack image for the same user, and therefore, the structural information is useless information or information amount is small for living body detection, i.e., discrimination between living bodies and attacks. If the unstructured operation is not performed, the living body detection model learns the structure information, thereby causing resource waste. Therefore, in order to avoid that the living body detection model is excessively focused on the face structure information, the description disturbs the facial feature distribution information and the face contour information, so that the facial feature distribution information and the face contour information have no rule, namely have no unified rule, namely, the structure information in the target face image is erased through the unstructured operation. Thus, the burden of model execution is reduced, the time of data processing by the processor 620 is saved, and the efficiency of living body detection is improved.

In living body detection, texture information and/or color information capable of effectively distinguishing living bodies from attacks are of greater concern. Such as skin texture, moire, fibrous texture, smooth texture, etc. If the attack is an electronic screen photo, the image capturing device 210 captures the electronic screen photo, and then moire will appear in the obtained face image, but no moire will appear in the face image captured by a real person. If the attack is a paper photo, the image capturing device 210 captures the paper photo, and then the obtained face image will have fibrous stripes, but the face image captured by a real person will not have fibrous stripes. If the attack is a silica gel mask, the image capturing device 210 captures the silica gel mask, and then the texture in the obtained face image is smoother, and there is no skin texture of human skin, and there is normal skin texture in the face image captured by a real person. Such as flesh color, highlight dots, black and white. If the attack is an electronic screen photo, when the image capturing device 210 captures the electronic screen photo, an abnormal emission area exists on the electronic screen of the mirror surface, so that some high spots appear in the obtained face image, and no high spots appear in the face image captured by the real person. If the attack is a paper photo, the image capturing device 210 captures the paper photo, and then the obtained face image is generally black and white, and the face image captured by a real person is generally flesh color.

For the unstructured operation, the processor 620 may divide the target face image into a plurality of target image blocks, and rearrange the positions of the plurality of image blocks in the target face image based on a preset rule, thereby obtaining a target unstructured image. The division may be an average division or a random division. The preset rule may be a random arrangement, adjustment of row positions of a plurality of image blocks, adjustment of column positions of a plurality of image blocks, adjustment of row positions and column positions of an image block, and the like. For example, the image blocks of the same line as a whole are subjected to a downshifting or upshifting operation (such as a whole downshifting one line or two lines), or the line positions are randomly disturbed. For example, the image blocks of the same column as a whole are subjected to a left shift or right shift operation (such as a left shift or a right shift of one column or two columns as a whole), or the column positions are randomly disturbed. Fig. 4 illustrates a schematic diagram of a unstructured operation on a target face image provided in accordance with some embodiments of the present specification. As shown in fig. 4, the processor 620 divides a 300×300 target face image into 6*6 =36 target image blocks (patches) of 50×50 on average, and then adjusts the row positions of the patches to scramble the positions of the patches, and reconstructs an image, i.e., a target unstructured image. The processor 620 may also perform the unstructured operation by adding markers to the target face image that cause the model to ignore structural information, adding noise to the target face image to obscure structural information, and so on. The present specification examples are not limited thereto.

The one-stage living detection model may include a block feature extraction module and a feature fusion module if a unstructured operation is performed. The block feature extraction module is used for extracting features of a plurality of image blocks contained in the unstructured image and outputting a plurality of block features corresponding to the image blocks. The feature fusion module is used for fusing a plurality of block features and outputting a first fusion feature and a first attack probability. The model structures of the block feature extraction module are ResNet18, denseNet and the like. The model structure of the feature fusion module is, for example, a transducer.

The training samples of the one-stage living body detection model when training can be a plurality of unstructured image samples subjected to an unstructured operation. The one-stage living body detection model needs to label the training sample during training so as to label the true value of the training sample. Labeling the training samples may include whether the training samples are living or non-living (attacks). When the processor 620 is training the one-stage living body detection model, the unstructured image includes unstructured image samples. The processor 620 may input the unstructured image samples as training samples to a one-stage living body detection model and output a predicted value of the first attack probability corresponding to the unstructured image samples.

The constraint target of the one-stage living body detection model during training is that the first loss is smaller than a first preset loss value. In some embodiments, the first penalty may include a fusion classification penalty. In some embodiments, the first penalty may further include at least one of the following: local classification loss (patch classification loss), local and global consistency loss, and block feature consistency loss (patch feature consistency loss).

The fusion classification loss can restrict the difference between the predicted value and the true value corresponding to the first attack probability. For example, the predicted value obtained from the first attack probability is a living body, and the true value is an attack, and the processor 620 calculates a difference between the predicted value and the true value through a loss function, which may be a cross entropy loss function, a center loss function, or the like. The result prediction of the one-stage living body detection model can be gradually close to a true value by training the one-stage living body detection model through fusion classification loss, so that the prediction accuracy of the one-stage living body detection model is improved.

For the local classification loss, the block feature extraction module may further output a plurality of block attack probabilities corresponding to a plurality of block features during training, and the processor 620 may calculate a difference between a predicted value and a true value corresponding to each block attack probability to obtain a block classification loss, thereby obtaining a plurality of block classification losses corresponding to a plurality of block attack probabilities. Processor 620 may then weight sum the plurality of block classification losses to obtain the local classification loss. In some embodiments, the weighted summation may be an averaging of a plurality of block classification losses. The local classification loss is used for constraining the difference between the predicted value and the true value corresponding to each block attack probability. The result prediction of the block feature extraction module is gradually close to a true value through training the one-stage living body detection model by local classification loss, so that the block features extracted by the block feature extraction module are more and more accurate, and the prediction accuracy of the one-stage living body detection model is further improved.

The block feature consistency penalty is used to constrain consistency between multiple block features by which the consistency of their predicted outcomes can be constrained. That is, by constraining the block feature consistency loss, it is possible to make a plurality of block features approach (e.g., all approach their average features) and learn to consistently represent whether the user is living or attacking, avoiding mutual collision between prediction results corresponding to the plurality of block features. The local and global consistency loss is used for restraining consistency between each block feature and the first fusion feature in the plurality of block features, so that each block feature is consistent with the first fusion feature, and mutual conflict between a prediction result corresponding to the block feature and a prediction result corresponding to the first fusion feature is avoided. By block feature consistency loss and local and global consistency loss, the results of a one-stage living body detection model based on different feature prediction consistency can be trained.

Taking the example that the first loss includes the four losses described above, the processor 620 may calculate the first loss by a loss function of equation one as follows:

equation one: loss (Low Density) _total1 ＝Loss _cls +Loss _patch +Loss _local-global

Wherein, loss _total1 For the first Loss, loss _cls To classify losses, loss _cls Including fusion classification Loss and partial class Loss, loss _patch Loss of consistency for block features, loss of consistency _local-global For local and global consistency loss.

Processor 620 may iteratively train the one-stage biopsy model based on the model structure and constraint targets until the one-stage biopsy model converges, resulting in a trained one-stage biopsy model.

The staff can deploy the one-stage living body detection model which completes training into an actual application scene. In the practical application of living body detection, the processor 620 may input the target unstructured image corresponding to the target user 100 into a one-stage living body detection model with training completed, to obtain the first target attack probability. Specifically, the processor 620 may input the target unstructured image into the block feature module for feature extraction, and output a plurality of target block features corresponding to the target unstructured image. And then inputting the plurality of target block features into a first fusion module for fusion, and outputting first target fusion features and first target attack probability (P1) corresponding to the plurality of target block features. It should be noted that, when the processor 620 uses the one-stage living detection model to perform living detection on the target face image of the target user 100, the unstructured image includes a target unstructured image, and the first attack probability includes a first target attack probability.

The probability that the one-stage living body detection model outputs in the training stage is referred to as a first attack probability, and the probability that the one-stage living body detection model outputs to the target user 100 in the application stage is referred to as a first target attack probability.

S150: and generating a target judgment threshold corresponding to the target face image through a threshold generation model based on the target face image.

The threshold generation model comprises a first face feature coding module, a palm vein feature generation module and a threshold regression module. The first face feature encoding module is used for extracting features of the face image and outputting first face features. The palm vein feature generation module is used for generating corresponding cross-mode palm vein features based on the first face features and outputting the cross-mode palm vein features. The threshold regression module is used for generating a judgment threshold corresponding to the face image based on the first face feature and the cross-mode palm vein feature. The model structure of the first face feature encoding module is, for example, resNet18, denseNet, etc. The model structure of the palmar vein feature generation module is, for example, a perceptron, such as an MLP (multi-level processor) or the like. The threshold regression module may be a TRM, a Bayesian threshold autoregressive model, or the like. The judgment threshold value may include at least one of the following: a first threshold, a second threshold, and a third threshold. In some embodiments, the threshold generation model may further include other modules, such as a first face feature encoding module and a threshold regression module, which are not limited in this specification.

The training samples of the threshold generation model during training may be a plurality of face image samples. The threshold generation model is used for marking training samples during training so as to mark the true values of the training samples. The labeling of training samples may be the most appropriate threshold for manual computation. When the processor 620 is training the threshold generation model, the face image includes a face image sample. The processor 620 may input the face image sample as a training sample to the threshold generation model, and output a predicted value of the judgment threshold corresponding to the face image sample.

The constraint target of the threshold generation model during training comprises that the second loss is smaller than a second preset loss value. In some embodiments, the second loss is a threshold regression loss. In some embodiments, the second loss comprises a threshold regression loss and a palmar venous regression loss. The threshold regression loss is used for restraining the difference between the judging threshold and the real threshold corresponding to the face image, the real threshold is the most suitable threshold calculated manually, and the living body detection accuracy under the most suitable threshold is higher than the living body detection accuracy under other thresholds. The actual threshold values corresponding to different face image samples may be different, so that the most suitable threshold values generated for different target users may be different when the trained threshold value generation model performs live detection on different target users. The threshold value generation model is trained through threshold value regression loss, so that the result prediction of the threshold value generation model on the judgment threshold value gradually approaches to the real threshold value, and the prediction accuracy of the threshold value generation model is improved. The palm vein regression loss is used to constrain the difference between the cross-modal palm vein feature and the real palm vein feature, which can be measured in terms of distance (e.g., L1 distance, L2 distance, etc.), similarity (e.g., pearson correlation coefficient, jaccard similarity coefficient), etc. By training the threshold generation model through palm vein regression loss, the cross-modal palm vein features generated by the palm vein feature generation module can be gradually close to the real palm vein features, so that the prediction accuracy of the threshold generation model is improved.

Taking the example where the second loss includes a threshold regression loss and a palm vein regression loss, processor 620 may calculate the second loss by a loss function of equation two as follows:

formula II: loss (Low Density) _total2 ＝Loss _feat +Loss _thre

Wherein, loss _total2 For the second Loss, loss _feat Loss of metacarpal vein regression, loss _thre Is a threshold regression loss.

Processor 620 may iteratively train the thresholding model based on the model structure and constraint targets described above until the thresholding model converges, resulting in a thresholding model that completes the training.

The staff can deploy the threshold generation model which completes training into the actual application scene. In the practical application of living body detection, the processor 620 may input the target face image of the target user 100 into the trained threshold generation model, and output the target judgment threshold generated for the target face image, which is a threshold customized for the target face image. The target determination threshold may include at least one of: a first target threshold (T1), a second target threshold (T2) and a third target threshold (T3). It should be noted that, when the processor 620 uses the threshold generation model to perform living detection on the target face image of the target user 100, the face image includes the target face image, and the judgment threshold includes the target judgment threshold.

If two-stage palm vein vital detection is performed for all users, the user experience may be poor. To be able to achieve a better efficiency-experience tradeoff, the present specification controls the proportion of entering a two-stage palmprint biopsy by training a thresholding model to generate the most appropriate threshold for the target face image. The judgment threshold value generated in the present specification is a customized threshold value for the target face image, and the accuracy of the living body detection is higher when the target face image is detected in the living body according to the customized threshold value than when the target face image is detected in the living body according to other non-customized threshold values. Of course, in some embodiments, the threshold generation model may generate a general judgment threshold (shared threshold), and use the shared threshold as a threshold for all users performing living body detection to enter two-stage living body detection, so as to improve the efficiency of living body detection.

In other embodiments, the target determination threshold may not be generated by the threshold generation model, and the target determination threshold may be a preset threshold.

The order of executing S130 and S150 is not limited in this specification, that is, the processor 620 may execute S130 first, then execute S150, execute S150 first, then execute S130, and execute S130 and S150 simultaneously.

S170: a target scheme is determined and executed based on the first target attack probability.

The processor 620 may set a two-stage threshold condition for controlling the proportion of the two-stage in-vivo detection entered based on the target determination threshold. The processor 620 may determine and execute the target scheme by comparing the first target attack probability (P1) to a two-stage threshold condition. In some embodiments, the two-phase threshold condition is between a first target threshold (T1) and a second target threshold (T2), where the target judgment threshold includes T1 and T2, T1 being less than T2. Of course, the two-stage threshold condition may also be other conditions, such as being greater than a first target threshold (T1) or less than a second target threshold (T2).

The present specification includes a plurality of aspects. The plurality of schemes includes a two-stage scheme, which is implemented through steps S171 and S173. The plurality of schemes includes a first scheme (implemented by step S175) and a second scheme (implemented by step S177) which may also be included. The target scenario is one of the multiple scenarios that is executed by processor 620.

The processor 620 may determine and execute a target scenario among the plurality of scenarios by comparing P1 to a two-stage threshold condition. Specifically, the processor 620 may determine whether P1 satisfies the two-stage threshold condition. If so, the target scheme is determined to be a two-stage scheme and executed. For example, when P1 is located at [ T1, T2], a two-stage scheme is performed. If not, determining that the target scheme is a scheme other than the two-stage scheme and executing, for example, if P1 is smaller than T1, executing a first scheme including step S175; if P1 is greater than T2, a second scheme is performed, the second scheme including step S177.

S171: a target palm vein image of the target user 100 is obtained.

In some embodiments, the processor 620 may send an acquisition instruction to the palm vein acquisition device 220 instructing the palm vein acquisition device 220 to acquire a target palm vein image of the target user 100 and send the target palm vein image to the processor 620. In some embodiments, the palm vein capture device 220 may detect a one-stage living body detection result of the target face image in real time, for example, detect whether P1 is located in [ T1, T2], and automatically capture the target palm vein image and send to the processor 620 when detecting that P1 is located in [ T1, T2 ].

S173: and at least inputting the target palm vein image into a two-stage living body detection model to obtain a target detection result of the target user 100.

In some embodiments, the processor 620 may obtain the target detection result from a target face image and a target palm vein image of the target user 100. The two-stage living body detection model can comprise a second face feature encoding module, a first palm vein feature encoding module and a fusion decision module. The second face feature encoding module is used for carrying out feature extraction on the face image and outputting second face features. The first palm vein feature encoding module is used for carrying out feature extraction on the palm vein image and outputting a first palm vein feature. And the fusion decision module is used for fusing the second face feature and the first palm vein feature and outputting a second fusion attack probability. The model structure of the second face feature encoding module is, for example, resNet18, denseNet, etc. The first palm vein feature encoding module is, for example, BPFNet or the like. The fusion decision module is, for example, a transducer.

The fusion decision module can also be used for outputting a consistency judgment result. The consistency determination result may be, for example, a distance d between the second face feature and the first palm vein feature, and may, of course, also be a correlation coefficient between the second face feature and the first palm vein feature. The consistency judgment result can represent the consistency degree of the face image and the palm vein image belonging to the same user. For example, if the face image and the palmar vein image belong to the same user, the features of the face image and the palmar vein image are close, and d is smaller, the degree of consistency is higher. If the face image and the palm vein image do not belong to the same user, the features of the face image and the palm vein image are far away, d is larger, and the degree of consistency is lower.

When the fusion decision module outputs the second fusion attack probability and the consistency judgment result, two results are generated on the same feature (the fusion feature obtained by fusing the second face feature and the first palm vein feature), and the two results are mutually dependent and mutually promoted. For example, the consistency judgment result is in direct proportion to the second attack fusion probability. When d is larger, the possibility that the face image and the palm vein image do not belong to the same user is higher, the possibility of attack is higher, and the second attack fusion probability is higher; when d is smaller, the possibility that the face image and the palm vein image belong to the same user is higher, the possibility of attack is smaller, and the second attack fusion probability is smaller. Therefore, the fusion decision module can refer to the consistency judgment result and output the second fusion attack probability. For example, the consistency judgment result is weighted and fused into the feature vector corresponding to the fusion feature, and the second fusion attack probability is calculated based on the weighted and fused feature vector.

The training samples of the two-stage living body detection model during training can be a human face image sample and a palm vein image sample. The two-stage living body detection model needs to label the training sample during training so as to label the true value of the training sample. Labeling the training samples may include whether the training samples are living or non-living (attacks). When the processor 620 is training the two-stage in vivo detection model, the face image includes a face image sample and the palm vein image includes a palm vein image sample. The processor 620 may input the face image sample and the palmar vein image sample as training samples to the two-stage living body detection model, and output a predicted value of the second fusion attack probability corresponding to the face image sample and the palmar vein image sample.

The constraint target of the two-stage living body detection model during training comprises that a third loss is smaller than a third preset loss value, and the third loss comprises living body detection loss and/or human face palm vein consistency loss. The living detection loss is used to constrain the difference between the predicted value and the true value of the second fusion attack probability. For example, the predicted value obtained from the second fusion attack probability is a living body, and the true value is an attack, and the processor 620 calculates a difference between the predicted value and the true value through a loss function, which may be a cross entropy loss function, a center loss function, or the like. The two-stage living body detection model is trained through living body detection loss, so that the result prediction of the two-stage living body detection model gradually approaches to a true value, and the prediction accuracy of the two-stage living body detection model is improved. The loss of facial palm vein consistency is used to constrain the difference between the second facial feature and the first palm vein feature, which can be measured by distance (e.g., L1 distance, L2 distance, etc.), by similarity (e.g., pearson correlation coefficient, jaccard similarity coefficient), etc. When the third loss includes a living detection loss and a loss of facial palm vein consistency, a sum of the living detection loss and the loss of facial palm vein consistency may be taken as the third loss. Training of a two-stage living body detection model is carried out through the consistency loss of the palm veins of the human face, verification is carried out mainly from the angle of the identity of the user, and other users are prevented from using the palm veins to carry out counterfeit living body detection.

The processor 620 may train the two-stage biopsy model through the above-described model structure and third penalty until the two-stage biopsy model converges, and the two-stage biopsy model completes the training.

The staff can deploy the two-stage living body detection model which completes training into an actual application scene. In the practical application of the living body detection, the processor 620 may determine the second target attack probability P2 corresponding to the target palm vein image output by the two-stage living body detection model, and determine the target detection result based on the second target attack probability P2. As previously described, the target determination threshold may also include a third target threshold T3, and the processor 620 may compare P2 to T3 to determine a target detection result. For example, when P2 is less than T3, determining the target detection result as a living body; when P2 is larger than T3, determining that the target detection result is attack; when P2 is equal to T3, it is determined that the target detection result is a living body or an attack, that is, whether the target user 100 is a living body or an attack is not determined, and at this time, the target face image and the target palmar vein image of the target user 100 may be collected again.

In the case that the target detection result is obtained through the target face image and the target palm vein image, the processor 620 may input the target face image and the target palm vein image corresponding to the target user 100 to the two-stage living body detection model, and output a second target fusion attack probability, where the second target fusion attack probability includes the second target fusion attack probability. Specifically, the processor 620 may input the target face image to the second face feature encoding module and the target palm vein image to the first palm vein feature encoding module, so that the fusion decision module outputs the second target fusion attack probability corresponding to the target user 100. It should be noted that, when the processor 620 uses the two-stage living detection model to perform living detection on the target face image and the target metacarpal vein image of the target user 100, the face image includes the target face image, the metacarpal vein image includes the target metacarpal vein image, and the second fusion attack probability includes the second target fusion attack probability.

In some embodiments, the processor 620 may obtain the target detection result from a target palm vein image of the target user 100. The two-stage in-vivo detection model may include a second palm vein feature encoding module and a decision module. The second palm vein feature encoding module is used for carrying out feature extraction on the palm vein image and outputting second palm vein features. The decision module is used for making a living decision according to the second palm vein characteristics and outputting a second decision attack probability.

The training sample of the two-stage living body detection model when training may be a palmar vein image sample. The palm vein image includes palm vein image samples when the processor 620 is training the two-stage biopsy model. The processor 620 may input the palmar vein image sample as a training sample to the two-stage living body detection model, and output a predicted value of the second decision attack probability corresponding to the palmar vein image sample.

In the case that the target detection result is obtained through the target palm vein image, the processor 620 may input the target palm vein image corresponding to the target user 100 to the two-stage living body detection model, and output a second target decision attack probability, where the second target attack probability includes the second target decision attack probability. Specifically, the processor 620 may input the target palm vein image to the second palm vein feature encoding module, and the decision module outputs the second target decision attack probability corresponding to the target user 100. It should be noted that, when the processor 620 uses the two-stage living detection model to perform living detection on the target palm vein image of the target user 100, the palm vein image includes the target palm vein image, and the second decision attack probability includes the second target decision attack probability.

S175: the target user 100 is output as a living body.

In some embodiments, if P1 is less than T1, then the output target user 100 is a living body.

S177: outputting the target user 100 as an attack.

In some embodiments, if P1 is greater than T2, the output target user is an attack.

In summary, the living body detection method and system provided in the present specification collect palm vein images of a user by means of an additional collection device, and when the living body detection of a human face is uncertain, use two-stage palm print living body to judge living body/attack instead of simultaneously using the human face and palm vein for living body detection. The palm vein collection is more efficient and natural than action interaction and short message verification code input. Meanwhile, training a human face living body detection model based on a unstructured method, and avoiding the living body detection model from paying excessive attention to human face structural information. And, utilize face-palm print uniformity loss to carry out the palm vein biopsy model training of two stages, carry out the verification from the angle of user's identity, prevent other users from using palm vein to imitate the living body detection. Meanwhile, an intelligent threshold judgment model is used for determining when two-stage judgment is needed, so that a better efficiency-safety compromise is achieved.

Another aspect of the present disclosure provides a non-transitory storage medium storing at least one set of executable instructions for performing a biopsy. When executed by a processor, the executable instructions direct the processor to perform the steps of the in-vivo detection method P100 described herein. In some possible implementations, aspects of the specification can also be implemented in the form of a program product including program code. The program code is for causing the living body detection system 001 to execute the steps of the living body detection method P100 described in the present specification when the program product is run on the living body detection system 001. The program product for implementing the above method may employ a portable compact disc read only memory (CD-ROM) comprising program code and may run on the biopsy system 001. However, the program product of the present specification is not limited thereto, and in the present specification, the readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system. The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations of the present specification may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the living detection system 001, partly on the living detection system 001, as a stand-alone software package, partly on the living detection system 001, partly on a remote computing device, or entirely on the remote computing device.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In view of the foregoing, it will be evident to a person skilled in the art that the foregoing detailed disclosure may be presented by way of example only and may not be limiting. Although not explicitly described herein, those skilled in the art will appreciate that the present description is intended to encompass various adaptations, improvements, and modifications of the embodiments. Such alterations, improvements, and modifications are intended to be proposed by this specification, and are intended to be within the spirit and scope of the exemplary embodiments of this specification.

Furthermore, certain terms in the present description have been used to describe embodiments of the present description. For example, "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present description. Thus, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined as suitable in one or more embodiments of the invention.

It should be appreciated that in the foregoing description of embodiments of the present specification, various features have been combined in a single embodiment, the accompanying drawings, or description thereof for the purpose of simplifying the specification in order to assist in understanding one feature. However, this is not to say that a combination of these features is necessary, and it is entirely possible for a person skilled in the art to label some of the devices as separate embodiments to understand them upon reading this description. That is, embodiments in this specification may also be understood as an integration of multiple secondary embodiments. While each secondary embodiment is satisfied by less than all of the features of a single foregoing disclosed embodiment.

Each patent, patent application, publication of patent application, and other material, such as articles, books, specifications, publications, documents, articles, and the like, in addition to any historical prosecution documents associated therewith, any identical or conflicting material to the present document or any identical historical prosecution document which may have a limiting effect on the broadest scope of the claims, is incorporated herein by reference for all purposes now or later associated with the present document. Furthermore, the terms in this document are used in the event of any inconsistency or conflict between the description, definition, and/or use of terms associated with any of the incorporated materials.

Finally, it is to be understood that the embodiments of the application disclosed herein are illustrative of the principles of the embodiments of the present specification. Other modified embodiments are also within the scope of this specification. Accordingly, the embodiments disclosed herein are by way of example only and not limitation. Those skilled in the art can adopt alternative arrangements to implement the application in the specification based on the embodiments in the specification. Therefore, the embodiments of the present specification are not limited to the embodiments precisely described in the application.

Claims

1. A living body detection method, comprising:

obtaining a target face image of a target user;

outputting a first target attack probability corresponding to the target face image through a one-stage living body detection model based on the target face image; and

determining and executing a target scheme based on the first target attack probability, wherein the target scheme is one of a plurality of schemes, and the plurality of schemes comprise a two-stage scheme, and the method comprises the following steps:

obtaining a target palm vein image of the target user

And at least inputting the target palm vein image into a two-stage living body detection model to obtain a target detection result of the target user.

2. The method of claim 1, wherein the outputting, based on the target face image, a first target attack probability corresponding to the target face image by a one-stage living body detection model comprises:

Performing unstructured operation on the target face image to obtain a target unstructured image, wherein the unstructured operation comprises the step of removing structural information of the target face image, and the structural information of the target face image comprises five-sense organ distribution information and/or face contour information; and

and inputting the target unstructured image into the one-stage living body detection model to carry out living body detection, and outputting the first target attack probability.

3. The method of claim 2, wherein the performing the unstructured operation on the target face image to obtain a target unstructured image comprises:

dividing the target face image into a plurality of target image blocks; and

and rearranging the positions of the plurality of target image blocks in the target face image based on a preset rule to obtain the target unstructured image.

4. A method according to claim 3, wherein the preset rules include at least one of: randomly arranging, adjusting row positions of the plurality of tiles, adjusting column positions of the plurality of tiles, and adjusting the row positions and the column positions of the plurality of tiles.

5. The method of claim 2, wherein the one-phase living detection model comprises:

The block feature extraction module is configured to perform feature extraction on a plurality of image blocks contained in the unstructured image subjected to the unstructured operation, and output a plurality of block features corresponding to the plurality of image blocks; and

and the feature fusion module is configured to fuse the plurality of block features and output a first fusion feature and a first attack probability.

6. The method of claim 2, wherein the constraint objective of the one-phase living detection model when trained includes a first loss less than a first preset loss value, the first loss comprising:

and fusing the classification loss, and constraining the difference between the predicted value and the true value corresponding to the first attack probability.

7. The method of claim 6, wherein the block feature extraction module further outputs a plurality of block attack probabilities corresponding to the plurality of block features, the first penalty further comprising at least one of:

a local classification penalty, the local classification penalty being derived by weighted summation based on a plurality of block classification penalties corresponding to the plurality of block attack probabilities, each of the plurality of local classification penalties configured to constrain a difference between a predicted value and a true value corresponding to its corresponding block attack probability;

A local and global consistency penalty configured to constrain consistency between the plurality of block features and the first fusion feature; and

a block feature consistency penalty configured to constrain consistency between the plurality of block features.

8. The method of claim 1, wherein prior to the determining and executing a target solution based on the first target attack probability, the method further comprises:

and generating a target judgment threshold corresponding to the target face image through a threshold generation model based on the target face image.

9. The method of claim 8, wherein the threshold generation model comprises:

the first face feature coding module is configured to perform feature extraction on the face image and output first face features;

the palm vein feature generation module is configured to generate corresponding cross-mode palm vein features based on the first face features and output the cross-mode palm vein features; and

and the threshold regression module is configured to generate a judgment threshold corresponding to the face image based on the first face feature and the cross-mode palm vein feature.

10. The method of claim 9, wherein the constraint objective of the threshold generation model when trained comprises a second penalty being less than a second preset penalty value, the second penalty comprising:

And a threshold regression loss configured to constrain a difference between the judgment threshold and a true threshold corresponding to the face image.

11. The method of claim 10, wherein the second penalty further comprises:

a palmar venous return loss configured to constrain a difference between the cross-modal palmar venous feature and a real palmar venous feature, the difference between the cross-modal palmar venous feature and the real palmar venous feature comprising a distance between the cross-modal palmar venous feature and the real palmar venous feature.

12. The method of claim 8, wherein the target determination threshold comprises a first target threshold and a second target threshold, the first target threshold being less than the second target threshold.

13. The method of claim 12, wherein the determining and executing a target solution based on the first target attack probability comprises:

and determining that the first target attack probability meets a two-stage threshold condition, determining that the target scheme is the two-stage scheme and executing the two-stage scheme, wherein the two-stage threshold condition comprises a position between the first target threshold and the second target threshold.

14. The method of claim 12, wherein the plurality of schemes further comprises:

The first scheme is that the target user is output as a living body; and

and outputting the target user as attack according to a second scheme.

15. The method of claim 14, wherein the determining and executing a target solution based on the first target attack probability comprises:

determining that the first target attack probability is smaller than the first target threshold, determining that the target scheme is the first scheme and executing the first scheme; or alternatively

And determining that the first target attack probability is larger than the second target threshold, determining that the target scheme is the second scheme and executing.

16. The method of claim 8, wherein the target determination threshold further comprises a third target threshold, and the obtaining the target detection result of the target user comprises:

determining a second target attack probability corresponding to the target palm vein image output by the two-stage living body detection model; and

determining the target detection result based on the second target attack probability includes:

determining that the second target attack probability is smaller than the third target threshold, determining that the target detection result is a living body, or

And determining that the second target attack probability is larger than the third target threshold value, and determining that the target detection result is attack.

17. The method of claim 16, wherein the inputting at least the target palm vein image into a two-stage living body detection model comprises:

and inputting the target face image and the target palm vein image into a two-stage living body detection model, and outputting a second target fusion attack probability, wherein the second target fusion attack probability comprises the second target fusion attack probability.

18. The method of claim 17, wherein the two-stage liveness detection model comprises:

the second face feature coding module is configured to perform feature extraction on the face image and output second face features;

the first palm vein feature encoding module is configured to perform feature extraction on the palm vein image and output a first palm vein feature; and

and the fusion decision module is configured to fuse the second face feature and the first palm vein feature and output a second fusion attack probability.

19. The method of claim 18, wherein the fusion decision module is further configured to:

and outputting a consistency judgment result, wherein the consistency judgment result represents the consistency degree of the face image and the palm vein image belonging to the same user, and the consistency judgment result comprises the distance between the second face feature and the first palm vein feature.

20. The method of claim 19, wherein the outputting the second fusion attack probability comprises:

outputting the second fusion attack probability by referring to the consistency judgment result,

and the consistency judgment result is in direct proportion to the second attack fusion probability.

21. The method of claim 18, wherein the constraint objective of the two-stage living detection model when trained includes a third loss less than a third preset loss value, the third loss comprising:

the living body detection loss is configured to constrain a difference between a predicted value and a true value of the second fusion attack probability.

22. The method of claim 21, wherein the third penalty further comprises:

a face palm vein consistency penalty configured to constrain a difference between the second face feature and the first palm vein feature, the difference between the second face feature and the first palm vein feature comprising a distance between the second face feature and the first palm vein feature.

23. The method of claim 16, wherein the inputting at least the palm vein image into a two-stage living body detection model comprises:

And inputting the target palm vein image into the two-stage living body detection model, and outputting a second target decision attack probability, wherein the second target attack probability comprises the second target decision attack probability.

24. The method of claim 23, wherein the two-stage liveness detection model comprises:

the second palm vein feature encoding module is configured to perform feature extraction on the palm vein image and output second palm vein features; and

and the decision module is configured to make living decision according to the second palm vein characteristics and output a second decision attack probability.

25. The method of claim 1, wherein the obtaining a target palm vein image of the target user comprises:

sending an acquisition instruction to the palm vein acquisition device; and

the target palm vein image is obtained from the palm vein collection device.

26. A liveness detection system comprising a client, the client comprising:

the image acquisition device is configured to acquire a target face image of a target user;

a palmar vein acquisition device configured to acquire a target palmar vein image of the target user;

at least one storage medium storing at least one set of instructions for performing in vivo detection; and

At least one processor in communication with the image acquisition device, the palm vein acquisition device, and the at least one storage medium, respectively,

wherein the at least one processor reads the at least one instruction set and implements the in-vivo detection method of any one of claims 1-25 when the in-vivo detection system is running.

27. A living body detection system comprising a server, the server comprising:

at least one processor communicatively coupled to the at least one storage medium,