CN116469179A - Living body identification method and system - Google Patents

Living body identification method and system Download PDF

Info

Publication number
CN116469179A
CN116469179A CN202310444958.9A CN202310444958A CN116469179A CN 116469179 A CN116469179 A CN 116469179A CN 202310444958 A CN202310444958 A CN 202310444958A CN 116469179 A CN116469179 A CN 116469179A
Authority
CN
China
Prior art keywords
image
images
living body
modal
modality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310444958.9A
Other languages
Chinese (zh)
Inventor
曹佳炯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202310444958.9A priority Critical patent/CN116469179A/en
Publication of CN116469179A publication Critical patent/CN116469179A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection

Abstract

According to the living body identification method provided by the specification, a plurality of modal images of the target object are obtained, abnormal shielding modal images are determined from the plurality of modal images, namely, shielding modal images and non-shielding modal images are distinguished in the plurality of modal images, so that the modal images which do not meet the living body identification requirement are shielded, living body identification is carried out based on the non-shielding modal images which meet the living body identification requirement, the living body identification result of the target object is determined, and the accuracy of living body identification is improved.

Description

Living body identification method and system
Technical Field
The present disclosure relates to the field of image processing, and in particular, to a method and system for living body identification.
Background
Biometric systems, such as face recognition systems, have found widespread use in recent years, with success in such scenarios as face payment, face entry, face attendance, etc. However, face recognition systems are also facing security challenges, and living attacks are one of the major security threats, such as electronic screen displayed photographs, printed paper photographs, 3D masks, etc.
Therefore, in order to accurately detect a living body attack in a biological recognition process, a new living body recognition method is demanded.
Disclosure of Invention
The living body identification method and the living body identification system can accurately detect living body attack in the biological identification process.
In a first aspect, the present specification provides a method of in vivo identification, comprising: acquiring multi-mode images of a target object; determining a shielding mode image with abnormality from the plurality of mode images; and determining a living body recognition result of the target object based on an unshielded modal image including other modal images than the shielded modal image of the plurality of modal images.
In some embodiments, the determining a mask modality image from the plurality of modality images for which an anomaly exists comprises: determining an index result of an evaluation index corresponding to each of the plurality of modal images; and taking the mode image of which the index result does not meet the preset condition as the shielding mode image, wherein at least one index result of the shielding mode image does not meet the preset condition.
In some embodiments, the plurality of modality images includes at least two of a visible light image, a near infrared image, a depth image, and a thermal imaging image.
In some embodiments, the evaluation index corresponding to the visible light image includes at least one of image integrity, image quality, and image exposure, the evaluation index corresponding to the near infrared image includes at least one of image integrity, image quality, and image exposure, the evaluation index corresponding to the depth image includes at least one of a duty cycle of point cloud data, whether depth reference position data is missing, and the evaluation index corresponding to the thermal imaging image includes a thermal image temperature difference.
In some embodiments, the image integrity includes an integrity degree of a target portion of the visible light image or the near infrared image including the target object, the image quality includes at least one of an image sharpness, an image angle, wherein the image sharpness is determined based on a gray value of a pixel of the visible light image or the near infrared image, the image exposure includes a brightness of a first object region of the target object corresponding in the visible light image or the near infrared image, the ratio of the point cloud data includes a ratio of a number of point cloud data in the depth image to a number of preset point cloud data, the depth reference position data is whether the depth image includes the depth reference position data, and the thermal image temperature difference is a difference between a temperature of a second object region in the thermal image and a temperature of a background region, the second object region being a region of the target object corresponding in the thermal image, the background region being a region of the thermal image other than the second object region.
In some embodiments, the determining the index result of the evaluation index corresponding to each of the plurality of modal images includes: and inputting the multiple modal images into a structural perception model, and outputting an index result of an evaluation index corresponding to each modal image.
In some embodiments, inputting the plurality of modal images into the structural perception model, and outputting the evaluation index corresponding to each modal image includes: inputting the multi-modal images into a basic feature encoder of the structural perception model, and outputting multi-modal feature images corresponding to the multi-modal images; and inputting the multi-mode feature images into a multi-element perception module of the structural perception model, and outputting index results of the evaluation indexes corresponding to each mode image.
In some embodiments, the preset conditions include at least one of: the image integrity is greater than a preset integrity threshold, the image quality is greater than a preset quality threshold, the image exposure is within a preset exposure range, the point cloud data occupancy ratio is higher than a preset proportion, the depth reference position data is not missing, and the thermal image temperature difference is within a preset temperature difference range.
In some embodiments, the determining the living body recognition result of the target object based on the non-shielding modality image includes: and carrying out weighted fusion on the non-shielding mode image and the shielding mode image, and carrying out living body recognition on the fused mode image obtained by the weighted fusion to obtain the living body recognition result, wherein the weight of the non-shielding mode image is greater than that of the shielding mode image.
In some embodiments, the performing weighted fusion on the unshielded modal image and the shielded modal image, and performing in-vivo identification on the fused modal image obtained by the weighted fusion includes: and inputting the unshielded modal image and the shielded modal image into a structural consistency model for carrying out the weighted fusion, and carrying out living body identification on the fused modal image through the structural consistency model, wherein the structural consistency model takes the first fusion weight of the shielded modal image as 0 and the second fusion weight of the unshielded modal image as 1 as a training target in the training process.
In some embodiments, the inputting the unshielded modality image and the shielded modality image into a structural consistency model for the weighted fusion, and the in vivo identification of the fused modality image by the structural consistency model comprises: inputting the unshielded modal image and the shielded modal image to a basic feature extraction module of the structural consistency model for feature extraction, and outputting an unshielded feature image of the unshielded modal image and a shielded feature image of the shielded modal image; inputting the non-shielding feature map and the shielding feature map into a shielding mode sensing module of the structure consistency model, and outputting the first fusion weight and the second fusion weight; the non-shielding feature map and the shielding feature map are subjected to weighted fusion based on the first fusion weight and the second fusion weight, and a fusion feature map is obtained; and inputting the fusion characteristic diagram into a living body identification module of the structure consistency model to carry out living body identification.
In a second aspect, the present specification also provides a system for living body identification, comprising: at least one storage medium storing at least one set of instructions for performing in vivo identification; and at least one processor communicatively coupled to the at least one storage medium, wherein the at least one processor reads the at least one instruction set and implements the method of living body identification of the first aspect when the living body identification system is in operation.
According to the method and the system for living body identification, provided by the technical scheme, a plurality of mode images of the target object are obtained, abnormal shielding mode images are determined from the plurality of mode images, namely, shielding mode images and non-shielding mode images are distinguished in the plurality of mode images, so that the mode images which do not meet the living body identification requirement are shielded, living body identification is performed based on the non-shielding mode images which meet the living body identification requirement, the living body identification result of the target object is determined, and the accuracy of living body identification is improved.
Additional functions of the methods and systems for in vivo identification provided herein will be set forth in part in the description that follows. The following numbers and examples presented will be apparent to those of ordinary skill in the art in view of the description. The inventive aspects of the methods and systems of living body identification provided herein may be fully explained by the practice or use of the methods, devices, and combinations described in the following detailed examples.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present description, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 illustrates an application scenario diagram of a system for in-vivo identification provided according to some embodiments of the present description;
FIG. 2 illustrates a hardware architecture diagram of a computing device provided in accordance with some embodiments of the present description;
FIG. 3 illustrates a flow chart of a method of in-vivo identification provided in accordance with some embodiments of the present description; and
fig. 4 illustrates a flow chart of a method of living body identification provided in accordance with some embodiments of the present description.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the disclosure. Thus, the present description is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. For example, as used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. The terms "comprises," "comprising," "includes," and/or "including," when used in this specification, are taken to specify the presence of stated integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
These and other features of the present specification, as well as the operation and function of the related elements of structure, as well as the combination of parts and economies of manufacture, may be significantly improved upon in view of the following description. All of which form a part of this specification, reference is made to the accompanying drawings. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the description. It should also be understood that the drawings are not drawn to scale.
The flowcharts used in this specification illustrate operations implemented by systems according to some embodiments in this specification. It should be clearly understood that the operations of the flow diagrams may be implemented out of order. Rather, operations may be performed in reverse order or concurrently. Further, one or more other operations may be added to the flowchart. One or more operations may be removed from the flowchart.
Before describing the specific embodiments of the present specification, the application scenario of the present specification will be described as follows:
the living body identification method and system provided in the present specification can be applied to various scenes in which living body identification is required. The living body recognition may include a face-based living body recognition, an iris-based living body recognition, a fingerprint-based living body recognition, a palm print-based living body recognition, a palm vein-based living body recognition, and the like. Taking face-based living body recognition as an example, applicable scenarios are face-brushing payment, face-brushing attendance checking, face-brushing inbound, face-brushing transfer, and the like. For example, in an unmanned vending store, the payment device may detect whether the consumer is a living body when the consumer makes a face-brushing payment using the payment device. When the face is brushed for attendance, the staff uses the attendance equipment to brush the face for punching cards, and the attendance equipment can detect whether the staff is a living body or not. It should be understood that the living body identification method and system provided in the present description can also be applied to other scenes, and is not limited to the above scenes. For convenience of description, the present specification will be described taking as an example face-based living body recognition.
In the living body identification method and system provided by the specification, multiple modes are introduced, multiple mode images under the multiple modes are acquired, and image information is increased, so that the identification accuracy of living body identification is improved. In some living body identification methods, when a plurality of modal images are acquired, the living body identification can be performed directly by using the plurality of modal images, so as to increase the modal information of the living body identification, thereby improving the efficiency of the living body identification. However, when performing living body recognition, since conditions and scenes to which different modality images are applied may be different, living body recognition using the same multi-modality image under different conditions and scenes may cause a decrease in living body recognition accuracy. In some living body identification methods, an image sequence of a target object may be acquired, that is, the multi-mode images may be multi-mode videos, where each mode video includes multi-frame mode images, so that information is increased in time, and identification accuracy of living body identification is improved. However, it may take longer to perform living body recognition by capturing the modality video, resulting in an increase in time cost and a reduction in experience.
According to the living body identification method and system, the plurality of modal images are acquired, the abnormality detection is carried out on the plurality of modal images, the modal images which are not suitable for the current living body identification scene or the current living body identification object are shielded or weakened, living body identification is carried out only or mainly based on the modal images which are suitable for the current living body identification scene or the living body identification object, the multi-modal information of the multi-modal images is reserved, the modal information which does not meet the living body identification requirement is removed, and the accuracy of living body identification is improved on the basis of guaranteeing the living body identification safety.
For convenience of description, the present specification explains the terms that appear in the context:
attack/living body attack: non-living body refers to attack means presented for a living body identification system, including photographs displayed on a mobile phone screen, printed paper photographs, high-precision masks, molds, prostheses and the like;
living body identification/living body anti-attack: an algorithm technology for detecting and intercepting the attack by utilizing an artificial intelligent model to detect and judge whether the user is a living body or an attack;
face structure perception: in the present specification, the analysis of structural features of a face, such as integrity, presence of five sense organs, and the like;
Multi-modal living body identification: the method is characterized by a living body identification method by utilizing the multi-mode data after the multi-mode data is acquired by the multi-mode module (or the multi-mode sensor). Wherein the multimodal data includes, for example, visible light images, near infrared images, far infrared images, depth images, thermal imaging images, and the like. The visible light image in this specification mainly refers to an RGB image, the near infrared image may be referred to as NIR (Near Infrared) image, the far infrared image may be referred to as FIR (Far Infrared) image, and the Depth image may be referred to as Depth image.
Fig. 1 illustrates an application scenario of a system 001 for living body identification according to some embodiments of the present specification. As shown in fig. 1, the system 001 may include a target object 100, a client 200, a server 300, and a network 400.
The target object 100 may be any user for living body recognition using the client 200.
The client 200 may include an image acquisition device 210 in a variety of modalities. The image capturing device 210 is configured to capture an image of the target object 100, for example, capture a face image, a fingerprint image, an iris image, a retina image, a hand image, and the like of the target object 100. Of course, the image capturing device 210 may also capture behavior information of the target object 100, such as sound, behavior, gait, etc.
Different modalities may refer to different imaging modes. The imaging modes can be classified into imaging modes under different visual fields, imaging modes under different dimensions, imaging modes of thermal imaging, and the like. Such as RGB mode, NIR mode, depth mode, thermal imaging mode, etc. Accordingly, the image capturing device 210 may include one or more of a camera under different visual fields, a camera under different dimensions, and a thermal imaging camera.
The visual field may refer to a spectrum range in which an image is located, such as an ultraviolet light field, a visible light field, a near infrared light field, a mid-infrared light field, a far-infrared light field, and the like. The cameras with different vision fields are, for example, ultraviolet cameras, visible light cameras, near infrared cameras, mid-infrared cameras, far infrared cameras and the like. The working principle of cameras in different vision fields is different. For example, a visible light camera mainly includes a lens, an image sensor, and an image processor. The lens is used for projecting the shot object on the image sensor, the image processor calculates proper parameters through photometry and ranging and instructs the lens to focus, when a shooting instruction is detected (for example, the face of the target object 100 is detected to be completely placed in the view-finder frame), the image sensor completes one exposure and becomes an image through the image processor. The near infrared camera mainly comprises an infrared emission device and an infrared receiving device, and the working principle of the near infrared camera is that the infrared emission device emits infrared rays to irradiate a shot object, and the infrared receiving device receives the infrared rays reflected by the shot object, so that a near infrared image is formed. The mode images shot by the cameras in different visual fields are different, such as ultraviolet images, visible light images, near infrared images, mid infrared images, far infrared images and the like.
The cameras in different dimensions can acquire images in different dimensions, such as a 2D camera, a 3D camera (or a depth camera). The 2D camera may acquire a planar image of the target object 100. The 3D camera may acquire a depth image of the target object 100, where the depth image includes depth information of the target object 100, such as a distance between the target object 100 and the 3D camera. Such as structured light cameras, TOF cameras, binocular stereo cameras, laser detectors, etc. The thermal imaging camera can also be called a thermal imager and is used for passively receiving infrared radiation energy (heat) emitted by a measured object and converting the heat energy into a visual image with temperature data, wherein the visual image is a thermal imaging image, and the thermal imaging image displays the temperature distribution of the surface of the measured object.
As shown in fig. 1, the image capturing device 210 may include a plurality of cameras among a visible light camera 211, a near infrared camera 213, a 3D camera 215, and a thermal imaging camera 217. Of course, the image capturing device 210 may include any two or more cameras, which is not limited in this embodiment of the present disclosure. In some embodiments, the visible light camera 211, the near infrared camera 213, the 3D camera 215, and the thermal imaging camera 217 may be integrated together on the body structure of the client 200. In some embodiments, the visible light camera 211, the near infrared camera 213, the 3D camera 215, and the thermal imaging camera 217 may also be independently disposed with respect to the main structure of the client 200, and disposed on different devices, that is, disposed outside the main structure of the client 200, and communicatively connected with the main structure of the client 200 in a wired or wireless manner, so as to send the acquired modal image to the main structure of the client 200 in a wired or wireless manner. For example, the 3D camera 215 and the thermal imaging camera 217 are connected with the main structure of the client 200 through data lines, the 3D camera 215 sends the collected depth image to the main structure of the client 200 through the data lines, and the thermal imaging camera 217 sends the collected thermal imaging image to the main structure of the client 200 through the data lines.
In some embodiments, the image capturing device 210 may also be a combination of cameras with a plurality of different modalities, which is not limited in this specification.
In some embodiments, the in-vivo identification method may be performed on the client 200. At this time, the client 200 may store data or instructions to perform the living body recognition method described in the present specification, and may execute or be used to execute the data or instructions. In some embodiments, the client 200 may include a hardware device having a data information processing function and a program necessary to drive the hardware device to operate. As shown in fig. 1, a client 200 may be communicatively connected to a server 300. In some embodiments, the server 300 may be communicatively coupled to a plurality of clients 200. In some embodiments, client 200 may interact with server 300 over network 400 to receive or transmit messages, etc., such as RGB images, NIR images, depth images, thermal imaging images, etc.
In some embodiments, the client 200 may include a mobile device, tablet, notebook, built-in device for a motor vehicle or the like, dragonfly device for a payroll, vending machine, sales counter, or any combination thereof. In some embodiments, the mobile device may include a smart home device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home device may include a smart television, a desktop computer, or the like, or any combination. In some embodiments, the smart mobile device may include a smart phone, personal digital assistant, gaming device, navigation device, etc., or any combination thereof. In some embodiments, the virtual reality device or augmented reality device may include a virtual reality helmet, virtual reality glasses, virtual reality patch, augmented reality helmet, augmented reality glasses, augmented reality patch, or the like, or any combination thereof. For example, the virtual reality device or the augmented reality device may include google glass, head mounted display, VR, or the like. In some embodiments, the built-in devices in the motor vehicle may include an on-board computer, an on-board television, and the like. In some embodiments, the client 200 may be a device with positioning technology for locating the position of the client 200. In some embodiments, client 200 may have one or more of the following functions: NFC (Near Field Communication ), WIFI (Wireless Fidelity, wireless fidelity), 3G/4G/5G, POS (Point Of Sale) machine card swiping function, two-dimensional code scanning function, bar code scanning function, bluetooth, infrared, SMS (Short Message Service), MMS (Multimedia Message Service, multimedia message). In some embodiments, client 200 may be installed with one or more Applications (APP). The APP can provide the target object 110 with the ability to interact with the outside world via the network 400 as well as an interface. The APP includes, but is not limited to: web browser-like APP programs, search-like APP programs, chat-like APP programs, shopping-like APP programs, video-like APP programs, financial-like APP programs, instant messaging tools, mailbox clients, social platform software, and the like. In some embodiments, the client 200 may have a target APP installed thereon. The target APP is able to acquire modality images for the client 200. In some embodiments, the target APP is also capable of in vivo recognition of the model image. The target object 100 may trigger a living body identification request through the target APP. The target APP may perform the living body identification method in response to the living body identification request.
The server 300 may be a server providing various services, such as a background server providing support for pages displayed on the client 200. In some embodiments, the in-vivo identification method may be performed on the server 300. For example, the server 300 acquires a plurality of modality images of the target object 100 from the client 200, thereby performing the living body recognition method. At this time, the server 300 may store data or instructions for performing the living body recognition method described in the present specification, and may perform or be used to perform the data or instructions. In some embodiments, the server 300 may include a hardware device having a data information processing function and a program necessary to drive the hardware device to operate. The server 300 may be communicatively connected to a plurality of clients 200 and receive data transmitted from the clients 200.
The network 400 is a medium used to provide communication connections between the client 200 and the server 300. The network 400 may facilitate the exchange of information or data. As shown in fig. 1, the client 200 and the server 300 may be connected to a network 400 and transmit information or data to each other through the network 400. In some embodiments, the network 400 may be any type of wired or wireless network, or a combination thereof. For example, network 400 may include a cable network, a wired network, a fiber optic network, a telecommunications network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Public Switched Telephone Network (PSTN), a bluetooth network, a ZigBee network, a Near Field Communication (NFC) network, or the like. In some embodiments, network 400 may include one or more network access points. For example, the network 400 may include a wired or wireless network access point, such as a base station or an internet switching point, through which one or more components of the client 200 and server 300 may connect to the network 400 to exchange data or information.
It should be understood that the number of clients 200, servers 300, and networks 400 in fig. 1 are merely illustrative. There may be any number of clients 200, servers 300, and networks 400, as desired for implementation.
The living body identification method may be performed entirely on the client 200, entirely on the server 300, or partially on the client 200 and partially on the server 300.
Fig. 2 illustrates a hardware architecture diagram of a computing device 600 provided in accordance with some embodiments of the present description. The computing device 600 may perform the in-vivo identification method described herein. The living body identification method is described in other parts of the specification. When the living body identification method is performed on the client 200, the computing device 600 may be the client 200. When the living body identification method is performed on the server 300, the computing device 600 may be the server 300. When the living body recognition method may be partially performed on the client 200 and partially performed on the server 300, the computing device 600 may be the client 200 and the server 300.
As shown in fig. 2, computing device 600 may include at least one storage medium 630 and at least one processor 620. In some embodiments, computing device 600 may also include a communication port 650 and an internal communication bus 610. Meanwhile, computing device 600 may also include I/O component 660.
Internal communication bus 610 may connect the various system components including storage medium 630, processor 620, and communication ports 650.
I/O component 660 supports input/output between computing device 600 and other components.
The communication port 650 is used for data communication between the computing device 600 and the outside world, for example, the communication port 650 may be used for data communication between the computing device 600 and the network 400. The communication port 650 may be a wired communication port or a wireless communication port.
The storage medium 630 may include a data storage device. The data storage device may be a non-transitory storage medium or a transitory storage medium. For example, the data storage devices may include one or more of magnetic disk 632, read Only Memory (ROM) 634, or Random Access Memory (RAM) 636. The storage medium 630 may store at least one set of instructions for performing living organism identification. The instructions are computer program code which may include programs, routines, objects, components, data structures, procedures, modules, etc. that perform the methods of living body identification provided herein. The storage medium 630 may also store a living body recognition model for implementing a living body recognition method, such as a perception model, a structural consistency model, or the like. At this point, the model may be one or more sets of instructions stored in the storage medium 630 that execute the corresponding instructions and are executed by the processor 620 in the computing device 600. Of course, the model may also be part of a circuit, hardware device, or module in computing device 600. For example, the perception model may be a hardware device/module in the computing device 600 that implements the evaluation index corresponding to the determined modal image, the structural consistency model may be a hardware device/module in the computing device 600 that implements the living body identification, and so on. At this point, processor 620 may have stored therein at least one set of instructions or instruction sets for controlling the model.
The at least one processor 620 may be communicatively coupled with at least one storage medium 630 and a communication port 650 via an internal communication bus 610. The at least one processor 620 is configured to execute the at least one instruction set. When the computing device 600 is running, the at least one processor 620 may read the at least one instruction set and, according to an indication of the at least one instruction set, perform the method of living body identification provided herein. The processor 620 may perform all steps involved in the method of living body identification. The processor 620 may be in the form of one or more processors, and in some embodiments, the processor 620 may include one or more hardware processors, such as microcontrollers, microprocessors, reduced Instruction Set Computers (RISC), application Specific Integrated Circuits (ASICs), application specific instruction set processors (ASIPs), central Processing Units (CPUs), graphics Processing Units (GPUs), physical Processing Units (PPUs), microcontroller units, digital Signal Processors (DSPs), field Programmable Gate Arrays (FPGAs), advanced RISC Machines (ARM), programmable Logic Devices (PLDs), any circuit or processor capable of executing one or more functions, or the like, or any combination thereof. For illustrative purposes only, only one processor 620 is depicted in the computing device 600 in this specification. It should be noted, however, that computing device 600 may also include multiple processors, and thus, operations and/or method steps disclosed in this specification may be performed by one processor as described herein, or may be performed jointly by multiple processors. For example, if the processor 620 of the computing device 600 performs steps a and B in this specification, it should be understood that steps a and B may also be performed by two different processors 620 in combination or separately (e.g., a first processor performs step a, a second processor performs step B, or the first and second processors perform steps a and B together).
Fig. 3 illustrates a flowchart of a living body recognition method P100 provided according to some embodiments of the present specification. As described previously, the computing device 600 may perform the living body recognition method P100 described in the present specification. Specifically, the processor 620 may read an instruction set stored in its local storage medium and then execute the living body recognition method P100 described in the present specification according to the specification of the instruction set. As shown in fig. 3, the method P100 may include:
s120: a multi-modality image of the target object 100 is obtained.
When the target object 100 is located in front of the client 200, the image capturing device 210 on the client 200 may capture a modal image of the target object 100. The target object 100 may be a real user, or may be a paper photo, a mobile phone screen, a mask, or the like. The client 200 may turn on the image capturing device 210 under multiple modes, such as the visible light camera 211, the near infrared camera 213, the 3D camera 215, and the thermal imaging camera 217, so as to capture multiple mode images of the target object 100. For example, the visible light camera 211 captures a visible light image (e.g., RGB image) of the target object 100, the near infrared camera 213 captures a near infrared image of the target object 100, the 3D camera 215 captures a depth image of the target object 100, and the thermal imaging camera 217 captures a thermal imaging image of the target object 100. Further, the processor 620 may obtain a plurality of modality images, such as RGB images, near infrared images, depth images, thermal imaging images, and the like, from the image capture device 210 in a plurality of modalities. The number of images of each modality image may be one or more. For example, 1 RGB image, two near infrared images, 3 depth images, 4 thermal imaging images, and the like are acquired.
It should be noted that, the image capturing device 210 in the multiple modes may capture multiple mode images of the target object 100 at the same time, or may sequentially capture multiple mode images of the target object 100, for example, sequentially capture the images according to the sequence of the arrangement of the visible light camera 211, the near infrared camera 213, the 3D camera 215, and the thermal imaging camera 217, which is not limited in the capturing sequence of the multiple mode images in the embodiment of the present disclosure.
As shown in fig. 3, the method P100 further includes:
s140: a screening modality image in which an anomaly exists is determined from the plurality of modality images.
The applicable conditions or applicable scenes or applicable objects may be different due to the different modality images. Therefore, in order to improve accuracy of living body recognition, abnormality detection or abnormality sensing may be performed on the plurality of modality images to determine whether or not abnormality exists in each of the plurality of modality images. Whether or not there is an abnormality may be determining whether or not each modality image is suitable for the scene of the current living body recognition, or whether or not it is suitable for the target object 100 of the current living body recognition, or whether or not the requirement of living body recognition on the image is satisfied, or the like. The mask modality image may be a modality image in which an abnormality exists in the plurality of modality images. For example, the multiple mode images are not suitable for the mode image of the current living body identification scene, for example, the multiple mode images are not suitable for the mode image of the target object 100 of the current living body identification, for example, the multiple mode images are not suitable for the mode image of the living body identification requirement, and the like. For convenience of description, we define images other than the mask modality image in the plurality of modality images as non-mask modality images. The unshielded modality image may be a modality image in which no anomaly is present in the plurality of modality images.
Fig. 4 illustrates a flowchart of a living body recognition method S140 provided according to some embodiments of the present specification.
As shown in fig. 4, the method S140 includes:
s141: and determining an index result of the evaluation index corresponding to each of the plurality of modal images.
The evaluation index of the modal image may include image integrity, image quality, image exposure, duty ratio of point cloud data, absence or absence of depth reference position data, thermal image temperature difference, image color, image resolution, image contrast, image size, etc. The evaluation indexes corresponding to different modal images may be the same or different. In some embodiments, the evaluation index corresponding to the visible light image may include at least one of image integrity, image quality, and image exposure. The evaluation index corresponding to the near infrared image may include at least one of image integrity, image quality, and image exposure. The evaluation index corresponding to the depth image may include at least one of a duty ratio of the point cloud data and whether the depth portion data is missing. The evaluation index corresponding to the thermal imaging image may include a thermal image temperature difference.
The mode image may include other evaluation indexes, and is not limited to the evaluation indexes. For example, the evaluation index corresponding to the visible light image may include at least one of image color, image resolution, image contrast, and image size. The evaluation index corresponding to the near infrared image may include at least one of image color, image resolution, image contrast, and image size. The evaluation index corresponding to the depth image may include at least one of image integrity, image quality, and exposure conditions. The evaluation index corresponding to the thermal imaging image may include at least one of image integrity, image quality, exposure conditions.
Wherein the image integrity includes the integrity of the target portion of the target object 100 in a modality image, such as a visible light image or a near infrared image. For example, in face-based in-vivo recognition, the image integrity is face integrity. The face integrity may be the integrity of the modal image including the face region including the facial features (eyebrows, eyes, nose, mouth, ears), cheeks, forehead. The face integrity may also refer to the integrity of the facial features contained in the modal image. At this time, the target part is a face area or a face five sense organs. In living body identification based on fingerprints, the image integrity is fingerprint integrity, which refers to the integrity degree of finger lines contained in a model image. At this time, the target portion is a fingerprint. In iris-based in vivo recognition, the image integrity is the iris integrity, which refers to the integrity of the iris region contained in the modal image. At this time, the target site is the iris.
The image quality may include at least one of image sharpness of a modality image, such as a visible light image or a near infrared image, an image angle of the modality image. Wherein the image sharpness may be determined based on gray values of pixels of the modality image. For example, the processor 620 may calculate the square of the gray value of each adjacent two pixels in the modal image by the Brenner gradient function to obtain the image sharpness. The processor 620 may also calculate the pixel gray value of the modal image by using a gray variance function, a gray variance product function, an entropy function, and the like, which is not limited in this embodiment of the present disclosure.
In some embodiments, the image angle may be a tilt angle of the target object 100 in a third object region in the modal image. The third object region is for example a face region of the target object 100 in the modal image. The tilt may be a tilt of the target object 100 on a plane parallel to the lens plane of the image pickup device 210 with respect to a straight line perpendicular to the ground as a tilt axis. The midline of the head is perpendicular to the line connecting the eyes. The tilt angle is an angle between the head midline and the tilt axis. When the target object 100 does not tilt the head, the tilt angle is 0 degrees; when the target object 100 tilts the head to the left with respect to the tilt axis, the tilt angle may be a negative value; when the target object 100 tilts the head rightward with respect to the tilt axis, the tilt angle may be a positive value.
In some embodiments, the image angle may be a rotation angle of the target object 100 in a third object region in the modal image. The rotation may be about a line perpendicular to the ground as a rotation axis about which the head of the target object 100 rotates. When the target object 100 does not rotate the head, the image acquisition device 210 may acquire the frontal face area of the target object 100, and the rotation angle is 0 degrees at this time; when the target object 100 rotates the head to the left by 90 degrees, the image capture device 210 may capture a right side face region of the target object 100; when the target object 100 rotates the head by 90 degrees to the right, the image pickup device 210 may pick up the left side face region of the target object 100.
The image quality may further include at least one of image color, image resolution, image contrast, image size, image tone, image shadow, and image distortion of the modal image, which is not limited in the embodiment of the present specification.
The image exposure includes the brightness of a corresponding first object region of the target object 100 in a modal image, such as a visible light image or a near infrared image. The first object region may be a face region, a fingerprint region, an iris region, etc. of the target object 100 in the modal image. The brightness may be an average brightness.
The duty ratio of the point cloud data comprises the ratio of the number of the point cloud data in the depth image to the number of preset point cloud data. The point cloud data refers to a data set of points under a preset coordinate system, and each point in the data set contains rich information, such as three-dimensional coordinates, colors, depth values, intensity values, time and the like. The number of point cloud data may be the number of points. The preset point cloud data amount may be an average point cloud amount of a high quality depth image (e.g., a high quality face image). The high-quality depth image may be a depth image with a quality score greater than a preset quality threshold, and the preset point cloud data amount may be the number of points in the high-quality depth image.
Whether the depth reference position data is missing or not is whether the depth image contains data of a depth reference position or not. The depth reference position is, for example, the position of a nose tip point, the position of an eye center point, etc. in the face depth image. The thermal image temperature difference is a difference between a temperature of a second object region and a temperature of a background region in the thermal imaging image. The second object region is a region of the target object 100 corresponding in the thermographic image, such as a face region, a fingerprint region, an iris region, etc. The background region is a region other than the second object region in the thermographic image.
In some embodiments, the processor 620 may perform analysis calculations on the multiple modality images of the target object 100 to obtain an index result of the evaluation index. The index result may be a specific value of the evaluation index, or may be a classification result obtained based on the specific value.
For example, for an index result of image integrity, the processor 620 may detect a necessary pixel corresponding to an object region (such as a face region or a facial feature) in a visible light image or a near infrared image, calculate a ratio between the necessary pixel and a preset complete pixel, and use the ratio as an index result of image integrity (such as face integrity), where the preset complete pixel may be a pixel corresponding to an image with an image integrity of 100%. Alternatively, the processor 620 may compare the ratio to a preset integrity threshold, the result of the comparison being that the image is complete or incomplete, and the processor 620 may treat the image as an indicator of the integrity of the image.
For the index result of image quality, the processor 620 may calculate the gray value of the pixel in the visible light image or the near infrared image, and further calculate the sharpness value of the visible light image or the near infrared image; the image angle of the visible light image or the near infrared image may be calculated, a result of the combination of the sharpness value and the image angle may be determined, and the result of the combination may be used as an index result of the image quality of the visible light image or the near infrared image. Alternatively, the processor 620 may compare the integrated result with a preset quality threshold, where the compared result is, for example, good or poor image quality, and the processor 620 may use the good or poor image quality as an indicator of the image quality.
For the index result of the image exposure, the processor 620 may acquire an average brightness of the corresponding first object region of the target object 100 in the visible light image or the near infrared image, and take the average brightness as the result of the image exposure. Alternatively, the processor 620 may compare the average brightness with a preset exposure range. When the average brightness is within the preset exposure range, the image exposure is normal, and when the average brightness is not within the preset exposure range, the image exposure is abnormal. Processor 620 may treat the image exposure normal or the image exposure abnormal as an index result of the image exposure.
For the index result of the duty ratio of the point cloud data, the processor 620 may obtain the number of point cloud data in the depth image, calculate a ratio of the number of point cloud data in the depth image to the number of preset point cloud data, and use the ratio as the duty ratio result of the point cloud data in the depth image. Alternatively, the processor 620 may compare the ratio to a preset ratio, and when the ratio is greater than the preset ratio, the point cloud data is high, and when the ratio is less than the preset ratio, the point cloud data is low. The processor 620 may take the high or low duty cycle of the point cloud data as an index result of the duty cycle of the point cloud data.
For an index result of whether the depth reference position data is missing, the processor 620 may detect the data of the depth reference position in the depth image, the index result being 0 when the data of the depth reference position is not detected, and the index result being 1 when the data of the depth reference position is detected. Alternatively, the index result is missing when no data of the depth reference position is detected, and the index result is not missing when data of the depth reference position is detected.
For an index result of the thermal image temperature difference, the processor 620 may calculate a difference between the temperature of the second object region and the temperature of the background region in the thermal image as an index result of the thermal image temperature difference. Alternatively, the processor 620 may compare the difference with a predetermined temperature difference range, and the thermal image temperature difference is normal when the difference is within the predetermined temperature difference range, and the thermal image temperature difference is abnormal when the difference is not within the predetermined temperature difference range. The processor 620 may treat the thermal image temperature difference as normal or the thermal image temperature difference as abnormal as an index result of the thermal image temperature difference.
It should be noted that the types of the corresponding index results of the different modal images may be the same or different. For example, the index result of the face integrity corresponding to the RGB face image is 60%, and the index result of whether the depth reference position data is missing.
In some embodiments, the processor 620 may not analyze and calculate the multi-modal image of the target object 100, but directly output the index results of each evaluation index using the trained structural awareness model. Specifically, the processor 620 may input the multiple mode images into the trained structural perception model, and output the index results of the evaluation index corresponding to each mode image. In some embodiments, the structural awareness model may include a base feature encoder and a multi-component awareness module. The processor 620 may input a plurality of mode images into the basic feature encoder, output a plurality of mode feature images corresponding to the plurality of mode images, further input the plurality of mode feature images into the multi-component sensing module, and output an index result of the evaluation index corresponding to each mode image.
The structural perception model can be used as a training sample by training images of multiple modes during training, and one training sample can comprise one training image under each mode. The training samples can be manually marked to mark the true values of the corresponding evaluation indexes of the training samples. For example, at least one of a true value of image integrity, a true value of image quality, and a true value of image exposure is annotated on the RGB training image. At least one of a true value of image integrity, a true value of image quality, and a true value of image exposure is annotated on the near infrared training image. And labeling at least one of the occupancy ratio true value of the point cloud data and the true value of whether the depth reference position data is missing or not on the depth training image. And labeling the real value of the thermal image temperature difference in the thermal imaging training image. The processor 620 may input each training sample into the structural awareness model for training, and output a predicted value of the corresponding evaluation index for each training sample.
The loss function of the structural perception model during training may be a multi-task loss, and the multi-task loss may include a loss corresponding to each evaluation index. The multitasking loss can restrict the difference between the predicted value of each evaluation index and the true value of the corresponding evaluation index to be smaller than a first preset difference threshold in the training process. The loss function of the multitasking loss may be a cross entropy loss function, a center loss function, or the like. Through the multi-task loss training structure perception model, the result prediction of the structure perception model can be gradually close to a true value, so that the prediction accuracy of the structure perception model on the evaluation index is improved.
The predicted value output by the structural perception model may be a specific value of an evaluation index, for example, the integrity of a face is 60%, the quality of the face is 70 minutes, the exposure value of the face is 128, the duty ratio of point cloud data is 80%, the missing value of depth reference position data is 0, the temperature difference of a thermal image is 5 degrees, and the like. The predicted value output by the structural perception model can also be a classification result obtained based on the specific value, such as complete face, good face quality, abnormal face exposure, excessively low point cloud data duty ratio, nose tip point missing, abnormal thermal image temperature difference and the like.
The method S140 may further include:
s143: and taking the mode image of which the index result does not meet the preset condition as a shielding mode image, wherein at least one index result of the shielding mode image does not meet the preset condition.
As described above, the index result may be a specific value of the evaluation index, or may be a classification result obtained based on the specific value. Correspondingly, the preset condition may be a condition about a specific numerical value, or a condition about the classification result. For example, the preset condition may include at least one of: the image integrity is greater than a preset integrity threshold, the image quality is greater than a preset quality threshold, the image exposure is within a preset exposure range, the ratio of the point cloud data is higher than a preset proportion, the depth reference position data is not missing, and the thermal image temperature difference is within a preset temperature difference range. As another example, the preset condition may include at least one of: the method has the advantages of complete image, poor image quality, abnormal image exposure, low point cloud data duty ratio, no missing depth reference position data and abnormal thermal image temperature difference.
The processor 620 may compare the index result of the evaluation index corresponding to each mode image with the preset condition, and determine the mode image corresponding to the index result as the shielding mode image when the index result does not meet the preset condition. Each modality image may correspond to one or more evaluation indicators. If a certain modality image corresponds to a plurality of evaluation indexes, in some embodiments, the processor 620 may determine the modality image as a mask modality image when an index result of any one of the plurality of evaluation indexes does not satisfy a preset condition. In some embodiments, the processor 620 may determine the modality image as a mask modality image when the index result of each of the plurality of evaluation indexes does not satisfy the preset condition. And the other mode images except the shielding mode image in the multiple mode images are non-shielding mode images.
There may be a mapping relationship between the index results of the evaluation index and the modality image. For example, when the image is incomplete, the visible light image and the near infrared image are shielding modality images. When the image quality is poor, the visible light image and the near infrared image are shielding mode images. When the image exposure is abnormal, the visible light image and the near infrared image are shielding mode images. When the duty ratio of the point cloud data is low, the depth image is a mask mode image. When the depth reference position data is missing, the depth image is a mask modality image. When the thermal image temperature difference is absent, the thermal imaging image is a shielding mode image.
In some embodiments, the processor 620 may not determine the index results of the evaluation index of the plurality of modality images, but rather determine the mask modality image based on factors such as whether the modality image is provided with an abnormality marker and/or the time of live identification. For example, due to the fact that the infrared rays emitted by the infrared emitting device of the near-infrared camera are weak due to the fact that the working time is long, the qualified near-infrared images cannot be collected, and under the condition that replacement cost is considered, workers can make abnormal marks on the near-infrared camera in advance, so that the near-infrared images shot by the near-infrared camera are provided with the abnormal marks. The processor 620 then determines that the near infrared image belongs to a shielding modality image when the anomaly signature is detected. For another example, if the processor 620 detects that the current time for living body recognition is in the night period, the probability of abnormality in the visible light image captured by the visible light camera is high, and thus the visible light image may be automatically classified as a mask mode image.
It should be noted that, three situations may exist in determining that there are abnormal shielding mode images from the multiple mode images. Firstly, shielding mode images are not existed in the multiple mode images, namely, the shielding mode images are all unshielded mode images; second, there are both shielded and unshielded modality images in the multiple modality images; third, the multiple modality images are all mask modality images.
According to the living body identification method and system, the acquired structure of the modal image is analyzed to obtain the index result of the evaluation index corresponding to each modal image, and then the shielding modal image is determined. That is, the requirements for images in each mode are considered, so that the shielded mode images and the unshielded mode images are distinguished in the multi-mode images, so that the processor 620 can determine which mode images in the multi-mode images are suitable for performing living body identification on the current target object 100, and the accuracy of living body identification is improved.
As shown in fig. 3, the method P100 may further include:
s160: and determining a living body identification result of the target object based on the unshielded modal image.
Wherein the unshielded modality image includes other modality images of the plurality of modality images than the shielded modality image.
In some embodiments, if there are both a shielding mode image and an unshielded mode image in the multiple mode images, the processor 620 may perform weighted fusion on the unshielded mode image and the shielding mode image, and perform living body recognition on the fused mode image obtained by the weighted fusion, to obtain a living body recognition result, where the living body recognition result may be a living body or an attack. The weight of the non-shielding mode image is larger than that of the shielding mode image, and the sum of the weight of the non-shielding mode image and that of the shielding mode image is 1. In some embodiments, the processor 620 may set a first preset weight for the non-shielding mode image, set a second preset weight for the shielding mode image, and fuse the non-shielding mode image given the first preset weight with the shielding mode image given the second preset weight to obtain a fused mode image, and further obtain a living body recognition result by using the fused mode image. Wherein, the shielding mode image has abnormality, the corresponding weight is lower, such as 0.2, 0.1, 0.05, 0.01, etc., the non-shielding mode image has no abnormality, the corresponding weight is higher, such as 0.8, 0.9, 0.95, 0.99, etc.
In some embodiments, the processor 620 may input the unshielded and shielded modality images into a trained structural consistency model for weighted fusion and perform in vivo identification of the fused modality images through the trained structural consistency model. The structural consistency model takes the approach of a first fusion weight of a shielding mode image to 0 and the approach of a second fusion weight of a non-shielding mode image to 1 as a training target in the training process. Specifically, the structural consistency model may be included in a basic feature extraction module, a shielding modality perception module, and a living body identification module. The processor 620 may input the non-shielding mode image and the shielding mode image to the basic feature extraction module for feature extraction, output a non-shielding feature map of the non-shielding mode image and a shielding feature map of the shielding mode image, input the non-shielding feature map and the shielding feature map to the shielding mode sensing module, and output a first fusion weight and a second fusion weight. The processor 620 may perform weighted fusion on the mask feature map and the non-mask feature map based on the first fusion weight and the second fusion weight to obtain a fusion feature map. The weighted fusion is for example to multiply the first fusion weight with the mask feature map and to multiply the second fusion weight with the non-mask feature map and to add the results of the two multiplications. Further, the processor 620 may input the fusion profile into a living body recognition module for living body recognition. The living body identification module can obtain attack probability P, and if P is greater than a threshold value T, an identification result of the attack is output; if P is smaller than the threshold T, the identification result of the living body can be output; if P is equal to the threshold T, a result of the identification of the living organism or attack may be output, or the processor 620 may instruct the client 200 to re-acquire the multi-modality image of the target user 100, using the re-acquired image for the identification of the living organism.
The basic feature extraction module and the basic feature encoder of the structural perception model can be the same or different. In some embodiments, the structural consistency model may not include the basic feature extraction module, but rather obtain a feature map of each modality of its output from a basic feature encoder of the structural awareness model, thereby reducing the cost of living body identification.
The structural consistency model can be used as a training sample by training images of multiple modes during training, and one training sample can comprise one training image under each mode. The training samples can be marked manually, each training sample can be marked with a real value (such as a living body or an attack) of living body identification, and the shielding mode training images can be marked with shielding marks so as to distinguish the shielding mode training images from the non-shielding mode training images. The shielding indicia may be a type of modality that is shielded, such as RGB, NIR, depth, thermal imaging, etc. The processor 620 may input each training sample into the structural consistency model for training and output a predicted value for each training sample for in vivo identification. The shielding mark and the feature map of the shielding mode training image can be input into the shielding mode sensing module for training.
The loss function of the structural consistency model when trained may include in vivo identification loss and structural consistency loss. The living body recognition loss can restrict the difference between the predicted value of the living body recognition of each training sample and the corresponding real value to be smaller than a second preset difference threshold value in the training process. The structural consistency loss can restrict the first fusion weight of the masking mode training image to approach 0, and the second fusion weight of the non-masking mode training image to approach 1. Through the living body identification loss and the structure consistency loss training structure consistency model, the result prediction of the structure consistency model gradually approaches to a true value, so that the prediction accuracy of the structure consistency model for living body identification is improved.
Although there may be anomalies in the shielded modality images, there may still be useful information, so the processor 620 may perform in-vivo identification in conjunction with the shielded modality images and the unshielded modality images, enabling in-vivo identification to utilize more information, thereby improving in-vivo identification performance. But to avoid that the shielding modality images too much affect the performance of the overall living body identification, they are given less weight. In this way, the living body identification not only utilizes the information of the shielding mode image, but also avoids the situation that the abnormal situation affects the global.
In some embodiments, if both a shielded and an unshielded modality image are present in the plurality of modality images, the processor 620 may not weight fuse the two, but rather use the unshielded modality image for live identification, thereby reducing the cost of live identification. For example, the structural consistency model may include a basic feature extraction module for extracting features from the unshielded modality image and a living body identification module for performing living body identification on the unshielded modality image.
It should be noted that if the plurality of modality images are all unshielded modality images or are all shielded modality images, the processor 620 may use the unshielded modality images or the shielded modality images for in vivo identification. When performing the living body recognition using the mask modality image, the accuracy of the living body recognition may be low, and at this time, the processor 620 may instruct the client 200 to re-acquire the multi-modality image of the target user 100, and perform the living body recognition using the re-acquired image. Alternatively, the processor 620 may further perform two-stage living body recognition using a model with higher recognition accuracy based on the current living body recognition result, which is not limited in the embodiment of the present specification.
When performing in vivo identification, different target objects 100 may differ in the corresponding combination of shielding modality images, including shielding modality images and/or non-shielding modality images. For example, in the combination of the masking mode images corresponding to the user a, the RGB image and the NIR image are masking mode images, while the depth image and the thermal imaging image are non-masking mode images. In the shielding mode image combination corresponding to the B user, the depth image and the thermal imaging image are shielding mode images, and the RGB image and the NIR image are non-shielding mode images. For different shielding mode image combinations, the embodiment of the specification can utilize a unified structure consistency model to carry out living body identification under multiple modes, so that the aim that one model is compatible with multiple shielding mode image combinations is fulfilled, different models are not required to be trained for different shielding mode image combinations, and the living body identification cost is reduced.
In summary, the method and system for living body recognition provided in the present disclosure consider the requirement on images in each mode, and distinguish between the shielded mode images and the unshielded mode images in the multiple mode images, so that the processor 620 can determine which mode images are not suitable for living body recognition on the current target object 100, and shield the unsuitable (unshielded mode images not meeting the living body recognition requirement), so as to perform living body recognition based on the unshielded mode images meeting the living body recognition requirement, and improve the accuracy of living body recognition. The method and the system for living body identification provided by the specification can lead to information loss under a certain mode when the mode image does not reach the standard, and can lead to the reduction of the overall living body identification performance if the mode image is directly introduced into living body identification without being processed. Therefore, the performance of living body identification is improved by performing abnormal perception on the modal image and performing the treatment of reducing the participation proportion (weight) on the shielding modal image which does not reach the standard. The living body identification method and system provided by the specification are compatible with various shielding mode image combinations by using the unified model, so that the living body identification cost is reduced.
Another aspect of the present disclosure provides a non-transitory storage medium storing at least one set of executable instructions for performing in-vivo identification. When executed by a processor, the executable instructions direct the processor to perform the steps of the method P100 of in-vivo identification as described herein. In some possible implementations, aspects of the specification can also be implemented in the form of a program product including program code. The program code is for causing the system 001 for living body identification to perform the steps of the method P100 for living body identification described in the present specification, when the program product is run on the system 001 for living body identification. The program product for implementing the above method may employ a portable compact disc read only memory (CD-ROM) comprising program code and may be run on the system 001 for in vivo identification. However, the program product of the present specification is not limited thereto, and in the present specification, the readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system. The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations of the present specification may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the system for live identification 001, partly on the system for live identification 001, as a stand-alone software package, partly on the system for live identification 001 partly on a remote computing device, or entirely on the remote computing device.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In view of the foregoing, it will be evident to a person skilled in the art that the foregoing detailed disclosure may be presented by way of example only and may not be limiting. Although not explicitly described herein, those skilled in the art will appreciate that the present description is intended to encompass various adaptations, improvements, and modifications of the embodiments. Such alterations, improvements, and modifications are intended to be proposed by this specification, and are intended to be within the spirit and scope of the exemplary embodiments of this specification.
Furthermore, certain terms in the present description have been used to describe embodiments of the present description. For example, "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present description. Thus, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined as suitable in one or more embodiments of the invention.
It should be appreciated that in the foregoing description of embodiments of the present specification, various features have been combined in a single embodiment, the accompanying drawings, or description thereof for the purpose of simplifying the specification in order to assist in understanding one feature. However, this is not to say that a combination of these features is necessary, and it is entirely possible for a person skilled in the art to label some of the devices as separate embodiments to understand them upon reading this description. That is, embodiments in this specification may also be understood as an integration of multiple secondary embodiments. While each secondary embodiment is satisfied by less than all of the features of a single foregoing disclosed embodiment.
Each patent, patent application, publication of patent application, and other material, such as articles, books, specifications, publications, documents, articles, and the like, in addition to any historical prosecution documents associated therewith, any identical or conflicting material to the present document or any identical historical prosecution document which may have a limiting effect on the broadest scope of the claims, is incorporated herein by reference for all purposes now or later associated with the present document. Furthermore, the terms in this document are used in the event of any inconsistency or conflict between the description, definition, and/or use of terms associated with any of the incorporated materials.
Finally, it is to be understood that the embodiments of the application disclosed herein are illustrative of the principles of the embodiments of the present specification. Other modified embodiments are also within the scope of this specification. Accordingly, the embodiments disclosed herein are by way of example only and not limitation. Those skilled in the art can adopt alternative arrangements to implement the application in the specification based on the embodiments in the specification. Therefore, the embodiments of the present specification are not limited to the embodiments precisely described in the application.

Claims (12)

1. A method of living body identification, comprising:
acquiring multi-mode images of a target object;
determining a shielding mode image with abnormality from the plurality of mode images; and
a living body recognition result of the target object is determined based on an unshielded modality image including other modality images of the plurality of modality images than the shielded modality image.
2. The method of claim 1, wherein the determining a mask modality image from the plurality of modality images for which an anomaly is present comprises:
determining an index result of an evaluation index corresponding to each of the plurality of modal images; and
And taking the mode image of which the index result does not meet the preset condition as the shielding mode image, wherein at least one index result of the shielding mode image does not meet the preset condition.
3. The method of claim 2, wherein the plurality of modality images includes at least two of a visible light image, a near infrared image, a depth image, and a thermal imaging image.
4. The method of claim 3, wherein the visual light image corresponds to an evaluation index comprising at least one of image integrity, image quality, image exposure,
the evaluation index corresponding to the near infrared image comprises at least one of image integrity, image quality and image exposure,
the evaluation index corresponding to the depth image comprises at least one of the duty ratio of point cloud data and whether depth reference position data is missing, and
the evaluation index corresponding to the thermal imaging image comprises a thermal image temperature difference.
5. The method of claim 4, wherein the image integrity comprises an integrity of a target portion of the visible light image or the near infrared image containing the target object,
the image quality includes at least one of image sharpness, image angle, wherein the image sharpness is determined based on gray values of pixels of the visible light image or the near infrared image,
The image exposure includes a brightness of a corresponding first object region of the target object in the visible light image or the near infrared image,
the duty ratio of the point cloud data comprises a ratio of the number of point cloud data in the depth image to the number of preset point cloud data,
whether the depth reference position data is missing is whether the depth image contains the data of the depth reference position, and
the thermal image temperature difference is a difference between a temperature of a second object region in the thermal imaging image and a temperature of a background region, the second object region is a region corresponding to the target object in the thermal imaging image, and the background region is a region except the second object region in the thermal imaging image.
6. The method of claim 2, wherein the determining the index result of the evaluation index corresponding to each of the plurality of modality images comprises:
and inputting the multiple modal images into a structural perception model, and outputting an index result of an evaluation index corresponding to each modal image.
7. The method of claim 6, wherein inputting the plurality of modal images into a structural awareness model, outputting an evaluation index corresponding to each modal image comprises:
Inputting the multi-modal images into a basic feature encoder of the structural perception model, and outputting multi-modal feature images corresponding to the multi-modal images; and
inputting the multi-mode feature images into a multi-element sensing module of the structural sensing model, and outputting index results of the evaluation indexes corresponding to each mode image.
8. The method of claim 2, wherein the preset conditions include at least one of: the image integrity is greater than a preset integrity threshold, the image quality is greater than a preset quality threshold, the image exposure is within a preset exposure range, the point cloud data occupancy ratio is higher than a preset proportion, the depth reference position data is not missing, and the thermal image temperature difference is within a preset temperature difference range.
9. The method of claim 1, wherein the determining the in-vivo identification result of the target object based on the unshielded modality image comprises:
and carrying out weighted fusion on the non-shielding mode image and the shielding mode image, and carrying out living body recognition on the fused mode image obtained by the weighted fusion to obtain the living body recognition result, wherein the weight of the non-shielding mode image is greater than that of the shielding mode image.
10. The method of claim 9, wherein the weighted fusion of the non-mask modality image and the in-vivo identification of the weighted fused modality image comprises:
and inputting the unshielded modal image and the shielded modal image into a structural consistency model for carrying out the weighted fusion, and carrying out living body identification on the fused modal image through the structural consistency model, wherein the structural consistency model takes the first fusion weight of the shielded modal image as 0 and the second fusion weight of the unshielded modal image as 1 as a training target in the training process.
11. The method of claim 10, wherein the inputting the unshielded modality image and the shielded modality image into a structural consistency model for the weighted fusion and the in vivo identification of the fused modality image by the structural consistency model comprises:
inputting the unshielded modal image and the shielded modal image to a basic feature extraction module of the structural consistency model for feature extraction, and outputting an unshielded feature image of the unshielded modal image and a shielded feature image of the shielded modal image;
Inputting the non-shielding feature map and the shielding feature map into a shielding mode sensing module of the structure consistency model, and outputting the first fusion weight and the second fusion weight;
the weighted fusion is carried out on the shielding characteristic diagram and the non-shielding characteristic diagram based on the first fusion weight and the second fusion weight, so that a fusion characteristic diagram is obtained; and
and inputting the fusion characteristic diagram into a living body identification module of the structure consistency model to carry out living body identification.
12. A system for in-vivo identification, comprising:
at least one storage medium storing at least one set of instructions for performing in vivo identification; and
at least one processor communicatively coupled to the at least one storage medium,
wherein the at least one processor reads the at least one instruction set and implements the method of living body identification of any one of claims 1-11 when the living body identification system is running.
CN202310444958.9A 2023-04-20 2023-04-20 Living body identification method and system Pending CN116469179A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310444958.9A CN116469179A (en) 2023-04-20 2023-04-20 Living body identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310444958.9A CN116469179A (en) 2023-04-20 2023-04-20 Living body identification method and system

Publications (1)

Publication Number Publication Date
CN116469179A true CN116469179A (en) 2023-07-21

Family

ID=87173179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310444958.9A Pending CN116469179A (en) 2023-04-20 2023-04-20 Living body identification method and system

Country Status (1)

Country Link
CN (1) CN116469179A (en)

Similar Documents

Publication Publication Date Title
CN106446873B (en) Face detection method and device
CN107563304B (en) Terminal equipment unlocking method and device and terminal equipment
CN105933589B (en) A kind of image processing method and terminal
WO2019205742A1 (en) Image processing method, apparatus, computer-readable storage medium, and electronic device
CN111091063A (en) Living body detection method, device and system
CN113205057B (en) Face living body detection method, device, equipment and storage medium
CN107479801A (en) Displaying method of terminal, device and terminal based on user's expression
TW202121251A (en) Living body detection method, device and storage medium thereof
CN108764071A (en) It is a kind of based on infrared and visible images real human face detection method and device
CN111368601B (en) Living body detection method and apparatus, electronic device, and computer-readable storage medium
EP3905104B1 (en) Living body detection method and device
WO2019196683A1 (en) Method and device for image processing, computer-readable storage medium, and electronic device
CN107483428A (en) Auth method, device and terminal device
US20170186170A1 (en) Facial contour recognition for identification
CN107491675A (en) information security processing method, device and terminal
CN111144169A (en) Face recognition method and device and electronic equipment
CN112052832A (en) Face detection method, device and computer storage medium
CN112712059A (en) Living body face recognition method based on infrared thermal image and RGB image
CN113313057A (en) Face living body detection and recognition system
CN111079470B (en) Method and device for detecting human face living body
CN106991376A (en) With reference to the side face verification method and device and electronic installation of depth information
CN113128428A (en) Depth map prediction-based in vivo detection method and related equipment
JP2021519983A (en) Image processing methods and devices, electronic devices and computer-readable storage media
CN116469179A (en) Living body identification method and system
CN108875472B (en) Image acquisition device and face identity verification method based on image acquisition device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination