CN116343350A - Living body detection method and device, storage medium and electronic equipment - Google Patents

Living body detection method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN116343350A
CN116343350A CN202310180698.9A CN202310180698A CN116343350A CN 116343350 A CN116343350 A CN 116343350A CN 202310180698 A CN202310180698 A CN 202310180698A CN 116343350 A CN116343350 A CN 116343350A
Authority
CN
China
Prior art keywords
mode
image
model
sample
modality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310180698.9A
Other languages
Chinese (zh)
Inventor
曹佳炯
丁菁汀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202310180698.9A priority Critical patent/CN116343350A/en
Publication of CN116343350A publication Critical patent/CN116343350A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/70Multimodal biometrics, e.g. combining information from different biometric modalities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The specification discloses a living body detection method, a living body detection device, a storage medium and electronic equipment, wherein the living body detection method comprises the following steps: determining a recommended image mode aiming at a target environment based on at least one type of first mode image of the target object in the acquired target environment, generating a second mode image corresponding to the recommended image mode based on the first mode image, and generating a multi-mode image combination based on the first mode image and the second mode image so as to perform in-vivo attack detection processing to obtain a target detection type aiming at the target object.

Description

Living body detection method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a living body detection method, a living body detection device, a storage medium, and an electronic device.
Background
In recent years, the biometric technology has been widely used in the production and life of people. For example, technologies such as face-brushing payment, face access control, face attendance and face-entering all need to rely on biological recognition, for example, image living body detection needs to be more and more raised in the biological recognition scenes such as face attendance, face-brushing entering and face-brushing payment, and image living body detection needs to verify whether a user is a true living body and operate, so that common attack means such as photos, face changes, masks, shielding and screen flipping can be effectively resisted, fraudulent behaviors can be conveniently screened, and user rights and interests are guaranteed.
Disclosure of Invention
The specification provides a living body detection method, a living body detection device, a storage medium and electronic equipment, wherein the technical scheme is as follows:
in a first aspect, the present specification provides a method of in vivo detection, the method comprising:
collecting at least one type of first mode image of a target object in a target environment, and determining a recommended image mode aiming at the target environment based on the at least one type of first mode image;
generating a second mode image corresponding to the recommended image mode based on the at least one type of first mode image;
generating a multi-mode image combination based on the first mode image and the second mode image, and performing living body attack detection processing based on the multi-mode image combination to obtain a target detection type aiming at the target object.
In a second aspect, the present specification provides a living body detection apparatus, the apparatus comprising:
the recommendation mode module is used for collecting at least one type of first mode image of a target object in a target environment, and determining a recommendation image mode aiming at the target environment based on the at least one type of first mode image;
the image generation module is used for generating a second mode image corresponding to the recommended image mode based on the at least one type of first mode image;
The detection processing module is used for generating a multi-mode image combination based on the first mode image and the second mode image, and performing living body attack detection processing based on the multi-mode image combination to obtain a target detection type aiming at the target object.
In a third aspect, the present description provides a computer storage medium storing at least one instruction adapted to be loaded by a processor and to perform the method steps of one or more embodiments of the present description.
In a fourth aspect, the present description provides a computer program product storing at least one instruction adapted to be loaded by a processor and to perform the method steps of one or more embodiments of the present description.
In a fifth aspect, the present description provides an electronic device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of one or more embodiments of the present description.
The technical scheme provided by some embodiments of the present specification has the following beneficial effects:
in one or more embodiments of the present disclosure, an electronic device determines a recommended image mode for a target environment based on at least one type of first mode image of a target object in the acquired target environment, generates a second mode image corresponding to a new recommended image mode based on the first mode image, and can generate a multi-mode image combination based on the first mode image and the second mode image, so that living body detection efficiency of image living body detection in a complex environment and a lower performance hardware environment is improved without relying on a high-cost multi-mode acquisition system; the multi-mode image combination has higher living body attack separability, can assist in subsequent accurate living body detection classification, achieves better living body attack detection effect, and improves the generalization capability of image living body detection; meanwhile, when a second mode image of a new mode is generated based on a first mode image of an original mode, a recommended image mode which is matched with the equipment environment is determined, so that better time consumption and performance balance can be achieved, and detection efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the present specification or the prior art, the following description will briefly introduce the drawings that are required to be used in the embodiments or the prior art descriptions, it is obvious that the drawings in the following description are only some embodiments of the present specification, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic view of a living body detection system provided in the present specification;
FIG. 2 is a schematic flow chart of a living body detection method provided in the present specification;
FIG. 3 is a model training schematic of an environmental modality recommendation model provided herein;
FIG. 4 is a model training schematic of a modal image generation model provided herein;
FIG. 5 is a schematic flow chart of in vivo attack detection based on a multi-modal in vivo detection model provided in the present specification;
FIG. 6 is a schematic diagram of a model training process for a multi-modal living body detection model provided herein;
fig. 7 is a schematic structural view of a living body detection apparatus provided in the present specification;
fig. 8 is a schematic structural view of an electronic device provided in the present specification;
FIG. 9 is a schematic diagram of the architecture of the operating system and user space provided herein;
FIG. 10 is an architecture diagram of the android operating system of FIG. 9;
FIG. 11 is an architecture diagram of the IOS operating system of FIG. 9.
Detailed Description
The following description of the embodiments of the present invention will be made apparent from, and elucidated with reference to, the drawings of the present specification, in which embodiments described are only some, but not all, embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
In the description of the present specification, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present specification, it should be noted that, unless expressly specified and limited otherwise, "comprise" and "have" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The specific meaning of the terms in this specification will be understood by those of ordinary skill in the art in the light of the specific circumstances. In addition, in the description of the present specification, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
In the related art, in the image living body detection scene such as image living body detection, interactive identification detection and the like, the accurate image living body detection is often realized by combining multi-mode image data, and the method adds more modes into a camera, such as an NIR mode and a 3D mode on the basis of an RGB mode, and even adds a thermal imaging mode. After a plurality of modes are added, the performance of the whole living body detection system can be obviously enhanced, and the prevention capability for various different types of attacks can be better. However, the defects of the method depend on high-cost multi-mode acquisition equipment (the more modes, the higher the cost and the stronger the safety capability), and cannot be applied to scenes with low cost requirements and low equipment requirements, and the more modes of the multi-mode imaging system, the higher the probability of failure of one of the modes causes poor mode stability, so that living body detection is affected. Based on this, the living body detection in the related art has a large limitation;
the present specification is described in detail below with reference to specific examples.
Please refer to fig. 1, which is a schematic diagram of a living body detection system provided in the present specification. As shown in fig. 1, the in-vivo detection system may include at least a client cluster and a service platform 100.
The client cluster may include at least one client, as shown in fig. 1, specifically including a client 1 corresponding to a user 1, a client 2 corresponding to a user 2, …, and a client n corresponding to a user n, where n is an integer greater than 0.
Each client in the client cluster may be a communication-enabled electronic device including, but not limited to: wearable devices, handheld devices, personal computers, tablet computers, vehicle-mounted devices, smart phones, computing devices, or other processing devices connected to a wireless modem, etc. Electronic devices in different networks may be called different names, for example: a user equipment, an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent or user equipment, a cellular telephone, a cordless telephone, a personal digital assistant (personal digital assistant, PDA), an electronic device in a 5G network or future evolution network, and the like.
The service platform 100 may be a separate server device, such as: rack-mounted, blade, tower-type or cabinet-type server equipment or hardware equipment with stronger computing capacity such as workstations, mainframe computers and the like is adopted; the server cluster may also be a server cluster formed by a plurality of servers, and each server in the server cluster may be formed in a symmetrical manner, wherein each server is functionally equivalent and functionally equivalent in a transaction link, and each server may independently provide services to the outside, and the independent provision of services may be understood as no assistance of another server is needed.
In one or more embodiments of the present disclosure, the service platform 100 may establish a communication connection with at least one client in the client cluster, and complete interaction of data in a living body detection process based on the communication connection, such as online transaction data interaction, for example, the service platform 100 may implement model deployment to the client based on a correlation model obtained by a living body detection method of the present disclosure, and the client may perform a living body attack detection process according to the living body detection method of the present disclosure; as another example, the service platform 100 may obtain, from the client, a first modality image of the target object in the target environment where the client is located, and execute the living body detection method related to the present specification by the service platform 100 to perform living body attack detection processing; .
It should be noted that, the service platform 100 establishes a communication connection with at least one client in the client cluster through a network for interactive communication, where the network may be a wireless network, or may be a wired network, where the wireless network includes, but is not limited to, a cellular network, a wireless local area network, an infrared network, or a bluetooth network, and the wired network includes, but is not limited to, an ethernet network, a universal serial bus (universal serial bus, USB), or a controller area network. In one or more embodiments of the specification, techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible Markup Language, XML), and the like are used to represent data exchanged over a network (e.g., target compression packages). All or some of the links may also be encrypted using conventional encryption techniques such as secure socket layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), and the like. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.
The embodiment of the living body detection system provided in the present specification and the living body detection method in one or more embodiments belong to the same concept, and an execution subject corresponding to the living body detection method related in one or more embodiments in the present specification may be an electronic device, and the electronic device may be the service platform 100 described above; the execution subject corresponding to the living body detection method in one or more embodiments of the specification may also be a client, and specifically determined based on an actual application environment. The implementation process of the embodiment of the living body detection system may be described in detail in the following method embodiments, which are not described herein.
Based on the schematic view of the scenario shown in fig. 1, the living body detection method provided in one or more embodiments of the present specification is described in detail below.
Referring to fig. 2, a flow diagram of a biopsy method that may be implemented in dependence on a computer program and that may be run on a von neumann system-based biopsy device is provided for one or more embodiments of the present description. The computer program may be integrated in the application or may run as a stand-alone tool class application. The living body detection device may be a service platform.
Specifically, the living body detection method comprises the following steps:
it will be appreciated that in some authentication scenarios where the detection of the real physiological characteristics of an object is determined, image biopsy requires that it be verified whether the image is a real living object operation based on the acquired target biopsy image in an image biopsy application. The image living body detection needs to effectively resist common living body attack means such as photos, face changing, masks, shielding, screen flipping and the like, so that the user is helped to discriminate fraudulent behaviors, and the rights and interests of the user are ensured;
s102: collecting at least one type of first mode image of a target object in a target environment, and determining a recommended image mode aiming at the target environment based on the at least one type of first mode image;
the first mode image can be understood as object image data corresponding to a certain mode type acquired for a target object (such as a user, an animal and the like) in an image living body detection scene;
when the at least one type of first mode images is a plurality of, the image mode types of the first mode images are different;
it can be appreciated that the several types of first modality images acquired for the target object may have different image modality types in practical applications, and the image modality types may be a fitting of one or more of image modality types such as a video modality type carrying object information, a color picture modality (rgb) type, a small video modality type, an animation modality type, a depth image (depth) modality type, an infrared image (ir) modality type, a near infrared modality (NIR) type, and the like.
Illustratively, the at least one type of first modality image may be a first modality image of the NIR type and a first modality image of the rgb type.
Illustratively, in an actual image application scene, one or more types of first-modality images of a target object to be identified or detected currently can be acquired through a camera such as an RGB camera, a monocular camera, an infrared camera, and the like based on a corresponding living body detection task.
The recommended image mode can be understood as a better new mode based on the environment estimation of the equipment determined by at least one type of first mode image, namely, intelligent mode estimation is performed on the environment based on the acquired first mode image so as to generate image data of the new mode by using the first mode image of the existing original mode. For example, a second modality image of the Depth modality, a second modality image of the thermal sensing modality may be generated by using the first modality images of the RGB modality and the NIR modality.
Illustratively, the recommended image modality is different from the image modality corresponding to the first modality image.
Illustratively, a recommended image modality that matches the environment in which the device is located may be estimated based on image features of at least one type of first modality image.
Illustratively, the recommended image modes matched with the environment where the device is located can be estimated based on the image parameters (such as resolution, image quality, brightness and the like) of at least one type of first-mode image, for example, the mode mapping relation between a plurality of reference image parameters and corresponding reference image modes is established, and the matched recommended image modes are led out in a mode mapping relation rope based on the image parameters.
S104: generating a second mode image corresponding to the recommended image mode based on the at least one type of first mode image;
it can be understood that determining the recommended image mode can save the time consumed for generating the uncertain multiple new mode images based on the original mode, because the new mode is generated based on the original mode of the first mode image, the more new mode types of the second mode image need to be generated, the more time is needed to be spent, in order to achieve better balance between time consumption and performance, the recommended image mode conforming to the environment of the equipment is generated through a plurality of first mode images, and the second mode image corresponding to the appointed recommended image mode can also be generated directly based on a plurality of types of first mode images through intelligent prediction.
In one or more embodiments of the present specification, a second modality image corresponding to a recommended image modality is output by training a machine learning model-based modality image generation model for new modality generation, taking the first modality image as input or taking the first modality image and the recommended image modality as input by the modality image generation model.
Optionally, a plurality of mode image generating models can be trained in advance for each new mode, each mode image generating model is used for designating generation of a mode image corresponding to the new mode, the mode image generating model corresponding to the recommended image mode is obtained, and the first mode image is used as a second mode image corresponding to the input and output recommended image mode;
optionally, only one mode image generating model can be trained for each new mode in advance, the mode image generating model is adopted to take the first mode image and the recommended image mode as input, and the second mode image corresponding to the recommended image mode is output.
S106: generating a multi-mode image combination based on the first mode image and the second mode image, and performing living body attack detection processing based on the multi-mode image combination to obtain a target detection type aiming at the target object.
Wherein the target detection type is one of a living body type and an attack type.
It can be understood that the multi-mode image combination is formed by combining the first mode image of one or more types of original modes acquired by the image acquisition device and the second mode image of a new mode, and the multi-mode image combination can fully utilize the imaging characteristics of corresponding self modes under different original modes and new image modes, can exert the living body attack separability characteristics brought by a plurality of different modes, has obviously improved indexes such as fine granularity, image quality and the like of the mode image characteristics, and is beneficial to living body attack classification and identification of image living body detection.
In a possible implementation manner, a multi-mode living body model can be constructed and trained in advance based on a machine learning model, and living body attack detection processing is performed on the basis of multi-mode image combination through the multi-mode living body model, so that a target detection type aiming at the target object is obtained;
optionally, the multi-modal living model can be a model which enables the multi-modal living model to be compatible with any multi-modal combination as input in a model training stage, namely, in the model training stage, a model training sample can be input by any multi-modal combination data as a training stage, and after actual application deployment, the influence of different multi-modal data types in a modal combination of the multi-modal image combination on the model processing capacity can be ignored;
Alternatively, the multi-modal living model may be a model supporting a specified multi-modal combination as an input, and the multi-modal living model supporting the specified multi-modal combination type may be trained for different multi-modal combinations, for example, each multi-modal combination type trains a multi-modal living model, and after deployment in practical application, the multi-modal living model only supports a corresponding multi-modal living model, based on which, it is necessary to acquire the multi-modal living model corresponding to the multi-modal image combination first, and then perform living attack detection processing by using the multi-modal living model.
It should be noted that the machine learning model according to one or more embodiments of the present disclosure includes, but is not limited to, fitting of one or more of a convolutional neural network (Convolutional Neural Network, CNN) model, a deep neural network (Deep Neural Network, DNN) model, a recurrent neural network (Recurrent Neural Networks, RNN), a model, an embedding (embedding) model, a gradient lifting decision tree (Gradient Boosting Decision Tree, GBDT) model, a logistic regression (Logistic Regression, LR) model, and the like.
In one or more embodiments of the present disclosure, an electronic device determines a recommended image mode for a target environment based on at least one type of first mode image of a target object in the acquired target environment, generates a second mode image corresponding to a new recommended image mode based on the first mode image, and generates a multi-mode image combination based on the first mode image and the second mode image to perform in-vivo attack detection processing to obtain a target detection type for the target object, where the multi-mode image combination has higher in-vivo attack separability, and makes full use of image characteristics between an original image mode and the recommended image mode to enable image in-vivo detection to have a good characteristic characterization effect, assist in subsequent accurate in-vivo detection classification, achieve a better in-vivo attack detection effect, and promote generalization capability of image in-vivo detection; meanwhile, when a second mode image of a new mode is generated based on a first mode image of an original mode, a recommended image mode which is matched with the equipment environment is determined, so that better time consumption and performance balance can be achieved, and detection efficiency is improved.
In a possible implementation, the electronic device performing the determining the recommended image modality for the target environment based on the at least one type of first modality image may be:
and carrying out environmental mode prediction processing by adopting an environmental mode recommendation model based on the at least one type of first mode image to obtain a recommended image mode aiming at the target environment.
Schematically, as shown in fig. 3, fig. 3 is a model training schematic diagram of an environmental modality recommendation model, specifically:
s2002: creating an initial environmental modality recommendation model;
in one or more embodiments of the present disclosure, an initial environmental modality recommendation model may be created based on a machine learning model in response to a living body detection task to perform model training to obtain an environmental modality recommendation model, and the environmental modality recommendation model is used to perform intelligent modality prediction of an environment in which a device is located;
illustratively, since a new modality is generated based on the original modality of the first modality image pair, the corresponding time is required to be consumed, and the more new modality types are generated, the more time is required to be consumed. In order to achieve better time-performance tradeoff, training an initial environmental mode recommendation model in the link to obtain a trained environmental mode recommendation model, and judging a new mode with a better equipment environment by the environmental mode recommendation model, so that time consumption for generating modes is reduced;
S2004: acquiring a plurality of groups of first sample mode images, wherein each group of first sample mode images consists of different types of sample mode images;
the first sample modality image may generally be a sample modality image acquired for a sample object in a sample environment based on the respective image acquisition part. For example, a first sample modality image of RGB type, a first sample modality image of NIR type, etc. may be acquired based on the raw acquisition system.
S2006: and inputting the first sample mode image into an initial environmental mode recommended model for training until the initial environmental mode recommended model finishes training, and obtaining a trained environmental mode recommended model.
The initial environmental mode recommendation model can be created based on a machine learning model, model training is carried out by acquiring a plurality of groups of first sample mode images, and after the initial environmental mode recommendation model meets the model finishing training conditions, the trained environmental mode recommendation model can be obtained.
In one or more embodiments herein, the model ending training condition may include, for example, a loss function having a value less than or equal to a preset loss function threshold, a number of iterations reaching a preset number of times threshold, and so on. The specific model end training conditions may be determined based on actual conditions and are not specifically limited herein.
Illustratively, the inputting the first sample mode image into the initial environmental mode recommendation model for training may be:
a2: inputting the first sample mode image into an initial environment mode recommendation model to respectively determine single mode image characteristics and single mode recommendation classification results through at least one single mode characteristic encoder of the initial environment mode recommendation model, and determining fusion mode characteristics and fusion mode recommendation classification results corresponding to all the single mode image characteristics through a fusion characteristic encoder of the initial environment mode recommendation model;
the fusion modality recommendation classification result can be understood as a recommendation image modality which is matched based on the environment where the equipment matched with the fusion modality characteristics is located;
the single-mode recommendation classification result can be understood as a recommendation image mode which is matched based on the environment where the equipment matched with the single-mode image features is located;
illustratively, an internal model structure of the initial environmental modality recommendation model may be constructed based on a machine learning model/network, for example, the initial environmental modality recommendation model may include at least two parts, the first part is a single-modality feature encoder, the second part is a fusion feature encoder, it should be noted that the number of single-modality feature encoders may be plural, different image modalities correspond to different single-modality feature encoders, and one image modality corresponds to plural single-modality feature encoders.
For example, taking an image mode corresponding to the first sample mode image as an RGB type image mode and an NIR type image mode as an example, the first portion may be composed of an RGB feature encoder and an NIR feature encoder.
Illustratively, the input of the RGB feature encoder is an RGB image in the first sample mode image, and the output of the RGB feature encoder is an RGB feature and RGB mode recommendation classification result; the input of the NIR feature encoder is an NIR image in the first sample modality image, and the output is an NIR feature and NIR modality recommended classification result; the RGB mode recommendation classification result and the NIR mode recommendation classification result are both single mode recommendation classification results. The input of the fusion feature encoder is RGB features and NIR features, and the output is fusion features and corresponding fusion mode recommendation classification results;
a4: and calculating a first model loss based on at least one single-mode recommended classification result and the fused mode recommended classification result, and performing model parameter adjustment on the initial environmental mode recommended model based on the first model loss.
Illustratively, the calculating the first model loss based on at least one of the single-mode recommended classification result and the fused-mode recommended classification result may be:
Determining a modality type classification penalty based on at least one of the single-modality recommendation classification result and the fusion modality recommendation classification result; determining a first classification prediction consistency loss based on at least one of the single-modality recommendation classification results and the fusion modality recommendation classification results; a first model loss is derived based on the modal type classification loss and the first classification predicted compliance loss.
It will be appreciated that the modal type classification penalty consists of at least two parts, as follows:
the first part is a model type classification loss determined based on a single-model recommendation type result and the fusion model recommendation classification result, and if the single-model recommendation type result is i (i is a positive integer) types and the fusion model recommendation classification result is j types, the estimated classification loss is (i+j) types, and the model type classification loss is the sum of the estimated classification losses of the (i+j) types. The method comprises the steps of calculating estimated classification Loss by adopting a related classification Loss function, obtaining a modal recommendation type label marked in a sample data stage, namely marking the sample data as a theoretical modal recommendation type, obtaining a single-modal recommendation type classification Loss (comprising an original modal recommendation type classification Loss and a new modal recommendation type classification Loss) based on a single-modal recommendation type result and the modal recommendation type label as input of the classification Loss function, obtaining a fusion modal recommendation type classification Loss based on a fusion modal recommendation type result and the modal recommendation type label as input of the classification Loss function, and taking the single-modal recommendation type classification Loss and the fusion modal recommendation type classification Loss as modal type classification Loss (Loss) cls
The relevant class Loss function may be, for example, "Cross Entropy Loss" cross entropy Loss function, "Hinge Loss function, etc.
The second part is prediction consistency loss, and the prediction consistency loss is used as a monitoring signal to monitor the ' original mode type classification result corresponding to the first mode image, the new mode type classification result corresponding to the second mode image and the fusion mode type classification result ' as consistent as possible ' and the ' type classification result of the original mode type feature corresponding to the first mode image and the fusion feature ' as consistent as possible, so as to calculate the first classification prediction consistency loss.
Illustratively, the first classification prediction consistency loss may be calculated using the following functional calculation:
Loss consistency =∑||Pi-Pt|| 2
wherein the Loss is consistency The consistency loss can be predicted for the first classification, pi is a single-mode recommended classification result of the ith single mode, i is a positive integer, and Pt is a fusion mode recommended classification result;
illustratively, the loss of consistency is predicted as a supervisory signal to cause the loss value to approach as close to the target value (e.g., 0) as possible during each round of model training.
Illustratively, the first model loss may be characterized using the following function:
Loss total =Loss cls +Loss consistency
Wherein the Loss is total For the first model Loss, the Loss cls Classifying losses for modality type, the Loss consistency A consistency loss is predicted for the first classification.
It can be understood that in the training process of each round of initial environmental mode recommendation model, the electronic device inputs the first sample mode image into the initial environmental mode recommendation model, so as to respectively determine the single mode image feature and the single mode recommendation classification result through a plurality of single mode feature encoders of the initial environmental mode recommendation model, determine the fusion mode feature and the fusion mode recommendation classification result corresponding to all the single mode image feature through the fusion feature encoder of the initial environmental mode recommendation model, output the mode recommendation classification result (at least one of the fusion mode recommendation classification result and the single mode recommendation classification result, namely the actual sample recommendation mode) and calculate the first model loss based on the mode, and back-propagate and adjust the model parameters of the initial environmental mode recommendation model in combination with the first model loss until the model is finished training condition is met, so that the trained environmental mode recommendation model can be obtained.
In one or more embodiments of the present description, it is considered that since a new modality is generated based on an original modality, corresponding time is required to be consumed, the more new modality types are generated, the more time is required to be consumed. In order to avoid generating possible new modes one by one and achieve better time-performance tradeoff, the link trains an environmental mode recommendation model which can be used for intelligent mode estimation of the environment where equipment is located, and judges a better new mode, namely a recommended image mode, so that time consumption for generating the mode is reduced;
Optionally, the electronic device executing the generating, based on the at least one type of first modality image, a second modality image corresponding to the recommended image modality may be:
b2: acquiring a mode image generation model corresponding to the recommended image mode;
b4: performing image generation processing by adopting the mode image generation model based on the at least one type of first mode image to obtain a second mode image corresponding to the recommended image mode;
schematically, a plurality of mode image generation models can be trained in advance for each new mode, each mode image generation model is used for designating generation of a mode image corresponding to the new mode, the mode image generation model corresponding to the recommended image mode is obtained, and a first mode image is used as input to output a second mode image corresponding to the recommended image mode through the mode image generation model; for example, a first modality image, an RGB modality image and an NIR modality image are taken as inputs, a Depth modality image generation model corresponding to a recommended image modality, a RGB modality image and an NIR modality image are taken as inputs of the Depth modality image generation model, and the Depth modality image generation model extracts modality image features of the RGB modality image and the NIR modality image to perform Depth modality reconstruction so as to output a second modality image of the Depth modality.
Schematically, as shown in fig. 4, fig. 4 is a model training schematic diagram of a model for generating a modal image, specifically:
s3002: determining at least one reference mode type, and creating an initial mode image generation model corresponding to the reference mode type;
it can be understood that, with an image mode that can be acquired by the electronic device through the image acquisition device as an original mode, for example, the first mode image can be an RGB original mode or an NIR original mode, which is relatively low in acquisition cost, the existing original mode of the image acquisition system is utilized to generate new mode data of the high-cost acquisition system, a part of new image modes can be preset, that is, one or more reference mode types are determined, and the acquisition cost of images of the reference mode types is generally high but the defensive capability and the identifying capability of the device for identifying high-precision living body attacks (such as a silica gel mask, a high-definition screen and the like) are better.
Illustratively, reference modality types include, but are not limited to, a Depth map modality, a thermal sensor map modality, and an rpg (non-contact heart rate measurement, such as a human face read heart rate) signal modality, among others, which image modalities can be understood to be distinguished from new modalities other than the original image modality that the electronic device can directly acquire. In one or more embodiments of the present disclosure, generating image modality data of a new modality by means of data of an original image modality is implemented through a related machine learning model, and the new modality data may enrich the original first modality data, so that a low-cost acquisition system may generate data of a high-cost acquisition system to assist in living attack detection.
Schematically, an initial mode image generation model is created in advance on the basis of a machine learning model for each reference mode type (new mode), and each reference mode type corresponds to one initial mode image generation model;
s3004: acquiring a plurality of groups of second sample mode images and reference mode label images corresponding to the second sample mode images, wherein each group of second sample mode images consists of different types of sample mode images, and the reference mode types corresponding to the reference mode label images are different from the mode types of the second sample mode images;
the second sample mode image is a sample mode image which is acquired by the initial mode image aiming at the sample object in the model training stage, and the second sample mode image is a mode image which is easy to acquire under the actual use scene of the electronic equipment.
Furthermore, in order to achieve a better model training effect, before model training, acquiring the modal data of various reference modal types of the same sample object corresponding to the second sample modal image by means of a high-cost acquisition system to serve as a reference modal label image corresponding to the second sample modal image, such as a Depth image, a thermal sensing image and an rPPG image;
Illustratively, the adoption of corresponding sets of second sample modality images and reference modality-type reference modality label images is performed on initial modality image generation models for different reference modality types.
S3006: inputting the second sample mode image into an initial mode image generation model to output a target sample mode image, so as to train based on the target sample mode image and the reference mode label image until the initial mode image generation model finishes training, and obtaining a trained mode image generation model.
The target sample mode image is a sample mode image of a reference mode type reconstructed by the initial mode image generation model based on the second sample mode image in a model training stage.
Illustratively, an internal model structure of an initial modality image generation model of a corresponding reference modality type may be constructed based on a machine learning model/network, for example, the initial modality image generation model may include at least two parts, the first part is a (single modality) feature map extraction module, the second part is a feature fusion modality generation module for single modality feature fusion and modality map generation of the reference modality type, it should be noted that the number of (single modality) feature map extraction modules may be plural, different image modalities correspond to different (single modality) feature map extraction modules, and one image modality corresponds to the (single modality) feature map extraction module may be plural.
Such as: taking an RGB mode and a second sample mode image of the mode as an example, the first part can be an RGB feature map extraction module and an NIR feature map extraction module, and the third part can be a feature fusion mode generation module;
illustratively, the inputting the second sample mode image into the initial mode image generating model outputs a target sample mode image to train based on the target sample mode image and the reference mode label image may be:
c2: inputting the second sample mode image into an initial mode image generation model to respectively determine single mode feature images through at least one feature image extraction module of the initial mode image generation model, and generating a target sample mode image through a feature fusion mode generation module of the initial mode image generation model based on all the single mode feature images;
illustratively, the processing object of each feature map extraction module is a second sample mode image of the corresponding image mode, and second sample mode images of different image modes can be processed by different feature map extraction modules to extract single mode feature maps of the corresponding modes;
for example, the processing object of the RGB feature map extracting module is an RGB image in the second sample mode image, and the processing result is an RGB feature map; the processing object of the NIR characteristic map extraction module is an NIR image in the second sample mode image, and the processing result is an NIR characteristic map;
Schematically, the processing object of the feature fusion mode generating module is all single mode feature graphs, such as the RGB feature graph and the NIR feature graph, and feature fusion is performed by the feature fusion mode generating module to generate a target sample mode image of the reference mode type, where the target sample mode image is a new mode different from the corresponding mode of the second sample mode image, for example, an image of a Depth mode different from the RGB mode and the NIR mode.
And C4: and calculating a second model loss based on the target sample mode image and the reference mode label image, and performing model parameter adjustment on the initial mode image generation model based on the second model loss.
Illustratively, the calculating a second model loss based on the target sample modality image and the reference modality label image may be:
d2: calculating overall modal reconstruction loss by adopting a first function based on the target sample modal image and the reference modal tag image;
the overall modal reconstruction loss is used as a reconstruction monitoring signal monitoring model training effect in a new modal image reconstruction process, and is calculated by a target sample modal image and a reference modal label image.
The first function satisfies the following formula:
Figure BDA0004109445040000111
wherein the Loss is global Reconstructing the loss for the overall modality, the img pred For the target sample modality image, the img ori And (5) the reference mode label image.
D4: performing random image region clipping processing on the target sample mode image and the reference mode label image to obtain a first local sample mode image and a second local label mode image, and calculating local mode reconstruction loss by adopting a second function based on the target sample mode image and the reference mode label image;
the local modal reconstruction loss is a second part of the second model loss, is a random local modal reconstruction loss, is calculated based on a first local sample modal image and a second local label modal image generated by calculation after selection or interception by randomly selecting or intercepting a plurality of areas on a corresponding reference modal label image and a target sample modal image;
the first local sample mode image is an image obtained by performing random image region clipping processing on the target sample mode image;
the second local tag mode image is an image obtained by performing random image region clipping processing on the reference mode tag image;
The second function satisfies the following formula:
Figure BDA0004109445040000121
wherein the Loss is local Reconstructing a loss for the local mode, the randCrop () representing a random image region clipping process, the randCrop (img pred ) For a first local sample mode image obtained by performing random image region clipping processing on a target sample mode image, the random image region clipping method comprises the following steps of ori ) And performing random image region clipping processing on the reference modal label image to obtain a second local label modal image.
D6: obtaining a second model loss based on the global modal reconstruction loss and the local modal reconstruction loss;
the second model loss may be characterized by the following calculation:
Loss total =Loss global +Loss local
the Loss is total Loss for the second model;
it can be understood that, for each new mode, that is, the reference mode type, training a mode image generation model corresponding to the reference mode type, and performing network training on each mode image generation model based on the model structure and the second model loss to adjust model parameters until the model finishing training condition is satisfied, so as to obtain a trained mode image generation model.
In one or more embodiments of the present disclosure, a model is generated by training a mode image corresponding to a plurality of reference mode types, and a new mode, that is, an image of the reference mode type, is generated by using an existing original mode of the original acquisition system. For example, the raw acquisition system may acquire RGB, NIR images, generate new modalities based on RGB, NIR (high sensitivity near infrared) images, depth maps, thermal sensor maps, and rpg (non-contact heart rate measurement, such as human face read heart rate) signals, etc. The new modal data can enrich the original modal data, so that a low-cost acquisition system can produce data of a high-cost acquisition system and combine the data into multi-modal image data for subsequent living body attack detection.
Optionally, as shown in fig. 5, fig. 5 is a schematic flow chart of living body attack detection based on a multi-mode living body detection model, specifically, as shown in fig. 5, the electronic device executes the living body attack detection processing based on the multi-mode image combination to obtain a target detection type for the target object, which may be:
s4002: determining a target modal combination type corresponding to the multi-modal image combination, and acquiring a multi-modal living body detection model corresponding to the target modal combination type;
the target mode combination type is formed by combining image modes corresponding to each mode image in the multi-mode image combination.
Schematically, the multi-mode living models supporting the specified multi-mode combination types can be trained respectively for different multi-mode image combinations, for example, each multi-mode combination type is used for training one multi-mode living model, after the multi-mode living model is deployed in practical application, the multi-mode living model only supports the corresponding multi-mode living model, based on the multi-mode living model, the multi-mode living model corresponding to the multi-mode image combination needs to be acquired firstly, and then living attack detection processing is carried out by adopting the multi-mode living model.
S4004: inputting the multi-mode image combination into a multi-mode living body detection model to perform living body attack detection processing and output a target detection result;
The target detection result is usually a living body attack probability P, and the target detection type is judged: for a threshold value T set in advance, if the p living body attack probability is greater than the threshold value T, judging the attack type, otherwise judging the living body type;
s4006: and determining a target detection type for the target object based on the target detection result.
Wherein the target detection type is one of a living body type and an attack type.
In one possible embodiment, as shown in fig. 6, fig. 6 is a schematic diagram of a model training process of a multi-modal living body detection model;
s5002: determining at least one modal combination type, and creating an initial multi-modal living body detection model corresponding to the modal image combination;
the mode combination type can be, for example, an rgb+nir+depth model, an rgb+nir+thermal imaging model, an rgb+nir+rpg model, and the like, and an initial multi-mode living body detection model is created based on a machine learning model for a mode combination consisting of all generated new modes and acquired original modes, and model training is performed on the initial multi-mode living body detection model.
Illustratively, an initial multi-modal living detection model is created in advance for each modal combination type based on the machine learning model.
S5004: acquiring a plurality of groups of third sample multi-mode image combinations and sample detection label results, wherein the third sample multi-mode image combinations consist of second sample mode images and target sample mode images;
illustratively, a plurality of groups of third sample multi-mode images are combined into training sample data of the initial multi-mode living body detection model in a model training stage;
the images in the third sample multi-modal image combination are at least associated with the input and output of the modal image generation model, and the third sample multi-modal image combination can be composed of an input-second sample modal image of the modal image generation model training stage and an output-target sample modal image of the modal image generation model training stage;
the sample detection label result can be understood as a detection label result of labeling the third sample multi-mode image combination in advance, namely labeling the living body attack probability or the living body attack category (living body category or attack category) of the third sample multi-mode image combination.
S5006: and inputting the third sample multi-mode image combination into an initial multi-mode living body detection model to output a sample detection actual result, and training the initial multi-mode living body detection model based on the sample detection label result until the initial multi-mode living body detection model finishes training, so as to obtain a trained multi-mode living body detection model.
Illustratively, the inputting the third sample multi-mode image combination into the initial multi-mode living body detection model to output a sample detection actual result, and training the initial multi-mode living body detection model based on the sample detection label result may be:
e2: inputting the third sample multi-mode image combination into an initial multi-mode living body detection model to determine an original image mode characteristic and an original mode detection result of the second sample mode image through an original mode characteristic coding module of the initial multi-mode living body detection model, determine a new mode characteristic and a new mode detection result of a target sample mode image through a new mode characteristic coding module of the initial multi-mode living body detection model, and determine a fusion mode characteristic and a fusion mode detection result which are corresponding to the original mode characteristic and the new mode characteristic together through a mode fusion module of the initial multi-mode living body detection model;
illustratively, the internal model structure of the initial multi-modal biopsy model of the respective reference modality type may be constructed based on the machine learning model/network, e.g. the initial multi-modal biopsy model may comprise at least three parts, as follows:
The first part is a primary mode feature encoding module, the processing object is a second sample mode image, the primary mode feature encoding module performs feature extraction encoding on the second sample mode image and performs living body attack detection to obtain primary image mode features and primary mode detection results, namely the processing result of the primary mode feature encoding module is the primary image mode features and primary mode detection results (for example, the primary mode feature encoding module can be a living body attack probability);
the second part is a new mode feature coding module, the processing object is a newly generated target sample mode image, the new mode feature coding module performs feature extraction coding on the target sample mode image and performs living attack detection to obtain a new image mode feature and a new mode detection result, namely the processing result of the new mode feature coding module is the new image mode feature and the new mode detection result (for example, the new mode feature coding module can be a living attack probability);
the third part is a mode fusion module, the processing object of the mode fusion module is an original image mode feature and a new image mode feature, the mode fusion module can perform feature fusion on the original image mode feature and the new image mode feature to obtain a fusion mode feature, and living body attack detection can be performed on the basis of the fusion mode feature to obtain a fusion mode detection result, namely, the processing result of the mode fusion module is a fusion image mode feature and a fusion mode detection result (for example, the processing result can be a living body attack probability);
Illustratively, the model training process of the initial multi-mode living body detection model in each round outputs a sample detection actual result (such as a living body attack probability) based on the third sample multi-mode image combination;
e4: and calculating a third model loss based on the original mode detection result, the new mode detection result and the fusion mode detection result, and performing model parameter adjustment on the initial multi-mode living body detection model based on the third model loss.
In a possible implementation, the calculating a third model loss based on the primary mode detection result, the new mode detection result, and the fusion mode detection result includes: and sample detection label result is used as input of classification loss function to obtain
F2, determining a modal living body classification loss based on the original modal detection result, the new modal detection result and the fusion modal detection result and the original modal detection classification loss, the new modal detection classification loss and the fusion modal detection classification loss of the sample detection tag result respectively;
F4, determining second classification prediction consistency loss based on the original mode detection result, the new mode detection result and the fusion mode detection result;
f6, obtaining a third modal loss based on the modal living body classification loss and the second classification prediction consistency loss.
Illustratively, the loss function of the third model loss is composed of at least two parts, as follows:
the first part is the Loss of classification Loss of the modal living body c1 (three living body detection classification losses of original mode, new mode and fusion characteristic in total), the estimated classification losses can be calculated by adopting a related classification loss function, the model classification label is marked as a living body type result or an attack type result by acquiring a sample detection label result marked in a sample data stage, and the original mode detection classification loss/new mode detection classification loss/fusion mode detection classification loss are obtained based on the original mode detection result/new mode detection result/fusion mode detection result and the sample detection label result respectively as inputs of the classification loss function;
the relevant class Loss function may be, for example, "Cross Entropy Loss" cross entropy Loss function, "Hinge Loss function, etc.
The second part is prediction consistency loss (the detection classification result of the original mode, the detection classification result of the new mode and the detection classification result of the fusion feature are consistent as much as possible), the prediction consistency loss is used as a supervision signal to supervise the original mode classification result corresponding to the first mode image, the new mode classification result corresponding to the second mode image and the fusion feature classification result as much as possible, and the classification result of the original mode feature corresponding to the first mode image and the fusion feature is consistent as much as possible, so that the prediction consistency loss is calculated.
Illustratively, the predicted consistency loss may be calculated using the following functional calculation:
Loss c2 =||Pa-Pt|| 2 +||Pb-Pt|| 2
wherein the Loss is c2 The consistency loss can be predicted for the second classification, wherein Pa is an original mode detection result, pb is a new mode detection result, pt is a fusion mode detection result, and Pt is a fusion mode recommended classification result;
illustratively, the consistency loss is predicted by the second classification as a supervisory signal to bring its loss value as close as possible to the target value (e.g., 0) during each round of model training.
The third model loss satisfies the following equation:
Loss c =Loss c1 +Loss c2
it can be appreciated that for each mode combination type (e.g., rgb+nir+depth model, rgb+nir+thermal imaging model, rgb+nir+rpg model, etc.), an initial multi-mode living body detection model corresponding to one mode combination type is trained, and network training is performed on each initial multi-mode living body detection model based on the model structure and the third model loss to adjust model parameters until the model end training condition is satisfied, and then a trained multi-mode living body detection model can be obtained.
In one or more embodiments of the present disclosure, a multi-modality image combination may be generated based on the first modality image and the second modality image, and a plurality of multi-modality biopsy models may be pre-trained to adapt to different new modality combination conditions, and deployment and corresponding biopsy decisions may be performed on the trained models. The multi-mode image combination is adopted to perform living body attack detection processing based on the corresponding multi-mode living body detection model, has higher living body attack separability, fully utilizes the image characteristics between the original image mode and the recommended image mode, enables the image living body detection to have good characteristic characterization effect, can assist the subsequent accurate living body detection classification of the multi-mode living body detection model, achieves better living body attack detection effect, and improves the generalization capability of the image living body detection;
the living body detection apparatus provided in the present specification will be described in detail with reference to fig. 7. The living body detection apparatus shown in fig. 7 is used to perform the method according to the embodiment shown in fig. 1 to 6 of the present specification, and for convenience of explanation, only the portion relevant to the present specification is shown, and specific technical details are not disclosed, and reference is made to the embodiment shown in fig. 1 to 6 of the present specification.
Referring to fig. 7, a schematic structural diagram of the living body detection apparatus of the present specification is shown. The living body detection apparatus 1 may be implemented as all or a part of the user terminal by software, hardware, or a combination of both. According to some embodiments, the living body detection apparatus 1 includes a recommended modality module 11, an image generation module 12, and a detection processing module 13, specifically for:
the recommended mode module 11 is used for collecting at least one type of first mode image of a target object in a target environment, and determining a recommended image mode aiming at the target environment based on the at least one type of first mode image;
an image generation module 12, configured to generate a second modality image corresponding to the recommended image modality based on the at least one type of first modality image;
a detection processing module 13 for generating a multi-modal image combination based on the first and second modal images, and performing in-vivo attack detection processing based on the multi-modal image combination to obtain a target detection type for the target object
Optionally, the recommendation modality module 11 is configured to:
and carrying out environmental mode prediction processing by adopting an environmental mode recommendation model based on the at least one type of first mode image to obtain a recommended image mode aiming at the target environment.
Optionally, the recommendation modality module 11 is configured to:
creating an initial environmental modality recommendation model;
acquiring a plurality of groups of first sample mode images, wherein each group of first sample mode images consists of different types of sample mode images;
and inputting the first sample mode image into an initial environmental mode recommended model for training until the initial environmental mode recommended model finishes training, and obtaining a trained environmental mode recommended model.
Optionally, the recommendation modality module 11 is configured to:
inputting the first sample mode image into an initial environment mode recommendation model to respectively determine single mode image characteristics and single mode recommendation classification results through at least one single mode characteristic encoder of the initial environment mode recommendation model, and determining fusion mode characteristics and fusion mode recommendation classification results corresponding to all the single mode image characteristics through a fusion characteristic encoder of the initial environment mode recommendation model;
and calculating a first model loss based on at least one single-mode recommended classification result and the fused mode recommended classification result, and performing model parameter adjustment on the initial environmental mode recommended model based on the first model loss.
Optionally, the recommendation modality module 11 is configured to:
determining a modality type classification penalty based on at least one of the single-modality recommendation classification result and the fusion modality recommendation classification result;
determining a first classification prediction consistency loss based on at least one of the single-modality recommendation classification results and the fusion modality recommendation classification results;
a first model loss is derived based on the modal type classification loss and the first classification predicted compliance loss.
Optionally, the image generating module 12 is configured to:
acquiring a mode image generation model corresponding to the recommended image mode;
and carrying out image generation processing by adopting the mode image generation model based on the at least one type of first mode image to obtain a second mode image corresponding to the recommended image mode.
Optionally, the image generating module 12 is configured to:
determining at least one reference mode type, and creating an initial mode image generation model corresponding to the reference mode type;
acquiring a plurality of groups of second sample mode images and reference mode label images corresponding to the second sample mode images, wherein each group of second sample mode images consists of different types of sample mode images, and the reference mode types corresponding to the reference mode label images are different from the mode types of the second sample mode images;
Inputting the second sample mode image into an initial mode image generation model to output a target sample mode image, so as to train based on the target sample mode image and the reference mode label image until the initial mode image generation model finishes training, and obtaining a trained mode image generation model.
Optionally, the image generating module 12 is configured to:
inputting the second sample mode image into an initial mode image generation model to respectively determine single mode feature images through at least one feature image extraction module of the initial mode image generation model, and generating a target sample mode image through a feature fusion mode generation module of the initial mode image generation model based on all the single mode feature images;
and calculating a second model loss based on the target sample mode image and the reference mode label image, and performing model parameter adjustment on the initial mode image generation model based on the second model loss.
Optionally, the image generating module 12 is configured to:
calculating overall modal reconstruction loss by adopting a first function based on the target sample modal image and the reference modal tag image;
Performing random image region clipping processing on the target sample mode image and the reference mode label image to obtain a first local sample mode image and a second local label mode image, and calculating local mode reconstruction loss by adopting a second function based on the target sample mode image and the reference mode label image;
obtaining a second model loss based on the global modal reconstruction loss and the local modal reconstruction loss;
the first function satisfies the following formula:
Figure BDA0004109445040000171
wherein the Loss is global Reconstructing the loss for the overall modality, the img pred For the target sample modality image, the img ori A label image for the reference modality;
the second function satisfies the following formula:
Figure BDA0004109445040000172
wherein the Loss is local Reconstructing a loss for the local mode, the randCrop () representing a random image region clipping process, the randCrop (img pred ) For a first local sample mode image obtained by performing random image region clipping processing on a target sample mode image, the random image region clipping method comprises the following steps of ori ) And cutting out the random image area of the reference modal label image to obtain a second local label modal image.
Optionally, the detection processing module 13 is configured to:
Determining a target modal combination type corresponding to the multi-modal image combination, and acquiring a multi-modal living body detection model corresponding to the target modal combination type;
inputting the multi-mode image combination into a multi-mode living body detection model to perform living body attack detection processing and output a target detection result;
and determining a target detection type for the target object based on the target detection result.
Optionally, the detection processing module 13 is configured to:
determining at least one modal combination type, and creating an initial multi-modal living body detection model corresponding to the modal image combination;
acquiring a plurality of groups of third sample multi-mode image combinations and sample detection label results, wherein the third sample multi-mode image combinations consist of second sample mode images and target sample mode images;
and inputting the third sample multi-mode image combination into an initial multi-mode living body detection model to output a sample detection actual result, and training the initial multi-mode living body detection model based on the sample detection label result until the initial multi-mode living body detection model finishes training, so as to obtain a trained multi-mode living body detection model.
Optionally, the detection processing module 13 is configured to:
inputting the third sample multi-mode image combination into an initial multi-mode living body detection model to determine an original image mode characteristic and an original mode detection result of the second sample mode image through an original mode characteristic coding module of the initial multi-mode living body detection model, determine a new mode characteristic and a new mode detection result of a target sample mode image through a new mode characteristic coding module of the initial multi-mode living body detection model, and determine a fusion mode characteristic and a fusion mode detection result which are corresponding to the original mode characteristic and the new mode characteristic together through a mode fusion module of the initial multi-mode living body detection model;
and calculating a third model loss based on the original mode detection result, the new mode detection result and the fusion mode detection result, and performing model parameter adjustment on the initial multi-mode living body detection model based on the third model loss.
Optionally, the detection processing module 13 is configured to:
determining a modal living body classification loss based on the primary modal detection result, the new modal detection result, and the fusion modal detection result, and the primary modal detection classification loss, the new modal detection classification loss, and the fusion modal detection classification loss of the sample detection tag result, respectively;
Determining a second classification prediction consistency loss based on the original mode detection result, the new mode detection result and the fusion mode detection result;
a third modal loss is derived based on the modal living body classification loss and the second classification predicted identity loss.
It should be noted that, when the living body detection apparatus provided in the foregoing embodiment performs the living body detection method, only the division of the foregoing functional modules is exemplified, and in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the living body detection device and the living body detection method provided in the foregoing embodiments belong to the same concept, which embody the implementation process in detail with reference to the method embodiment, and are not described herein again.
The foregoing description is provided for the purpose of illustration only and does not represent the advantages or disadvantages of the embodiments.
In this specification, the present invention relates to a portable electronic device.
The present disclosure further provides a computer storage medium, where a plurality of instructions may be stored, where the instructions are adapted to be loaded by a processor and executed by the living body detection method according to the embodiment shown in fig. 1 to 6, and the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to 6, which is not repeated herein.
The present disclosure further provides a computer program product, where at least one instruction is stored, where the at least one instruction is loaded by the processor and executed by the processor, where the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to 6, and details are not repeated herein.
Referring to fig. 8, a block diagram of an electronic device according to an exemplary embodiment of the present disclosure is shown. The electronic device in this specification may include one or more of the following: processor 110, memory 120, input device 130, output device 140, and bus 150. The processor 110, the memory 120, the input device 130, and the output device 140 may be connected by a bus 150.
Processor 110 may include one or more processing cores. The processor 110 utilizes various interfaces and lines to connect various portions of the overall electronic device, perform various functions of the electronic device 100, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and invoking data stored in the memory 120. Alternatively, the processor 110 may be implemented in at least one hardware form of digital signal processing (digital signal processing, DSP), field-programmable gate array (field-programmable gate array, FPGA), programmable logic array (programmable logic Array, PLA). The processor 110 may integrate one or a combination of several of a central processor (central processing unit, CPU), an image processor (graphics processing unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 110 and may be implemented solely by a single communication chip.
The memory 120 may include a random access memory (random Access Memory, RAM) or a read-only memory (ROM). Optionally, the memory 120 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 120 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, which may be an Android (Android) system, including an Android system-based deep development system, an IOS system developed by apple corporation, including an IOS system-based deep development system, or other systems, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the electronic device in use, such as phonebooks, audiovisual data, chat log data, and the like.
Referring to FIG. 9, the memory 120 may be divided into an operating system space in which the operating system is running and a user space in which native and third party applications are running. In order to ensure that different third party application programs can achieve better operation effects, the operating system allocates corresponding system resources for the different third party application programs. However, the requirements of different application scenarios in the same third party application program on system resources are different, for example, under the local resource loading scenario, the third party application program has higher requirement on the disk reading speed; in the animation rendering scene, the third party application program has higher requirements on the GPU performance. The operating system and the third party application program are mutually independent, and the operating system often cannot timely sense the current application scene of the third party application program, so that the operating system cannot perform targeted system resource adaptation according to the specific application scene of the third party application program.
In order to enable the operating system to distinguish specific application scenes of the third-party application program, data communication between the third-party application program and the operating system needs to be communicated, so that the operating system can acquire current scene information of the third-party application program at any time, and targeted system resource adaptation is performed based on the current scene.
Taking an operating system as an Android system as an example, as shown in fig. 10, a program and data stored in the memory 120 may be stored in the memory 120 with a Linux kernel layer 320, a system runtime library layer 340, an application framework layer 360 and an application layer 380, where the Linux kernel layer 320, the system runtime library layer 340 and the application framework layer 360 belong to an operating system space, and the application layer 380 belongs to a user space. The Linux kernel layer 320 provides the underlying drivers for various hardware of the electronic device, such as display drivers, audio drivers, camera drivers, bluetooth drivers, wi-Fi drivers, power management, and the like. The system runtime layer 340 provides the main feature support for the Android system through some C/c++ libraries. For example, the SQLite library provides support for databases, the OpenGL/ES library provides support for 3D graphics, the Webkit library provides support for browser kernels, and the like. Also provided in the system runtime library layer 340 is a An Zhuoyun runtime library (Android run) which provides mainly some core libraries that can allow developers to write Android applications using the Java language. The application framework layer 360 provides various APIs that may be used in building applications, which developers can also build their own applications by using, for example, campaign management, window management, view management, notification management, content provider, package management, call management, resource management, location management. At least one application program is running in the application layer 380, and these application programs may be native application programs of the operating system, such as a contact program, a short message program, a clock program, a camera application, etc.; and may also be a third party application developed by a third party developer, such as a game-like application, instant messaging program, photo beautification program, etc.
Taking an operating system as an IOS system as an example, the program and data stored in the memory 120 are shown in fig. 9, the IOS system includes: core operating system layer 420 (Core OS layer), core service layer 440 (Core Services layer), media layer 460 (Media layer), and touchable layer 480 (Cocoa Touch Layer). The core operating system layer 420 includes an operating system kernel, drivers, and underlying program frameworks that provide more hardware-like functionality for use by the program frameworks at the core services layer 440. The core services layer 440 provides system services and/or program frameworks required by the application, such as a Foundation (Foundation) framework, an account framework, an advertisement framework, a data storage framework, a network connection framework, a geographic location framework, a sports framework, and the like. The media layer 460 provides an interface for applications related to audiovisual aspects, such as a graphics-image related interface, an audio technology related interface, a video technology related interface, an audio video transmission technology wireless play (AirPlay) interface, and so forth. The touchable layer 480 provides various commonly used interface-related frameworks for application development, with the touchable layer 480 being responsible for user touch interactions on the electronic device. Such as a local notification service, a remote push service, an advertisement framework, a game tool framework, a message User Interface (UI) framework, a User Interface UIKit framework, a map framework, and so forth.
Among the frameworks illustrated in fig. 11, frameworks related to most applications include, but are not limited to: the infrastructure in core services layer 440 and the UIKit framework in touchable layer 480. The infrastructure provides many basic object classes and data types, providing the most basic system services for all applications, independent of the UI. While the class provided by the UIKit framework is a basic UI class library for creating touch-based user interfaces, iOS applications can provide UIs based on the UIKit framework, so it provides the infrastructure for applications to build user interfaces, draw, process and user interaction events, respond to gestures, and so on.
The manner and principle of implementing data communication between the third party application program and the operating system in the IOS system may refer to the Android system, and this description is not repeated here.
The input device 130 is configured to receive input instructions or data, and the input device 130 includes, but is not limited to, a keyboard, a mouse, a camera, a microphone, or a touch device. The output device 140 is used to output instructions or data, and the output device 140 includes, but is not limited to, a display device, a speaker, and the like. In one example, the input device 130 and the output device 140 may be combined, and the input device 130 and the output device 140 are a touch display screen for receiving a touch operation thereon or thereabout by a user using a finger, a touch pen, or any other suitable object, and displaying a user interface of each application program. Touch display screens are typically provided on the front panel of an electronic device. The touch display screen may be designed as a full screen, a curved screen, or a contoured screen. The touch display screen can also be designed to be a combination of a full screen and a curved screen, and a combination of a special-shaped screen and a curved screen is not limited in this specification.
In addition, those skilled in the art will appreciate that the configuration of the electronic device shown in the above-described figures does not constitute a limitation of the electronic device, and the electronic device may include more or less components than illustrated, or may combine certain components, or may have a different arrangement of components. For example, the electronic device further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (wireless fidelity, wiFi) module, a power supply, and a bluetooth module, which are not described herein.
In this specification, the execution subject of each step may be the electronic device described above. Optionally, the execution subject of each step is an operating system of the electronic device. The operating system may be an android system, an IOS system, or other operating systems, which is not limited in this specification.
The electronic device of the present specification may further have a display device mounted thereon, and the display device may be various devices capable of realizing a display function, for example: cathode ray tube displays (cathode ray tubedisplay, CR), light-emitting diode displays (light-emitting diode display, LED), electronic ink screens, liquid crystal displays (liquid crystal display, LCD), plasma display panels (plasma display panel, PDP), and the like. A user may utilize a display device on electronic device 101 to view displayed text, images, video, etc. The electronic device may be a smart phone, a tablet computer, a gaming device, an AR (Augmented Reality ) device, an automobile, a data storage device, an audio playing device, a video playing device, a notebook, a desktop computing device, a wearable device such as an electronic watch, electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic article of clothing, etc.
In the electronic device shown in fig. 8, where the electronic device may be a terminal, the processor 110 may be configured to invoke an application program stored in the memory 120 and specifically perform the following operations:
collecting at least one type of first mode image of a target object in a target environment, and determining a recommended image mode aiming at the target environment based on the at least one type of first mode image;
generating a second mode image corresponding to the recommended image mode based on the at least one type of first mode image;
generating a multi-mode image combination based on the first mode image and the second mode image, and performing living body attack detection processing based on the multi-mode image combination to obtain a target detection type aiming at the target object.
In one embodiment, the processor 110, in performing the determining a recommended image modality for the target environment based on the at least one type of first modality image, performs the following:
and carrying out environmental mode prediction processing by adopting an environmental mode recommendation model based on the at least one type of first mode image to obtain a recommended image mode aiming at the target environment.
In one embodiment, the processor 110, when executing the living body detection method, further performs the following operations:
Creating an initial environmental modality recommendation model;
acquiring a plurality of groups of first sample mode images, wherein each group of first sample mode images consists of different types of sample mode images;
and inputting the first sample mode image into an initial environmental mode recommended model for training until the initial environmental mode recommended model finishes training, and obtaining a trained environmental mode recommended model.
In one embodiment, the processor 110 performs the training in performing the inputting the first sample modality image into an initial environmental modality recommendation model by:
inputting the first sample mode image into an initial environment mode recommendation model to respectively determine single mode image characteristics and single mode recommendation classification results through at least one single mode characteristic encoder of the initial environment mode recommendation model, and determining fusion mode characteristics and fusion mode recommendation classification results corresponding to all the single mode image characteristics through a fusion characteristic encoder of the initial environment mode recommendation model;
and calculating a first model loss based on at least one single-mode recommended classification result and the fused mode recommended classification result, and performing model parameter adjustment on the initial environmental mode recommended model based on the first model loss.
In one embodiment, the processor 110, when executing the calculation of the first model penalty based on at least one of the single-modality recommendation classification result and the fusion modality recommendation classification result, performs the steps of:
determining a modality type classification penalty based on at least one of the single-modality recommendation classification result and the fusion modality recommendation classification result;
determining a first classification prediction consistency loss based on at least one of the single-modality recommendation classification results and the fusion modality recommendation classification results;
a first model loss is derived based on the modal type classification loss and the first classification predicted compliance loss.
In one embodiment, the processor 110, when executing the generating the second modality image corresponding to the recommended image modality based on the at least one type of first modality image, executes the following steps:
acquiring a mode image generation model corresponding to the recommended image mode;
and carrying out image generation processing by adopting the mode image generation model based on the at least one type of first mode image to obtain a second mode image corresponding to the recommended image mode.
In one embodiment, the processor 110, when executing the living body detection method, further performs the steps of:
Determining at least one reference mode type, and creating an initial mode image generation model corresponding to the reference mode type;
acquiring a plurality of groups of second sample mode images and reference mode label images corresponding to the second sample mode images, wherein each group of second sample mode images consists of different types of sample mode images, and the reference mode types corresponding to the reference mode label images are different from the mode types of the second sample mode images;
inputting the second sample mode image into an initial mode image generation model to output a target sample mode image, so as to train based on the target sample mode image and the reference mode label image until the initial mode image generation model finishes training, and obtaining a trained mode image generation model.
In one embodiment, the processor 110, when executing the inputting the second sample modality image into the initial modality image generation model, outputs a target sample modality image to train based on the target sample modality image and the reference modality label image, performs the steps of:
inputting the second sample mode image into an initial mode image generation model to respectively determine single mode feature images through at least one feature image extraction module of the initial mode image generation model, and generating a target sample mode image through a feature fusion mode generation module of the initial mode image generation model based on all the single mode feature images;
And calculating a second model loss based on the target sample mode image and the reference mode label image, and performing model parameter adjustment on the initial mode image generation model based on the second model loss.
In one embodiment, the processor 110, in performing the calculating a second model loss based on the target sample modality image and the reference modality label image, performs the steps of:
calculating overall modal reconstruction loss by adopting a first function based on the target sample modal image and the reference modal tag image;
performing random image region clipping processing on the target sample mode image and the reference mode label image to obtain a first local sample mode image and a second local label mode image, and calculating local mode reconstruction loss by adopting a second function based on the target sample mode image and the reference mode label image;
obtaining a second model loss based on the global modal reconstruction loss and the local modal reconstruction loss;
the first function satisfies the following formula:
Figure BDA0004109445040000231
wherein the Loss is global Reconstructing the loss for the overall modality, the img pred For the target sample modality image, the img ori A label image for the reference modality;
the second function satisfies the following formula:
Figure BDA0004109445040000232
wherein the Loss is local Reconstructing a loss for the local mode, the randCrop () representing a random image region clipping process, the randCrop(img pred ) For a first local sample mode image obtained by performing random image region clipping processing on a target sample mode image, the random image region clipping method comprises the following steps of ori ) And cutting out the random image area of the reference modal label image to obtain a second local label modal image.
In one embodiment, the processor 110 performs the following steps when performing the in-vivo attack detection process based on the multi-mode image combination to obtain a target detection type for the target object:
determining a target modal combination type corresponding to the multi-modal image combination, and acquiring a multi-modal living body detection model corresponding to the target modal combination type;
inputting the multi-mode image combination into a multi-mode living body detection model to perform living body attack detection processing and output a target detection result;
and determining a target detection type for the target object based on the target detection result.
In one embodiment, the processor 110, when executing the living body detection method, further performs the steps of:
Determining at least one modal combination type, and creating an initial multi-modal living body detection model corresponding to the modal image combination;
acquiring a plurality of groups of third sample multi-mode image combinations and sample detection label results, wherein the third sample multi-mode image combinations consist of second sample mode images and target sample mode images;
and inputting the third sample multi-mode image combination into an initial multi-mode living body detection model to output a sample detection actual result, and training the initial multi-mode living body detection model based on the sample detection label result until the initial multi-mode living body detection model finishes training, so as to obtain a trained multi-mode living body detection model.
In one embodiment, the processor 110, when executing the input of the third sample multi-modal image combination to the initial multi-modal living body detection model, outputs a sample detection actual result, trains the initial multi-modal living body detection model based on the sample detection label result, and executes the following steps:
inputting the third sample multi-mode image combination into an initial multi-mode living body detection model to determine an original image mode characteristic and an original mode detection result of the second sample mode image through an original mode characteristic coding module of the initial multi-mode living body detection model, determine a new mode characteristic and a new mode detection result of a target sample mode image through a new mode characteristic coding module of the initial multi-mode living body detection model, and determine a fusion mode characteristic and a fusion mode detection result which are corresponding to the original mode characteristic and the new mode characteristic together through a mode fusion module of the initial multi-mode living body detection model;
And calculating a third model loss based on the original mode detection result, the new mode detection result and the fusion mode detection result, and performing model parameter adjustment on the initial multi-mode living body detection model based on the third model loss.
In one embodiment, the processor 110, when executing the calculation of the third model loss based on the primary mode detection result, the new mode detection result, and the fusion mode detection result, executes the following steps:
determining a modal living body classification loss based on the primary modal detection result, the new modal detection result, and the fusion modal detection result, and the primary modal detection classification loss, the new modal detection classification loss, and the fusion modal detection classification loss of the sample detection tag result, respectively;
determining a second classification prediction consistency loss based on the original mode detection result, the new mode detection result and the fusion mode detection result;
a third modal loss is derived based on the modal living body classification loss and the second classification predicted identity loss.
In one or more embodiments of the present disclosure, an electronic device determines a recommended image mode for a target environment based on at least one type of first mode image of a target object in the acquired target environment, generates a second mode image corresponding to a new recommended image mode based on the first mode image, and can generate a multi-mode image combination based on the first mode image and the second mode image, so that living body detection efficiency of image living body detection in a complex environment and a lower performance hardware environment is improved without relying on a high-cost multi-mode acquisition system; the multi-mode image combination has higher living body attack separability, can assist in subsequent accurate living body detection classification, achieves better living body attack detection effect, and improves the generalization capability of image living body detection; meanwhile, when a second mode image of a new mode is generated based on a first mode image of an original mode, a recommended image mode which is matched with the equipment environment is determined, so that better time consumption and performance balance can be achieved, and detection efficiency is improved.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.
It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals according to the embodiments of the present disclosure are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, the first modality image and the like referred to in the present specification are all acquired with sufficient authorization.
The foregoing disclosure is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the claims, which follow the meaning of the claims of the present invention.

Claims (17)

1. A method of in vivo detection, the method comprising:
collecting at least one type of first mode image of a target object in a target environment, and determining a recommended image mode aiming at the target environment based on the at least one type of first mode image;
generating a second mode image corresponding to the recommended image mode based on the at least one type of first mode image;
Generating a multi-mode image combination based on the first mode image and the second mode image, and performing living body attack detection processing based on the multi-mode image combination to obtain a target detection type aiming at the target object.
2. The method of claim 1, the determining a recommended image modality for the target environment based on the at least one type of first modality image, comprising:
and carrying out environmental mode prediction processing by adopting an environmental mode recommendation model based on the at least one type of first mode image to obtain a recommended image mode aiming at the target environment.
3. The method of claim 2, the method further comprising:
creating an initial environmental modality recommendation model;
acquiring a plurality of groups of first sample mode images, wherein each group of first sample mode images consists of different types of sample mode images;
and inputting the first sample mode image into an initial environmental mode recommended model for training until the initial environmental mode recommended model finishes training, and obtaining a trained environmental mode recommended model.
4. A method according to claim 3, said inputting the first sample modality image into an initial environmental modality recommendation model for training, comprising:
Inputting the first sample mode image into an initial environment mode recommendation model to respectively determine single mode image characteristics and single mode recommendation classification results through at least one single mode characteristic encoder of the initial environment mode recommendation model, and determining fusion mode characteristics and fusion mode recommendation classification results corresponding to all the single mode image characteristics through a fusion characteristic encoder of the initial environment mode recommendation model;
and calculating a first model loss based on at least one single-mode recommended classification result and the fused mode recommended classification result, and performing model parameter adjustment on the initial environmental mode recommended model based on the first model loss.
5. The method of claim 4, the calculating a first model loss based on at least one of the single-modality recommendation classification result and the fused-modality recommendation classification result, comprising:
determining a modality type classification penalty based on at least one of the single-modality recommendation classification result and the fusion modality recommendation classification result;
determining a first classification prediction consistency loss based on at least one of the single-modality recommendation classification results and the fusion modality recommendation classification results;
A first model loss is derived based on the modal type classification loss and the first classification predicted compliance loss.
6. The method of claim 1, the generating a second modality image corresponding to the recommended image modality based on the at least one type of first modality image, comprising:
acquiring a mode image generation model corresponding to the recommended image mode;
and carrying out image generation processing by adopting the mode image generation model based on the at least one type of first mode image to obtain a second mode image corresponding to the recommended image mode.
7. The method of claim 6, the method further comprising:
determining at least one reference mode type, and creating an initial mode image generation model corresponding to the reference mode type;
acquiring a plurality of groups of second sample mode images and reference mode label images corresponding to the second sample mode images, wherein each group of second sample mode images consists of different types of sample mode images, and the reference mode types corresponding to the reference mode label images are different from the mode types of the second sample mode images;
inputting the second sample mode image into an initial mode image generation model to output a target sample mode image, so as to train based on the target sample mode image and the reference mode label image until the initial mode image generation model finishes training, and obtaining a trained mode image generation model.
8. The method of claim 7, the inputting the second sample modality image into an initial modality image generation model outputting a target sample modality image to train based on the target sample modality image and the reference modality label image, comprising:
inputting the second sample mode image into an initial mode image generation model to respectively determine single mode feature images through at least one feature image extraction module of the initial mode image generation model, and generating a target sample mode image through a feature fusion mode generation module of the initial mode image generation model based on all the single mode feature images;
and calculating a second model loss based on the target sample mode image and the reference mode label image, and performing model parameter adjustment on the initial mode image generation model based on the second model loss.
9. The method of claim 8, the calculating a second model loss based on the target sample modality image and the reference modality label image, comprising:
calculating overall modal reconstruction loss by adopting a first function based on the target sample modal image and the reference modal tag image;
Performing random image region clipping processing on the target sample mode image and the reference mode label image to obtain a first local sample mode image and a second local label mode image, and calculating local mode reconstruction loss by adopting a second function based on the target sample mode image and the reference mode label image;
obtaining a second model loss based on the global modal reconstruction loss and the local modal reconstruction loss;
the first function satisfies the following formula:
Figure FDA0004109445020000021
wherein the Loss is global Reconstructing the loss for the overall modality, the img pred For the target sample modality image, the img ori A label image for the reference modality;
the second function satisfies the following formula:
Figure FDA0004109445020000031
wherein the Loss is local Reconstructing a loss for the local mode, the randCrop () representing a random image region clipping process, the randCrop (img pred ) For a first local sample mode image obtained by performing random image region clipping processing on a target sample mode image, the random image region clipping method comprises the following steps of ori ) And cutting out the random image area of the reference modal label image to obtain a second local label modal image.
10. The method according to claim 1, wherein the performing the living body attack detection processing based on the multi-mode image combination to obtain a target detection type for the target object includes:
Determining a target modal combination type corresponding to the multi-modal image combination, and acquiring a multi-modal living body detection model corresponding to the target modal combination type;
inputting the multi-mode image combination into a multi-mode living body detection model to perform living body attack detection processing and output a target detection result;
and determining a target detection type for the target object based on the target detection result.
11. The method of claim 10, the method further comprising:
determining at least one modal combination type, and creating an initial multi-modal living body detection model corresponding to the modal image combination;
acquiring a plurality of groups of third sample multi-mode image combinations and sample detection label results, wherein the third sample multi-mode image combinations consist of second sample mode images and target sample mode images;
and inputting the third sample multi-mode image combination into an initial multi-mode living body detection model to output a sample detection actual result, and training the initial multi-mode living body detection model based on the sample detection label result until the initial multi-mode living body detection model finishes training, so as to obtain a trained multi-mode living body detection model.
12. The method of claim 11, the inputting the third sample multi-modality image combination into an initial multi-modality living body detection model to output a sample detection actual result, the training the initial multi-modality living body detection model based on the sample detection label result, comprising:
inputting the third sample multi-mode image combination into an initial multi-mode living body detection model to determine an original image mode characteristic and an original mode detection result of the second sample mode image through an original mode characteristic coding module of the initial multi-mode living body detection model, determine a new mode characteristic and a new mode detection result of a target sample mode image through a new mode characteristic coding module of the initial multi-mode living body detection model, and determine a fusion mode characteristic and a fusion mode detection result which are corresponding to the original mode characteristic and the new mode characteristic together through a mode fusion module of the initial multi-mode living body detection model;
and calculating a third model loss based on the original mode detection result, the new mode detection result and the fusion mode detection result, and performing model parameter adjustment on the initial multi-mode living body detection model based on the third model loss.
13. The method of claim 12, the calculating a third model loss based on the primary mode detection result, the new mode detection result, and the fusion mode detection result, comprising:
determining a modal living body classification loss based on the primary modal detection result, the new modal detection result, and the fusion modal detection result, and the primary modal detection classification loss, the new modal detection classification loss, and the fusion modal detection classification loss of the sample detection tag result, respectively;
determining a second classification prediction consistency loss based on the original mode detection result, the new mode detection result and the fusion mode detection result;
a third modal loss is derived based on the modal living body classification loss and the second classification predicted identity loss.
14. A living body detection apparatus, the method comprising:
the recommendation mode module is used for collecting at least one type of first mode image of a target object in a target environment, and determining a recommendation image mode aiming at the target environment based on the at least one type of first mode image;
The image generation module is used for generating a second mode image corresponding to the recommended image mode based on the at least one type of first mode image;
the detection processing module is used for generating a multi-mode image combination based on the first mode image and the second mode image, and performing living body attack detection processing based on the multi-mode image combination to obtain a target detection type aiming at the target object.
15. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any one of claims 1 to 13.
16. A computer program product storing at least one instruction for loading by a processor and performing the method steps of any one of claims 1 to 13.
17. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-13.
CN202310180698.9A 2023-02-16 2023-02-16 Living body detection method and device, storage medium and electronic equipment Pending CN116343350A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310180698.9A CN116343350A (en) 2023-02-16 2023-02-16 Living body detection method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310180698.9A CN116343350A (en) 2023-02-16 2023-02-16 Living body detection method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN116343350A true CN116343350A (en) 2023-06-27

Family

ID=86890706

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310180698.9A Pending CN116343350A (en) 2023-02-16 2023-02-16 Living body detection method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116343350A (en)

Similar Documents

Publication Publication Date Title
US11463631B2 (en) Method and apparatus for generating face image
CN109902659B (en) Method and apparatus for processing human body image
US11200395B2 (en) Graphic code recognition method and apparatus, terminal, and storage medium
CN109993150B (en) Method and device for identifying age
CN111767554B (en) Screen sharing method and device, storage medium and electronic equipment
CN112069414A (en) Recommendation model training method and device, computer equipment and storage medium
CN111311554A (en) Method, device and equipment for determining content quality of image-text content and storage medium
CN112839223B (en) Image compression method, image compression device, storage medium and electronic equipment
CN111125663B (en) Control method and device for child mode, storage medium and terminal
US20200019789A1 (en) Information generating method and apparatus applied to terminal device
CN113544682A (en) Data privacy using a Podium mechanism
CN110046571B (en) Method and device for identifying age
CN115131603A (en) Model processing method and device, storage medium and electronic equipment
CN116798129A (en) Living body detection method and device, storage medium and electronic equipment
CN115843369A (en) Tracking usage of augmented reality content
CN116129534A (en) Image living body detection method and device, storage medium and electronic equipment
CN116823537A (en) Insurance report processing method and device, storage medium and electronic equipment
CN116228391A (en) Risk identification method and device, storage medium and electronic equipment
CN115620111A (en) Image identification method and device, storage medium and electronic equipment
CN116343350A (en) Living body detection method and device, storage medium and electronic equipment
KR20230122160A (en) Access to third-party resources through client applications
CN116778585A (en) Living body detection method and device, storage medium and electronic equipment
CN116246322A (en) Living body anti-attack method and device, storage medium and electronic equipment
CN116246014B (en) Image generation method and device, storage medium and electronic equipment
CN116168451A (en) Image living body detection method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination