CN113642639B

CN113642639B - Living body detection method, living body detection device, living body detection equipment and storage medium

Info

Publication number: CN113642639B
Application number: CN202110927106.6A
Authority: CN
Inventors: 胡炳然; 刘青松; 宁学成; 梁家恩
Original assignee: Unisound Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2024-03-01
Anticipated expiration: 2041-08-12
Also published as: CN113642639A

Abstract

The invention relates to a living body detection method, a living body detection device, electronic equipment and a storage medium, which are applied to the technical field of living body detection, wherein the method comprises the following steps: acquiring a first image and a second image of a target object, wherein the first image and the second image are images of different modes; extracting a first object feature of the first image and a second object feature of the second image, fusing the features to obtain a fused feature, and judging whether the target object is a living body according to the fused feature to obtain a judging result; and determining the judging result as a detection and identification result of the target object.

Description

Living body detection method, living body detection device, living body detection equipment and storage medium

Technical Field

The present invention relates to the field of living body detection technologies, and in particular, to a living body detection method, apparatus, device, and storage medium.

Background

With the rapid development and wide application of artificial intelligence technology, the safety problem is increasingly valued by the public. Face recognition is a biological recognition technology for carrying out identity recognition based on facial feature information of people, and has the advantages of non-mandatory property, non-contact property and the like. Along with the improvement of the accuracy of the face recognition algorithm and the development of the massive parallel computing technology, the face recognition technology is used in various ground scenes (such as security, financial field, electronic commerce and other scenes needing identity verification, such as remote bank account opening, access control system, remote transaction operation verification and the like) of identity authentication. The most important face anti-counterfeiting in the face recognition technology is an indispensable ring.

The face anti-fake, also called living body detection, is a technology for distinguishing whether a face in front of a camera is from a living body or a dummy such as a paper photo/screen photo/mask. Whether the detected object is a living individual or not can be determined, and the detected object is not an inanimate object such as a photo, a video and the like, so that malicious attacks can be prevented by malicious attackers in a recorded video, a shot photo, a 3D face model or a fake mask and the like.

Disclosure of Invention

The invention provides a living body detection method, a living body detection device, electronic equipment and a storage medium, which are used for solving the problem of low safety and reliability of an identity verification system for face recognition in the prior art.

The technical scheme for solving the technical problems is as follows:

the invention provides a living body detection method, which comprises the following steps:

acquiring a first image and a second image of a target object, wherein the first image and the second image are images of different modes;

extracting a first object feature of the first image and a second object feature of the second image, and fusing the features to obtain a fusion feature, and judging whether the target object is a living body according to the fusion feature to obtain a judging result;

and determining the judging result as a detection and identification result of the target object.

Further, in the living body detection method, the extracting the first object feature of the first image and the second object feature of the second image, and performing feature fusion to obtain a fusion feature, and determining whether the target object is a living body according to the fusion feature to obtain a determination result, where the determining step includes:

inputting the first image and the second image into a dual-stream convolutional network model;

extracting the first object feature and the second object feature through the double-flow convolution network model, and carrying out feature fusion on the first object feature and the second object feature to obtain fusion features;

and judging whether the target object is a living body according to the fusion characteristics, and obtaining a judging result.

Further, in the living body detection method, the training process of the dual-flow convolutional network model includes:

obtaining a sample set, wherein the sample set comprises at least one group of sample data, the sample data comprises a first sample image, a second sample image, a modal class identifier and a living body class identifier, the living body class identifier is used for indicating whether a target sample corresponding to the first sample image and the second sample image is a living body or not, and the modal class identifier is used for indicating whether the target sample corresponding to the first sample image and the second sample image is consistent or not;

the following training process is sequentially carried out on each group of sample data in the sample set:

inputting the sample data into an initial double-flow convolutional network model;

extracting first sample features in the first sample image and extracting second sample features in the second sample image;

performing feature fusion on the first sample feature and the second sample feature to obtain a sample fusion feature;

obtaining a first modal prediction probability according to the first sample characteristics, obtaining a second modal prediction probability according to the second sample characteristics, and obtaining a fusion prediction probability according to the sample fusion characteristics;

calculating a loss function value based on the first modal prediction probability, the second modal prediction probability, the fusion prediction probability, the modal category identification and the living body category identification;

and according to the loss function value, reversely spreading the gradient to each layer of the initial double-flow convolution network model, optimizing parameters of the initial double-flow convolution network model, acquiring the next group of sample data from the sample set, repeatedly executing the training process, and taking the initial double-flow convolution network model as the final double-flow convolution network model when the loss function is smaller than a preset value.

Further, in the living body detection method, the calculating a loss function value based on the first modality prediction probability, the second modality prediction probability, the fusion prediction probability, the modality category identification, and the living body category identification includes:

determining a first intermediate value according to the living body category identification and the first modal prediction probability;

determining a second intermediate value according to the living body category identification and the second modal prediction probability;

determining a third intermediate value according to the living body category identification and the fusion modality prediction probability;

determining a first adjustment factor and a second adjustment factor according to the first intermediate value and the second intermediate value;

and calculating the loss function value according to the first intermediate value, the second intermediate value, the third intermediate value, the first regulating factor, the second regulating factor and the living body category identifier.

substituting the first modal prediction probability, the second modal prediction probability, the fusion prediction probability, the modal category identification and the living body category identification into the following loss function formula to obtain the loss function value;

the loss function formula L is:

L _CE ＝-log(m _t )

wherein m represents the fusion prediction probability, p represents the first modal prediction probability, q represents the second modal prediction probability, η represents the modal category identifier, y represents the living body category identifier, and λ, α, γ are preset parameters, wherein λ is greater than 0.5.

Further, the living body detection method described above further includes:

when the sample data comprises any one of the first sample image and the second sample image, acquiring a first living body category of a target sample corresponding to the first sample image or the second sample image;

acquiring a sample image consistent with the first living body category and serving as the second sample image or the first sample image;

and determining the mode type identification in the sample data formed by the first sample image and the second sample image as target sample inconsistency.

Further, in the living body detection method described above, the acquiring the first image and the second image of the target object includes:

shooting the target object by using a binocular camera to obtain a first original image and a second original image;

carrying out channel transformation on the first original image to obtain a first transformed image;

carrying out channel transformation on the second original image to obtain a second transformed image;

performing image scaling on the first transformation image to obtain the first image;

and performing image scaling on the second transformation image to obtain the second image.

The present invention also provides a living body detection apparatus including:

the acquisition module is used for acquiring a first image and a second image of a target object, wherein the first image and the second image are images of different modes;

the detection module is used for extracting first object features of the first image and second object features of the second image, obtaining fusion features after feature fusion, judging whether the target object is a living body according to the fusion features, and obtaining a judgment result;

and the determining module is used for determining the judging result as the detection and identification result of the target object.

The present invention also provides a living body detecting apparatus including: a processor and a memory;

the processor is configured to execute a living body detection program stored in the memory to implement the living body detection method according to the first aspect.

The present invention also provides a storage medium storing one or more programs which when executed implement the living body detection method of the first aspect.

The beneficial effects of the invention are as follows:

acquiring a first image and a second image of a target object, wherein the first image and the second image are images of different modes; extracting a first object feature of the first image and a second object feature of the second image, fusing the features to obtain a fused feature, and judging whether the target object is a living body according to the fused feature to obtain a judging result; and determining the judging result as a detection and identification result of the target object. Therefore, based on images of different modes of the target object, living body detection can be performed, the living body detection cost is reduced, and after the image features of different modes are fused, whether the target object is a living body or not is judged, and the mutual connection among the modes is considered, so that the safety and the reliability of an identity verification system for face recognition are improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a real view of a hardware environment scenario of the positioning adjustment method of the present invention;

FIG. 2 is a flowchart of an embodiment of a positioning adjustment method according to the present invention;

FIG. 3 is a block diagram of an embodiment of a positioning adjustment device of the present invention;

fig. 4 is a structural view of an embodiment of the living body detecting device of the present invention.

Detailed Description

The principles and features of the present invention are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, the method of the embodiment of the present invention may be performed by a single device, for example, a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the method of an embodiment of the present invention, and the devices interact with each other to complete the method.

FIG. 1 is a real view of a hardware environment scenario of the positioning adjustment method of the present invention. In the embodiment of the present invention, the above-described living body detection method may be applied to a hardware environment constituted by the terminal 101 and the server 102 as shown in fig. 1. As shown in fig. 1, the server 102 is connected to the terminal 101 through a network, which may be used to provide services (such as video services, application services, etc.) to the terminal or clients installed on the terminal, and a database may be provided on the server or independent of the server, for providing data storage services to the server 102, where the network includes, but is not limited to: the terminal 101 is not limited to a PC, a mobile phone, a tablet computer, or the like.

The living body detection method according to the embodiment of the present invention may be executed by the server 102, may be executed by the terminal 101, or may be executed by both the server 102 and the terminal 101. The terminal 101 may perform the living body detection method according to the embodiment of the present invention, or may be performed by a client installed thereon.

Taking the living body detection method of the embodiment of the present invention performed by the terminal as an example, the method can be applied to the terminal, fig. 2 is a flowchart of an embodiment of the positioning adjustment method of the present invention, and as shown in fig. 2, the flow of the method can include the following steps:

201. and acquiring a first image and a second image of the target object, wherein the first image and the second image are images of different modes.

In some embodiments, the target object may be any type of organism, such as: a person or an animal. The first image and the second image can be obtained by shooting through the binocular camera module arranged in the detection area. The binocular camera module can collect infrared images and RGB images of a target object at the same time. The invention refers to images obtained by different imaging principles as images of different 'modalities'.

Illustratively, taking a target object as an example of a person, the first image and the second image may be, but are not limited to, face images of the photographed person.

The first image and the second image are often images with the same specification. In general, the neural network model is input with a uniform input image size, so that the neural network model can be well adapted when being input, and the first image and the second image which are consistent with the requirements of the neural network model are input, so that the classification task can be accurately ensured. In addition, by selecting a good ROI (region of interest), the subsequent classification effect, i.e. the impact of how the "correction" operation is specific, can be improved. The affine transformation or the custom transformation is generally carried out according to the key points of the human face obtained in the human face detection stage, but the input with the same size is finally needed to be obtained.

In an alternative embodiment, a binocular camera is used for shooting a target object to obtain a first original image and a second original image; channel transformation is carried out on the first original image to obtain a first transformed image; channel transformation is carried out on the second original image to obtain a second transformed image; performing image scaling on the first transformation image to obtain a first image; and performing image scaling on the second transformation image to obtain a second image.

In some embodiments, the channel transform is a process of linearly transforming the image. Taking the first original image and the second original image as RGB images and infrared images as examples. The RGB image is transformed into HSV image through the channel, and the infrared image is transformed into gray scale image through the channel.

Specifically, the process of converting an RGB image into an HSV image includes:

max＝max(R,G,B)；

min＝min(R,G,B)；

if R＝max,H＝(G-B)/(max-min)；

if G＝max,H＝2+(B-R)/(max-min)；

if B＝max,H＝4+(R-G)/(max-min)；

H＝H*60if H<0,H＝H+360；

V＝max(R,G,B)；

S＝(max-min)/max；

and further obtaining an HSV image according to the HSV in the obtained formula.

Unlike RGB images, which are three-channel, infrared images actually have only single-channel information. By converting the original data of the infrared image (such as yuyv format) into three channels (the three channels are identical) similar to RGB, and then taking one channel as the gray scale image of the infrared image.

There are various ways to scale the image, for example, scaling based on equally spaced extraction of image pixels or extracting image scaling based on regional sub-blocks.

202. Extracting first object features of the first image and second object features of the second image, fusing the features to obtain fusion features, and judging whether the target object is a living body according to the fusion features to obtain a judging result.

In some embodiments, the object features of the first image and the second image may be extracted in a plurality of ways, for example, by a neural network model.

In an alternative embodiment, extracting a first object feature of the first image and a second object feature of the second image, and fusing the features to obtain a fused feature, and judging whether the target object is a living body according to the fused feature to obtain a judging result, including:

inputting the first image and the second image into a double-flow convolution network model; extracting a first object feature and a second object feature through a double-flow convolution network model, and carrying out feature fusion on the first object feature and the second object feature to obtain a fusion feature; and judging whether the target object is a living body according to the fusion characteristics, and obtaining a judging result.

In some embodiments, the first image and the second image are input into a dual-stream and dual-stream convolutional network model, and after the first object feature and the second object feature are extracted through the model, the first object feature and the second object feature are fused, so that the calculation amount can be reduced in the subsequent processing.

In an alternative embodiment, the training process of the dual-flow convolutional network model includes:

acquiring a sample set, wherein the sample set comprises at least one group of sample data, the sample data comprises a first sample image, a second sample image, a modal type identifier and a living body type identifier, the living body type identifier is used for indicating whether a target sample corresponding to the first sample image and the second sample image is a living body or not, and the modal type identifier is used for indicating whether the target sample corresponding to the first sample image and the second sample image is consistent or not;

and according to the loss function value, reversely spreading the gradient to each layer of the initial double-flow convolution network model, obtaining the next group of sample data from the sample set after optimizing the parameters of the initial double-flow convolution network model, and repeatedly executing the training process until the loss function is smaller than a preset value, wherein the initial double-flow convolution network model is used as a final double-flow convolution network model.

In some embodiments, in the obtained sample data, the living body category identification of the sample object may be set manually, and is marked as 1 when the sample object is a living body, and is marked as 0 when the sample object is a non-living body (paper photo/screen photo/mask or the like).

In an alternative embodiment, the calculating the loss function value based on the first modality prediction probability, the second modality prediction probability, the fusion prediction probability, the modality category identification, and the living body category identification includes:

determining a third intermediate value according to the living body category identification and the fusion mode prediction probability;

and calculating a loss function value according to the first intermediate value, the second intermediate value, the third intermediate value, the first regulating factor, the second regulating factor and the living body category identifier.

Specifically, the first modal prediction probability, the second modal prediction probability, the fusion prediction probability, the modal category identifier and the living body category identifier can be substituted into the following loss function formula to obtain a loss function value;

the loss function formula L is:

L _CE ＝-log(m _t )

wherein m represents fusion prediction probability, p represents first modal prediction probability, q represents second modal prediction probability, eta represents modal category identification, y represents living body category identification, lambda, alpha and gamma are preset parameters, wherein lambda is larger than 0.5.

Where α may be, but is not limited to, 0.5, γ may be, but is not limited to, 3, and λ may be, but is not limited to, 0.8.

In an alternative embodiment, the method further comprises: when the sample data comprises any one of the first sample image and the second sample image, acquiring a first living body category of a target sample corresponding to the first sample image or the second sample image; acquiring a sample image consistent with the first living body category and serving as a second sample image or a first sample image; the identification of the mode type in the sample data composed of the first sample image and the second sample image is determined as the target sample inconsistency.

In the embodiment, when a double-flow convolution network model is trained, an infrared image and an RGB image with consistent shooting time are directly formed into an (IR, RGB) image pair, and a mode type mark is marked as 1; when only single-mode data (namely only infrared images or RGB images) is shot at a certain shooting time, the single-mode data is randomly matched with the other mode data with the same living body category identification, and an (IR, RGB) image pair is formed, and the mode category identification is marked as 0.

203. And determining the judging result as a detection and identification result of the target object.

In some embodiments, after the judgment result is obtained, the judgment result is used as a detection and identification result of the target object, so that the living body detection of the target object is realized.

In the invention, by additionally marking the 'mode category' label, when training data is unpaired on a time axis, the loss of the fusion characteristic branch is set to 0, namely, the fusion characteristic is not updated with parameters, and each mode characteristic is trained independently. Otherwise, if the training data has consistency on the time axis, restarting the fusion feature training. The method removes the limitation that training data must be matched with multi-mode and input in pairs, and can be compatible and effectively utilized with single-mode data while normal multi-mode data is input.

In addition, for the loss function, one of the important regulatory factors isIt also takes into account the multi-modal (infrared branch and RGB branch in this embodiment) prediction probabilities. This factor will increase with increasing prediction probability for another modality, and will follow the prediction probability for the same modalityDecrease and decline. Compared with a cross entropy loss function or a single-mode loss function, the adjusting mode can better cope with the overfitting problem in multi-mode modeling.

Based on the same concept, a living body detection device is provided in the embodiment of the invention, and fig. 3 is a structural diagram of the embodiment of the positioning adjustment device of the invention. The specific implementation of the apparatus may be referred to the description of the embodiment part of the method, and the repetition is omitted, as shown in fig. 3, where the apparatus mainly includes:

the acquiring module 31 is configured to acquire a first image and a second image of a target object, where the first image and the second image are images of different modalities;

the detection module 32 is configured to extract a first object feature of the first image and a second object feature of the second image, perform feature fusion to obtain a fusion feature, and determine whether the target object is a living body according to the fusion feature to obtain a determination result;

a determining module 33, configured to determine that the determination result is a detection recognition result of the target object.

Further, in the above embodiment, the detection module 32 is specifically configured to:

extracting the first object feature of the first image and the second object feature of the second image, and performing feature fusion to obtain a fusion feature, and judging whether the target object is a living body according to the fusion feature to obtain a judging result, wherein the method comprises the following steps:

Further, in the foregoing embodiment, the training process of the dual-flow convolutional network model includes:

Further, in the above embodiment, the detection module 32 is further configured to:

the loss function formula L is:

L _CE ＝-log(m _t )

Further, in the above embodiment, the obtaining module 31 is further configured to:

The device of the foregoing embodiment is configured to implement the corresponding method in the foregoing embodiment, and specific implementation schemes thereof may refer to the method described in the foregoing embodiment and related descriptions in the method embodiment, and have beneficial effects of the corresponding method embodiment, which are not described herein.

Fig. 4 is a schematic structural view of an embodiment of the living body detection apparatus of the present invention, and as shown in fig. 4, the passing apparatus of the present embodiment may include: a processor 1010 and a memory 1020. The device may also include an input/output interface 1030, a communication interface 1040, and a bus 1050, as will be appreciated by those skilled in the art. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.

The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).

It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.

The present invention also provides a storage medium storing one or more programs which when executed implement the living body detection method of the above-described embodiments.

The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the invention. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.

While the invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.

The present invention is not limited to the above embodiments, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and these modifications and substitutions are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A living body detecting method, characterized by comprising:

extracting a first object feature of the first image and a second object feature of the second image through the double-flow convolution network model, and carrying out feature fusion on the first object feature and the second object feature to obtain a fusion feature;

judging whether the target object is a living body according to the fusion characteristics, and obtaining a judging result;

determining the judging result as a detection and identification result of the target object;

the training process of the double-flow convolution network model comprises the following steps:

obtaining a sample set, wherein the sample set comprises at least one group of sample data, the sample data comprises a first sample image, a second sample image, a modal class identifier and a living body class identifier, the living body class identifier is used for indicating whether a target sample corresponding to the first sample image and the second sample image is a living body or not, and the modal class identifier is used for indicating whether the target sample corresponding to the first sample image and the second sample image has consistency on a time axis or not;

according to the loss function value, reversely spreading a gradient to each layer of the initial double-flow convolution network model, optimizing parameters of the initial double-flow convolution network model, acquiring the next group of sample data from the sample set, and repeatedly executing the training process until the loss function is smaller than a preset value, wherein the initial double-flow convolution network model is used as the final double-flow convolution network model;

when the target samples corresponding to the first sample image and the second sample image do not have consistency on a time axis, setting the loss of the fusion characteristic branch to be 0; restarting fusion feature training when the target samples corresponding to the first sample image and the second sample image have consistency on a time axis;

the calculating a loss function value based on the first modality prediction probability, the second modality prediction probability, the fusion prediction probability, the modality category identification and the living body category identification includes:

determining a third intermediate value according to the living body category identification and the fusion prediction probability;

the loss function formula L is:

L _CE ＝-log(m _t )

2. The living body detection method according to claim 1, characterized by further comprising:

3. The living body detection method according to claim 1, wherein the acquiring the first image and the second image of the target object includes:

4. A living body detecting device, characterized by comprising:

the detection module is used for inputting the first image and the second image into a double-flow convolution network model; extracting a first object feature of the first image and a second object feature of the second image through the double-flow convolution network model, and carrying out feature fusion on the first object feature and the second object feature to obtain a fusion feature; judging whether the target object is a living body according to the fusion characteristics, and obtaining a judging result;

the determining module is used for determining the judging result as a detection and identification result of the target object;

the loss function formula L is:

L _CE ＝-log(m _t )

5. A living body detecting apparatus, characterized by comprising: a processor and a memory; the processor is configured to execute a living body detection program stored in the memory to implement the living body detection method according to any one of claims 1 to 3.

6. A storage medium storing one or more programs which when executed implement the living body detection method of any of claims 1-3.