WO2022126914A1

WO2022126914A1 - Living body detection method and apparatus, electronic device, and storage medium

Info

Publication number: WO2022126914A1
Application number: PCT/CN2021/083721
Authority: WO
Inventors: 赵娅琳; 陆进; 陈斌; 刘玉宇
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-12-18
Filing date: 2021-03-30
Publication date: 2022-06-23
Also published as: CN112528908B; CN112528908A

Abstract

The present application relates to face recognition technology; disclosed is a living body detection method, comprising: acquiring an image set to be detected, using a face classification network to perform classification processing on the image set to be detected, and filtering to obtain a face image set; using a face positioning network to execute a face positioning operation on the face image set to obtain a facial area image set; using a living body detection network to perform living body detection processing on the facial area image set to obtain a plurality of detection results; and executing a weighted average of the plurality of detection results to obtain a living body detection result of the image set to be detected. The present application also relates to blockchain technology, as the image set to be detected can be stored in a blockchain node. Also disclosed in the present application are a living body detection apparatus, an electronic device, and a storage medium. The present application can reduce the computing resources consumed during living body detection and increase the accuracy of living body detection.

Description

Living body detection method, device, electronic device and storage medium

This application claims the priority of the Chinese patent application with the application number CN202011508401.X and the title of "Method, Device, Electronic Device and Storage Medium for Living Body Detection" filed with the China Patent Office on December 18, 2020, the entire contents of which are by reference Incorporated in this application.

technical field

The present application relates to the technical field of financial technology, and in particular, to a method, apparatus, electronic device, and computer-readable storage medium for detecting a living body.

Background technique

As face recognition, unlocking and other technologies are widely used in finance, access control, mobile devices and other scenarios, face forgery technology has received more and more attention in recent years. A normal working face recognition system, in addition to realizing In addition to identification, it also needs to have the function of live detection.

The inventor found that the general living body detection method mainly uses traditional image recognition algorithms for detection, and selects a variety of judgment models for living body discrimination after locating the face in the image. resources and low accuracy.

SUMMARY OF THE INVENTION

A live detection method provided by this application includes:

Obtaining a set of images to be detected, using a face classification network to classify the set of images to be detected, and screening to obtain a set of face images;

Use a face localization network to perform a face localization operation on the face image set to obtain a face region image set;

Using a living body detection network to perform living body detection processing on the face region image set to obtain multiple detection results;

A weighted average is performed on the multiple detection results to obtain a living body detection result of the image set to be detected.

The present application also provides a device for detecting a living body, the device comprising:

a face classification module, used to obtain a set of images to be detected, and to classify and process the set of images to be detected by using a face classification network, and to obtain a set of face images after screening;

a face localization module, used for using a face localization network to perform a face localization operation on the face image set to obtain a face region image set;

a living body detection module, configured to perform living body detection processing on the face region image set by using a living body detection network to obtain multiple detection results;

The result generating module is configured to perform a weighted average of the multiple detection results to obtain the living body detection result of the image set to be detected.

The present application also provides an electronic device, the electronic device comprising:

at least one processor; and,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores computer program instructions executable by the at least one processor, the computer program instructions being executed by the at least one processor to enable the at least one processor to perform the steps of:

The present application also provides a computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the following steps are implemented:

Description of drawings

1 is a schematic flowchart of a method for model training in a live detection method provided by an embodiment of the present application;

2 is a schematic flowchart of a method for performing living body detection on an image set to be detected using a trained model provided by an embodiment of the present application;

3 is a schematic diagram of a module of a living body detection device provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of an internal structure of an electronic device for implementing a method for detecting a living body provided by an embodiment of the present application.

The realization, functional characteristics and advantages of the purpose of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.

Detailed ways

It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

Embodiments of the present application provide a method for detecting a living body, where an executing subject of the method includes but is not limited to at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server and a terminal. In other words, the liveness detection method can be executed by software or hardware installed in a terminal device or a server device, and the software can be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.

Referring to FIG. 1 , FIG. 1 is a schematic flowchart of a model training method in a living body detection method provided by an embodiment of the present application. In this embodiment, the model training method includes:

S1. Build a face living body judgment model including the face classification network, the face positioning network, and the living body detection network.

In the embodiment of this application, the face classification network may be a MobileNet network (mobile terminal network), the face localization network is a Coarse-to-fine CNN network (coarse localization convolutional neural network), and the living body detection network The network is an SVM (Support Vector Machine) classifier with a linear kernel.

S2, using the training image set to train the face classification network.

In the embodiment of the present application, the training image set includes a plurality of photos containing human faces.

In detail, the present application uses the face classification network to perform classification on the training image set to obtain a face training set, and uses the following first loss function to calculate the difference between the face training image and the preset real face label. The first loss value L _cls between:

where α and β are the hyperparameters of the first loss function, Y _{x, y} represent the gray value of coordinates (x, y) in the true label,

represents the gray value of the coordinates (x, y) in the face training set, and N is the number of samples in the face training set.

When the first loss value L _cls is greater than the first standard value, the embodiment of the present application adjusts the parameters of the face classification network and re-executes the training of the face classification network using the training image set until the The number of times that the face classification network is trained reaches the first preset number of times.

S3. When the number of training times of the face classification model reaches a first preset number of times, use the training image set to jointly train the face classification network and the face location network.

The embodiment of the present application uses the face classification network to classify the training image set, generates a face training set, and uses the face localization network to locate the face region of the face training set to obtain a face area image set, and calculate the face scale set and face position offset set in the face area image set. Further, the embodiment of the present application uses the following joint loss function to calculate the joint loss value L _det of the face region image set and the preset face region label:

L _det =L _cls +λ _size L _size +λ _off L _off

Wherein, L _size is the loss value of face scale, |A∪B| is the area of intersection between the picture in the face scale set and the picture in the real scale set, and |A∩B| is the picture in the face scale set The area merged with the pictures in the true scale set, |A _c | is the area of the smallest closure between the pictures in the face scale set and the pictures in the true scale set, L _off is the face position offset loss value, x is the difference between the Kth real position offset and the face position offset, in addition, λ _size and λ _off are preset weights, in the embodiment of the present application, the λ _size =1, and the λ _off = 0.1.

When the joint loss value L _det is greater than the second standard value, the embodiment of the present application adjusts the parameters of the face classification network and the face location network, and re-executes the face classification using the training image set The network and the face location network are jointly trained until the number of times of training reaches a second preset number of times.

S4. When the number of joint training times of the face classification network and the face location network reaches a second preset number of times, use the training image set to perform a The living body detection network is jointly trained, and the trained face living body judgment model including the face classification network, the face localization network and the living body detection network is obtained.

In detail, the S4 includes:

Use the face classification network to classify the training image set to obtain a face training set, and calculate the first loss value between the face training set and a preset real label;

Use the face localization network to locate the face training set to obtain a face region training map, and calculate the face scale set and face position offset set of the face region training map, and calculate the The second loss value between the face scale set and the preset real scale set and the third loss value between the face position offset set and the preset real position offset set are calculated; using the living body detection The network detects the training image set to obtain a predicted living body detection set, and calculates a fourth loss value between the predicted living body detection set and a preset real living body detection set;

The first loss value, the second loss value, the third loss value and the fourth loss value are processed in series by using a preset weight to obtain a joint loss value. If the joint loss value is greater than the preset threshold, the The face classification network, the face location network, and the living body detection network are adjusted and updated until the joint loss value is less than or equal to a preset threshold, and the training completed includes the face classification network, The face positioning network and the face living body judgment model of the living body detection network.

As described above, the calculation of the first loss value between the face training set and the preset real label includes:

The first loss value is calculated using the following first loss function:

Further, calculating the second loss value between the face scale set and the preset real scale set as described above, including:

The second loss value L _size is calculated using the following second loss function:

Wherein, |A∪B| is the intersection area between the picture in the face scale set and the picture in the real scale set, and |A∩B| is the area between the picture in the face scale set and the picture in the real scale set The combined area, |A _c | is the area of the smallest closure between the picture in the face scale set and the picture in the true scale set.

Further, as described above, the calculation of the third loss value between the face position offset set and the preset real position offset set includes:

The third loss value L _off is calculated using the following third loss function:

Among them, x is the difference between the Kth real position offset and the face position offset, and M is the number of samples of the real center offset map.

Further, the calculating the fourth loss value between the predicted live detection set and the preset real live detection set includes:

The fourth loss value is calculated using the following fourth loss function

in,

is the predicted live detection set, Y is the real live detection set, Q is the number of samples in the predicted live detection set, and λ represents an error factor.

Specifically, performing series processing on the first loss value, the second loss value, the third loss value and the fourth loss value by using a preset weight to obtain a joint loss value, including:

Among them, L is the joint loss value, λ _size , λ _off are preset weights, which can be 1 and 0.1 respectively.

Compare the joint loss value with a preset threshold, and if the joint loss value is greater than the preset threshold, adjust and update the face classification network, the face location network, and the living body detection network , until the joint loss value is less than or equal to a preset threshold, a trained face living body judgment model including the face classification network, the face localization network and the living body detection network is obtained.

Referring to FIG. 2 , it is a schematic flowchart of a method for performing living body detection using a trained model to be detected image set according to an embodiment of the present application. In the embodiment of the present application, the method for detecting a living body includes:

S10. Acquire a set of images to be detected, use the face classification network to classify the set of images to be detected, and filter to obtain a set of face images.

In this embodiment of the present application, the image set to be detected may include video frames in a face video captured by a camera. In one of the embodiments of the present application, the set of images to be detected and the like may be stored in a blockchain node.

S20, using the face localization network to perform a face localization operation on the face image set to obtain a face region image set.

In the embodiment of the present application, a face localization operation is performed on the face image set by using a face localization model to obtain the face scale set and the face position offset set, and the face scale set is used to determine The approximate position of the face region in the face image set, the face position offset is used to fine-tune the face scale set, and finally a face region image set is obtained.

S30. Use the living body detection network to perform living body detection processing on the face region image set to obtain multiple detection results.

In the embodiment of the present application, the face region image set is input into the living body detection network for living body detection processing, and multiple detection results are obtained, wherein the detection results are determined as The probability value of living body.

S40. Perform a weighted average on the multiple detection results to obtain a living body detection result of the image set to be detected.

In the embodiment of the present application, the weighted average is performed on the multiple detection results by using a preset weighting formula, including:

P(cls)=a*RA _cls +b*Re _cls +c*Rd _cls

Wherein, P(cls) is the detection probability value, RA _cls , Re _cls and Rd _cls are the probability values that are determined to be living bodies after being detected and processed by the living body detection network, and a, b and c are preset weights.

Specifically, the detection probability value is compared with a preset detection threshold value in combination with a preset determination formula to obtain a living body detection result of the to-be-detected image set, including:

The determination formula is:

Wherein, y is the determination result, and N is the preset detection threshold.

Preferably, in the embodiment of the present application, N is 0.65.

In this embodiment of the present application, before acquiring and detecting the image set to be detected, a face image set with faces is selected from the set of images to be detected through a face classification network and a face localization network, and further positioning is performed from the set of face images. As for the face area image set, using the living body detection network to perform the living body detection processing on the face area image set can reduce the computational resources consumed in the living body detection, and can improve the accuracy of the living body detection. Therefore, the living body detection method, device and computer-readable storage medium proposed in the present application can improve the efficiency of the living body detection method, and solve the problems that traditional image recognition algorithms consume a lot of computing resources and have low accuracy when performing living body detection.

As shown in FIG. 3 , it is a schematic diagram of a module of a living body detection device provided by an embodiment of the present application.

The living body detection apparatus 100 described in the present application may be installed in an electronic device. According to the implemented functions, the living body detection apparatus 100 may include a face classification module 101 , a face positioning module 102 , a living body detection module 103 , and a result generation module 104 . The modules described in this application may also be referred to as units, which refer to a series of computer program segments that can be executed by the processor of an electronic device and can perform fixed functions, and are stored in the memory of the electronic device.

In this embodiment, the functions of each module/unit are as follows:

The face classification module 101 is used to obtain a set of images to be detected, and to perform classification processing on the set of images to be detected by using a face classification network, and to obtain a set of face images after screening;

The face location module 102 is configured to use a face location network to perform a face location operation on the face image set to obtain a face area image set;

The living body detection module 103 is configured to use a living body detection network to perform a living body detection process on the face region image set to obtain multiple detection results;

The result generation module 104 is configured to perform a weighted average of the multiple detection results to obtain a living body detection result of the image set to be detected.

In the embodiment of the present application, when each module in the living body detection apparatus 100 is used, the living body detection method shown in FIG. 2 can be implemented, and the same beneficial effects can be produced, which will not be repeated here.

As shown in FIG. 4 , it is a schematic structural diagram of an electronic device implementing the method for detecting a living body of the present application.

The electronic device 1 may include a processor 10, a memory 11 and a bus, and may also include a computer program stored in the memory 11 and executable on the processor 10, such as a living body detection program 12.

Wherein, the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc. The memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a mobile hard disk of the electronic device 1 . In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a pluggable mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash memory card (Flash Card), etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 can not only be used to store application software installed in the electronic device 1 and various types of data, such as codes of the living body detection program 12 , etc., but also can be used to temporarily store data that has been output or will be output.

In some embodiments, the processor 10 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more integrated circuits. Central Processing Unit (CPU), microprocessor, digital processing chip, graphics processor and combination of various control chips, etc. The processor 10 is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, by running or executing the program or module (for example, executing the program) stored in the memory 11. living body detection program, etc.), and call data stored in the memory 11 to execute various functions of the electronic device 1 and process data.

The bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (Extended industry standard architecture, EISA for short) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to implement connection communication between the memory 11 and at least one processor 10 and the like.

FIG. 4 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 4 does not constitute a limitation on the electronic device 1, and may include fewer or more components than those shown in the drawings. components, or a combination of certain components, or a different arrangement of components.

For example, although not shown, the electronic device 1 may also include a power supply (such as a battery) for powering the various components, preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that the power management The device implements functions such as charge management, discharge management, and power consumption management. The power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components. The electronic device 1 may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

Further, the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.

Optionally, the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (eg, a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like. The display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.

It should be understood that the embodiments are only used for illustration, and are not limited by this structure in the scope of the patent application.

The living body detection program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions, and when running in the processor 10, can realize:

Further, if the modules/units integrated in the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. The computer-readable storage medium may be volatile or non-volatile, for example, the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U Disk, removable hard disk, magnetic disk, CD-ROM, computer memory, read-only memory (ROM, Read-Only Memory).

The present application also provides a computer-readable storage medium, the computer-readable storage medium may be volatile or non-volatile, and the readable storage medium stores a computer A program, when the computer program is executed by the processor of the electronic device, it can realize:

Further, the computer-usable storage medium may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required by at least one function, and the like; using the created data, etc.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.

The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

It will be apparent to those skilled in the art that the present application is not limited to the details of the above-described exemplary embodiments, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application.

Accordingly, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the application is to be defined by the appended claims rather than the foregoing description, which is therefore intended to fall within the scope of the claims. All changes within the meaning and scope of the equivalents of , are included in this application. Any accompanying reference signs in the claims should not be construed as limiting the involved claims.

Furthermore, it is clear that the word "comprising" does not exclude other units or steps and the singular does not exclude the plural. Several units or means recited in the system claims can also be realized by one unit or means by means of software or hardware. Second-class terms are used to denote names and do not denote any particular order.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application rather than limitations. Although the present application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present application can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present application.

Claims

A method for detecting a living body, wherein the method comprises:

Obtaining a set of images to be detected, using a face classification network to classify the set of images to be detected, and screening to obtain a set of face images;

Use a face localization network to perform a face localization operation on the face image set to obtain a face region image set;

Using a living body detection network to perform living body detection processing on the face region image set to obtain multiple detection results;

A weighted average is performed on the multiple detection results to obtain a living body detection result of the image set to be detected.
The living body detection method according to claim 1, wherein, before the acquisition of the image set to be detected, the method further comprises:

constructing a face living body judgment model including the face classification network, the face positioning network and the living body detection network;

using the training image set to train the face classification network;

When the number of training times of the face classification model reaches a first preset number of times, use the training image set to jointly train the face classification network and the face location network;

When the number of joint training times of the face classification network and the face location network reaches a second preset number of times, the face classification network, the face location network and the living body are analyzed using the training image set. The detection network is jointly trained, and a trained face living body judgment model including the face classification network, the face localization network and the living body detection network is obtained.
The living body detection method according to claim 2, wherein the joint training of the face classification network, the face localization network and the living body detection network is performed by using the training image set, and the obtained training completed comprises: The face living body judgment model of the face classification network, the face positioning network and the living body detection network, including:

Use the face classification network to classify the training image set to obtain a face training set, and calculate the first loss value between the face training set and a preset real label;

Use the face localization network to locate the face training set to obtain a face region training map, and calculate the face scale set and face position offset set of the face region training map, and calculate the the second loss value between the face scale set and the preset real scale set and calculating the third loss value between the face position offset set and the preset real position offset set;

Use the living body detection network to detect the face region training map to obtain a predicted live body detection set, and calculate a fourth loss value between the predicted live body detection set and a preset real live body detection set;

The first loss value, the second loss value, the third loss value and the fourth loss value are processed in series by using a preset weight to obtain a joint loss value;

If the joint loss value is greater than a preset threshold, adjust and update the face classification network, the face location network and the living body detection network until the joint loss value is less than or equal to the preset threshold When the training is completed, the face living body judgment model including the face classification network, the face positioning network and the living body detection network is obtained.
The method for living body detection according to claim 3, wherein the calculating the first loss value between the face training set and a preset real label comprises:

The first loss value L c is calculated using the following first loss function:

where α and β are the hyperparameters of the first loss function, Y x, y represent the gray value of coordinates (x, y) in the true label,
represents the gray value of the coordinates (x, y) in the face training set, and N is the number of samples in the face training set.
The liveness detection method according to claim 3, wherein the calculating the second loss value between the face scale set and the preset real scale set comprises:

The second loss value L size is calculated using the following second loss function:

Wherein, |A∪B| is the intersection area between the picture in the face scale set and the picture in the real scale set, and |A∩B| is the area between the picture in the face scale set and the picture in the real scale set The combined area, |A c | is the area of the smallest closure between the picture in the face scale set and the picture in the true scale set.
The living body detection method according to claim 3, wherein the calculating the third loss value between the face position offset set and the preset real position offset set comprises:

The third loss value L off is calculated using the following third loss function:

Among them, x is the difference between the Kth real position offset and the face position offset, and M is the number of samples of the real center offset map.
The method of living body detection according to claim 3, wherein the calculating a fourth loss value between the predicted living body detection set and the preset real living body detection set comprises:

The fourth loss value is calculated using the following fourth loss function

in,
is the predicted live detection set, Y is the real live detection set, Q is the number of samples in the predicted live detection set, and λ represents an error factor.
A living body detection device, wherein the device comprises:

a face classification module, used to obtain a set of images to be detected, and to classify and process the set of images to be detected by using a face classification network, and to obtain a set of face images after screening;

a face localization module, used for using a face localization network to perform a face localization operation on the face image set to obtain a face region image set;

a living body detection module, configured to perform living body detection processing on the face region image set by using a living body detection network to obtain multiple detection results;

The result generating module is configured to perform a weighted average of the multiple detection results to obtain the living body detection result of the image set to be detected.
An electronic device, wherein the electronic device comprises:

at least one processor; and,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores computer program instructions executable by the at least one processor, the computer program instructions being executed by the at least one processor to enable the at least one processor to perform the steps of:

Obtaining an image set to be detected, using a face classification network to classify the image set to be detected, and screening to obtain a face image set;

Use a face localization network to perform a face localization operation on the face image set to obtain a face region image set;

Using a living body detection network to perform living body detection processing on the face region image set to obtain multiple detection results;

A weighted average is performed on the multiple detection results to obtain a living body detection result of the image set to be detected.
The electronic device according to claim 9, wherein, before acquiring the image set to be detected, the method further comprises:

constructing a face living body judgment model including the face classification network, the face positioning network and the living body detection network;

using the training image set to train the face classification network;

When the number of training times of the face classification model reaches a first preset number of times, use the training image set to jointly train the face classification network and the face location network;

When the number of joint training times of the face classification network and the face location network reaches a second preset number of times, the face classification network, the face location network and the living body are analyzed using the training image set. The detection network is jointly trained, and a trained face living body judgment model including the face classification network, the face localization network and the living body detection network is obtained.
The electronic device according to claim 10, wherein the joint training is performed on the face classification network, the face localization network and the living body detection network by using the training image set, and the obtained training completed includes all the The face living body judgment model of the face classification network, the face positioning network and the living body detection network, including:

Use the face classification network to classify the training image set to obtain a face training set, and calculate the first loss value between the face training set and a preset real label;

Use the face localization network to locate the face training set to obtain a face region training map, and calculate the face scale set and face position offset set of the face region training map, and calculate the the second loss value between the face scale set and the preset real scale set and calculating the third loss value between the face position offset set and the preset real position offset set;

Use the living body detection network to detect the face region training map to obtain a predicted live body detection set, and calculate a fourth loss value between the predicted live body detection set and a preset real live body detection set;

The first loss value, the second loss value, the third loss value and the fourth loss value are processed in series by using a preset weight to obtain a joint loss value;

If the joint loss value is greater than a preset threshold, adjust and update the face classification network, the face location network and the living body detection network until the joint loss value is less than or equal to the preset threshold When the training is completed, the face living body judgment model including the face classification network, the face positioning network and the living body detection network is obtained.
The electronic device according to claim 11, wherein the calculating a first loss value between the face training set and a preset real label comprises:

The first loss value L c is calculated using the following first loss function:

where α and β are the hyperparameters of the first loss function, Y x, y represent the gray value of coordinates (x, y) in the true label,
represents the gray value of the coordinates (x, y) in the face training set, and N is the number of samples in the face training set.
The electronic device according to claim 11, wherein the calculating a second loss value between the face scale set and a preset real scale set comprises:

The second loss value L size is calculated using the following second loss function:

Wherein, |A∪B| is the intersection area between the picture in the face scale set and the picture in the real scale set, and |A∩B| is the area between the picture in the face scale set and the picture in the real scale set The combined area, |A c | is the area of the smallest closure between the picture in the face scale set and the picture in the true scale set.
The electronic device according to claim 11, wherein the calculating the third loss value between the face position offset set and the preset real position offset set comprises:

The third loss value L off is calculated using the following third loss function:

Among them, x is the difference between the Kth real position offset and the face position offset, and M is the number of samples of the real center offset map.
The electronic device of claim 11, wherein the calculating a fourth loss value between the predicted liveness detection set and a preset real liveness detection set comprises:

The fourth loss value is calculated using the following fourth loss function

in,
is the predicted live detection set, Y is the real live detection set, Q is the number of samples in the predicted live detection set, and λ represents an error factor.
A computer-readable storage medium storing a computer program, wherein the computer program implements the following steps when executed by a processor:

Obtaining a set of images to be detected, using a face classification network to classify the set of images to be detected, and screening to obtain a set of face images;

Use a face localization network to perform a face localization operation on the face image set to obtain a face region image set;

Using a living body detection network to perform living body detection processing on the face region image set to obtain multiple detection results;

A weighted average is performed on the multiple detection results to obtain a living body detection result of the image set to be detected.
The computer-readable storage medium of claim 16, wherein before the acquiring the image set to be detected, the method further comprises:

constructing a face living body judgment model including the face classification network, the face positioning network and the living body detection network;

using the training image set to train the face classification network;

When the number of training times of the face classification model reaches a first preset number of times, use the training image set to jointly train the face classification network and the face location network;

When the number of joint training times of the face classification network and the face location network reaches a second preset number of times, the face classification network, the face location network and the living body are analyzed using the training image set. The detection network is jointly trained, and a trained face living body judgment model including the face classification network, the face localization network and the living body detection network is obtained.
The computer-readable storage medium according to claim 17, wherein the joint training of the face classification network, the face localization network and the living body detection network is performed by using the training image set, and the training is completed. The face living body judgment model including the face classification network, the face localization network and the living body detection network, including:

Use the face classification network to classify the training image set to obtain a face training set, and calculate the first loss value between the face training set and a preset real label;

Use the face localization network to locate the face training set to obtain a face region training map, and calculate the face scale set and face position offset set of the face region training map, and calculate the the second loss value between the face scale set and the preset real scale set and calculating the third loss value between the face position offset set and the preset real position offset set;

Use the living body detection network to detect the face region training map to obtain a predicted live body detection set, and calculate a fourth loss value between the predicted live body detection set and a preset real live body detection set;

The first loss value, the second loss value, the third loss value and the fourth loss value are processed in series by using a preset weight to obtain a joint loss value;

If the joint loss value is greater than a preset threshold, adjust and update the face classification network, the face location network and the living body detection network until the joint loss value is less than or equal to the preset threshold When the training is completed, the face living body judgment model including the face classification network, the face positioning network and the living body detection network is obtained.
The computer-readable storage medium of claim 18, wherein the calculating a first loss value between the face training set and a preset real label comprises:

The first loss value L c is calculated using the following first loss function:

where α and β are the hyperparameters of the first loss function, Y x, y represent the gray value of coordinates (x, y) in the true label,
represents the gray value of the coordinates (x, y) in the face training set, and N is the number of samples in the face training set.
The computer-readable storage medium of claim 18, wherein the calculating a second loss value between the face scale set and a preset real scale set comprises:

The second loss value L size is calculated using the following second loss function:

Wherein, |A∪B| is the intersection area between the picture in the face scale set and the picture in the real scale set, and |A∩B| is the area between the picture in the face scale set and the picture in the real scale set The combined area, |A c | is the area of the smallest closure between the picture in the face scale set and the picture in the true scale set.
The computer-readable storage medium of claim 18, wherein the calculating a third loss value between the set of face position offsets and a preset set of real position offsets comprises:

The third loss value L off is calculated using the following third loss function:

Among them, x is the difference between the Kth real position offset and the face position offset, and M is the number of samples of the real center offset map.
The computer-readable storage medium of claim 18, wherein the calculating a fourth loss value between the predicted liveness detection set and a preset real liveness detection set comprises:

The fourth loss value is calculated using the following fourth loss function

in,
is the predicted live detection set, Y is the real live detection set, Q is the number of samples in the predicted live detection set, and λ represents an error factor.