CN116704578A

CN116704578A - Face image joint detection method, device, computer equipment and storage medium

Info

Publication number: CN116704578A
Application number: CN202310679219.8A
Authority: CN
Inventors: 陈俊逸; 汤继敏
Original assignee: Changsha Xiaogu Technology Co ltd
Current assignee: Changsha Xiaogu Technology Co ltd
Priority date: 2023-06-09
Filing date: 2023-06-09
Publication date: 2023-09-05

Abstract

The application relates to the technical field of computer vision, and provides a face image joint detection method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: inputting the face image to be detected into a quality living body joint detection model, wherein the quality living body joint detection model is obtained by training based on model distillation and a trained face quality prediction model; and after the face image to be detected is processed by the backbone network of the quality living body joint detection model, inputting a quality detection branch and a living body detection branch to obtain a quality detection result and a living body detection result of the face image. By adopting the method, the running speed and the detection precision can be improved.

Description

Face image joint detection method, device, computer equipment and storage medium

Technical Field

The application belongs to the technical field of computer vision, and particularly relates to a face image joint detection method, a face image joint detection device, computer equipment and a storage medium.

Background

With the development of computer vision technology, face recognition technology has been widely applied to various scenes. Such as smart locks, access controls, face payments, etc. In order to ensure sufficient safety for such scenes with high safety requirements, it is common to perform not only face quality detection but also living face detection.

The face image detection is to analyze the face image to judge whether the face image meets the quality requirement. The human face living body detection is used for judging whether the human face image is a true human image or not. However, face recognition in these existing scenarios mainly performs face quality detection and face living detection as two separate tasks, thereby reducing the running speed and detection accuracy.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a face image joint detection method, apparatus, computer device, and storage medium capable of improving the operation speed and detection accuracy.

The application provides a face image joint detection method, which comprises the following steps:

inputting a face image to be detected into a quality living body joint detection model, wherein the quality living body joint detection model comprises a backbone network, a quality detection branch and a living body detection branch, and is obtained by training based on model distillation and a trained face quality prediction model;

and the backbone network of the quality living body joint detection model inputs a quality detection branch and a living body detection branch after processing the face image to be detected, so as to obtain a quality detection result and a living body detection result of the face image.

In one embodiment, the step of inputting the quality detection branch and the living body detection branch after the processing of the face image to be detected by the backbone network of the quality living body joint detection model to obtain the quality detection result and the living body detection result of the face image includes:

the face image features output by the backbone network model are respectively input to the quality detection branch and the full-connection layer of the living body detection branch to obtain face quality features and face living body features;

inputting the face quality characteristics into an output layer of the quality detection branch, and outputting face quality scores;

and after carrying out feature fusion on the face quality features and the face living body features, inputting the face quality features and the face living body features into an output layer of the living body detection branch, and outputting the face living body score.

In one embodiment, the output layer of the quality detection branch includes at least one full connection layer and a Softmax layer, the full connection layer and the Softmax layer correspond to different face quality parameters, and the face quality parameters include at least one of angle, occlusion, blur and expression;

the output layer of the living body detection branch comprises a full-connection layer and a Softmax layer.

In one embodiment, the outputting layer for inputting the face quality feature and the face living body feature into the living body detection branch after feature fusion includes:

performing a vector outer product operation and a vector expansion operation on the face quality feature and the face living body feature to obtain a fusion feature;

the fusion feature is input to an output layer of the living body detection branch.

In one embodiment, the training process of the quality living body joint detection model comprises the following steps:

constructing a training data set, wherein the training data set comprises a true face data set and a fake face data set; the real human face data set is marked with a quality label and a living body label; the fake face data set is marked with a living body label;

training a face quality prediction model by using the real face data set, and inputting the training data set into the trained face quality prediction model to obtain face quality prediction output;

iteratively training the quality living body joint detection model by using the training data set; the quality detection branch adopts a model distillation loss function, and the model distillation loss function calculates the quality loss based on the face quality prediction output and the face quality training output of the quality detection branch; the total loss of the mass living body joint detection model training is the sum of the mass loss and the living body loss, and the weight coefficients of the mass loss and the living body loss are dynamically adjusted in the training process.

In one embodiment, the dynamically adjusting the weight coefficients of the mass loss and the living body loss includes: and after the preset number of epoch iterations, reducing the weight coefficient by a preset value until the weight coefficient is a target value.

In one embodiment, before the inputting the face image to be detected into the quality living body joint detection model, the method further includes: and enlarging, cutting and scaling the face frame in the face image to obtain the face image to be detected.

A face image joint detection apparatus comprising:

the input module is used for inputting the face image to be detected into a quality living body joint detection model, and the quality living body joint detection model is obtained by training based on model distillation and a trained face quality prediction model;

The application also provides a computer device comprising a processor and a memory, the memory storing a computer program, the processor implementing the steps of the face image joint detection method of any one of the above when executing the computer program.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the face image joint detection method of any one of the above.

According to the face image joint detection method, the device, the computer equipment and the storage medium, the face detection is carried out on the face image through the multi-task quality living body joint detection model, so that the quality detection and the living body detection can be carried out simultaneously, and the detection running speed is improved. And the quality living body joint detection model is obtained by training based on model distillation and a trained face quality prediction model, so that the expression capacity of a neural network can be improved by utilizing the correlation between two tasks to improve the accuracy of the model, and the detection accuracy can be improved.

Drawings

Fig. 1 is a flow chart of a face image joint detection method in an embodiment.

FIG. 2 is a flow diagram of a method for obtaining fusion features in one embodiment.

FIG. 3 is a schematic diagram of the structure of a combined mass and living body detection model in one embodiment.

FIG. 4 is a flow chart of a method for training a combined detection model of mass living bodies in one embodiment.

Fig. 5 is a schematic structural diagram of a face quality prediction model in an embodiment.

FIG. 6 is a training schematic of a mass in vivo joint detection model in one embodiment.

Fig. 7 is a block diagram of a face image joint detection apparatus according to an embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The application provides a face image joint detection method, as shown in fig. 1, which comprises steps S101-S102.

S101, inputting a face image to be detected into a quality living body joint detection model; the quality living body joint detection model comprises a backbone network, a quality detection branch and a living body detection branch, and is obtained by training based on model distillation and a trained face quality prediction model;

s102, the backbone network of the quality living body joint detection model processes the face image to be detected and then inputs the face image to be detected into a quality detection branch and a living body detection branch to obtain a quality detection result and a living body detection result of the face image.

The face image to be detected is a face image which needs to be subjected to face detection at present. The quality living body joint detection model is a neural network model for face detection, and comprises a backbone network, a quality detection branch and a living body detection branch, and is obtained by training based on model distillation and a trained face quality prediction model.

Specifically, the quality living body joint detection model is deployed into equipment after training, and after the equipment acquires the face image to be detected, the trained quality living body joint detection model is called. Then, the face image to be detected is input into a quality living body joint detection model, and the backbone network at the front end of the quality living body joint detection model processes the face image to be detected and then respectively inputs into a quality detection branch and a living body detection branch. And then the quality detection branch outputs the quality detection result of the human face, and the living body detection branch outputs the living body detection result of the human face.

In some embodiments, S102 comprises: the face image features output by the backbone network model are respectively input to the full-connection layers of the quality detection branch and the living body detection branch, so that face quality features and face living body features are obtained; inputting the face quality characteristics into an output layer of a quality detection branch, and outputting face quality scores; and after the face quality characteristics and the face living body characteristics are subjected to characteristic fusion, inputting the face quality characteristics and the face living body characteristics into an output layer of a living body detection branch, and outputting the face living body score.

Specifically, the quality detection branch and the living body detection branch respectively comprise a full connection layer and an output layer. After the backbone network at the front end of the quality living body joint detection model processes the input face image to be detected to obtain the face image characteristics, the face image characteristics are respectively input to the full-connection layers of the quality detection branch and the living body detection branch. Then, the full connection layer branched by the quality detection outputs the face quality feature based on the face image feature. Meanwhile, the full-connection layer of the living body detection branch outputs the human face living body characteristics based on the human face image characteristics. And then the face quality characteristics are input to an output layer of the quality detection branch to output a face quality score as a quality detection result. And the input characteristics of the output layer of the living body detection branch are fusion characteristics, namely, the human face quality characteristics and the human face living body characteristics are subjected to characteristic fusion and then input into the output layer of the living body detection branch. The output layer of the living body detection branch outputs a face living body score as a living body detection result based on the fusion characteristics of the living body and the quality.

In some embodiments, the output layer of the living body detection branch after feature fusion of the face quality feature and the face living body feature is input, as shown in fig. 2, including: carrying out a vector outer product operation and a vector expansion operation on the face quality characteristics and the face living body characteristics to obtain fusion characteristics; the fusion features are input to the output layer of the living detection branch. In the embodiment of the application, the feature fusion adopts one vector outer product operation and one vector unfolding operation to strengthen the features of the living body part, thereby improving the accuracy of living body detection.

In some embodiments, the face quality parameter may include at least one of angle, occlusion, blurring, and expression according to actual business requirements. Therefore, the embodiment of the application designs the corresponding output layer aiming at different face quality parameters. Furthermore, in the embodiment of the application, the output layer of the quality detection branch comprises at least one full-connection layer and a Softmax layer, and each full-connection layer and each Softmax layer correspond to different face quality parameters. For example, angles, occlusions, blur, and expressions have corresponding fully connected and Softmax layers, respectively.

In addition, the living body detection of the human face only detects whether the human face image is a real person, so that the output layer of the living body detection branch comprises a full-connection layer and a Softmax layer.

In some embodiments, the backbone network of the quality of living joint detection model is preferably a lightweight backbone network, such as ResNet, mobileNet, shuffleNet, and the like.

In some embodiments, to improve the image quality or to enable the image size to be adapted to the model, before inputting the face image to be detected into the quality living body joint detection model, further comprises: and enlarging, cutting and scaling the face frame in the face image to obtain the face image to be detected.

Specifically, the expansion times, the clipping size and the scaling size of the face frame can be set according to the actual business requirements, but are required to be consistent with the size of the image preprocessing during model training. For example, the face frame may be enlarged 1.5 times and then cut. The cropped image is then scaled to 96 x 96 size.

According to the face image joint detection method, the face image is subjected to face detection through the multitasking quality living body joint detection model, so that the quality detection and living body detection can be simultaneously carried out, and the detection running speed is improved. And the quality living body joint detection model is obtained by training based on model distillation and a trained face quality prediction model, so that the expression capacity of a neural network can be improved by utilizing the correlation between two tasks to improve the accuracy of the model, and the detection accuracy can be improved.

As shown in fig. 3, an embodiment of the present application provides a schematic structural diagram of a quality living body joint detection model, and the face image joint detection method provided by the present application is described below with reference to fig. 1.

Referring to fig. 1, the backbone network in the quality living body joint detection model provided by the embodiment of the present application uses an 18-layer res net network res net18 (0.5), that is, the backbone network res net18 (0.5) uses a smaller 18-layer res net network, and 0.5 means that the number of convolved channels at each layer is 0.5 times that of the standard res net network. The quality detection branch comprises a full connection layer FC, and the output layer comprises 4 full connection layers FC and Softmax layers which respectively correspond to a large angle, shielding, blurring and a large expression. The living body detection branch comprises a full connection layer FC, a characteristic fusion layer and a full connection layer FC and a Softmax layer in the output layer.

Specifically, the face image to be detected is scaled to 96×96 after image preprocessing, and then is input to a backbone network ResNet18 (0.5) to obtain the face image characteristics. Then, the face image features are input to the full connection layers FC of the quality detection branch and the living body detection branch, respectively. In the quality detection branch, the human face quality characteristics output by the full connection layer FC are respectively input to four full connection layers FC and Softmax layers in the output layer and a characteristic fusion layer of the living body detection branch. The four full-connection layers FC and Softmax in the quality detection branch output layer respectively output a large-angle score, a shielding score, a fuzzy score and a large expression score, namely a quality detection result.

In the living body detection branch, the human face living body characteristics output by the full-connection layer FC are given to a characteristic fusion layer, the characteristic fusion layer fuses the human face living body characteristics and the human face quality characteristics and then gives the human face living body characteristics to the full-connection layer FC and the Softmax layer in the output layer, and living body scores are obtained, namely living body detection results.

In one embodiment, as shown in fig. 4, an embodiment of the present application provides a flow chart of a method for training a mass living body joint detection model, which includes S301-303.

S301, constructing a training data set.

The training data set comprises a true human face data set and a fake human face data set, and the true human face data set is marked with a quality label and a living body label. The counterfeit face data set is labeled with a living label.

Specifically, a real human face image is first collected as a real human face data set. In the process of acquisition, quality parameters are required to be considered to acquire different real face images. For example, a large-angle real face image, an occluded real face image, a clear real face image, and a real face image with various expressions (mouth opening, eye closing, etc.) are acquired, respectively. Then, the quality label and the living label are labeled on the face image of the real person, and the living label is set to 1. The quality labels are marked based on the considered quality parameters, for example, the number of the quality labels is 4, and the quality labels comprise a large-angle label, a shielding label, a fuzzy label, a large expression label and the like. The large-angle label is 1 when the face angle is larger, otherwise, the large-angle label is 0. If the face is blocked, the blocking label is 1, otherwise, the face is 0. The fuzzy label of the face area is 1, otherwise, the fuzzy label of the face area is 0. The big expression label is 1 if the eyes are closed, the mouth is opened, and the like, otherwise, the big expression label is 0.

Then, the fake face image is collected as a fake face data set, and for example, a camera similar to that used for collecting a real person can be used for collecting face printing paper and a photo of a face video as the fake face image. And the label is labeled as well, only the living label is labeled, and the label is set to 0.

Finally, image preprocessing can be performed on all face images after the data are acquired. For example, face images in a training dataset are detected using existing face detection tools. And expanding the detection frame of the obtained face by 1.5 times, and then cutting and storing the detection frame for later use.

S302, training a face quality prediction model by using the real face data set, and inputting the training data set into the trained face quality prediction model to obtain face quality prediction output.

Specifically, after a training data set is constructed, a face quality prediction model is trained by utilizing the real face data set, and a trained face quality prediction model is obtained. And then, inputting all training data sets into the trained face quality prediction model to obtain face quality prediction output. The face quality prediction can be used for outputting a training quality living body joint detection model.

In some embodiments, as shown in fig. 5, a schematic structure diagram of a face quality prediction model is provided, and a training process of the face quality prediction model is described based on the structure of the face quality prediction model.

Specifically, after the training data set is built, the quality label attribute of the real human face data set is utilized to train the human face quality prediction model. First, the real face image in the real face dataset is scaled to 224 x 224, then input into a res net network with 100 layers, followed by a full connection layer FC. Further, for each quality label, one full connection layer and one Softmax layer were used for output. The loss function of the face quality prediction model training is as follows:

；

where t represents the labeled quality label and o represents the output of the model. m is the size of mini-batch during training.Representing the ith image, +.>Representing a loss of classification->Is the overall loss in quality training. Large angle->Shielding->Blurring->Big expression->Are individual quality labels of the face image x. Is a big angle +>Shielding coverBlurring->Big expression->Is 4 outputs of the face quality prediction model.

S303, iteratively training a quality living body joint detection model by using the training data set.

All training data sets are adopted in the training of the quality living body joint detection model, and model distillation loss functions are adopted in the quality detection branch training. The model distillation loss function calculates a quality loss based on the face quality prediction output and the face quality training output of the quality detection branch. The living body detection branch adopts a two-class loss function. The total loss of mass and living body combined detection model training is the sum of mass loss and living body loss. And in the iterative training process, the weight coefficients of the mass loss and the living body loss are dynamically adjusted.

As shown in FIG. 6, a training schematic of a mass in vivo joint detection model is provided. The training process of the mass living body joint detection model will be described below with reference to fig. 6.

Specifically, the quality living body joint detection model adopts a multi-task architecture, and respectively outputs a face quality score and a face living body score. The face images in the training dataset are first scaled to 224 x 224 and 96 x 96, respectively, to reduce the computational effort of the network. Then, the face quality prediction output is obtained by inputting the images with the size of 224 x 224 into the trained face quality prediction model, and the images with the size of 96 x 96 are input into the backbone network ResNet18 (0.5) of the quality living body joint detection model. The backbone network is processed and then respectively input to a quality detection branch and a living body detection branch.

In the quality detection branch, the quality loss adopts a model distillation technology, and a trained face quality prediction model is required to be used for obtaining a quality label. The mass loss Lquality is expressed as follows:

；

wherein m is the size of mini-batch,representing the ith image, +.> 、/> 、/>Andis the output of the trained face quality prediction model, < >> 、/> 、/>And->The method is output corresponding to four quality parameters of a quality detection branch in a multi-task quality living body joint detection model, namely face quality training output. KL represents Kullback-Leibler divergence, a loss function commonly used in model distillation techniques, a small model can learn the output of a large model with KL loss.

In the living body detection branch, since training of a living body task is difficult, features of a living body part are enhanced by a feature fusion operation. After feature fusion, one full connectivity layer FC and Softmax layer was used for each output. Living body loss requires the use of living body tagsA two-class loss function can be employed, with living loss rlive expressed as follows:

。

total loss when training mass living body joint detection modelIs the sum of mass loss and living body loss:

wherein, the liquid crystal display device comprises a liquid crystal display device,is the weight coefficient of two losses. In the training process, every time m face images are input into a quality living body joint detection model, the total loss can be calculated only by using living body labels of the images>. And then, performing iterative optimization of the weight of the quality living body joint detection model by using a back propagation algorithm.

In some embodiments, the weight coefficients may be dynamically changed during training to increase convergence speed, since training with two tasks simultaneously may lead to convergence difficultiesValues. The training of the quality detecting section may be emphasized first by the weight coefficient adjustment, and then the weight of the living body section may be gradually increased. Specifically, after the preset number of epoch iterations, the weight coefficient is reduced by a preset value until the weight coefficient is the target value. For example, in the beginning of the phase,set to 1 and then decrease by 0.1 after every 2 epoch iterations until the target value is 0.5.

In embodiments of the present application, model distillation techniques are utilized to aid in the training of quality tasks. The large quality model is trained firstly, and then the large quality model is used for guiding the training of the small model, namely the living body detection model, so that the accuracy of the small model can be improved.

It should be understood that, although the steps in the flowcharts of fig. 1 and 3 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1, 3 may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the steps or stages is not necessarily sequential, but may be performed in rotation or alternately with at least some of the other steps or stages.

In one embodiment, as shown in fig. 7, there is provided a face image joint detection apparatus, including:

the input module 601 is configured to input a face image to be detected into a quality living body joint detection model, where the quality living body joint detection model is obtained by training based on model distillation and a trained face quality prediction model.

The detection module 602 processes the face image to be detected by the backbone network of the quality living body joint detection model, and inputs the quality detection branch and the living body detection branch to obtain a quality detection result and a living body detection result of the face image.

In one embodiment, the detection module 602 is further configured to input the face image features output by the backbone network model to the full connection layer of the quality detection branch and the living body detection branch, to obtain a face quality feature and a face living body feature; inputting the face quality characteristics into an output layer of a quality detection branch, and outputting face quality scores; and after the face quality characteristics and the face living body characteristics are subjected to characteristic fusion, inputting the face quality characteristics and the face living body characteristics into an output layer of a living body detection branch, and outputting the face living body score.

In one embodiment, the detection module 602 is further configured to perform feature fusion on the face quality feature and the face living feature to obtain a fused feature; the fusion features are input to the output layer of the living detection branch.

In one embodiment, the system further comprises a training module for constructing a training data set comprising a real face data set and a fake face data set; the real human face data set is marked with a quality label and a living body label; labeling a living body label on the fake face data set; training a face quality prediction model by using the real face data set, and inputting the training data set into the trained face quality prediction model to obtain face quality prediction output; iteratively training the quality living body joint detection model by using a training data set; the quality detection branch adopts a model distillation loss function, and the model distillation loss function calculates the quality loss based on the face quality prediction output and the face quality training output of the quality detection branch; the total loss of the mass living body joint detection model training is the sum of the mass loss and the living body loss, and the weight coefficient of the mass loss and the living body loss is dynamically adjusted in the training process.

In one embodiment, the training module is further configured to decrease the weight coefficient by a preset value after the preset number of epoch iterations until the weight coefficient is the target value.

In one embodiment, the method further comprises a preprocessing module, which is used for enlarging a face frame in the face image, then cutting and scaling to obtain the face image to be detected.

The specific limitation of the face image joint detection apparatus may be referred to the limitation of the face image joint detection method hereinabove, and will not be described herein. The modules in the facial image joint detection device can be realized in whole or in part by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules. Based on such understanding, the present application may implement all or part of the procedures in the above-described embodiment method, or may be implemented by instructing related hardware by a computer program, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of the above-described embodiment method for joint detection of face images when executed by a processor. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc.

In one embodiment, a computer device is provided, which may be a server, including a processor, a memory, and a network interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a face image joint detection method. For example, a computer program may be split into one or more modules, one or more modules stored in memory and executed by a processor to perform the present application. One or more modules may be a series of computer program instruction segments capable of performing particular functions to describe the execution of a computer program in a computer device. The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is a control center of the computer device, connecting various parts of the overall computer device using various interfaces and lines.

The memory may be used to store the computer program and/or modules, and the processor may implement various functions of the computer device by running or executing the computer program and/or modules stored in the memory, and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to use, or the like. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

In another embodiment, a computer device is provided, including a memory and a processor, where the memory stores a computer program, and the processor implements the face image joint detection method described in any of the foregoing embodiments when executing the computer program.

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor, implements the face image joint detection method described in any one of the above embodiments.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. The face image joint detection method is characterized by comprising the following steps of:

2. The method according to claim 1, wherein the step of inputting the quality detection branch and the living body detection branch after the processing of the face image to be detected by the backbone network of the quality living body joint detection model to obtain the quality detection result and the living body detection result of the face image includes:

3. The method of claim 2, wherein the output layer of the quality detection branch comprises at least one full connection layer and a Softmax layer, the full connection layer and the Softmax layer corresponding to different face quality parameters, the face quality parameters comprising at least one of angle, occlusion, blur, and expression;

4. The method according to claim 2, wherein the inputting the face quality feature and the face living body feature into the output layer of the living body detection branch after feature fusion includes:

5. The method of any one of claims 1-4, wherein the training process of the quality in vivo joint detection model comprises:

6. The method of claim 5, wherein the dynamically adjusting the weight coefficients of the mass loss and the living loss comprises: and after the preset number of epoch iterations, reducing the weight coefficient by a preset value until the weight coefficient is a target value.

7. The method according to claim 1, further comprising, before inputting the face image to be detected into the quality living body joint detection model: and enlarging, cutting and scaling the face frame in the face image to obtain the face image to be detected.

8. A face image joint detection apparatus, comprising:

9. A computer device comprising a processor and a memory, the memory storing a computer program, characterized in that the processor is configured to implement the face image joint detection method of any one of claims 1-7 when executing the computer program.

10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the face image joint detection method of any of claims 1-7.