WO2021068322A1

WO2021068322A1 - Training method and apparatus for living body detection model, computer device, and storage medium

Info

Publication number: WO2021068322A1
Application number: PCT/CN2019/116269
Authority: WO
Inventors: 赵娅琳; 陆进; 陈斌; 宋晨
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-10-10
Filing date: 2019-11-07
Publication date: 2021-04-15
Also published as: CN110941986A; CN110941986B

Abstract

A training method for a living body detection model comprises: acquiring an initial living body detection model comprising an initial candidate region generation network and an initial living body classification network; training the initial candidate region generation network according to a first training sample set, to obtain a first candidate region generation network; training the initial living body classification network according to the first candidate region generation network and a second training sample set, to obtain a first living body classification network; according to the first candidate region generation network, the first living body classification network and the second training sample set, obtaining current living body position information; according to a difference between the current living body position information and target living body position information, adjusting parameters of the first candidate region generation network and continuing training same, to obtain a target candidate region generation network; and training the first living body classification network according to the target candidate region generation network and the second training sample set, to obtain a target living body classification network.

Description

Training method, device, computer equipment and storage medium of living body detection model

Cross-references to related applications

This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on October 10, 2019, the application number is 2019109581915, and the application title is "Training method, device, computer equipment and storage medium of living body detection model", and its entire content Incorporated in this application by reference.

Technical field

This application relates to a training method, device, computer equipment and storage medium of a living body detection model.

Background technique

With the development of artificial intelligence technology, near-infrared live detection technology has emerged. Near-infrared live detection, as an identity witness method, uses infrared light with different spectral bands than visible light. It can be blindly measured on near-infrared images without the user’s cooperation, which reduces the cumbersomeness of the live detection algorithm and improves its accuracy, and While reducing production costs, it can better guarantee the interests of related users and enterprises.

The traditional near-infrared living body detection method is mostly divided into two steps. First, the face detector is used to detect the human face on the color picture formed by visible light; then the LBP feature of the human face is extracted at the corresponding position of the near-infrared image and input to the living body discriminator for living body judgment. However, the inventor realizes that in this way, each step is an independent task, and the face detector and the living body discriminator used need to be trained separately. The fit between the models is not high, and the living body discrimination The accuracy of the detector is easily affected by the face detector, resulting in low accuracy of the trained model. .

Summary of the invention

According to various embodiments disclosed in the present application, a training method, device, computer device, and storage medium of a living body detection model are provided.

A method for training a living body detection model, the method comprising:

Acquiring an initial living body detection model, where the initial living body detection model includes an initial candidate region generation network and an initial living body classification network;

Acquiring a first training sample set and a second training sample set; the training samples corresponding to the second training sample set include a color image, a near-infrared image corresponding to the color image, and corresponding target living body position information;

Training the initial candidate region generation network according to the first training sample set until convergence, to obtain a first candidate region generation network;

Training the initial living body classification network according to the first candidate region generation network and the second training sample set until convergence, to obtain a first living body classification network;

The color image is input into the first candidate region generation network to obtain current face candidate region position information, and the current face candidate region position information and the near-infrared image are input into the first living body classification network In, get the current living body position information;

Adjust the parameters of the first candidate region generation network according to the difference between the current living body position information and the target living body position information, and return to the step of inputting the color image into the first candidate region generation network until convergence , Get the target candidate region generation network; and

Train the first living body classification network according to the target candidate region generation network and the second training sample set until convergence to obtain a target living body classification network, and obtain training based on the target candidate region generating network and the target living body classification network Good target live detection model.

A training device for a living body detection model, the device comprising:

An initial model acquisition module for acquiring an initial living body detection model, the initial living body detection model including an initial candidate region generation network and an initial living body classification network;

The training sample acquisition module is used to acquire a first training sample set and a second training sample set; the training samples corresponding to the second training sample set include a color image, a near-infrared image corresponding to the color image, and a corresponding target Living body position information;

The first training module is configured to train the initial candidate region generation network according to the first training sample set until it converges to obtain the first candidate region generation network;

The second training module is configured to train the initial living body classification network according to the first candidate region generation network and the second training sample set until it converges to obtain a first living body classification network;

The input module is used to input the color image into the first candidate area generation network to obtain current face candidate area position information, and input the current face candidate area position information and the near-infrared image into the In the first living body classification network, the current living body position information is obtained;

The parameter adjustment module is configured to adjust the parameters of the first candidate region generation network according to the difference between the current living body position information and the target living body position information, and return to input the color image to the first candidate region to generate The steps in the network until they converge, and the target candidate area is obtained to generate the network; and

The living body detection model obtaining module trains the first living body classification network according to the target candidate region generation network and the second training sample set until convergence to obtain the target living body classification network, and generates the network and the said target candidate region according to the target candidate region. The target living body classification network obtains a trained target living body detection model.

A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:

One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:

The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.

Fig. 1 is an application scenario diagram of a method for training a living body detection model according to one or more embodiments.

Fig. 2 is a schematic flowchart of a method for training a living body detection model according to one or more embodiments.

Fig. 3 is a schematic flowchart of steps for obtaining location information of a target face candidate area according to one or more embodiments.

Fig. 4 is a block diagram of a training device for a living body detection model according to one or more embodiments.

Figure 5 is a block diagram of a computer device according to one or more embodiments.

Detailed ways

In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.

The training method of the living body detection model provided in this application can be applied to the application environment as shown in FIG. 1. In this application environment, the computer device 102 first obtains the initial living body detection model including the initial candidate region generation network and the initial living body classification network, and trains the initial candidate region generation network according to the first training sample set until it converges to obtain the first candidate region generation network. Network, and then train the initial live classification network according to the first candidate area generation network and the second training sample set until it converges to obtain the first live classification network, and input the color image into the first candidate area generation network to obtain the current face candidate area Position information, input the current position information of the face candidate area and the near-infrared image into the first living body classification network to obtain the current living body position information, and further adjust the first candidate area generation network according to the difference between the current living body position information and the target living body position information Parameters, and return to the step of inputting the color image into the first candidate region generation network until convergence, and obtain the target candidate region generation network. According to the target candidate region generation network and the second training sample set, train the first living body classification network until convergence, and get The target living body classification network, and finally the trained target living body detection model is obtained according to the target candidate area generation network and the target living body classification network. Further, after the computer device 102 obtains the target living body detection model through training, it can be stored locally or sent to the computer device 104.

The computer device 102 and the computer device 104 may be, but are not limited to, various personal computers and notebook computers.

In some embodiments, as shown in FIG. 2, a training method of a living body detection model is provided. Taking the method applied to the above-mentioned computer device 102 as an example for description, the method includes the following steps:

Step 202: Obtain an initial living body detection model. The initial living body detection model includes an initial candidate region generation network and an initial living body classification network.

The initial living body detection model may be a predetermined model used for living body detection for training the living body detection model, and the initial living body detection model may be an untrained living body detection model or a living body detection model that has not been trained. The initial living body detection model includes the initial candidate region generation network and the initial living body classification network. The candidate initial candidate area is used to train to obtain the target candidate area generation network, the target candidate area generation network is used to extract the candidate area from the input image; the initial living body classification network is used to train the target living body classification network, and the target living body classification network is used according to the input The image is classified into living body to obtain the result of living body detection.

In some embodiments, before step 202, the following steps are further included:

First, the network structure information of the initial live detection model can be determined. Specifically, since the initial living body detection model includes the initial candidate region generating network and the initial living body classification network, the network structure information of the initial candidate region generating network and the network structure information of the initial living body classification network can be determined respectively.

It is understandable that the initial candidate region generation network and the initial living body classification network can be various neural networks. For this reason, the initial candidate region generation network and the initial living body classification network can be determined respectively which kind of neural network, including several layers of neurons, each How many neurons are in a layer, the connection sequence relationship between neurons in each layer, what parameters each layer of neurons includes, the type of activation function corresponding to each layer of neurons, and so on. It is understandable that for different neural network types, the network structure information that needs to be determined is also different.

Then, the parameter values of the network parameters of the initial candidate region generation network and the initial living body classification network in the initial living body detection model can be initialized. In some embodiments, each network parameter of the initial candidate region generation network and the initial living body classification network may be initialized with some different small random numbers. "Small random number" is used to ensure that the network will not enter a saturated state due to excessive weights, resulting in training failure, and "different" is used to ensure that the network can learn normally.

Step 204: Obtain a first training sample set and a second training sample set; the training samples corresponding to the second training sample set include a color image, a near-infrared image corresponding to the color image, and corresponding target living body position information.

The first training sample set and the second training sample set are both labeled image sample sets containing human faces. The training samples in the first training sample set (hereinafter referred to as the first training samples) include color images, target face images and corresponding target face candidate area location information. The color images refer to the RGB images collected by the camera under natural light. , The target face image refers to the image corresponding to the face area in the color image, and the location information of the target face candidate area refers to the position coordinates corresponding to the face area in the color image. It is understandable that the color image is the first training sample The corresponding input data, the target face image and the corresponding target face candidate region location information are the training labels corresponding to the first training sample.

The training sample corresponding to the second training sample set (hereinafter referred to as the second training sample) includes the color image, the near-infrared image corresponding to the color image, the target living body detection result and the corresponding target living body position information. It can be understood that the target living body The detection result and the corresponding target living body position information are the training labels corresponding to the second training sample. The target living body detection result is used to characterize whether the face in the face image to be detected is a living body face; the target living body position information refers to The position coordinates of the face image corresponding to the target living body detection result.

In some embodiments, the living body detection result may be a detection result identifier (for example, the number 1 or the vector (1,0)) used to characterize the human face in the face image, or it may be used to characterize the human face image. If the face in is not a live face, the detection result identifier (for example, the number 0 or the vector (0,1)); in other embodiments, the live detection result may also include that the face in the face image is a live person The probability of the face and/or the probability that the face in the face image is a non-living human face. For example, the live detection result may be a vector including a first probability and a second probability, and the first probability is used to characterize the person in the face image The probability that the face is a living face, and the second probability is used to represent the probability that the face in the face image is a non-living face.

Step 206: Train the initial candidate region generation network according to the first training sample set until convergence, and obtain the first candidate region generation network.

Specifically, the color image in the first training sample is input into the initial candidate area generation network, and the target face image corresponding to the color image and the corresponding target face candidate area position information are used as the desired output to perform the initial candidate area generation network. Training: During the training process, the parameters of the initial candidate region generation network are continuously adjusted, until the convergence condition is met, the training is stopped, and the currently trained candidate region generation network is obtained, that is, the first candidate region generation network. In some embodiments, the convergence condition may be that the training time exceeds the preset duration, the number of training times exceeds the preset number, and the difference between the actual output and the expected output is less than the difference threshold.

It is understandable that various methods can be used to train the initial candidate region generation network in this embodiment, for example, a BP (Back Propagation) algorithm or a SGD (Stochastic Gradient Descent) algorithm can be used.

Step 208: Train the initial living body classification network according to the first candidate region generation network and the second training sample set until it converges to obtain the first living body classification network.

Specifically, when training the initial living body classification network, it is necessary to fix the parameters of the currently trained candidate region generation network, that is, first input the color image in the second training sample into the first candidate region generation network to obtain the first candidate region generation network. A target face image and its corresponding first face candidate area location information, and then based on the first face candidate area location information, the near-infrared image corresponding to the color image in the second training sample, the target live body detection result, and the corresponding The target living body position information trains the initial living body classification network until the convergence condition is met, the training is stopped, and the currently trained living body classification network, namely the first living body classification network, is obtained.

During the training process, firstly, according to the position information of the first face candidate area, the image of the corresponding position is intercepted from the near-infrared image to obtain the image of the region of interest, and the image of the region of interest is input into the initial living body classification network, and the target living body detection result and the corresponding The target living body position information is used as the expected output to adjust the parameters of the initial living body classification network until the convergence condition is met, and the training ends.

Step 210: Input the color image into the first candidate area generation network to obtain the current position information of the face candidate area, and input the current position information of the face candidate area and the near-infrared image into the first living body classification network to obtain the current living body position information .

Specifically, by inputting the color image in the second training sample into the first candidate region generation network, the current face image corresponding to the color image and the current face candidate region location information corresponding to the current face image can be obtained. Further, Input the position information of the current face candidate area and the near-infrared image corresponding to the color image in the second training sample into the first living body classification network. The first living body classification network first intercepts the position of the current face candidate area from the near-infrared image Information corresponding to the image area, the region of interest image is obtained, and then the image of the region of interest is classified by the first living body classification network to obtain the current living body detection result and the corresponding current living body position information. The current living body position information is the sensitive The position coordinates obtained by the position regression of the image of the region of interest.

Step 212: Adjust the parameters of the first candidate region generating network according to the difference between the current living body position information and the target living body position information, and return to the step of inputting the color image into the first candidate region generating network until convergence, to obtain the target candidate region generating network .

The difference can be an error, and the error can be a mean absolute error (MAE), a mean squared error (MSE), or a root mean squared error (RMSE), etc.

Specifically, the cost function can be constructed according to the current living body position information and the error of the target living body position information, which is usually also called the loss function. It should be understood that the cost function is used to reflect the current living body position information and the target. The difference between the living body position information may include a regularization term for preventing overfitting. In this embodiment, since the location information of the face region in the candidate region generation network and the living body classification network are corresponding, the cost functions of the two are the same, and there is gradient back propagation, so it can be adjusted by minimizing the cost function of the living body classification network The candidate area generates the parameters of the network.

In some embodiments, the parameters of the first candidate region generation network can be adjusted by the gradient descent method. Specifically, the gradient determined according to the error of the current living body position information and the target living body position information (for example, the cost function versus the model parameter The partial derivative of) propagates back to the first candidate region generating network to adjust the parameters of the first candidate region generating network.

Repeat step 210-step 212 to train the first candidate region generation network for multiple times, until the convergence condition is met, stop training, and obtain a trained target candidate region generation network.

Step 214: Train the first living body classification network according to the target candidate region generation network and the second training sample set until convergence to obtain the target living body classification network, and obtain the trained target living body detection model according to the target candidate region generation network and the target living body classification network.

Specifically, the parameters of the target candidate region generation network are fixed, and the first living body classification network is trained through the second training sample set. First, the color image in the second training sample is input into the target candidate area generation network to obtain the second target face image and its corresponding second face candidate area location information, and then according to the second face candidate area location information, the first 2. In the training sample, the near-infrared image corresponding to the color image, the target live body detection result and the corresponding target live body position information train the first live body classification network, until the convergence condition is met, the training is stopped, and the currently trained live body classification network is obtained , Namely the target living body classification network.

During training, firstly, according to the position information of the second face candidate area, the image of the corresponding position is intercepted from the near-infrared image to obtain the image of the region of interest, and the image of the region of interest is input into the first living body classification network, and the target living body detection result and the corresponding The target living body position information is used as the desired output to adjust the parameters of the first living body classification network until the convergence condition is met, and the training ends.

After the target candidate region generation network and the target living body classification network are obtained, the output terminal of the target candidate region generation network is connected with the input terminal of the target living body classification network to obtain a trained target living body detection model.

In the training method of the above-mentioned living body detection model, the initial candidate region generation network is first trained to obtain the first candidate region generation network, and then the parameters of the first candidate region generation network are fixed, the initial living body classification network is trained, and the first living body classification network is obtained. According to the first candidate region generating network and the first living body classification network, the current living body position information is obtained, and the parameters of the first candidate region generating network are adjusted according to the difference between the current living body position information and the target living body position information to obtain the target candidate region generation. Network, the fixed target candidate region generation network continues to train the first living body classification network to obtain the target living body classification network, and finally the trained target living body detection model is obtained according to the target candidate region generating network and the target living body classification network. In this application, face detection and living body classification are integrated into one model, and an end-to-end model training method is adopted. During training, the loss of the living body classification network can be back propagated to the candidate region generation network, and the fit between the networks Compared with the two separate models in traditional technology, the accuracy of the obtained living body detection model has been significantly improved.

In some embodiments, the above method further includes: obtaining a target living body detection model; obtaining a to-be-detected color image and a to-be-detected near-infrared image corresponding to the face to be detected; and inputting the to-be-detected color image into a target candidate corresponding to the target living body detection model The region generation network obtains the position information of the target face candidate area; the position information of the target face candidate area and the near-infrared image to be detected are input into the target living body classification network corresponding to the target living body detection model, and the living body detection result is obtained.

The color image to be detected refers to the color image used for live detection to determine whether the face to be detected is a live face, and the near-infrared image to be detected refers to the color image used for live detection to determine whether the face to be detected is a live person Near infrared image of face.

In this embodiment, by inputting the color image to be detected into the target candidate area generation network, the target face image and the corresponding target face candidate area location information can be obtained, and the target face candidate area location information and the near-infrared The image is input to the target living body classification network. The target living body classification network can first intercept the image of the corresponding position from the near-infrared image to be detected according to the position information of the target face candidate area to obtain the image of the region of interest, and classify the image of the region of interest in vivo. Obtain the live body detection result corresponding to the face to be detected.

In the foregoing embodiment, since a relatively accurate end-to-end target living body detection model is used for living body detection, the accuracy of living body detection is improved.

In some embodiments, as shown in FIG. 3, the target candidate region generation network includes a first convolutional layer, a second convolutional layer, and a first pooling layer, and the color image to be detected is input to the target corresponding to the target living detection model. The candidate area generation network obtains the position information of the target face candidate area, including:

Step 302: Input the color image to be detected into a first convolution layer, and perform a convolution operation on the color image to be detected through the first convolution layer to obtain a first feature matrix.

Specifically, the target candidate region generation network includes at least one convolution layer, and the convolution layer performs a convolution operation on the color image to be detected to obtain the first feature matrix. Convolution operation refers to the operation of multiplying products using a convolution kernel. Convolution through the convolution kernel can reduce the feature dimension and express the local features of the image. Different convolution windows have different expression capabilities. The size of the convolution window is determined according to the latitude (embedding size) and filter width (filter width) of the feature vector corresponding to the image. The filter width is adjusted by experiment. In some embodiments, the filter width is selected as 3 and 4 respectively. , 5, 6, 7, 8 several values, assuming that the latitude of the feature vector is 128 dimensions, then the convolution window can be selected respectively 128*3, 128*4, 128*5, 128*6, 128*7, 128* 8. One convolution kernel corresponds to one output. For example, if there are 10 convolution kernels in the convolution layer, 10 outputs will be obtained after the effect of 10 convolution kernels, that is, a 10-dimensional first feature matrix is obtained.

Step 304: Input the first feature matrix into the first pooling layer, and project the largest weight in each vector in the first feature matrix through the first pooling layer to obtain a normalized second feature matrix.

Specifically, the target candidate region generation network includes at least one pooling layer. In some embodiments, the pooling layer adopts max-pooling, which is used to project the element with the largest energy in each vector obtained by the convolution layer (ie, the element with the largest weight) to the next layer. Input, the purpose of this is to ensure that the output of different feature vectors and different convolution kernels are normalized, and the maximum information is not lost. The first feature matrix is composed of multiple vectors, and the largest weight in each vector is projected to obtain a normalized second feature matrix.

Step 306: Input the second feature matrix into the second convolutional layer, and perform convolution calculation on the second feature matrix through the second convolutional layer to obtain position information of the target face candidate region.

Specifically, the candidate region generation network in this embodiment adopts Fully Convolutional Networks. After the image passes through the pooling layer, it is directly input into the second convolutional layer, and the second convolutional layer is used instead of the fully connected layer. Perform convolution calculation on the second feature matrix to obtain the target face image corresponding to the color image to be detected and the corresponding target face candidate region position information.

In the above embodiment, by using the convolutional layer instead of the fully connected layer, since the calculation of the convolution kernel is parallel and does not need to be read into the memory at the same time, the storage overhead can be saved and the candidate region generation network can be improved. The efficiency of face classification and position regression.

In some embodiments, inputting the position information of the target face candidate region and the near-infrared image to be detected into the target living body classification network corresponding to the target living body detection model to obtain the living body detection result includes: The corresponding region of interest image is intercepted from the near-infrared image to be detected, the region of interest image is input into the third convolution layer, and the third convolution layer performs convolution operation on the region of interest image to obtain the third feature matrix; The third feature matrix is input to the second pooling layer, and the largest weight in each vector in the third feature matrix is projected through the second pooling layer to obtain the normalized fourth feature matrix; the fourth feature matrix is input In the fourth convolutional layer, the fourth feature matrix is convolved and calculated through the fourth convolutional layer to obtain the live body detection result.

In this embodiment, the living body classification network adopts a full convolutional network, which includes at least one third convolutional layer, at least one fourth convolutional layer, and at least one second pooling layer. After intercepting the corresponding region of interest image from the near-infrared image to be detected according to the position information of the target face candidate region, first input the region of interest image into the third convolutional layer, and perform the convolution operation through the third convolutional layer In order to express the local characteristics, the third feature matrix is obtained, and then the third feature matrix is input to the second pooling layer connected with the third convolutional layer to obtain the fourth feature matrix. The largest weight in each vector in the feature matrix is projected, and the number of parameters is significantly reduced, so that the feature dimension can be reduced. Finally, the fourth feature matrix obtained is input to the fourth convolution connected to the second pooling layer. In the layer, the fourth feature matrix is subjected to convolution calculation through the fourth convolution layer to obtain the living body detection result and the corresponding living body position information. It can be understood that the living body position information here refers to the position information obtained by performing position regression on the image of the region of interest, and may be the position information corresponding to the living body face or the position information corresponding to the non-living body face. In this embodiment, since the full convolutional network is adopted, not only the storage overhead is saved, but the living body detection efficiency can also be improved.

In some embodiments, capturing the corresponding region of interest image from the near-infrared image to be detected according to the position information of the target face candidate region includes: according to a pre-calibrated camera parameter matrix, corresponding the position information of the target face candidate region to the target face candidate region. On the detection near-infrared image, locate the face position in the near-infrared image to be detected, and cut out the corresponding region of interest image according to the positioned face position.

In this embodiment, the dual camera modules are used to collect color images and near-infrared images, and the camera parameter matrix between the camera module corresponding to the color image and the camera module corresponding to the near-infrared image is pre-calibrated. After the generation network performs position regression to obtain the position information of the target face candidate area corresponding to the face to be detected, the position information of the target face candidate area can be matrix transformed according to the camera parameter matrix to obtain the corresponding position information in the near-infrared image. According to the position information, the position of the face is located from the near-infrared image, and the image area corresponding to the position of the face is intercepted to obtain the image of the region of interest.

In the above embodiment, by pre-calibrating the camera parameter matrix, the image of the region of interest can be accurately captured from the near-infrared image, thereby improving the efficiency and accuracy of living body detection.

In some embodiments, before acquiring the to-be-detected color image and the to-be-detected near-infrared image corresponding to the to-be-detected face, the above method further includes: using dual camera modules to collect the color image and the near-infrared image corresponding to the to-be-detected face, Perform face detection on the collected color images; when it is determined that a human face is detected according to the face detection results, the collected color image and near-infrared image are determined as the to-be-detected color image and the to-be-detected near-infrared image, respectively; When it is judged that no human face is detected according to the face detection result, return to the step of using the dual camera module to collect the color image and the near-infrared image corresponding to the face to be detected.

In this embodiment, after the color image and the near-infrared image are collected by the dual camera module, the color image is subjected to face detection. When a human face is detected in the color image, since the near-infrared image and the color image are collected at the same time, Therefore, the near-infrared image will inevitably also include the face area. Therefore, the color image and the near-infrared image collected at this time can be determined as the to-be-detected color image and the to-be-detected near-infrared image; on the contrary, if there is no color image If a human face is detected, the near-infrared image must not contain the face area. At this time, it is necessary to continue to collect the color image and the near-infrared image corresponding to the face to be detected to collect the human face that can be used for live detection Image.

In the above embodiment, the color image and the near-infrared image corresponding to the face to be detected are collected by the dual camera module. As long as the face detection is performed on the color image, it can be accurately determined whether the collected data can be used for living body detection. The image of the human face improves the efficiency of image collection, thereby improving the efficiency of living body detection.

It should be understood that, although the various steps in the flowchart of FIGS. 2-3 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless there is a clear description in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in Figure 2-3 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

In some embodiments, as shown in FIG. 4, a training device 400 for a living body detection model is provided, including:

The initial model acquisition module 402 is used to acquire an initial living body detection model, and the initial living body detection model includes an initial candidate region generation network and an initial living body classification network;

The training sample acquisition module 404 is used to acquire the first training sample set and the second training sample set; the training samples corresponding to the second training sample set include color images, near-infrared images corresponding to the color images, and corresponding target living body position information ；

The first training module 406 is configured to train the initial candidate region generation network according to the first training sample set until it converges to obtain the first candidate region generation network;

The second training module 408 is configured to train the initial living body classification network according to the first candidate region generation network and the second training sample set until it converges to obtain the first living body classification network;

The input module 410 is used to input the color image into the first candidate area generation network to obtain the current position information of the face candidate area, and input the current position information of the face candidate area and the near-infrared image into the first living body classification network to obtain the current Living body position information;

The parameter adjustment module 412 is used to adjust the parameters of the first candidate region generation network according to the difference between the current living body position information and the target living body position information, and return to the step of inputting the color image into the first candidate region generation network until convergence, to obtain the target Candidate area generation network;

The living body detection model obtaining module 414 trains the first living body classification network according to the target candidate area generation network and the second training sample set until convergence, and obtains the target living body classification network, and obtains the trained target according to the target candidate area generation network and the target living body classification network Live detection model.

In some embodiments, the above-mentioned device further includes: a living body detection module for obtaining a target living body detection model; obtaining a color image to be detected and a near-infrared image to be detected corresponding to a face to be detected; and inputting the color image to be detected into the target living body The target candidate area generation network corresponding to the detection model obtains the position information of the target face candidate area; the position information of the target face candidate area and the near-infrared image to be detected are input into the target living body classification network corresponding to the target living body detection model to obtain the living body detection result.

In some embodiments, the target candidate region generation network includes a first convolutional layer, a second convolutional layer, and a first pooling layer. The living body detection module is also used to input the color image to be detected into the first convolutional layer. The first convolution layer performs a convolution operation on the color image to be detected to obtain the first feature matrix; the first feature matrix is input into the first pooling layer, and each vector in the first feature matrix is transferred by the first pooling layer The largest weight is projected to obtain the normalized second feature matrix; the second feature matrix is input into the second convolution layer, and the second feature matrix is convolved through the second convolution layer to obtain the target face candidate area location information.

In some embodiments, the target living body classification network includes a third convolutional layer, a fourth convolutional layer, and a second pooling layer. The living body detection module is also used to obtain information from the near-infrared image to be detected based on the position information of the target face candidate area. The corresponding region of interest image is intercepted, the region of interest image is input into the third convolution layer, and the convolution operation is performed on the region of interest image through the third convolution layer to obtain the third feature matrix; the third feature matrix is input to the first In the second pooling layer, the largest weight in each vector in the third feature matrix is projected by the second pooling layer to obtain a normalized fourth feature matrix; the fourth feature matrix is input to the fourth convolutional layer In, the fourth feature matrix is convolved through the fourth convolution layer to obtain the result of living body detection.

In some embodiments, the living body detection module is also used to map the position information of the candidate face region of the target to the near-infrared image to be detected according to the pre-calibrated camera parameter matrix, and locate the position of the face in the near-infrared image to be detected, According to the located face position, the corresponding region of interest image is cut out.

In some embodiments, the above-mentioned device further includes: an image acquisition module, which is used to collect color images and near-infrared images corresponding to the face to be detected by using dual camera modules, and perform face detection on the collected color images; When the face detection result determines that a human face is detected, the collected color image and the near-infrared image are respectively determined as the to-be-detected color image and the to-be-detected near-infrared image; when it is determined that no human face is detected according to the face detection result, Return to the step of using the dual camera module to collect the color image and the near-infrared image corresponding to the face to be detected.

For the specific limitation of the training device of the living body detection model, please refer to the above definition of the training method of the living body detection model, which will not be repeated here. The various modules in the training device for the above-mentioned living body detection model can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

In one embodiment, a computer device is provided, and its internal structure diagram may be as shown in FIG. 5. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store training sample data. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instruction is executed by the processor to realize a training method of a living body detection model.

Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the method for training a living body detection model provided in any one of the embodiments of the present application is implemented step.

One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors implement any one of the embodiments of the present application. Provide the steps of the training method of the living body detection model.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.

The above-mentioned embodiments only express several implementation manners of the present application, and their description is relatively specific and detailed, but they should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

A training method of a living body detection model includes:

Acquiring an initial living body detection model, where the initial living body detection model includes an initial candidate region generation network and an initial living body classification network;

Acquiring a first training sample set and a second training sample set; the training samples corresponding to the second training sample set include a color image, a near-infrared image corresponding to the color image, and corresponding target living body position information;

Training the initial candidate region generation network according to the first training sample set until convergence, to obtain a first candidate region generation network;

Training the initial living body classification network according to the first candidate region generation network and the second training sample set until convergence, to obtain a first living body classification network;

The color image is input into the first candidate region generation network to obtain current face candidate region position information, and the current face candidate region position information and the near-infrared image are input into the first living body classification network In, get the current living body position information;

Adjust the parameters of the first candidate region generation network according to the difference between the current living body position information and the target living body position information, and return to the step of inputting the color image into the first candidate region generation network until convergence , Get the target candidate region generation network; and

Train the first living body classification network according to the target candidate region generation network and the second training sample set until convergence to obtain a target living body classification network, and obtain training based on the target candidate region generating network and the target living body classification network Good target live detection model.
The method according to claim 1, further comprising:

Acquiring the target living body detection model;

Obtain the to-be-detected color image and the to-be-detected near-infrared image corresponding to the face to be detected;

Inputting the to-be-detected color image into the target candidate region generation network corresponding to the target living body detection model to obtain position information of the target face candidate region; and

Inputting the position information of the target face candidate area and the near-infrared image to be detected into the target living body classification network corresponding to the target living body detection model to obtain a living body detection result.
The method according to claim 2, wherein the target candidate region generation network includes a first convolutional layer, a second convolutional layer, and a first pooling layer, and the color image to be detected is input to The target candidate region generation network corresponding to the target living body detection model to obtain the position information of the target face candidate region includes:

Input the to-be-detected color image into the first convolutional layer, and perform a convolution operation on the to-be-detected color image through the first convolutional layer to obtain a first feature matrix;

The first feature matrix is input into the first pooling layer, and the largest weight in each vector in the first feature matrix is projected by the first pooling layer to obtain a normalized second Feature matrix; and

The second feature matrix is input into the second convolutional layer, and the second feature matrix is subjected to convolution calculation by the second convolutional layer to obtain the position information of the target face candidate region.
The method according to claim 2, wherein the target living body classification network includes a third convolutional layer, a fourth convolutional layer, and a second pooling layer, and the position information of the target face candidate region And the input of the near-infrared image to be detected into the target living body classification network corresponding to the target living body detection model to obtain the living body detection result includes:

According to the position information of the target face candidate region, the corresponding region of interest image is intercepted from the near-infrared image to be detected, and the region of interest image is input into the third convolutional layer, and the third convolutional layer is passed through the third convolutional layer. Performing a convolution operation on the image of the region of interest to obtain a third feature matrix;

The third feature matrix is input into the second pooling layer, and the largest weight in each vector in the third feature matrix is projected by the second pooling layer to obtain a normalized fourth Feature matrix; and

The fourth feature matrix is input into the fourth convolutional layer, and the fourth feature matrix is subjected to convolution calculation through the fourth convolutional layer to obtain the living body detection result.
The method according to claim 4, wherein the intercepting a corresponding region of interest image from the near-infrared image to be detected according to the position information of the candidate face region of the target face comprises:

According to the pre-calibrated camera parameter matrix, the position information of the target face candidate area is mapped to the near-infrared image to be detected, the position of the face in the near-infrared image to be detected is located, and the person located is located The face position intercepts the corresponding region of interest image.
The method according to any one of claims 2 to 5, characterized in that, before said acquiring the to-be-detected color image and the to-be-detected near-infrared image corresponding to the to-be-detected face, the method further comprises:

Collecting the color image and the near-infrared image corresponding to the face to be detected by using the dual camera module, and performing face detection on the collected color image;

When it is determined that a human face is detected based on the face detection result, the collected color image and the near-infrared image are determined as the to-be-detected color image and the to-be-detected near-infrared image, respectively; and

When it is determined according to the face detection result that no human face is detected, return to the step of using the dual camera module to collect the color image and the near-infrared image corresponding to the face to be detected.
The method according to claim 1, characterized in that, before said obtaining the initial living body detection model, the method further comprises:

Determining the network structure information of the initial living body detection model;

Initializing the parameter values of the network parameters of the initial candidate region generation network and the initial living body classification network in the initial living body detection model.
A training device for a living body detection model includes:

An initial model acquisition module for acquiring an initial living body detection model, the initial living body detection model including an initial candidate region generation network and an initial living body classification network;

The training sample acquisition module is used to acquire a first training sample set and a second training sample set; the training samples corresponding to the second training sample set include a color image, a near-infrared image corresponding to the color image, and a corresponding target Living body position information;

The first training module is configured to train the initial candidate region generation network according to the first training sample set until it converges to obtain the first candidate region generation network;

The second training module is configured to train the initial living body classification network according to the first candidate region generation network and the second training sample set until it converges to obtain a first living body classification network;

The input module is used to input the color image into the first candidate area generation network to obtain current face candidate area position information, and input the current face candidate area position information and the near-infrared image into the In the first living body classification network, the current living body position information is obtained;

The parameter adjustment module is configured to adjust the parameters of the first candidate region generation network according to the difference between the current living body position information and the target living body position information, and return to input the color image to the first candidate region to generate The steps in the network until they converge, and the target candidate area is obtained to generate the network; and

The living body detection model obtaining module trains the first living body classification network according to the target candidate region generation network and the second training sample set until convergence to obtain the target living body classification network, and generates the network and the said target candidate region according to the target candidate region. The target living body classification network obtains a trained target living body detection model.
8. The device according to claim 8, wherein the device further comprises: a living body detection module for obtaining the target living body detection model; obtaining the color image to be detected and the near-infrared image to be detected corresponding to the face to be detected Input the color image to be detected into the target candidate region generation network corresponding to the target living body detection model to obtain the position information of the target face candidate region; combine the position information of the target face candidate region and the near-infrared to be detected The image is input into the target living body classification network corresponding to the target living body detection model, and the living body detection result is obtained.
A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:

Acquiring an initial living body detection model, where the initial living body detection model includes an initial candidate region generation network and an initial living body classification network;

Acquiring a first training sample set and a second training sample set; the training samples corresponding to the second training sample set include a color image, a near-infrared image corresponding to the color image, and corresponding target living body position information;

Training the initial candidate region generation network according to the first training sample set until convergence, to obtain a first candidate region generation network;

Training the initial living body classification network according to the first candidate region generation network and the second training sample set until convergence, to obtain a first living body classification network;

The color image is input into the first candidate region generation network to obtain current face candidate region position information, and the current face candidate region position information and the near-infrared image are input into the first living body classification network In, get the current living body position information;

Adjust the parameters of the first candidate region generation network according to the difference between the current living body position information and the target living body position information, and return to the step of inputting the color image into the first candidate region generation network until convergence , Get the target candidate region generation network; and

Train the first living body classification network according to the target candidate region generation network and the second training sample set until convergence to obtain a target living body classification network, and obtain training based on the target candidate region generating network and the target living body classification network Good target live detection model.
The computer device according to claim 10, wherein the processor further executes the following steps when executing the computer-readable instruction:

Acquiring the target living body detection model;

Obtain the to-be-detected color image and the to-be-detected near-infrared image corresponding to the face to be detected;

Inputting the to-be-detected color image into the target candidate region generation network corresponding to the target living body detection model to obtain position information of the target face candidate region; and

Inputting the position information of the target face candidate area and the near-infrared image to be detected into the target living body classification network corresponding to the target living body detection model to obtain a living body detection result.
The computer device according to claim 11, wherein the target candidate region generation network includes a first convolutional layer, a second convolutional layer, and a first pooling layer, and the processor executes the computer-readable The following steps are also performed when ordering:

Input the to-be-detected color image into the first convolutional layer, and perform a convolution operation on the to-be-detected color image through the first convolutional layer to obtain a first feature matrix;

The first feature matrix is input into the first pooling layer, and the largest weight in each vector in the first feature matrix is projected by the first pooling layer to obtain a normalized second Feature matrix; and

The second feature matrix is input into the second convolutional layer, and the second feature matrix is subjected to convolution calculation by the second convolutional layer to obtain the position information of the target face candidate region.
The computer device according to claim 11, wherein the target living body classification network includes a third convolutional layer, a fourth convolutional layer, and a second pooling layer, and the processor executes the computer-readable instructions It also performs the following steps:

According to the position information of the target face candidate region, the corresponding region of interest image is intercepted from the near-infrared image to be detected, and the region of interest image is input into the third convolutional layer, and the third convolutional layer is passed through the third convolutional layer. Performing a convolution operation on the image of the region of interest to obtain a third feature matrix;

The third feature matrix is input into the second pooling layer, and the largest weight in each vector in the third feature matrix is projected by the second pooling layer to obtain a normalized fourth Feature matrix; and

The fourth feature matrix is input into the fourth convolutional layer, and the fourth feature matrix is subjected to convolution calculation through the fourth convolutional layer to obtain the living body detection result.
The computer device according to claim 13, wherein the processor further executes the following steps when executing the computer-readable instruction:

According to the pre-calibrated camera parameter matrix, the position information of the target face candidate area is mapped to the near-infrared image to be detected, the position of the face in the near-infrared image to be detected is located, and the person located is located The face position intercepts the corresponding region of interest image.
One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:

Acquiring an initial living body detection model, where the initial living body detection model includes an initial candidate region generation network and an initial living body classification network;

Acquiring a first training sample set and a second training sample set; the training samples corresponding to the second training sample set include a color image, a near-infrared image corresponding to the color image, and corresponding target living body position information;

Training the initial candidate region generation network according to the first training sample set until convergence, to obtain a first candidate region generation network;

Training the initial living body classification network according to the first candidate region generation network and the second training sample set until convergence, to obtain a first living body classification network;

The color image is input into the first candidate region generation network to obtain current face candidate region position information, and the current face candidate region position information and the near-infrared image are input into the first living body classification network In, get the current living body position information;

Adjust the parameters of the first candidate region generation network according to the difference between the current living body position information and the target living body position information, and return to the step of inputting the color image into the first candidate region generation network until convergence , Get the target candidate region generation network; and

Train the first living body classification network according to the target candidate region generation network and the second training sample set until convergence to obtain a target living body classification network, and obtain training based on the target candidate region generating network and the target living body classification network Good target live detection model.
The storage medium according to claim 15, wherein the following steps are further executed when the computer-readable instructions are executed by the processor:

Acquiring the target living body detection model;

Obtain the to-be-detected color image and the to-be-detected near-infrared image corresponding to the face to be detected;

Inputting the to-be-detected color image into the target candidate region generation network corresponding to the target living body detection model to obtain position information of the target face candidate region; and

Inputting the position information of the target face candidate area and the near-infrared image to be detected into the target living body classification network corresponding to the target living body detection model to obtain a living body detection result.
The storage medium according to claim 16, wherein the target candidate region generation network includes a first convolutional layer, a second convolutional layer, and a first pooling layer, and the computer-readable instructions are processed by the The following steps are also performed when the device is executed:

Input the to-be-detected color image into the first convolutional layer, and perform a convolution operation on the to-be-detected color image through the first convolutional layer to obtain a first feature matrix;

The first feature matrix is input into the first pooling layer, and the largest weight in each vector in the first feature matrix is projected by the first pooling layer to obtain a normalized second Feature matrix; and

The second feature matrix is input into the second convolutional layer, and the second feature matrix is subjected to convolution calculation by the second convolutional layer to obtain the position information of the target face candidate region.
The storage medium according to claim 16, wherein the target living body classification network includes a third convolutional layer, a fourth convolutional layer, and a second pooling layer, and the computer-readable instructions are executed by the processor The following steps are also performed during execution:

According to the position information of the target face candidate region, the corresponding region of interest image is intercepted from the near-infrared image to be detected, and the region of interest image is input into the third convolutional layer, and the third convolutional layer is passed through the third convolutional layer. Performing a convolution operation on the image of the region of interest to obtain a third feature matrix;

The third feature matrix is input into the second pooling layer, and the largest weight in each vector in the third feature matrix is projected by the second pooling layer to obtain a normalized fourth Feature matrix; and

The fourth feature matrix is input into the fourth convolutional layer, and the fourth feature matrix is subjected to convolution calculation through the fourth convolutional layer to obtain the living body detection result.
18. The storage medium according to claim 18, wherein the following steps are further executed when the computer-readable instructions are executed by the processor:

According to the pre-calibrated camera parameter matrix, the position information of the target face candidate area is mapped to the near-infrared image to be detected, the position of the face in the near-infrared image to be detected is located, and the person located is located The face position intercepts the corresponding region of interest image.
The storage medium according to claims 16 to 19, wherein the following steps are further executed when the computer-readable instructions are executed by the processor:

Collecting the color image and the near-infrared image corresponding to the face to be detected by using the dual camera module, and performing face detection on the collected color image;

When it is determined that a human face is detected based on the face detection result, the collected color image and the near-infrared image are determined as the to-be-detected color image and the to-be-detected near-infrared image, respectively; and

When it is determined according to the face detection result that no human face is detected, return to the step of using the dual camera module to collect the color image and the near-infrared image corresponding to the face to be detected.