CN117746482A

CN117746482A - Training method, detection method, device and equipment of face detection model

Info

Publication number: CN117746482A
Application number: CN202311766835.3A
Authority: CN
Inventors: 王珂尧; 张国生
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-03-22

Abstract

The training method, the detection device and the equipment of the face detection model are applied to the technical field of artificial intelligence, and particularly relate to the technical fields of computer vision, deep learning, face recognition and the like, and can be applied to scenes such as smart cities, smart finance and the like. The training method comprises the following steps: obtaining a training set, wherein the training set comprises: a first image, a second image, and a third image; the first image is an image meeting preset conditions; the second image has false attributes, and the false attributes of the second image can be detected by using low-order image features; the third image has false attributes, and the false attributes of the third image cannot be obtained through low-order image feature detection; training the first model according to the first image and the second image to obtain a second model; and training the second model according to the first image and the third image to obtain a face detection model.

Description

Training method, detection method, device and equipment of face detection model

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, deep learning, face recognition and the like, and can be applied to scenes such as smart cities, smart finance and the like; and in particular, to a training method, a detection method, a device and equipment for a face detection model.

Background

Currently, user identity verification is required to be performed by collecting face images in real time in different scenes such as security, finance and the like. In the face detection process, whether the current image is based on real person shooting (i.e., face living body detection), whether the currently acquired image is falsified synthesized (i.e., face synthesis detection), and the like are detected to ensure the authenticity of the acquired face image.

Therefore, how to accurately detect the face is a problem to be solved.

Disclosure of Invention

The disclosure provides a training method, a detection method, a device and equipment for a face detection model, so as to improve the accuracy of a face detection recognition result.

According to a first aspect of the present disclosure, there is provided a training method of a face detection model, including:

acquiring a training set; wherein the training set comprises: at least one first image, at least one second image, and at least one third image; the first image is an image meeting preset conditions; the second image has false attributes, and the false attributes of the second image can be detected by using low-order image features; the third image has false attributes, and the false attributes of the third image cannot be obtained through low-order image feature detection; the false attribute is an attribute which does not accord with the preset condition;

Training the first model according to the first image and the second image to obtain a second model;

and training the second model according to the first image and the third image to obtain a face detection model.

According to a second aspect of the present disclosure, there is provided a method for detecting a face image, including:

acquiring an image to be detected;

carrying out face detection processing on the image to be detected according to a face detection model to obtain a detection result; wherein the face detection model is trained according to the method described in the first aspect; and the detection result is used for representing whether the image to be detected accords with a preset condition or not.

According to a third aspect of the present disclosure, there is provided a training apparatus of a face detection model, including:

the first acquisition unit is used for acquiring a training set; wherein the training set comprises: at least one first image, at least one second image, and at least one third image; the first image is an image meeting preset conditions; the second image has false attributes, and the false attributes of the second image can be detected by using low-order image features; the third image has false attributes, and the false attributes of the third image cannot be obtained through low-order image feature detection; the false attribute is an attribute which does not accord with the preset condition;

The first training unit is used for training the first model according to the first image and the second image to obtain a second model;

and the second training unit is used for training the second model according to the first image and the third image to obtain a face detection model.

According to a fourth aspect of the present disclosure, there is provided a face image detection apparatus, including:

the second acquisition unit is used for acquiring the image to be detected;

the processing unit is used for carrying out face detection processing on the image to be detected according to the face detection model to obtain a detection result; wherein the face detection model is trained according to the apparatus described in the third aspect; and the detection result is used for representing whether the image to be detected accords with a preset condition or not.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect; or to enable the at least one processor to perform the method of the second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of the first aspect; alternatively, the computer instructions are for causing the computer to perform the method of the second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising: a computer program stored in a readable storage medium, from which at least one processor of an electronic device can read, the at least one processor executing the computer program causing the electronic device to perform the method of the first aspect; alternatively, execution of the computer program by the at least one processor causes the electronic device to perform the method of the second aspect.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flow chart of a training method of a face detection model according to an embodiment of the disclosure;

fig. 2 is a flowchart of a training method of a face detection model according to another embodiment of the disclosure;

fig. 3 is a flowchart of another training method of a face detection model according to an embodiment of the disclosure;

FIG. 4 is a schematic diagram of a model training provided by an embodiment of the present disclosure;

fig. 5 is a flowchart of a method for detecting a face image according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a training device for a face detection model according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a training device for a face detection model according to another embodiment of the disclosure;

fig. 8 is a schematic structural diagram of a face image detection device according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure;

fig. 10 is a block diagram of an electronic device used to implement a training method of a face detection model, or a detection method of a face image, in accordance with an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the related art, when detecting the authenticity of a face, two models are usually trained, and one model is used to determine whether a currently acquired image is photographed by a real person. Another model is used to detect if the current image is a tampered composite image. When the image to be detected passes the detection of the two models, the image to be detected is considered to be a real image shot by a real person. However, the above method is less efficient.

In one example, training sets used as in vivo and synthetic tests are mixed together to train the same model, however, the manner in which the training sets are directly mixed tends to result in lower accuracy in recognition of the final model.

To avoid at least one of the above technical problems, the inventors of the present disclosure have creatively worked to obtain the inventive concept of the present disclosure: dividing the training set into three types, wherein one type is a first image which accords with preset conditions; the remaining images that do not meet the preset condition (i.e., have a false attribute) are further divided into a second image and a third image. The second image and the third image differ in whether the image has a spurious attribute or not can be determined based on its corresponding low-order image feature. Furthermore, after the above-mentioned distinction is performed, model training may be performed in batches, that is, training is performed based on the second image and the first image, and then model training is performed based on the first image and the third image on the basis of the model obtained by the first batch training, so as to improve accuracy of the face detection model.

The present disclosure provides a training method, a detection method, a device and equipment for a face detection model, which are applied to the technical fields of computer vision, deep learning, face recognition and the like in artificial intelligence technology, and can be applied to the fields of smart city, smart finance and the like, so as to improve the accuracy of face detection.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

Fig. 1 is a flow chart of a training method of a face detection model according to an embodiment of the disclosure, as shown in fig. 1, where the method includes:

s101, acquiring a training set; wherein the training set comprises: at least one first image, at least one second image, and at least one third image; the first image is an image meeting preset conditions; the second image has false attributes, and the false attributes of the second image can be detected by using low-order image features; the third image has false attributes, and the false attributes of the third image cannot be obtained through low-order image feature detection; the false attribute is an attribute which does not meet a preset condition.

The execution body of the embodiment may be a training device of a face detection model, and the training device of the face detection model may be a server (such as a local server or a cloud server), may also be a computer, may also be a processor, may also be a chip, or the like, which is not limited in this embodiment.

In this embodiment, the images included in the training set are classified. And taking the face image meeting the preset condition as a first image. Further, the face images which do not meet the preset conditions are further divided into the category to which the second image belongs and the category to which the third image belongs.

Wherein the second image and the third image are images that do not meet a preset condition (i.e., have a false attribute). But the spurious attribute of the second image is identifiable based on the low-order image features to which the second image corresponds. The false attribute of the third image cannot be identified by the low-order image feature corresponding to the third image. The low-order image features may be understood as features in the image that can be automatically extracted from the image without shape information (information of spatial relationship), such as edge features, contour features, colors, textures, and shape features corresponding to the image. That is, the difficulty of identifying the false attribute of the second image is low, and the difficulty of identifying the false attribute of the third image is high, that is, the false attribute corresponding to the image needs to be determined by adopting the high-order image feature corresponding to the image.

It should be noted that, the higher-order image features in this embodiment may be understood as semantic features corresponding to the image.

In addition, the preset condition in the embodiment may be determined according to the purpose of face detection, and if the face detection model is a model that needs to be detected in a living body, the preset condition may be a face image captured by the living body. If the face detection model is a model which needs to detect whether the face image is a synthetic image, the preset condition is a non-synthetic face image. In the present embodiment, the above-described preset conditions are not particularly limited.

S102, training the first model according to the first image and the second image to obtain a second model.

Illustratively, in this embodiment, the face detection model is divided into two phases. In the first stage, the second model obtained through training is enabled to accurately identify false attributes of the second image through training the first model.

That is, the first image and the second image in the training set are used as the training set for training the first model in this stage in this embodiment.

In one example, during the training of the first model, the loss function corresponding to the first model may be determined based on the detection result output by the first model and the category to which the image currently input into the first model belongs (i.e., whether the preset condition is met), and parameters included in the first model are continuously adjusted until the model converges or reaches the preset training stop condition, so as to obtain the second model.

And S103, training the second model according to the first image and the third image to obtain a face detection model.

Illustratively, in this embodiment, after the second model is obtained, the second model may be further trained to identify the capability of the third image to have false attributes based on the first image and the third image in the training set.

In one example, in the training process of the second model, parameters included in the second model may be adjusted until the model converges or a preset training stop condition is reached in combination with an image category to which an image currently input to the second model belongs (for example, whether the image category here may be a preset condition is met or not) and a detection result output correspondingly by the current second model, so as to obtain a final face detection model.

In one example, the face detection model obtained in the present embodiment may be used to detect into which image category the input image may be divided. The image category may be understood as a category corresponding to the first image, a category corresponding to the second image, and a category corresponding to the third image.

It will be appreciated that in this embodiment, the model is trained by first using the first image and the second image, so that the model obtained at this stage has a higher detectability in determining whether the image has false properties based on the low-order image features. And then, combining the first image and the third image to perform model training so that the model further has the capability of detecting the false attribute of the third image with higher recognition difficulty. Furthermore, the accuracy of the face detection result of the obtained model is improved by the above-mentioned dividing mode of the training set and the two-stage training mode.

Fig. 2 is a flow chart of a training method of a face detection model according to another embodiment of the disclosure, as shown in fig. 2, the method includes:

s201, acquiring a training set; wherein the training set comprises: at least one first image, at least one second image, and at least one third image; the first image is an image meeting preset conditions; the second image has false attributes, and the false attributes of the second image can be detected by using low-order image features; the third image has false attributes, and the false attributes of the third image cannot be obtained through low-order image feature detection; the false attribute is an attribute which does not meet a preset condition.

The technical principle of step S201 may be referred to step S101, and will not be described herein.

In one example, X convolution layers are provided in the first model; the first K convolution layers in the X convolution layers are used as a first extraction module; the latter Y convolution layers in the X convolution layers are used as a second extraction module; wherein X, K, Y is a positive integer, and the value of X is greater than or equal to the sum result of K and Y; the first extraction module is used for extracting low-order image features of the image; the second extraction module is used for extracting high-order image features of the image.

Illustratively, in the present embodiment, a first extraction module and a second extraction module are provided in the first model. The first extraction module may be understood as a model component in the first model for extracting low-order image features of the image input into the model. While the second extraction module may be understood as a model component in the first model for extracting higher-order image features of the image input into the first model.

In addition, in this embodiment, X sequentially connected convolution layers may be provided, where a previous K convolution layer of the X convolution layers may be used as the first extraction module in the first model. And then, taking the last Y convolution layers in the X convolution layers as a second extraction module for extracting the high-order image features.

It can be understood that in this embodiment, the first K convolution layers of the plurality of serially connected convolution layers are used as the first extraction module, and the last Y convolution layers of the plurality of serially connected convolution layers are used as the second extraction module, so that the detection result corresponding to the image can be determined by analyzing the high-order image feature and the low-order image feature in the image, and the network structure is simpler and easy to implement.

S202, training and correcting parameters of the first extraction module according to the first image and the second image to obtain a second model. Wherein the first model comprises a first extraction module; the first extraction module is used for extracting low-order image features of the image.

In this embodiment, when the first model is trained according to the first image and the second image, the second image is an image that can identify the false attribute based on the low-order image feature, so in this embodiment, the parameter corresponding to the first extraction module for extracting the low-order image feature in the first model may be continuously corrected in the training stage until the preset model training stopping condition is met. It should be noted that, in this embodiment, the model parameters used for extracting the remaining image features in the first model need not be adjusted.

In one example, when executing the step S202, the detection result output by the first model and the category corresponding to the image corresponding to the first model that is currently input to the first model may be directly determined to determine the loss function, so as to continuously adjust the parameter corresponding to the first extraction module, so as to obtain the second model.

It can be appreciated that in this embodiment, during the training of the first model, only the parameters of the first extraction module in the first model are adjusted, so as to improve the accuracy of the first extraction module in identifying the low-order image features.

In one example, step S202 includes the steps of:

A first step of step S202: an image to be trained is selected from the first image and the second image.

In this embodiment, when the first model is trained by combining the first image and the second image, the image to be trained may be selected from the first image and the second image included in the training set.

A second step of step S202: inputting an image to be trained into a first model to obtain a first detection result and a first feature extraction result; the first detection result represents whether the image to be trained accords with preset conditions or not; the first feature extraction result is the feature in the image to be trained identified by the first extraction module.

For example, after the image to be trained is input to the first model, a result output by the first model for determining whether the image to be trained meets a preset condition may be used as the first detection result. In addition, since the first extraction module for extracting the low-order image features of the image is further provided in the first model, a processing result obtained when the first extraction module performs low-order image feature extraction on the image to be trained can be further used as a first feature extraction result.

Third step of step S202: extracting a second feature extraction result of the image to be trained; the second feature extraction result is a low-order image feature determined based on the preset feature extractor.

In this embodiment, the low-order image feature processing is performed on the image to be trained based on the preset feature extractor, so as to obtain a second feature extraction result corresponding to the image to be trained. It should be noted that the preset feature extractor in this embodiment is independent of the model in this embodiment. For example, the preset feature extractor may be an edge feature extraction operator provided in the related art.

Fourth step of step S202: and carrying out parameter correction on the first extraction module according to the first detection result, the first feature extraction result and the second feature extraction result to obtain a second model.

For example, after the first detection result, the first feature extraction result, and the second feature extraction result are obtained, the parameters corresponding to the first extraction module may be adjusted in combination with the above, so as to obtain an adjusted second model. For example, the first loss function may be determined by combining the first detection result and the class (i.e. whether the class meets the preset condition) corresponding to the image to be trained. And obtaining a second loss function according to the first feature extraction result and the second feature extraction result. And then, combining the first loss function and the second loss function to carry out parameter correction on the first extraction module.

It can be appreciated that in this embodiment, in the stage of training the first model, the preset feature extractor is introduced to perform low-order image feature extraction on the image to be trained, so that the second feature extraction result obtained can be combined with the first feature extraction result and the first detection result obtained in the model training process, and parameters of the first model are adjusted together, so that the low-order image feature extracted by the first extraction module in the first model is more accurate, and the accuracy of the finally obtained model is improved.

In one example, the fourth step of step S202 may be implemented as follows: sampling the first feature extraction result to obtain a first processing result; the matrix dimension of the first processing result is the same as the matrix dimension of the second feature extraction result; and carrying out parameter correction on the first extraction module according to the first processing result, the second characteristic extraction result and the first detection result to obtain a second model.

For example, in this embodiment, in order to avoid that the dimensions of the matrices corresponding to the first feature extraction result and the second feature extraction result obtained based on the first model are different, that is, the data sizes corresponding to the two feature extraction results are different, after the first feature extraction result is obtained, sampling processing, for example, downsampling processing and/or upsampling processing, is further performed on the first feature extraction result, so that the first processing result obtained after the sampling processing has the same size as the second feature extraction result, so as to construct a loss function later. After the obtained first processing result, the first extraction module of the first model can be further subjected to parameter adjustment by combining the first detection result and the second feature extraction result so as to obtain a second module.

It can be understood that in this embodiment, the first feature extraction result may be sampled to ensure that the size of the first processing result and the size of the second feature extraction result are the same, which is beneficial to improving accuracy of model parameter adjustment.

In one example, sampling the first feature extraction result to obtain a first processing result, including: performing downsampling processing on the first feature extraction result based on N preset convolution layers to obtain a second processing result; based on M preset convolution layers, sequentially carrying out up-sampling treatment on the second treatment result according to the first characteristic extraction result to obtain a first treatment result; wherein N, M, K is a positive integer.

In this embodiment, when the first feature extraction result obtained by the first extraction module is sampled, the first feature extraction result may be sequentially downsampled based on the N preset convolution layers, and then a second processing result may be obtained. And then, sequentially carrying out up-sampling processing on the second processing result according to the obtained second processing result and the first feature extraction result so as to obtain a first processing result with the same matrix dimension as the second feature extraction result. In this embodiment, when the second processing result is up-sampled, the up-sampling processing is further performed in combination with the first feature extraction result obtained before, so as to improve the accuracy of the sampling result. In addition, it should be noted that, the preset convolution layer in this embodiment is not a model component included in the face detection model, but is only a rest of processing structures for assisting model training introduced in the model training process.

It will be appreciated that in this embodiment, the first feature extraction result may be adjusted to the same size as the second feature extraction result by combining the downsampling and upsampling processes. In addition, in the up-sampling process, the matrix dimension corresponding to the result output by each preset convolution layer is continuously enlarged by combining the first feature extraction result obtained before, so that the accuracy of the finally obtained first processing result is improved.

In one example, the third step of step S202 may be implemented by:

performing image size transformation processing on the image to be trained to obtain a transformed image to be trained; and carrying out feature extraction processing on the transformed image to be trained based on a preset feature extractor to obtain a second feature extraction result.

For example, in this embodiment, in order to ensure that the matrix dimensions of the first feature extraction result and the second feature extraction result are the same, before feature extraction is performed based on the preset feature extractor, the image size conversion processing may be performed on the image to be trained, and then the feature extraction processing may be performed on the image obtained after the transformation based on the preset feature extractor, so as to obtain a final second feature extraction result.

It can be understood that, in this embodiment, the size of the second feature extraction result and the size of the first feature extraction result are the same in a manner of extracting the low-order image feature corresponding to the converted image to be trained after the image size conversion process is performed on the image to be trained, so that a loss function can be constructed based on the first feature extraction result and the second feature extraction result, and model parameter adjustment can be performed by combining the obtained loss function.

S203, training and correcting parameters of a second extraction module in the second model according to the first image and the third image to obtain a face detection model; the face detection model is used for identifying whether the image accords with preset conditions. Wherein the first model further comprises a second extraction module; the second extraction module is used for extracting high-order image features of the image.

In this embodiment, when the first image and the third image are combined, the parameters of the second extraction module for extracting the high-order image features included in the second model are adjusted, that is, the parameters of the first extraction module included in the first model are not adjusted in this stage, but only the parameters of the second extraction module for extracting the high-order image features in the model are adjusted and trained, so that the accuracy of the face detection model for extracting the high-order image features is improved, and the accuracy of the face detection model obtained finally is further improved. It should be noted that the first model is provided with a first extraction module and a second extraction module. The second model obtained by training the first model also comprises a first extraction module and a second extraction module after training. And, this step is trained on the basis of the second model. In addition, the face detection model obtained through the training method in this embodiment may be specifically used to identify whether the image input into the model meets the preset condition.

In this embodiment, in the process of training the first model, the method is only used for adjusting parameters of the first extraction module in the first model, so as to improve accuracy of identifying low-order image features by the first extraction module. In the second training stage, the first image and the third image can be combined, and when the second model is trained, parameters of a second extraction module for extracting high-order image features contained in the second model are adjusted, so that the accuracy of finally obtaining the face detection image is improved.

Fig. 3 is a flow chart of another training method of a face detection model according to an embodiment of the disclosure, as shown in fig. 3, the method includes:

s301, acquiring a training set; wherein the training set comprises: at least one first image, at least one second image, and at least one third image; the first image is an image meeting preset conditions; the second image has false attributes, and the false attributes of the second image can be detected by using low-order image features; the third image has false attributes, and the false attributes of the third image cannot be obtained through low-order image feature detection; the false attribute is an attribute which does not meet a preset condition.

In one example, the images contained in the training set may also be preprocessed before they are used as model training. For example, a face region corresponding to a face in an image is first determined. And then, carrying out face key points on the corresponding faces in the face area to obtain key point coordinate values of the faces contained in the image. And then, carrying out face alignment processing on the faces of the images according to the key point coordinate values of the faces, and aligning the faces in the images. Meanwhile, the face image containing the face after the correction can be intercepted based on affine transformation processing, so that the face image obtained after the interception meets the input data requirement of the model. And normalization processing can be performed on the intercepted face image so as to reduce the calculated amount in the subsequent model processing process.

In one example, the preset condition is a face image that is photographed in a living body and is not subjected to the composition processing.

The preset condition in this embodiment is that the face image is obtained by live shooting and has not undergone tamper synthesis processing. That is, the face detection model trained in the present embodiment can be used to recognize whether or not an image is live-shot, and can also be used to recognize whether or not an image has been tampered with by composition.

Further, under the preset condition, the second image may include: a second image of the first type and a second image of the second type. Wherein the first type is an image photographed by a non-living body, for example, the second image of the first type may be: an image of an electronic screen including edges (i.e., an image obtained by photographing a face in a display device), a face image obtained by photographing a face on a sheet of paper, a face image obtained by photographing a paper mask, an image obtained by photographing a document, or the like, which is obtained by non-living photographing, can be recognized as whether or not it is live photographing, i.e., whether or not a preset condition is not met, based on low-order image features such as a significant edge feature of an image background. The second type is a synthetic counterfeit image, for example, the second image of the second type may be: the face A is buckled out of the face image attached to the face B through the image synthesis software, and whether the image is pseudo-synthesized or not can be identified according to the low-order image characteristics.

Further, the third image may include: a third image of the first type and a third image of the second type. The third image of the second type may be a synthetic image obtained by forging a Face-changing method such as FaceSwap, face Face, and the like, and it is necessary to combine with an attack of high-order features of the image to identify whether the synthetic image is forged. The third image of the first type may be: obtained by a three-dimensional head model, a three-dimensional mask, or the like, and it is necessary to recognize whether or not an image photographed by a living body is one in combination with high-dimensional image features.

It can be understood that, by combining the preset conditions and the setting mode of the training set, the face detection model obtained by training can be enabled to perform face living detection and face synthesis detection, compared with the mode that two models are required to be used for detection in the related technology, the embodiment can employ the same model for image detection, and the accuracy of model training can be improved by combining the mode that two subsequent stages are used for training.

S302, training a first model according to the first image, the second image and the third image to obtain a trained first model; the trained first model is obtained by carrying out parameter adjustment on a first extraction module and a second extraction module in the first model; the first extraction module is used for extracting low-order image features of the image; the second extraction module is used for extracting high-order image features of the image.

In this embodiment, before the first model is trained by combining the first image and the second image, the first model may be trained and parameters of the first model may be adjusted by combining the images of each category (i.e., the first image, the second image, and the third image) included in the training set, and parameters corresponding to the first extraction module and the second extraction module in the first model may be adjusted by combining the training result in this stage.

After the overall parameters of the first extraction module and the second extraction module in the first model are adjusted, in the subsequent steps S303 and S304, the model can be trained by combining different images in stages and pertinently on the basis of the model obtained in the step S302, so as to improve the accuracy of model training.

It can be understood that, in this embodiment, before the model is trained in stages by using different training images, the training set may first train the images of each class, and based on the model obtained in step S302, the training process in step S303 is further performed, so as to improve the training efficiency in the subsequent model training stage.

S303, training the trained first model according to the first image and the second image to obtain a second model.

S304, training the second model according to the first image and the third image to obtain a face detection model; the face detection model is used for identifying whether the image accords with preset conditions.

For example, steps S303 to S304 provided in this embodiment may refer to steps S102 to S103 in fig. 1, or steps S202 to S203 in fig. 2, which are not described in detail in this embodiment.

In this embodiment, the training process of the face detection model may be divided into three stages, and before the first extraction module and the second extraction module in the model are respectively trained, the first extraction module and the second extraction module in the model may be combined with each type of image in the training set to perform parameter adjustment processing first, and then, the subsequent extraction modules are respectively trained independently, so as to improve accuracy of the obtained face detection model.

For example, fig. 4 is a schematic diagram of model training provided in an embodiment of the present disclosure. As shown in fig. 4, the model training process in this embodiment can be divided into three stages. And a plurality of convolution layers are disposed in the first model. The first K convolutional layers in the first model serve as the first extraction module in the model. The last Y convolutional layers in the first model serve as the second extraction module in the model. Alternatively, the first model may employ a network structure of ResNet-18.

In the first stage, the first image, the second image and the third image in the training set are used as the training set 1 corresponding to the first training stage. After face region detection, face key point detection, face alignment and image preprocessing are sequentially carried out on images in the training set 1, the preprocessed images are input into a first model provided with a first extraction module and a second extraction module, and parameters of the first extraction module and the second extraction module of the first model are adjusted to obtain a trained first model.

In the second phase, the first image and the second image in the training set are taken as the corresponding training set 2 in the second phase. After face region detection, face key point detection, face alignment and image preprocessing are sequentially carried out on images in the training set 1, the preprocessed images are input into a trained first model obtained through first-stage training, parameters of a second extraction module are fixed in the training stage, and only the parameters in the first extraction module are adjusted, so that a second model is obtained. And in the training process, the method can further set N preset convolution layers, sequentially perform downsampling processing on the final output result of the ground extraction module, and sequentially perform upsampling processing on the second processing result according to the first feature extraction result on the basis of the M preset convolution layers, so as to obtain a first processing result. And then, carrying out low-order image feature extraction processing on the images in the training set 2 based on a preset feature extractor to obtain a second feature extraction result. And then, the first extraction module is adjusted by combining the second characteristic extraction result, the first processing result and the detection result output by the model.

For example, in practical application, the K convolutional layers in the first extraction module, the N preset convolutional layers for downsampling, and the M preset convolutional layers for upsampling may be connected by using a UNet network architecture to obtain the first processing result.

In the third stage, the first image and the third image in the training set may be used as the training set 3 corresponding to the first training stage. The face detection model is obtained by sequentially carrying out face region detection, face key point detection, face alignment and image preprocessing on the images in the training set 3, inputting the preprocessed images into a second model, and adjusting parameters of a second extraction module in the second model.

Fig. 5 is a flowchart of a face image detection method according to an embodiment of the present disclosure. As shown in fig. 5, the method includes:

s501, acquiring an image to be detected.

S502, carrying out face detection processing on an image to be detected according to a face detection model to obtain a detection result; the face detection model is trained according to the method in any embodiment; the detection result is used for representing whether the image to be detected accords with preset conditions or not.

In this embodiment, the image to be detected may be input to the face detection model, and the result output by the face detection model may be used as the face detection result, so as to determine whether the current image to be detected meets the preset condition.

It can be understood that the face detection model obtained by the model training method in the above embodiment performs face detection on the image to be detected, so as to improve accuracy of the face detection result.

Fig. 6 is a schematic structural diagram of a training device for a face detection model according to an embodiment of the present disclosure, and as shown in fig. 6, a training device 600 for a face detection model includes:

a first obtaining unit 601, configured to obtain a training set; wherein the training set comprises: at least one first image, at least one second image, and at least one third image; the first image is an image meeting preset conditions; the second image has false attributes, and the false attributes of the second image can be detected by using low-order image features; the third image has false attributes, and the false attributes of the third image cannot be obtained through low-order image feature detection; the false attribute is an attribute which does not accord with a preset condition;

a first training unit 602, configured to train the first model according to the first image and the second image, to obtain a second model;

The second training unit 603 is configured to train the second model according to the first image and the third image, to obtain a face detection model.

The device provided in this embodiment is configured to implement the technical scheme provided by the method, and the implementation principle and the technical effect are similar and are not repeated.

Fig. 7 is a schematic structural diagram of a training device for a face detection model according to another embodiment of the present disclosure, and as shown in fig. 7, a training device 700 for a face detection model includes:

a first obtaining unit 701, configured to obtain a training set; wherein the training set comprises: at least one first image, at least one second image, and at least one third image; the first image is an image meeting preset conditions; the second image has false attributes, and the false attributes of the second image can be detected by using low-order image features; the third image has false attributes, and the false attributes of the third image cannot be obtained through low-order image feature detection; the false attribute is an attribute which does not accord with a preset condition;

a first training unit 702, configured to train the first model according to the first image and the second image, so as to obtain a second model;

a second training unit 703, configured to train the second model according to the first image and the third image, so as to obtain a face detection model.

In one example, the first model includes a first extraction module; the first extraction module is used for extracting low-order image features of the image; the first training unit is specifically configured to:

and training and correcting parameters of the first extraction module according to the first image and the second image to obtain a second model.

In one example, first training unit 702 includes:

a selection module 7021, configured to select an image to be trained from the first image and the second image;

a first determining module 7022, configured to input an image to be trained into a first model, to obtain a first detection result and a first feature extraction result; the first detection result represents whether the image to be trained accords with preset conditions or not; the first feature extraction result is the feature in the image to be trained identified by the first extraction module;

an obtaining module 7023, configured to extract a second feature extraction result of the image to be trained; the second feature extraction result is the low-order image feature determined based on the preset feature extractor;

the second determining module 7024 is configured to perform parameter correction on the first extracting module according to the first detection result, the first feature extraction result, and the second feature extraction result, so as to obtain a second model.

In one example, the second determination module 7024 includes:

The first processing sub-module is used for sampling the first feature extraction result to obtain a first processing result; the matrix dimension of the first processing result is the same as the matrix dimension of the second feature extraction result;

and the correction sub-module is used for carrying out parameter correction on the first extraction module according to the first processing result, the second characteristic extraction result and the first detection result to obtain a second model.

In one example, the processing sub-module is specifically configured to:

performing downsampling processing on the first feature extraction result based on N preset convolution layers to obtain a second processing result;

based on M preset convolution layers, sequentially carrying out up-sampling treatment on the second treatment result according to the first characteristic extraction result; wherein N, M is a positive integer.

In one example, the acquisition module 7023 includes:

the transformation submodule is used for carrying out image size transformation processing on the image to be trained to obtain a transformed image to be trained;

and the second processing sub-module is used for carrying out feature extraction processing on the transformed image to be trained based on a preset feature extractor to obtain a second feature extraction result.

In one example, the first model further includes a second extraction module; the second extraction module is used for extracting high-order image features of the image; the second training unit 703 is specifically configured to:

And training and correcting parameters of a second extraction module in the second model according to the first image and the third image to obtain the face detection model.

In one example, the apparatus further comprises:

a third training unit 704, configured to train, before the first training unit trains the first model according to the first image and the second image to obtain the second model, the first model according to the first image, the second image, and the third image to obtain a trained first model; the trained first model is obtained by carrying out parameter adjustment on a first extraction module and a second extraction module in the first model; the first extraction module is used for extracting low-order image features of the image; the second extraction module is used for extracting high-order image features of the image.

In one example, a face detection model is used to identify whether an image meets preset conditions.

Fig. 8 is a schematic structural diagram of a face image detection device according to an embodiment of the present disclosure, and as shown in fig. 8, a training device 800 of a face detection model includes:

a second acquiring unit 801 configured to acquire an image to be detected;

a processing unit 802, configured to perform face detection processing on an image to be detected according to a face detection model, so as to obtain a detection result; wherein the face detection model is trained from the apparatus according to any one of claims 12-21; the detection result is used for representing whether the image to be detected accords with preset conditions or not.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

The present disclosure provides an electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method provided in any one of the embodiments described above.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. As shown in fig. 9, an electronic device 900 in the present disclosure may include: a processor 901 and a memory 902.

A memory 902 for storing a program; the memory 902 may include a volatile memory (english: volatile memory), such as a random-access memory (RAM), such as a static random-access memory (SRAM), a double data rate synchronous dynamic random-access memory (DDR SDRAM), etc.; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory 902 is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more of the memories 902 in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be called by the processor 901.

The computer programs, computer instructions, etc., described above may be stored in one or more of the memories 902 in a partitioned manner. And the above-described computer programs, computer instructions, etc. may be invoked by the processor 901.

A processor 901 for executing a computer program stored in the memory 902 to implement the steps in the method according to the above embodiment.

Reference may be made in particular to the description of the embodiments of the method described above.

The processor 901 and the memory 902 may be separate structures or may be integrated structures. When the processor 901 and the memory 902 are separate structures, the memory 902 and the processor 901 may be coupled by a bus 903.

The electronic device in this embodiment may execute the technical scheme in the above method, and the specific implementation process and the technical principle are the same, which are not described herein again.

The present disclosure provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method provided by any one of the embodiments described above.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program stored in a readable storage medium, from which at least one processor of an electronic device can read, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any one of the embodiments described above.

Fig. 10 shows a schematic block diagram of an example electronic device 1000 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the device 1000 can also be stored. The computing unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

Various components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and communication unit 1009 such as a network card, modem, wireless communication transceiver, etc. Communication unit 1009 allows device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The computing unit 1001 may be among various general and/or special purpose processing groups having processing and computing capabilities. Some examples of computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 1001 performs the respective methods and processes described above, for example, a training method of a face detection model, or a detection method of a face image. For example, in some embodiments, the training method of the face detection model, or the detection method of the face image, may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communication unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the training method of the face detection model described above, or the detection method of the face image may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the training method of the face detection model, or the detection method of the face image, in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A training method of a face detection model comprises the following steps:

2. The method of claim 1, wherein the first model comprises a first extraction module; the first extraction module is used for extracting low-order image features of the image; training the first model according to the first image and the second image to obtain a second model, wherein the training comprises the following steps:

3. The method of claim 2, wherein modifying the parameters of the first extraction module to obtain a second model based on the first image and the second image comprises:

selecting an image to be trained from the first image and the second image;

inputting the image to be trained into the first model to obtain a first detection result and a first feature extraction result; the first detection result represents whether the image to be trained accords with the preset condition or not; the first feature extraction result is the features in the image to be trained identified by the first extraction module;

Extracting a second feature extraction result of the image to be trained; the second feature extraction result is the low-order image feature determined based on the preset feature extractor;

and carrying out parameter correction on the first extraction module according to the first detection result, the first feature extraction result and the second feature extraction result to obtain a second model.

4. The method of claim 3, wherein performing parameter correction on the first extraction module according to the first detection result, the first feature extraction result, and the second feature extraction result to obtain a second model includes:

sampling the first feature extraction result to obtain a first processing result; the matrix dimension of the first processing result is the same as the matrix dimension of the second feature extraction result;

and carrying out parameter correction on the first extraction module according to the first processing result, the second characteristic extraction result and the first detection result to obtain a second model.

5. The method of claim 4, wherein sampling the first feature extraction result to obtain a first processing result, comprising:

based on M preset convolution layers, sequentially carrying out up-sampling processing on the second processing result according to the first characteristic extraction result; wherein N, M is a positive integer.

6. A method according to claim 3, wherein extracting the second feature extraction result of the image to be trained comprises:

performing image size transformation processing on the image to be trained to obtain a transformed image to be trained;

and carrying out feature extraction processing on the transformed image to be trained based on a preset feature extractor to obtain the second feature extraction result.

7. The method of any of claims 2-6, wherein the first model further comprises a second extraction module; the second extraction module is used for extracting high-order image features of the image; training the second model according to the first image and the third image to obtain a face detection model, including:

and training and correcting parameters of a second extraction module in the second model according to the first image and the third image to obtain a face detection model.

8. The method of any of claims 1-7, wherein prior to training a first model from the first image and the second image to obtain a second model, the method further comprises:

training the first model according to the first image, the second image and the third image to obtain a trained first model; the trained first model is obtained by carrying out parameter adjustment on a first extraction module and a second extraction module in the first model; the first extraction module is used for extracting low-order image features of the image; the second extraction module is used for extracting high-order image features of the image.

9. The method of any of claims 1-8, wherein X convolutional layers are provided in the first model; the first K convolution layers in the X convolution layers are used as a first extraction module; the last Y convolution layers in the X convolution layers are used as a second extraction module; wherein X, K, Y is a positive integer, and the value of X is greater than or equal to the sum result of K and Y; the first extraction module is used for extracting low-order image features of the image; the second extraction module is used for extracting high-order image features of the image.

10. The method according to any one of claims 1 to 9, wherein the preset condition is a face image that is photographed in a living body and is not subjected to a synthetic process.

11. The method according to any one of claims 1-10, wherein the face detection model is used to identify whether an image meets the preset condition.

12. A face image detection method comprises the following steps:

acquiring an image to be detected;

carrying out face detection processing on the image to be detected according to a face detection model to obtain a detection result; wherein the face detection model is trained according to the method of any one of claims 1-11; and the detection result is used for representing whether the image to be detected accords with a preset condition or not.

13. A training device for a face detection model, comprising:

14. The apparatus of claim 13, wherein the first model comprises a first extraction module; the first extraction module is used for extracting low-order image features of the image; the first training unit is specifically configured to:

15. The apparatus of claim 14, wherein the first training unit comprises:

the selection module is used for selecting an image to be trained from the first image and the second image;

the first determining module is used for inputting the image to be trained into the first model to obtain a first detection result and a first feature extraction result; the first detection result represents whether the image to be trained accords with the preset condition or not; the first feature extraction result is the features in the image to be trained identified by the first extraction module;

The acquisition module is used for extracting a second feature extraction result of the image to be trained; the second feature extraction result is the low-order image feature determined based on the preset feature extractor;

and the second determining module is used for carrying out parameter correction on the first extracting module according to the first detecting result, the first characteristic extracting result and the second characteristic extracting result to obtain a second model.

16. The apparatus of claim 15, wherein the second determination module comprises:

17. The apparatus of claim 16, wherein the processing sub-module is specifically configured to:

18. The apparatus of claim 17, wherein the acquisition module comprises:

19. The apparatus of any of claims 14-18, wherein the first model further comprises a second extraction module; the second extraction module is used for extracting high-order image features of the image; the second training unit is specifically configured to:

20. The apparatus of any of claims 13-19, wherein the apparatus further comprises:

the third training unit is used for training the first model according to the first image, the second image and the third image before the first training unit trains the first model according to the first image and the second image to obtain a second model, and training the first model to obtain a trained first model; the trained first model is obtained by carrying out parameter adjustment on a first extraction module and a second extraction module in the first model; the first extraction module is used for extracting low-order image features of the image; the second extraction module is used for extracting high-order image features of the image.

21. The apparatus of any of claims 13-20, wherein X convolutional layers are provided in the first model; the first K convolution layers in the X convolution layers are used as a first extraction module; the last Y convolution layers in the X convolution layers are used as a second extraction module; wherein X, K, Y is a positive integer, and the value of X is greater than or equal to the sum result of K and Y; the first extraction module is used for extracting low-order image features of the image; the second extraction module is used for extracting high-order image features of the image.

22. The apparatus according to any one of claims 13-21, wherein the preset condition is a face image that is photographed by a living body and is not subjected to a synthetic process.

23. The apparatus according to any one of claims 13-22, wherein the face detection model is configured to identify whether an image meets the preset condition.

24. A face image detection apparatus comprising:

the second acquisition unit is used for acquiring the image to be detected;

the processing unit is used for carrying out face detection processing on the image to be detected according to the face detection model to obtain a detection result; wherein the face detection model is trained from the apparatus according to any one of claims 13-23; and the detection result is used for representing whether the image to be detected accords with a preset condition or not.

25. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-12.

26. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-12.

27. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of any of claims 1-12.