CN114612971A

CN114612971A - Face detection method, model training method, electronic device, and program product

Info

Publication number: CN114612971A
Application number: CN202210211585.6A
Authority: CN
Inventors: 张丽; 杜悦艺; 孙亚生
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-03-04
Filing date: 2022-03-04
Publication date: 2022-06-10

Abstract

The present disclosure provides a face detection method, a model training method, an electronic device and a program product, which relate to a computer vision technology and a deep learning technology, and include: acquiring an image to be processed, and processing the image to be processed by using a face detection model to obtain a face detection result; the face detection result is used for representing whether pixel points included in the image to be processed belong to the face part or not; and determining a face region in the image to be processed according to the detection result. According to the face detection method, the model training method, the electronic device and the program product, the detected face region does not include pixel points which do not belong to the face, so that the face region with more accurate edge contour is obtained, and other follow-up processing on the face region is facilitated.

Description

Face detection method, model training method, electronic device, and program product

Technical Field

The present disclosure relates to computer vision technologies and deep learning technologies in artificial intelligence technologies, and in particular, to a face detection method, a model training method, an electronic device, and a program product.

Background

With the development of artificial intelligence technology, the application of face detection technology is more and more extensive, for example, the position of a face in an image can be determined through the face detection technology, and then the region where the face is located is processed.

In the prior art, an image can be input into a neural network, the neural network outputs a face region in the image, and specifically, a face rectangular frame, that is, coordinates of an upper left corner, a lower left corner, an upper right corner and a lower right corner of a region where the face is located can be output.

However, the shape of the face is not rectangular, which causes the problem that the face region output in the prior art includes pixels not belonging to the face, and further causes the pixels not belonging to the face to be processed synchronously when the face region is processed in the later period, resulting in poor processing effect.

Disclosure of Invention

The present disclosure provides a face detection method, a model training method, an electronic device, and a program product to improve the effect of face detection in an image.

According to a first aspect of the present disclosure, there is provided a face detection method, including:

acquiring an image to be processed, and processing the image to be processed by using a face detection model to obtain a face detection result; the face detection result is used for representing whether pixel points included in the image to be processed belong to a face part or not;

and determining a face region in the image to be processed according to the detection result.

According to a second aspect of the present disclosure, there is provided a model training method for detecting a human face, including:

acquiring a training image, wherein pixel points included in the training image have label data, and the label data is used for representing whether the pixel points belong to a face part;

processing the training image by using a preset model to obtain prediction data of pixel points included in the training image; the prediction data is used for representing whether the pixel points belong to human face parts or not;

and adjusting parameters of the model according to the label data and the prediction data which belong to the same pixel point in the training image to obtain the model for detecting the human face.

According to a third aspect of the present disclosure, there is provided a face detection apparatus comprising:

the acquisition unit is used for acquiring an image to be processed;

the detection unit is used for processing the image to be processed by using a face detection model to obtain a face detection result; the face detection result is used for representing whether pixel points included in the image to be processed belong to a face part;

and the determining unit is used for determining a face area in the image to be processed according to the detection result.

According to a fourth aspect of the present disclosure, there is provided a training apparatus for detecting a model of a human face, comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a training image, pixel points in the training image have label data, and the label data is used for representing whether the pixel points belong to a face part;

the detection unit is used for processing the training image by using a preset model to obtain prediction data of pixel points included in the training image; the prediction data is used for representing whether the pixel point belongs to a face part;

and the adjusting unit is used for adjusting the parameters of the model according to the label data and the prediction data which belong to the same pixel point in the training image to obtain the model for detecting the human face.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to the first or second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first or second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first or second aspect.

The present disclosure provides a face detection method, a model training method, an electronic device, and a program product, including: acquiring an image to be processed, and processing the image to be processed by using a face detection model to obtain a face detection result; the face detection result is used for representing whether pixel points included in the image to be processed belong to the face part or not; and determining a face region in the image to be processed according to the detection result. According to the face detection method, the model training method, the electronic device and the program product, the detected face region does not include pixel points which do not belong to the face, so that the face region with more accurate edge contour is obtained, and other follow-up processing on the face region is facilitated.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram illustrating detection of a face in an image according to an exemplary embodiment;

fig. 2 is a schematic flow chart illustrating a face detection method according to an exemplary embodiment of the present disclosure;

fig. 3 is a schematic flow chart of a face detection method according to another exemplary embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a model training method for detecting human faces according to an exemplary embodiment of the present disclosure;

FIG. 5 is a schematic flow chart diagram illustrating a model training method for detecting human faces according to another exemplary embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a face detection apparatus according to an exemplary embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a face detection apparatus according to another exemplary embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of a model training apparatus for detecting human faces according to an exemplary embodiment of the present disclosure;

FIG. 9 is a schematic structural diagram of a model training apparatus for detecting human faces according to another exemplary embodiment of the present disclosure;

FIG. 10 is a block diagram of an electronic device used to implement methods of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a diagram illustrating detection of a human face in an image according to an exemplary embodiment.

As shown in fig. 1, the image 11 may be input into a preset neural network 12, and the face detection result 13 in the image 11 may be output through the neural network.

The face detection results 13 output in this way are the coordinates of the upper left corner, the lower left corner, the upper right corner and the lower right corner of the area where the face is located, and only the face rectangle frame can be obtained in this way.

In some application processes, a face region in an image needs to be processed, for example, the face region may be subjected to blurring processing.

In the prior art, the face region in the detected image is a region including a face in the detected image by taking a rectangular frame as a unit. The face area detected in the scheme is a face rectangular frame, and the rectangular frame can include pixel points which do not belong to the face, so that the problem of poor processing effect exists when the face area detected in the image is processed. For example, the blurring process is also performed on the pixel points that do not belong to the face region.

In order to solve the technical problem, in the scheme provided by the disclosure, whether a pixel point included in the image to be processed belongs to a face part can be detected, and then the face region can be determined.

Fig. 2 is a flowchart illustrating a face detection method according to an exemplary embodiment of the present disclosure.

As shown in fig. 2, the face detection method provided by the present disclosure includes:

step 201, acquiring an image to be processed, and processing the image to be processed by using a face detection model to obtain a face detection result; and the face detection result is used for representing whether the pixel points included in the image to be processed belong to the face part or not.

The method provided by the present disclosure may be executed by an electronic device with computing capability, for example, a computer. The computer can process the image to be processed based on the face detection method provided by the disclosure, so as to detect the face region in the image to be processed.

Specifically, the image to be processed may be an image that meets the condition of the input data for the face detection model, for example, the size of the image meets a preset size, and for example, the pixel value of a pixel point in the image is a normalized value.

Further, the computer may obtain an original image of the face to be detected, which may be uploaded by the user or obtained by the computer automatically. For example, if there is a face detection task and a face region needs to be detected in N images, the computer may obtain the N original images, and then process the original images to obtain an image to be processed.

Furthermore, the computer can process the images to be processed, detect whether each pixel point in the images belongs to the face part, and further obtain the face detection result. For example, if the image to be processed includes m × n pixel points, it can be determined whether the m × n pixel points belong to the face portion by processing the image to be processed.

The face detection model can be trained in advance, and the face detection result can be obtained by processing the image to be processed by using the model. For example, the model can determine whether each pixel point in the image to be processed belongs to a face part.

Specifically, training data may be prepared in advance, and the model may be trained using the training data. The training data may be, for example, a training image, and a pixel point of the training image has label data for characterizing whether the pixel point belongs to a face part.

Further, the face detection model may specifically be an mae (masked automatons) model, and the structure of the model includes an encoder and a decoder. The infrastructure of both the encoder and decoder is a Vision Transformer (VIT) consisting of multiple Multi-head attention modules, feed-forward neural network modules, and normalization and residual chaining.

In practical application, training data and label data of the training data can be used for fine adjustment of the trained MAE, and a face detection model is obtained.

The MAE is an unsupervised pre-training model, and not only learns from a large number of unmarked picture samples; and the learning goal of the model is to learn the residual 75% of unknown pixel information through 25% of known pixel information, the learning granularity is fine, and the final pre-training model is promoted to have strong characteristic characterization capability and model construction capability. Therefore, the MAE model is used for detecting the face region in the image to be processed, the model can more accurately extract the characteristics of each pixel point in the image, and therefore the pixel points belonging to the face part in the image can be more accurately detected.

According to the scheme provided by the disclosure, through an end-to-end feature extraction mode, features are directly learned from an original picture without manually constructing the features, the data mining capability of big data and deep learning is fully utilized, and the features are comprehensive and have good generalization. In practical application, the computer can process each pixel point in the image to be processed one by one, and detect whether the pixel point belongs to the face part. For example, the computer may determine the probability that a pixel belongs to the face part according to the pixel in the image to be processed and the pixels around the pixel. If the probability value is higher, the pixel point can be considered as a pixel point belonging to the face part. If the probability value is lower, the pixel point can be considered as a pixel point which does not belong to the face part.

Step 202, according to the detection result, determining a face region in the image to be processed.

The computer can determine the region where the pixel points belonging to the face part in the image to be processed are located as the face region.

Specifically, the face region in the image should be a whole continuous region, and therefore, the connected region can be determined according to the pixel points belonging to the face region in the image to be processed. And determining a connected region which accords with the size of the face as a face region, and if the connected region is very small or the peripheral pixel points of one pixel point belonging to the face region are not the pixel points belonging to the face part, removing the connected region and the pixel points under the condition and not considering the connected region and the pixel points as the face region.

By the method, the pixel points which are mistakenly considered to belong to the face part can be removed, so that the face area can be more accurately detected in the image to be processed.

In the scheme provided by the disclosure, the region where the pixel points belonging to the face part are located can be determined as the face region, so that the region conforming to the appearance of the face can be obtained without being constrained by the rectangular frame, the face region detected in this way does not include the pixel points not belonging to the face part, and other processing such as blurring processing can be performed on the face region, and the blurring processing can be, for example, processing the face region into a mosaic image.

The face detection method provided by the present disclosure includes: acquiring an image to be processed, and processing the image to be processed by using a face detection model to obtain a face detection result; the face detection result is used for representing whether pixel points included in the image to be processed belong to the face part or not; and determining a face region in the image to be processed according to the detection result. According to the face detection method provided by the disclosure, the detected face region does not include pixel points which do not belong to the face region, so that the face region with more accurate edge contour is obtained, and other subsequent processing on the face region is facilitated.

Fig. 3 is a flowchart illustrating a face detection method according to another exemplary embodiment of the present disclosure.

As shown in fig. 3, the face detection method of the present disclosure includes:

step 301, an original image is acquired.

In the scheme provided by the disclosure, the computer can acquire the original image first and then process the original image to obtain the image to be processed. The original image may be uploaded by a user, or may be obtained by a computer automatically, for example.

Step 302, preprocessing the original image to obtain an image to be processed.

In the method provided by the disclosure, a face detection model can be preset, and then the face detection model is used for detecting pixel points belonging to a face part in an original image.

Specifically, before the original image is input into the model, the original image may be preprocessed to obtain an image to be processed, and then the image to be processed is input into the face detection model, so as to improve the accuracy of the model output result.

Further, when the original image is preprocessed, the size of the original image may be converted to 840 × 3, and then the converted image is normalized, for example, the pixel value of the image may be normalized, so as to obtain the image to be processed.

And step 303, extracting image features of the image to be processed by using an encoder in the face detection model.

And 304, processing the image characteristics by using a decoder in the face detection model, and determining whether pixel points included in the image to be processed belong to the face part.

According to the scheme provided by the disclosure, through an end-to-end feature extraction mode, features are directly learned from an original picture without manually constructing the features, the data mining capability of big data and deep learning is fully utilized, and the features are comprehensive and have good generalization.

The computer can extract the image characteristics of the image to be processed by using the encoder, process the image characteristics by using the decoder and determine whether the pixel points included in the image to be processed belong to the face part.

Further, the trained MAE model comprises an encoder and a decoder, wherein the encoder is used for extracting image features of the image to be processed, and the decoder is used for determining whether each pixel point in the image to be processed belongs to a face part according to the image features so as to obtain a face detection result.

Compared with a convolutional neural network, VIT is a global modeling, information of all pixels in an image can be detected once, and each pixel point can be communicated with all other pixel points more deeply in the VIT model building process, so that the VIT model has stronger feature expression capability, learning capability and model building capability. Therefore, the scheme provided by the disclosure can detect whether each pixel point in the image to be processed belongs to the face part or not through the encoder and the decoder with the VIT structure.

In an optional implementation manner, a decoder can be used to process image features, and the probability that pixel points included in an image to be processed belong to a face part is determined; and determining whether the pixel belongs to the face part or not based on the decoder and the probability of the pixel.

For example, the probability that one pixel belongs to the face region is determined to be 80%, and the probability that the other pixel belongs to the face region is determined to be 20%.

In practical application, the decoder has a function of determining whether the pixel belongs to the face part according to the probability of the pixel, for example, the pixel with the probability higher than a threshold value can be determined as the face pixel. By determining the probability that the single pixel point belongs to the face region, the detection result of whether the single pixel point belongs to the face part can be determined, and therefore the face region can be detected in the image to be processed more accurately.

And 305, determining a face region in the original image according to the face region in the image to be processed.

The computer can determine a face region in the image to be processed, and the image to be processed is obtained by preprocessing the original image, so that a corresponding relation exists between pixel points in the image to be processed and pixel points in the original image. Based on the corresponding relationship, a region in the original image corresponding to the face region in the image to be processed may be determined as the face region.

In an optional implementation manner, the image to be processed may be subjected to inverse processing, which may be a processing manner corresponding to the preprocessing, so as to obtain an original image, and thus, a face region may be determined in the original image according to the face region of the image to be processed.

Step 306, performing blurring processing on the face region in the original image, so that the face in the original image cannot be recognized.

In a practical application scheme, after a face region in an original image is detected, the face region can be subjected to blurring processing, so that user information is prevented from being leaked from an image including a face.

After the face area is fuzzified, the face area cannot correspond to a real user according to the fuzzified face area. The scheme can effectively guarantee the privacy of the user in the image, and can not fuzzify pixel points which do not belong to the face, so that excessive information of the processed image can not be lost.

Fig. 4 is a flowchart illustrating a model training method for detecting a human face according to an exemplary embodiment of the present disclosure.

As shown in fig. 4, the training method for a model for detecting a human face provided by the present disclosure includes:

step 401, acquiring a training image, where pixel points included in the training image have label data, and the label data is used to represent whether the pixel points belong to a face part.

The method provided by the disclosure can be applied to a computer, training images can be prepared in advance, and the model is trained by using the training images, so that the model learns the capability of detecting the face in the image.

Specifically, the training images may be images including faces in the public data set, and the images may be labeled by the user, so as to label pixel points belonging to the face part and pixel points not belonging to the face part in the images.

Furthermore, a user can intelligently or manually label the training image based on semantic segmentation labeling tools EISeg and labelme, specifically, label all boundary pixel points of all face targets based on the outline of each face in each image. Then, the computer can record the label information of each pixel point of each picture based on the labeled boundary pixel points, namely, record whether each pixel point belongs to one part of the face.

Step 402, processing a training image by using a preset model to obtain prediction data of pixel points included in the training image; the prediction data is used for representing whether the pixel point belongs to the face part or not.

In practical application, the training image can be input into a preset model, so that whether each pixel point in the model output image belongs to a face part or not is judged, and prediction data are obtained.

The prediction data can be a prediction result of any pixel point in the training image, and the prediction result is that the pixel point belongs to a face part or is not the face part.

The model can output the prediction data of each pixel point in the training image, so that the detection result of whether each pixel point in the training image belongs to the face part is obtained.

Specifically, in order to improve the training efficiency, in the scheme provided by the disclosure, a preset model can be used for processing a plurality of training images each time, and then prediction data of pixel points included in the plurality of training images is obtained.

And 403, adjusting parameters of the model according to the label data and the prediction data which belong to the same pixel point in the training image to obtain the model for detecting the face.

Furthermore, the computer can adjust the parameters of the model according to the label data and the prediction data which belong to the same pixel point in the same training image. For example, there is a training image, where a pixel point P has label data and prediction data of the pixel point P can also be obtained, and then parameters of the model can be adjusted according to the label data and the prediction data of the pixel point P.

In practical application, the computer can construct a loss function according to the label data and the prediction data, and then the loss function is used for gradient return to adjust parameters in the model.

The parameters in the model can be adjusted for multiple times through multiple iterations, so that the model can output more accurate prediction data, and when the prediction data output by the model is approximately consistent with the corresponding label data, the training of the model can be stopped, and the model for detecting the face is obtained.

The model for detecting the human face, which is trained by the present disclosure, can be applied to the scheme shown in fig. 2 or fig. 3.

According to the model obtained by training the scheme provided by the disclosure, the detected face region does not comprise pixel points which do not belong to the face, so that the face region with more accurate edge contour is obtained, and other subsequent processing on the face region is more facilitated.

Fig. 5 is a flowchart illustrating a model training method for detecting a human face according to another exemplary embodiment of the present disclosure.

As shown in fig. 5, the training method for a model for detecting a human face provided by the present disclosure includes:

step 501, a training image is obtained, pixel points included in the training image have label data, and the label data is used for representing whether the pixel points belong to a face part.

Step 501 is similar to the corresponding content in step 401, and is not described again.

Step 502, preprocessing the training image to obtain an input image.

In the method provided by the disclosure, a preset model can be used, and a training image is used for training the model, so that the model has the function of detecting the face region in the image.

Specifically, before the image is input into the model, the training image may be preprocessed to obtain an input image, and then the input image is input into the model, so as to improve the accuracy of the output result of the model.

Further, when preprocessing the training image, the size of the training image may be converted to 840 × 3, and then the converted training image is normalized, for example, the pixel value of the image may be normalized to obtain the input image.

Step 503, extracting the image features of the input image by using the encoder in the model.

Step 504, processing the image features by using a decoder in the model, and determining whether pixel points included in the input image belong to a face part.

In practical application, a model can be set in a computer, the preprocessed input image is processed by the model, and the model can determine face pixel points belonging to faces in the input image.

Further, the preset model may specifically be an mae (masked autoencoders) model, and the structure of the model includes an encoder and a decoder. The infrastructure of both the encoder and decoder is a Vision Transformer (VIT) consisting of multiple Multi-head attention modules, feed-forward neural network modules, and normalization and residual chaining.

In practical application, the trained MAE can be finely adjusted by using the training image and the label data of the training image, so that the face detection model is obtained.

The MAE is an unsupervised pre-training model, and not only learns from a large number of unmarked picture samples; and the learning goal of the model is to learn the residual 75% of unknown pixel information through 25% of known pixel information, the learning granularity is fine, and the final pre-training model is promoted to have strong characteristic characterization capability and model construction capability.

Compared with a convolutional neural network, VIT is a global modeling, information of all pixels in an image can be detected once, and each pixel point can be communicated with all other pixel points more deeply in the VIT model building process, so that the VIT model has stronger feature expression capability, learning capability and model building capability.

Wherein, the computer can extract the image characteristics of the input image by using the encoder; and processing the image characteristics by using a decoder to determine whether pixel points included in the input image belong to the face part.

Processing the image characteristics by using the decoder, and determining the probability that pixel points included in the input image belong to a face part; and then determining whether the pixel points included in the input image belong to a face part or not based on the decoder and the probability of the pixel points.

Further, the trained MAE model comprises an encoder and a decoder, wherein the encoder is used for extracting image features of the input image, and the decoder is used for determining whether each pixel point in the input image belongs to a face part according to the image features. Compared with a convolutional neural network, the VIT is a global modeling, information of all pixels in an image can be detected at one time, and each pixel point can be communicated with all other pixel points more deeply in the VIT model building process, so that the VIT model has stronger feature expression capability, learning capability and model building capability. Therefore, the scheme provided by the disclosure can detect whether each pixel point in the input image belongs to the face part or not through the encoder and the decoder with the VIT structure.

In an optional implementation manner, a decoder may be used to process image features, and determine the probability that a pixel point included in an input image belongs to a face part; and determining whether the pixel belongs to the face part or not based on the decoder and the probability of the pixel.

In practical application, the decoder has a function of determining whether the pixel belongs to the face part according to the probability of the pixel, so that the computer can determine whether the pixel belongs to the face part according to the probability of the pixel based on the function of the decoder. For example, if the probability of a pixel is higher than a threshold, it is determined that the pixel belongs to a face part. By determining the probability that the single pixel point belongs to the face part, the detection result of whether the single pixel point belongs to the face part can be determined, and therefore the face region can be detected in the input image more accurately.

Step 505, determining whether the pixel points included in the training image belong to the face part according to whether the pixel points included in the input image belong to the face part, thereby obtaining prediction data.

The computer can determine a face region in an input image, and the input image is obtained by preprocessing a training image, so that a corresponding relation exists between pixel points in the input image and pixel points in the training image. Based on the correspondence, a region in the training image corresponding to the input image may be determined as a face region.

In an optional implementation manner, the input image may be subjected to inverse processing, which may be a processing manner corresponding to the preprocessing, so as to obtain a training image, and thus, a face region may be determined in the training image according to the face region of the input image.

Step 506, according to the label data and the prediction data belonging to the same pixel point in the training image, adjusting parameters of the model to obtain a model for detecting the human face.

Step 506 is similar to the implementation of step 403, and is not described in detail.

Fig. 6 is a schematic structural diagram of a face detection apparatus according to an exemplary embodiment of the present disclosure.

As shown in fig. 6, the present disclosure provides a face detection apparatus 600, including:

an acquisition unit 610 configured to acquire an image to be processed;

a detecting unit 620, configured to process the image to be processed by using a face detection model to obtain a face detection result; the face detection result is used for representing whether pixel points included in the image to be processed belong to a face part;

a determining unit 630, configured to determine a face region in the image to be processed according to the detection result.

According to the face detection device, the detected face region does not include pixel points which do not belong to the face, so that the face region with more accurate edge contour is obtained, and other follow-up processing on the face region is facilitated.

Fig. 7 is a schematic structural diagram of a face detection apparatus according to another exemplary embodiment of the present disclosure.

As shown in fig. 7, in the face detection apparatus 700 provided by the present disclosure, the acquisition unit 710 is similar to the acquisition unit 610 shown in fig. 6, the detection unit 720 is similar to the detection unit 620 shown in fig. 6, and the determination unit 730 is similar to the determination unit 630 shown in fig. 6.

The structure of the face detection model comprises an encoder and a decoder;

the detecting unit 720 includes:

a feature extraction module 721, configured to extract an image feature of the image to be processed by using the encoder;

the feature processing module 722 is configured to process the image features by using the decoder, and determine whether a pixel point included in the image to be processed belongs to a face part.

The feature processing module 722 is specifically configured to:

processing the image characteristics by using the decoder, and determining the probability that pixel points included in the image to be processed belong to a face part;

and determining whether the pixel belongs to a human face part or not based on the decoder and the probability of the pixel.

The obtaining unit 710 is specifically configured to obtain an original image, and perform preprocessing on the original image to obtain the image to be processed;

the apparatus further comprises a mapping unit 740 configured to:

and determining a face region in the original image according to the face region in the image to be processed.

The apparatus further comprises a processing unit 750 for:

and blurring the face area in the original image so as to make the face in the original image unrecognizable.

Fig. 8 is a schematic structural diagram of a model training apparatus for detecting a human face according to an exemplary embodiment of the present disclosure.

As shown in fig. 8, the present disclosure provides a training apparatus 800 for detecting a model of a human face, including:

an obtaining unit 810, configured to obtain a training image, where a pixel point included in the training image has label data, and the label data is used to represent whether the pixel point belongs to a face part;

a detecting unit 820, configured to process the training image by using a preset model to obtain prediction data of a pixel point included in the training image; the prediction data is used for representing whether the pixel point belongs to a face part;

and an adjusting unit 830, configured to adjust parameters of the model according to the label data and the prediction data that belong to the same pixel point in the training image, so as to obtain a model for detecting a human face.

The model obtained by training the device provided by the disclosure is utilized, and the detected face region does not include pixel points which do not belong to the face, so that the face region with more accurate edge contour is obtained, and other subsequent processing on the face region is more facilitated.

Fig. 9 is a schematic structural diagram of a model training apparatus for detecting a human face according to another exemplary embodiment of the present disclosure.

As shown in fig. 9, in a training apparatus 900 for detecting a model of a human face provided in the present disclosure, an obtaining unit 910 is similar to the obtaining unit 810 shown in fig. 8, a detecting unit 920 is similar to the detecting unit 820 shown in fig. 8, and an adjusting unit 930 is similar to the adjusting unit 830 shown in fig. 8.

Wherein the structure of the model comprises an encoder and a decoder;

the detecting unit 920 includes:

the preprocessing module 921 is configured to preprocess the training image to obtain an input image;

a feature extraction module 922, configured to extract image features of the input image by using the encoder;

a feature processing module 923, configured to process the image features by using the decoder, and determine whether a pixel point included in the input image belongs to a face part;

a mapping module 924, configured to determine whether a pixel point included in the training image belongs to a face part according to whether the pixel point included in the input image belongs to the face part.

The feature processing module 923 is specifically configured to:

processing the image characteristics by using the decoder, and determining the probability that pixel points included in the input image belong to a face part;

determining whether the pixel points included in the input image belong to a face part based on the decoder and the probability of the pixel points.

The present disclosure provides a face detection method, a model training method, an electronic device, and a program product, which are applied to a computer vision technique and a deep learning technique in an artificial intelligence technique to improve the effect of face detection in an image.

It should be noted that the face image in this embodiment is not a face image for a specific user, and cannot reflect personal information of a specific user. It should be noted that the two-dimensional face image in the present embodiment is from a public data set.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, and the execution of the computer program by the at least one processor causes the electronic device to perform the solutions provided by any of the above embodiments.

FIG. 10 shows a schematic block diagram of an example electronic device 1000 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the device 1000 can also be stored. The calculation unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

A number of components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1001 executes the respective methods and processes described above, such as a face detection method or a training method of a model for detecting a face. For example, in some embodiments, a face detection method or a training method for detecting a model of a face may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communications unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the above described face detection method or training method for detecting a model of a face may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured by any other suitable means (e.g. by means of firmware) to perform a face detection method or a training method for detecting a model of a face.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A face detection method, comprising:

2. The method of claim 1, wherein the structure of the face detection model comprises an encoder and a decoder;

the processing the image to be processed by using the face detection model to obtain a face detection result comprises the following steps:

extracting image features of the image to be processed by using the encoder;

and processing the image characteristics by using the decoder to determine whether pixel points included in the image to be processed belong to a face part.

3. The method according to claim 2, wherein the processing the image features by the decoder to determine whether pixel points included in the image to be processed belong to a face part comprises:

4. The method according to any one of claims 1-3, wherein the acquiring the image to be processed comprises:

acquiring an original image, and preprocessing the original image to obtain the image to be processed;

after the face region is determined in the image to be processed, the method further comprises the following steps:

5. The method of claim 4, further comprising:

6. A model training method for detecting faces, comprising:

processing the training image by using a preset model to obtain prediction data of pixel points included in the training image; the prediction data is used for representing whether the pixel point belongs to a face part;

7. The method of claim 6, wherein the structure of the model comprises an encoder and a decoder;

the processing the training image by using the preset model to obtain the prediction data of the pixel points included in the training image comprises:

preprocessing the training image to obtain an input image;

extracting image features of the input image with the encoder;

processing the image characteristics by using the decoder to determine whether pixel points included in the input image belong to a face part;

and determining whether the pixel points included in the training image belong to the face part or not according to whether the pixel points included in the input image belong to the face part or not.

8. The method of claim 7, wherein said processing the image features with the decoder to determine whether pixel points included in the input image belong to a face portion comprises:

9. A face detection apparatus comprising:

the acquisition unit is used for acquiring an image to be processed;

10. The apparatus of claim 9, wherein the structure of the face detection model comprises an encoder and a decoder; the detection unit includes:

the characteristic extraction module is used for extracting the image characteristics of the image to be processed by utilizing the encoder;

and the characteristic processing module is used for processing the image characteristics by using the decoder and determining whether pixel points included in the image to be processed belong to a face part.

11. The apparatus of claim 10, wherein the feature processing module is specifically configured to:

12. The apparatus of any one of claims 9-11,

the acquisition unit is specifically used for acquiring an original image and preprocessing the original image to obtain the image to be processed;

the apparatus further comprises a mapping unit configured to:

13. The apparatus of claim 12, further comprising a processing unit to:

14. Training apparatus for detecting a model of a human face, comprising:

and the adjusting unit is used for adjusting the parameters of the model according to the label data and the prediction data which belong to the same pixel point in the training image to obtain the model for detecting the face.

15. The apparatus of claim 14, wherein the structure of the model comprises an encoder and a decoder;

the detection unit includes:

the preprocessing module is used for preprocessing the training image to obtain an input image;

a feature extraction module for extracting image features of the input image using the encoder;

the characteristic processing module is used for processing the image characteristics by using the decoder and determining whether pixel points included in the input image belong to a face part;

and the mapping module is used for determining whether the pixel points included in the training image belong to the face part according to whether the pixel points included in the input image belong to the face part.

16. The apparatus of claim 15, wherein the feature processing module is specifically configured to:

and determining whether the pixel points included in the input image belong to a face part or not based on the decoder and the probability of the pixel points.

17. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

19. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method of any one of claims 1 to 8.