CN111985374A

CN111985374A - Face positioning method and device, electronic equipment and storage medium

Info

Publication number: CN111985374A
Application number: CN202010808926.9A
Authority: CN
Inventors: 隆超; 黄磊; 彭菲; 张健
Original assignee: Hanwang Technology Co Ltd
Current assignee: Hanvon Manufacturer Co ltd
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2020-11-24
Anticipated expiration: 2040-08-12
Also published as: CN111985374B

Abstract

The application discloses a face positioning method, belongs to the technical field of face positioning, and is beneficial to improving the face positioning accuracy. The method comprises the following steps: preprocessing a target face image to obtain an image data matrix; performing feature mapping processing on the image data matrix through a pre-trained face positioning model to determine image features corresponding to the target face image; carrying out face positioning on a target face image according to image characteristics, wherein a face positioning model is trained on the basis of a mask shielding face image and a mask-free shielding face image which are provided with labels, and the labels comprise a face positioning frame position, a size true value and a thermodynamic diagram, wherein the thermodynamic diagram is generated by adopting a Gaussian mask technology; and performing weighted operation on the error between the estimated value and the true value of the face positioning frame by the weight indicated by the thermodynamic diagram through a loss function of the face positioning model, calculating the model error of the face positioning model, and performing model training by taking the minimum model error as a target.

Description

Face positioning method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of face positioning technologies, and in particular, to a face positioning method and apparatus, an electronic device, and a computer-readable storage medium.

Background

The face positioning is widely applied to the fields of face recognition related to attendance checking, entrance guard, security protection and the like. The Single-stage face detection method in the prior art is mainly based on the classification and regression of anchors, and is usually optimized on the basis of a classical frame (such as an SSD (Single Shot multi box Detector) and a Yolo (young Only Look one) series), and has a faster detection speed than a two-stage method and a better detection performance than a cascade method, and is an algorithm with balanced detection performance and speed and also a mainstream direction of optimization of a current face detection algorithm. However, in the single-stage face detection method in the prior art, the detection performance is low under the condition that the face is partially shielded. For example, according to the requirements of special industries such as building, manufacturing, medicine, catering and the like, a person needs to wear a mask in the working process, and related equipment for face recognition cannot be normally used due to the fact that the mask shields the person. In the case, most effective information of the face is shielded by the mask, and the face positioning error in the face detection method is serious under the condition of incomplete information.

Therefore, the single-stage face positioning method in the prior art needs to be improved.

Disclosure of Invention

The application provides a face positioning method which is beneficial to improving face positioning accuracy.

In order to solve the above problem, in a first aspect, an embodiment of the present application provides a face positioning method, including:

preprocessing a target face image to obtain an image data matrix;

performing feature mapping processing on the image data matrix through a pre-trained face positioning model to determine image features corresponding to the target face image; wherein the content of the first and second substances,

the face positioning model is trained based on a face image covered by a mask provided with a label and a face image covered by a non-mask provided with a label, and the label at least comprises: the real values of the position and the size of the face positioning frame and the thermodynamic diagrams, wherein the thermodynamic diagrams of the non-mask face shielding image comprise: generating thermodynamic diagram data of the part of the unshielded face by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image without the mask; the thermodynamic diagram for covering the face image by the gauze mask comprises: generating semicircle thermodynamic diagram data of the part of the face which is not shielded by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image shielded by the mask, and generating single color diagram data of the semicircle of the part of the mask; carrying out weighted operation on the error between the estimated value and the true value of the face positioning frame by the weight indicated by the thermodynamic diagram through the loss function of the face positioning model, calculating the model error of the face positioning model, and carrying out model training by taking the minimum model error as a target;

and carrying out face positioning on the target face image according to the image characteristics.

In a second aspect, an embodiment of the present application provides a face positioning apparatus, including:

the image preprocessing module is used for preprocessing a target face image to obtain an image data matrix;

the image feature extraction module is used for performing feature mapping processing on the image data matrix through a pre-trained face positioning model to determine image features corresponding to the target face image; wherein, face location model has the gauze mask to shelter from face image and be provided with the label and do not have the gauze mask to shelter from face image training based on being provided with the label, the label includes at least: the real values of the position and the size of the face positioning frame and the thermodynamic diagrams, wherein the thermodynamic diagrams of the non-mask face shielding image comprise: generating thermodynamic diagram data of the part of the unshielded face by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image without the mask; the thermodynamic diagram for covering the face image by the gauze mask comprises: generating semicircle thermodynamic diagram data of the part of the face which is not shielded by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image shielded by the mask, and generating single color diagram data of the semicircle of the part of the mask; carrying out weighted operation on the error between the estimated value and the true value of the face positioning frame by the weight indicated by the thermodynamic diagram through the loss function of the face positioning model, calculating the model error of the face positioning model, and carrying out model training by taking the minimum model error as a target;

and the face positioning module is used for carrying out face positioning on the target face image according to the image characteristics.

In a third aspect, an embodiment of the present application further discloses an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the face location method according to the embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor, and the steps of the face location method disclosed in the embodiment of the present application are performed.

The face positioning method disclosed by the embodiment of the application obtains an image data matrix by preprocessing a target face image; performing feature mapping processing on the image data matrix through a pre-trained face positioning model to determine image features corresponding to the target face image; wherein, the face location model is based on having the gauze mask that is provided with the label to shelter from face image and be provided with the no gauze mask that the label sheltered from face image training, the label includes at least: the real values of the position and the size of the face positioning frame and the thermodynamic diagrams, wherein the thermodynamic diagrams of the non-mask face shielding image comprise: generating thermodynamic diagram data of the part of the unshielded face by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image without the mask; the thermodynamic diagram for covering the face image by the gauze mask comprises: generating semicircle thermodynamic diagram data of the part of the face which is not shielded by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image shielded by the mask, and generating single color diagram data of the semicircle of the part of the mask; carrying out weighted operation on the error between the estimated value and the true value of the face positioning frame by the weight indicated by the thermodynamic diagram through the loss function of the face positioning model, calculating the model error of the face positioning model, and carrying out model training by taking the minimum model error as a target; and carrying out face positioning on the target face image according to the image characteristics, thereby effectively improving the face positioning accuracy.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of a face positioning method according to a first embodiment of the present application;

fig. 2 is a schematic structural diagram of a face positioning model in a face positioning method according to a first embodiment of the present application;

fig. 3 is a flowchart of an image feature determination step in a face positioning method according to a first embodiment of the present application;

FIG. 4 is a schematic diagram of a front end module according to a first embodiment of the present application;

FIG. 5 is a schematic diagram of a bidirectional feature pyramid structure according to a first embodiment of the present application;

fig. 6 is a schematic structural diagram of a face positioning device according to a second embodiment of the present application;

fig. 7 is a second schematic structural diagram of a face positioning device according to a second embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Example one

As shown in fig. 1, the method for locating a human face disclosed in the embodiment of the present application includes steps 110 to 130.

And step 110, preprocessing the target face image to obtain an image data matrix.

The pretreatment described in the embodiments of the present application includes, but is not limited to: adjusting image size, normalizing image content data, and the like.

For example, due to the reasons of different acquisition devices or different distances between the specific face image acquisition devices, the sizes of the acquired target face images are different, and in some embodiments of the present application, the acquired target face images may be first adjusted to a preset size.

For another example, for a collected color image, the color value is usually 0 to 255, and in order to reduce the amount of subsequent operations, the pixel value of the target face image may be normalized from 0 to 255 to 0 to 1. In some embodiments of the present application, a series of image preprocessing operations such as mean value reduction can be further performed on the normalized image, so as to reduce the amount of operation and improve the positioning accuracy.

And 120, performing feature mapping processing on the image data matrix through a pre-trained face positioning model, and determining image features corresponding to the target face image.

Wherein, the face location model is based on having the gauze mask that is provided with the label to shelter from face image and be provided with the no gauze mask that the label sheltered from face image training, the label includes at least: the real values of the position and the size of the face positioning frame and the thermodynamic diagram. Further, the thermodynamic diagram of the mask-free human face image comprises: generating thermodynamic diagram data of the part of the unshielded face by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image without the mask; the thermodynamic diagram for covering the face image by the gauze mask comprises: generating semicircle thermodynamic diagram data of the part of the face which is not shielded by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image shielded by the mask, and generating single color diagram data of the semicircle of the part of the mask; and performing weighted operation on the error between the estimated value and the true value of the face positioning frame by the weight indicated by the thermodynamic diagram through the loss function of the face positioning model, calculating the model error of the face positioning model, and performing model training by taking the minimum model error as a target.

The face positioning model described in the embodiments of the present application is pre-trained. In some embodiments of the present application, before the step of performing feature mapping processing on the image data matrix through a pre-trained face positioning model and determining an image feature corresponding to the target face image, a training sample is constructed according to a face image with a mask and a face image without a mask; and training a face positioning model based on the training samples. Wherein, the sample data of each training sample is a preprocessed image matrix, and the sample label comprises: sample category (used for indicating that the sample data is covered by a mask or not covered by the mask), position and size true value of the face positioning frame and thermodynamic diagram.

In some embodiments of the present application, in order to improve the positioning accuracy of the face positioning model, training samples are respectively constructed according to a face image covered by a mask (i.e., a face image with a mask) and a face image covered by a non-mask (i.e., a face image without a mask), so as to train the face positioning model.

Specifically, each face image is used as sample data, and a corresponding sample label is set for the sample data, so that a training sample can be generated according to each face image, a plurality of training samples are collected, and a sample set is constructed.

The face positioning frame in the sample label is the real size value of the face positioning frame in the face image corresponding to the corresponding sample data, and the thermodynamic diagram in the sample label is the thermodynamic diagram of the face image corresponding to the corresponding sample data.

In some embodiments of the present application, the actual values of the position and the size of the face positioning frame are the actual values of the center of the face positioning frame and the length and the width of the face positioning frame, and can be obtained according to the manual labeling result. In other embodiments of the present application, the actual values of the position and the size of the face positioning box may also be obtained by executing an existing algorithm by a computer to perform a labeling result of machine labeling. The method and the device do not limit the obtaining mode of the position and the real size value of the face positioning frame in the sample label.

In some embodiments of the present application, a thermodynamic diagram in a sample label of sample data constructed according to a mask-covered face image is generated in a different manner from a thermodynamic diagram in a sample label of sample data constructed according to a mask-free face image.

The thermodynamic diagram can visually demonstrate the focus of the data distribution through color changes and distribution. For example, the area of a page on which visitors to a website are keen and the geographical area in which the visitors are located are displayed in a special highlight form. In the embodiment of the application, the focus area in the face image is identified through the thermal map. For example, the center position of the face is identified by a high point region.

In some embodiments of the present application, the generating of the thermal diagram data of the unoccluded face part by using the gaussian mask technique according to the position and the size of the face positioning frame in the mask-free occluded face image includes: initializing a thermodynamic diagram; determining the radius of a Gaussian mask according to the size of a face positioning frame in the non-mask human face image; the center of the face positioning frame is mapped to the center of the thermodynamic diagram through a preset Gaussian kernel, the circular area in the thermodynamic diagram is updated according to the Gaussian mask radius, and the thermodynamic diagram data of the unoccluded face part are determined according to the updated thermodynamic diagram. For example, first, a thermodynamic diagram of all 0's is generated; then, calculating to generate a value of the gaussian mask radius according to the width and height of the face positioning frame as a basic proportion, for example, calculating the gaussian mask radius by the following method:

radius＝(h+w)+(h+w)²-4 (h w (1-0.7)/(1+0.7)), wherein h represents the height of the face mask and w represents the width of the face mask; and then, distributing the face positioning frame of the sample data on the thermodynamic diagram through a Gaussian kernel to generate a bright point circle on the black image. The position of the face positioning frame determines the position of the center of the thermodynamic diagram gaussian kernel (namely the center of the bright point circle), and the length and the width of the face positioning frame determine the radius of the bright point circle.

In some embodiments of the present application, the non-occluded face part semicircle thermodynamic diagram data generated by adopting the gaussian mask technique according to the position and size of the face positioning frame in the covered face image, and the single color diagram data of the covered part semicircle include: initializing a thermodynamic diagram; determining the radius of a Gaussian mask according to the size of a face positioning frame in the face image shielded by the mask; the center of the face positioning frame is mapped to the center of the thermodynamic diagram through a preset Gaussian kernel, the upper semicircle region in the thermodynamic diagram is updated according to the Gaussian mask radius, the lower semicircle region in the thermodynamic diagram is updated according to the Gaussian mask radius by a preset single color value, and the unoccluded face part thermodynamic diagram data is determined according to the updated thermodynamic diagram. For example, first, a thermodynamic diagram of all 0's is generated; then, calculating to generate a value of the gaussian mask radius according to the width and height of the face positioning frame as a basic proportion, for example, calculating the gaussian mask radius by the following method:

radius＝(h+w)+(h+w)²-4 (h w (1-0.7)/(1+0.7)), wherein h represents the height of the face mask and w represents the width of the face mask; and then, distributing the face positioning frame of the sample data on the thermodynamic diagram through a Gaussian kernel to generate a bright spot upper semicircle on a black image. The position of the face positioning frame determines the circle center position of the thermodynamic diagram Gaussian core (namely the circle center of the semicircle on the bright spot), and the length and the width of the face positioning frame determine the radius of the semicircle on the bright spot. Then, the lower semicircle corresponding to the face part shielded by the mask is filled with a preset single color value (such as 0.0001), and the thermodynamic diagram of the face image shielded by the mask is obtained. The preset single color value in the embodiment of the present application is generally a color value close to black.

And respectively inputting a plurality of training samples obtained according to the method into the face positioning model for training parameters of the face positioning model.

In the model training process, firstly, the image data matrix is subjected to feature mapping processing through the face positioning model, and image features matched with input sample data are determined. For example, for each input training sample, feature extraction and mapping are performed on sample data of the input training sample through each network sequentially arranged in the face positioning model, so as to obtain an image feature matched with each sample data, wherein the image feature is used for indicating a pre-estimated value of a face positioning box.

And then, performing weighted operation on the error between the estimated value and the true value of the face positioning frame by the weight indicated by the thermodynamic diagram through the loss function of the face positioning model, calculating the model error of the face positioning model, and optimizing the model parameter by taking the minimum model error as a target until the model converges. For example, the error of the face positioning model is calculated through a loss function of the model according to the difference between the input predicted value and the true value of the face positioning frame of all training samples and the input thermodynamic diagram, and the model parameters are optimized through a gradient descent method with the minimum error as a target until the error of the face positioning model is converged to a preset range, so that the training of the face positioning model is completed.

In some embodiments of the application, the loss function determines the calculation weight of each dimension vector in the image features according to the thermodynamic diagram data, and increases the weight of the part which is not covered by the mask by changing the learning direction of the mask region in the face image, thereby changing the attention mechanism of model training. The learning attention is focused on the area which is not shielded by the mask and the human head area, so that the positioning accuracy is improved.

The specific implementation of obtaining the estimated value of the face positioning frame and the estimated value of the thermodynamic diagram through feature extraction and mapping of sample data of the input training sample through the networks sequentially arranged in the face positioning model refers to the following face positioning process of the target face image, and details are not repeated here.

In some embodiments of the present application, the backbone network structure of the face localization model includes serially connected: the system comprises a front-end feature extraction network, a plurality of convolutional neural networks and a feature fusion network. As shown in fig. 2, includes: a front-end feature extraction network 210, serially connected convolutional neural networks 220, 230, and 240, and a feature fusion network 250. The convolutional neural network may adopt an initiation network, a rescet network, a shfflent network, an squzenet network, and the like, which is not limited in the present invention. Referring to fig. 3, in some embodiments of the present application, the performing feature mapping processing on the image data matrix through a pre-trained face localization model to determine an image feature corresponding to the target face image includes: substeps S1 to substep S3.

And a substep S1, performing multilayer convolution operation and feature dimension reduction mapping on the image data matrix through a front-end feature extraction network in a pre-trained face positioning model, and determining hidden layer output and module output of the front-end feature extraction network.

In some embodiments of the present application, the front-end feature extraction network 210 may further include at least two feature mapping structures. For example, the front-end feature extraction network 210 includes: the mapping structure comprises a first feature mapping structure consisting of N convolutional layers and 1 pooling layer which are connected in series, and a second feature mapping structure consisting of M convolutional layers and 1 pooling layer which are connected in series, wherein M and N are integers larger than 1, and N is larger than M.

The input of the first convolutional neural network (e.g., convolutional neural network 220 in fig. 2) of the plurality of convolutional neural networks connected in series is connected to the output of the front-end feature extraction network 210.

In some embodiments of the present application, as shown in fig. 4, the front-end feature extraction network 210 includes at least a first convolutional layer 2101, a second convolutional layer 2102, a first pooling layer 2103, a third convolutional layer 2104, and a second pooling layer 2105 connected in series, where the convolution kernel of the first convolutional layer 2101 and the second convolutional layer 2102 is larger than the convolution kernel of the third convolutional layer 2104. For example, the convolution kernels of the first convolution layer 2101 and the second convolution layer 2102 are 5 × 5, the reception field is 2, the convolution kernel of the third convolution layer 2104 is 3 × 3, and the reception field is 2. In some embodiments of the present application, the convolution kernel of the first pooling layer 2103 and the second pooling layer 2105 is 3 x 3 and the receptive field is 2. After passing through such a front-end feature extraction network, the original image features input to the front-end feature extraction network will be compressed to 1/16 of the original, and the feature content in the original image features will not be lost.

In some embodiments of the present application, the performing multilayer convolution operation and feature dimension reduction mapping on the image data matrix through a front-end feature extraction network in a pre-trained face positioning model, and determining hidden layer output and module output of the front-end feature extraction network includes: performing convolution operation on the image data matrix sequentially through the first convolution layer and the second convolution layer, and performing feature mapping on the output of the second convolution layer through the first pooling layer to obtain hidden layer output of the front-end feature extraction network; and performing convolution operation and feature mapping on the hidden layer output sequentially through the third convolution layer and the second pooling layer, and determining the module output of the front-end feature extraction network.

First, an image data matrix obtained by preprocessing a target face image is input to a front-end feature extraction network 210 as shown in fig. 2, the image data matrix is sequentially convolved by the first convolution layer 2101 and the second convolution layer 2102 of the front-end feature extraction network 210, then, the output of the second convolution layer 2102 is feature-mapped by the first pooling layer 2103 of the front-end feature extraction network 210, and the first pooling layer 2103 outputs a set of hidden layer vectors, which may be denoted as F1, for example. Next, the hidden layer vector F1 output by the first pooling layer 2103 is further convolved by the third convolution layer 2104 of the front-end feature extraction network 210, and then the output of the third convolution layer 2104 is feature mapped by the second pooling layer 2105, so as to obtain the module output of the front-end feature extraction network 210. The modules of the front-end feature extraction network 210 output as a set of vectors, which may be denoted as F2, for example.

The front-end feature extraction network 210 rapidly reduces the feature map by 16 times while extracting the features of the image data matrix, which is beneficial to rapidly extracting the features of the subsequent backbone network.

And a substep S2 of performing progressively deepened abstract mapping on the module output of the front-end feature extraction network through a plurality of convolutional neural networks connected in series in the face positioning model, and respectively determining the output of each convolutional neural network.

Next, the module output of the front-end feature extraction network 210, such as the aforementioned vector F2, will be input to the first convolutional neural network 220 of the plurality of convolutional neural networks connected in series.

For example, the model output F2 of the front-end feature extraction network 210 is sequentially input to the convolutional neural network 220 shown in fig. 2, and the convolutional neural network 220 performs feature extraction on the input vector F2 to obtain feature information, which is represented as F3, for example; then, the convolutional neural network 230 performs feature extraction on the output vector F3 of the convolutional neural network 220 to obtain feature information, for example, represented as F4; then, feature extraction is performed on the output vector F4 of the convolutional neural network 230 by the convolutional neural network 240, and feature information, for example, represented as F5, is obtained.

In some embodiments of the present application, a plurality of convolutional neural networks connected in series are sequentially deepened, so that extracted feature levels are sequentially deepened, and features of a plurality of levels carry different information, which is information required for face detection and positioning. The structure of each convolutional neural network can be referred to as a convolutional neural network in the prior art, and is not described in detail herein.

And a substep S3, splicing the hidden layer output and module output of the front-end feature extraction network and the output of each convolutional neural network into a feature to be fused, performing multi-scale feature fusion on the feature to be fused through a feature fusion network in the face positioning model, and determining the image feature corresponding to the target face image.

Next, the hidden layer output F1 and the module output F2 of the front-end feature extraction network 210 shown in fig. 2, and the outputs F3, F4, and F5 of each convolutional neural network are spliced into a feature to be fused, the feature to be fused is input to the feature fusion network 250, the feature fusion network 250 performs multi-scale feature fusion on the feature to be fused, and finally, an image feature corresponding to the target face image is obtained.

In some embodiments of the present application, the feature fusion network is a bidirectional feature pyramid network. The bidirectional feature pyramid network may employ a network structure as shown in fig. 5. For example, the bidirectional feature pyramid network includes a forward propagation network branch 510, a backward propagation network branch 520, and a converged network branch 530.

In some embodiments of the present application, the performing multi-scale feature fusion on the feature to be fused through a feature fusion network in the face positioning model to determine an image feature corresponding to the target face image includes: sequentially sampling the features to be fused from top to bottom through each convolution layer of the bidirectional feature pyramid network forward propagation network branch, and up-sampling the features to be fused from bottom to top through each convolution layer of the bidirectional feature pyramid network backward propagation network branch; merging the output of the convolutional layer of the forward propagation network branch with the output of the corresponding convolutional layer of the backward propagation network branch; and performing convolution and regression processing on the fusion result, and outputting the image characteristics corresponding to the target face image.

For example, the feature to be fused is respectively input into the forward propagation network branch 510 and the backward propagation network branch 520 of the bidirectional feature pyramid network, and the feature to be fused is down-sampled by each convolution layer sequentially arranged from top to bottom through the forward propagation network branch 510, and each convolution layer outputs a feature map with different scales; on the other hand, for the feature map input to the back propagation network branch 520, the feature to be fused is up-sampled by each convolutional layer arranged from bottom to top in sequence, and each convolutional layer outputs a feature map with a different scale. Then, feature maps (such as feature maps output by the convolutional layer 5101 at the topmost layer of the forward propagation network branch 510 and the convolutional layer 5201 at the bottommost layer of the backward propagation network branch 520) with the same size in the forward propagation network branch 510 and the backward propagation network branch 520 are fused through the fusion network branch 530, after the fusion, a convolution operation is further performed on each fusion result by adopting a convolution kernel such as 3 × 3, and the feature maps obtained after the convolution operation are adjusted to a specified scale. By performing convolution operation on each fusion result, aliasing effect of up-sampling can be eliminated.

Taking the forward propagation network branch 510 and the backward propagation network branch 520 respectively having 5 convolutional layers as an example, after performing convolution operation on the forward propagation network branch 510 and the backward propagation network branch 520, the forward propagation network branch 510 and the backward propagation network branch 520 respectively output feature maps of 5 scales, each feature map is three dimensions, namely h, w, and c, where h and w represent the length and width of the feature map, and c represents the thickness of the feature map, and the thickness of the feature maps of different scales is the same, and the length and width are sequentially halved, for example, the length and width of the feature maps of 5 scales output by the respective convolutional layers of the forward propagation network branch 510 and the backward propagation network branch 520 are respectively 512, 256, 128, 64, and 32. These 5 feature maps are then fused via a fusion network branch 530. For example, the two large-scale feature maps are convolved (for example, feature maps with lengths of 512 and 256) by using a convolution kernel with a reception field of 2, that is, downsampled, so that the feature maps are reduced by half, and a feature map F7 (for example, a feature map with a length of 128) is obtained; meanwhile, carrying out deconvolution operation on two small-scale feature maps (such as feature maps with the lengths of 32 and 64) by adopting a convolution kernel with the receptive field of 2, namely carrying out upsampling, so that the feature maps are sequentially expanded by one time to obtain a feature map F8 (such as a feature map with the length of 128); finally, feature maps F7 and F8 obtained after convolution and deconvolution operations are fused with the fused intermediate scale feature maps (such as the feature map with the length of 128) output by the forward propagation network branch 510 and the backward propagation network branch 520 to obtain a feature map with a specified scale, and the feature map is used as the image feature matched with the target face image.

Because the loss of the face features is serious after the mask is worn, the feature fusion is carried out on a plurality of feature maps with a plurality of scales through the bidirectional feature pyramid network, the information loss of high-dimensional features is made up by using low-dimensional features, and the sense field deficiency of the low-dimensional features is made up by using high-dimensional features, so that the feature extraction precision is improved, and the face positioning accuracy is further improved.

Continuing to refer to fig. 1, in step 130, the target face image is subjected to face localization according to the image features.

The image characteristics output by the face positioning model are used for indicating the position and size information of a face positioning frame, the face positioning model also outputs the category information of a target face image, and the category information is used for indicating whether a mask is shielded. Further, the position and size information of the face positioning frame indicated in the image features can be obtained by decoding the image features, so that the face positioning of the target face image is realized.

According to the face positioning method disclosed by the embodiment of the application, a thermodynamic diagram label and a face positioning frame label are arranged on a face image shielded by a mask and a face image shielded by no mask, a training sample is constructed, then, in the model training process, a loss function is executed, a thermodynamic diagram is used as a calculation weight of an error between a true value corresponding to sample data and a predicted value output by a model to calculate a model error, the learning direction of a face mask area is changed through the thermodynamic diagram, the characteristic weight of a part not shielded by the mask is increased, the attention mechanism of model training is changed, and the effect of improving the face positioning accuracy is achieved.

Furthermore, multi-scale feature fusion is carried out through a bidirectional feature pyramid network, information loss is reduced through bidirectional feature fusion of the pyramid from top to bottom and from bottom to top, and face positioning accuracy can be obviously improved.

On the other hand, the front-end feature extraction network arranged at the front end of the backbone network of the face positioning model can obviously improve the running speed of a subsequent model, thereby facilitating the deployment and application of the model in an embedded environment. The method and the device are used for rapidly positioning the face of a scene wearing the mask, missing detection and false detection caused by the fact that the mask shields partial information are avoided, through experimental comparison, the target face images with different sizes from 20 × 20 to 512 × 512 have universality, the face positioning speed is more than 4 times of MTCNN (multi task convolutional neural network), and the application requirement of embedded equipment is completely met.

Example two

Corresponding to the method embodiment, another embodiment of the present application discloses a face positioning device, as shown in fig. 6, the device includes:

the image preprocessing module 610 is configured to preprocess the target face image to obtain an image data matrix;

an image feature extraction module 620, configured to perform feature mapping processing on the image data matrix through a pre-trained face positioning model, and determine an image feature corresponding to the target face image; wherein, the face location model is based on having the gauze mask that is provided with the label to shelter from face image and be provided with the no gauze mask that the label sheltered from face image training, the label includes at least: the real values of the position and the size of the face positioning frame and the thermodynamic diagrams, wherein the thermodynamic diagrams of the non-mask face shielding image comprise: generating thermodynamic diagram data of the part of the unshielded face by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image without the mask; the thermodynamic diagram for covering the face image by the gauze mask comprises: generating semicircle thermodynamic diagram data of the part of the face which is not shielded by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image shielded by the mask, and generating single color diagram data of the semicircle of the part of the mask; carrying out weighted operation on the error between the estimated value and the true value of the face positioning frame by the weight indicated by the thermodynamic diagram through the loss function of the face positioning model, calculating the model error of the face positioning model, and carrying out model training by taking the minimum model error as a target;

a face positioning module 630, configured to perform face positioning on the target face image according to the image feature.

In some embodiments of the present application, as shown in fig. 7, the image feature extraction module 620 further includes:

the front-end feature extraction submodule 6201 is used for performing multilayer convolution operation and feature dimension reduction mapping on the image data matrix through a front-end feature extraction network in a pre-trained face positioning model, and determining hidden layer output and module output of the front-end feature extraction network;

a feature abstract mapping submodule 6202, configured to perform progressively deepened abstract mapping on module outputs of the front-end feature extraction network through a plurality of convolutional neural networks connected in series in the face positioning model, and determine outputs of each convolutional neural network respectively;

a feature fusion sub-module 6203, configured to splice the hidden layer output and module output of the front-end feature extraction network and the output of each convolutional neural network into a feature to be fused, perform multi-scale feature fusion on the feature to be fused through a feature fusion network in the face positioning model, and determine an image feature corresponding to the target face image.

In some embodiments of the present application, the feature fusion network is a bidirectional feature pyramid network, and the feature fusion sub-module 6203 is further configured to:

sequentially sampling the features to be fused from top to bottom through each convolution layer of the bidirectional feature pyramid network forward propagation network branch, and up-sampling the features to be fused from bottom to top through each convolution layer of the bidirectional feature pyramid network backward propagation network branch;

merging the output of the convolutional layer of the forward propagation network branch with the output of the corresponding convolutional layer of the backward propagation network branch;

and performing convolution and regression processing on the fusion result, and outputting the image characteristics corresponding to the target face image.

In some embodiments of the present application, the front-end feature extraction network includes at least a first convolutional layer, a second convolutional layer, a first pooling layer, a third convolutional layer, and a second pooling layer connected in series, where convolutional kernels of the first convolutional layer and the second convolutional layer are greater than convolutional kernel of the third convolutional layer, and the front-end feature extraction sub-module 6201 is further configured to:

the first pooling layer sequentially performs convolution operation on the image data matrix through the first convolution layer and the second convolution layer, and performs feature mapping on the output of the second convolution layer through the first pooling layer to obtain hidden layer output of the front-end feature extraction network;

and performing convolution operation and feature mapping on the hidden layer output sequentially through the third convolution layer and the second pooling layer, and determining the module output of the front-end feature extraction network.

In some embodiments of the present application, the generating of the thermodynamic diagram data of the part of the unoccluded face by using the gaussian mask technique according to the position and the size of the face positioning frame in the mask-free face image includes:

initializing a thermodynamic diagram;

determining the radius of a Gaussian mask according to the size of a face positioning frame in the non-mask human face image;

the center of the face positioning frame is mapped to the center of the thermodynamic diagram through a preset Gaussian kernel, the circular area in the thermodynamic diagram is updated according to the Gaussian mask radius, and the thermodynamic diagram data of the unoccluded face part are determined according to the updated thermodynamic diagram.

In some embodiments of the present application, the non-occluded face part semicircle thermodynamic diagram data generated by adopting the gaussian mask technique according to the position and size of the face positioning frame in the covered face image, and the single color diagram data of the covered part semicircle include:

initializing a thermodynamic diagram;

determining the radius of a Gaussian mask according to the size of a face positioning frame in the face image shielded by the mask;

the center of the face positioning frame is mapped to the center of the thermodynamic diagram through a preset Gaussian kernel, the upper semicircle region in the thermodynamic diagram is updated according to the Gaussian mask radius, the lower semicircle region in the thermodynamic diagram is updated according to the Gaussian mask radius by a preset single color value, and the unoccluded face part thermodynamic diagram data is determined according to the updated thermodynamic diagram.

The face recognition device according to the embodiment of the present application is used for implementing the face recognition method disclosed in the embodiment of the present application, and the specific implementation manner of each module of the device and the corresponding step of the method are as follows, and are not described herein again.

The face positioning device disclosed by the embodiment of the application obtains an image data matrix by preprocessing a target face image; performing feature mapping processing on the image data matrix through a pre-trained face positioning model to determine image features corresponding to the target face image; wherein, the face location model is based on having the gauze mask that is provided with the label to shelter from face image and be provided with the no gauze mask that the label sheltered from face image training, the label includes at least: the real values of the position and the size of the face positioning frame and the thermodynamic diagrams, wherein the thermodynamic diagrams of the non-mask face shielding image comprise: generating thermodynamic diagram data of the part of the unshielded face by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image without the mask; the thermodynamic diagram for covering the face image by the gauze mask comprises: generating semicircle thermodynamic diagram data of the part of the face which is not shielded by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image shielded by the mask, and generating single color diagram data of the semicircle of the part of the mask; carrying out weighted operation on the error between the estimated value and the true value of the face positioning frame by the weight indicated by the thermodynamic diagram through the loss function of the face positioning model, calculating the model error of the face positioning model, and carrying out model training by taking the minimum model error as a target; and carrying out face positioning on the target face image according to the image characteristics, thereby effectively improving the face positioning accuracy.

The face positioning device disclosed by the embodiment of the application sets the thermodynamic diagram label and the face positioning frame label on the face image shielded by the mask and the face image shielded by no mask, constructs a training sample, then, in the model training process, calculates the model error by executing the loss function and taking the thermodynamic diagram as the calculation weight of the error between the true value corresponding to the sample data and the predicted value output by the model, changes the learning direction of the face mask area by the thermodynamic diagram, increases the characteristic weight of the part not shielded by the mask, thereby changing the attention mechanism of the model training and achieving the effect of improving the face positioning accuracy.

Correspondingly, the application also discloses an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the face positioning method according to the first embodiment of the application. The electronic device can be a PC, a mobile terminal, a personal digital assistant, a tablet computer and the like.

The present application also discloses a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the face localization method according to the first embodiment of the present application.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The above detailed description is given to a face positioning method and device provided by the present application, and a specific example is applied in the description to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understanding the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Claims

1. A face localization method, comprising:

preprocessing a target face image to obtain an image data matrix;

2. The method according to claim 1, wherein the step of determining the image features corresponding to the target face image by performing feature mapping processing on the image data matrix through a pre-trained face localization model comprises:

carrying out multilayer convolution operation and feature dimension reduction mapping on the image data matrix through a front-end feature extraction network in a pre-trained face positioning model, and determining hidden layer output and module output of the front-end feature extraction network;

through a plurality of convolutional neural networks connected in series in the face positioning model, carrying out progressively deepened abstract mapping on the module output of the front-end feature extraction network, and respectively determining the output of each convolutional neural network;

and splicing the hidden layer output and module output of the front-end feature extraction network and the output of each convolutional neural network into a feature to be fused, and performing multi-scale feature fusion on the feature to be fused through a feature fusion network in the face positioning model to determine the image feature corresponding to the target face image.

3. The method according to claim 2, wherein the feature fusion network is a bidirectional feature pyramid network, and the step of determining the image features corresponding to the target face image by performing multi-scale feature fusion on the features to be fused through the feature fusion network in the face localization model includes:

4. The method according to claim 2, wherein the front-end feature extraction network at least comprises a first convolutional layer, a second convolutional layer, a first pooling layer, a third convolutional layer and a second pooling layer which are connected in series, wherein convolution kernels of the first convolutional layer and the second convolutional layer are larger than that of the third convolutional layer, and the step of determining hidden layer output and module output of the front-end feature extraction network by performing multi-layer convolution operation and feature dimension reduction mapping on the image data matrix through the front-end feature extraction network in the pre-trained face localization model comprises:

5. The method according to any one of claims 1 to 4, wherein the step of generating the thermodynamic diagram data of the unoccluded face part by adopting the Gaussian mask technology according to the position and the size of the face positioning frame in the non-mask occluded face image comprises the following steps:

initializing a thermodynamic diagram;

6. The method according to any one of claims 1 to 4, wherein the step of generating the semicircle thermodynamic diagram data of the unoccluded face part generated by adopting the Gaussian mask technology according to the position and the size of the face positioning frame in the face image with the mask occlusion and the single color diagram data of the semicircle of the mask part comprises the following steps:

initializing a thermodynamic diagram;

7. A face localization apparatus, comprising:

the image feature extraction module is used for performing feature mapping processing on the image data matrix through a pre-trained face positioning model to determine image features corresponding to the target face image; wherein, the face location model is based on having the gauze mask that is provided with the label to shelter from face image and be provided with the no gauze mask that the label sheltered from face image training, the label includes at least: the real values of the position and the size of the face positioning frame and the thermodynamic diagrams, wherein the thermodynamic diagrams of the non-mask face shielding image comprise: generating thermodynamic diagram data of the part of the unshielded face by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image without the mask; the thermodynamic diagram for covering the face image by the gauze mask comprises: generating semicircle thermodynamic diagram data of the part of the face which is not shielded by adopting a Gaussian mask technology according to the position and the size of a face positioning frame in the face image shielded by the mask, and generating single color diagram data of the semicircle of the part of the mask; carrying out weighted operation on the error between the estimated value and the true value of the face positioning frame by the weight indicated by the thermodynamic diagram through the loss function of the face positioning model, calculating the model error of the face positioning model, and carrying out model training by taking the minimum model error as a target;

8. The apparatus of claim 7, wherein the image feature extraction module further comprises: the front-end feature extraction submodule is used for carrying out multilayer convolution operation and feature dimension reduction mapping on the image data matrix through a front-end feature extraction network in a pre-trained face positioning model, and determining hidden layer output and module output of the front-end feature extraction network;

the feature abstract mapping submodule is used for carrying out progressively deepened abstract mapping on the module output of the front-end feature extraction network through a plurality of convolutional neural networks connected in series in the face positioning model, and respectively determining the output of each convolutional neural network;

and the feature fusion submodule is used for splicing the hidden layer output and the module output of the front-end feature extraction network and the output of each convolutional neural network into features to be fused, performing multi-scale feature fusion on the features to be fused through a feature fusion network in the face positioning model, and determining image features corresponding to the target face image.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the face localization method according to any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the face localization method of any one of claims 1 to 6.