CN110163049B

CN110163049B - Face attribute prediction method, device and storage medium

Info

Publication number: CN110163049B
Application number: CN201810787870.6A
Authority: CN
Inventors: 贺珂珂; 葛彦昊; 邰颖; 汪铖杰; 李季檩; 吴永坚; 黄飞跃
Original assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Priority date: 2018-07-18
Filing date: 2018-07-18
Publication date: 2023-08-29
Anticipated expiration: 2038-07-18
Also published as: CN110163049A

Abstract

The embodiment of the invention discloses a face attribute prediction method, a face attribute prediction device and a storage medium. The method in the embodiment of the invention comprises the following steps: acquiring a face picture to be predicted; carrying out preset processing on the face picture to be predicted to obtain a face processing picture to be predicted; carrying out face attribute calculation according to the face picture to be predicted and the face processing picture to be predicted to obtain a face attribute predicted value of the face picture to be predicted; and predicting the face attribute of the face picture to be predicted according to the face attribute predicted value of the face picture to be predicted. In the embodiment of the invention, the face attribute is predicted by the face picture to be predicted and the corresponding face processing picture, and the face processing picture can be complemented with the face picture, so that the interference of the background on the face attribute prediction is reduced, and the accuracy and the robustness of the face attribute prediction are improved.

Description

Face attribute prediction method, device and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and apparatus for predicting a face attribute, and a storage medium.

Background

Currently, predicting face attributes from face images is becoming more and more interesting. The facial attributes include expression, action unit, sex, age, race, mouth size, nose bridge height, whether glasses are worn, whether sunglasses are worn, eye size, eyes open or close, mouth open or close, hair length or hair style category, face value, front or side, etc. The face attribute prediction technology is widely applied to the fields of man-machine interaction, user modeling and the like.

The existing face attribute prediction is mainly based on a traditional machine learning framework, firstly, manually designed features are extracted, then the dimensions of the features are reduced to obtain compact features, and finally, classification or regression models are used for predicting the face attributes.

The existing face attribute prediction takes an original image as input, and the original image contains a large amount of background noise, such as a market scene/office scene has great difference, different backgrounds easily cause interference to attribute classification, so that the face attribute prediction is inaccurate, and the robustness is poor.

Disclosure of Invention

The embodiment of the application provides a face attribute prediction method, a face attribute prediction device and a storage medium, which reduce interference of a background on face attribute prediction and improve accuracy and robustness of face attribute prediction.

In a first aspect, the present application provides a face attribute prediction method, including:

acquiring a face picture to be predicted;

carrying out preset processing on the face picture to be predicted to obtain a face processing picture to be predicted;

performing face attribute calculation according to the face picture to be predicted and the face processing picture to be predicted to obtain a face attribute predicted value of the face picture to be predicted;

And predicting the face attribute of the face picture to be predicted according to the face attribute predicted value of the face picture to be predicted.

In a second aspect, the present application provides a face attribute prediction apparatus, the apparatus comprising:

the acquisition unit is used for acquiring the face picture to be predicted;

the picture processing unit is used for carrying out preset processing on the face picture to be predicted to obtain a face processing picture to be predicted;

the computing unit is used for carrying out face attribute computation according to the face picture to be predicted and the face processing picture to be predicted to obtain a face attribute predicted value of the face picture to be predicted;

and the prediction unit is used for predicting the face attribute of the face picture to be predicted according to the face attribute prediction value of the face picture to be predicted.

Further, the splicing subunit is specifically configured to:

and splicing the first face feature data and the second face feature data to acquire the face splicing data.

Further, the splicing subunit is specifically configured to:

normalizing the first face feature data and the second face feature data so that the first face feature data and the second face feature data are in the same order of magnitude;

And splicing the first face characteristic data and the second face characteristic data which are in the same order of magnitude to obtain the face splicing data.

Further, the device further comprises a training unit, wherein the training unit is specifically configured to:

before the face picture to be predicted and the face processing picture to be predicted are respectively input into a preset multichannel neural network model, acquiring a plurality of sample face pictures, and acquiring a face attribute true value corresponding to each sample face picture;

acquiring a sample face processing diagram corresponding to each sample face picture;

adding the sample face picture, the sample face processing picture and the face attribute true value corresponding to each sample face picture into a training sample data set;

and training a preset multichannel neural network by using the training sample data set to obtain the multichannel neural network model.

Further, the face processing diagram to be predicted includes a face abstract diagram to be predicted, and the picture processing unit is specifically configured to:

and carrying out abstract processing on the face picture to be predicted to obtain a face abstract picture to be predicted.

Further, the acquisition unit includes:

the acquisition subunit is used for acquiring the original picture;

And the detection subunit is used for carrying out face detection on the original picture so as to carry out face correction processing on the original picture, and obtaining the face picture to be predicted.

Further, the detection subunit is specifically configured to:

performing face detection on the original picture to determine a face area;

determining a preset number of face key points in the face area;

carrying out face correction on the original picture according to the face key points to obtain a face correction chart;

and carrying out size adjustment on the face correction picture according to a preset size to obtain the face picture to be predicted.

In a third aspect, the present invention also provides a storage medium storing a plurality of instructions adapted to be loaded by a processor for execution of the steps of the face attribute prediction method of any one of the first aspects.

The embodiment of the invention obtains the face picture to be predicted; carrying out preset processing on the face picture to be predicted to obtain a face processing picture to be predicted; carrying out face attribute calculation according to the face picture to be predicted and the face processing picture to be predicted to obtain a face attribute predicted value of the face picture to be predicted; and predicting the face attribute of the face picture to be predicted according to the face attribute predicted value of the face picture to be predicted. According to the embodiment of the invention, the face attribute is predicted through the face picture to be predicted and the corresponding face processing picture, the face processing picture can be complementary with the information in the face picture, the interference of the background on the face attribute prediction when the face attribute is predicted by the single face picture is reduced, and the accuracy and the robustness of the face attribute prediction are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a schematic diagram of one embodiment of a face recognition system provided in an embodiment of the present invention;

FIG. 2 is a flowchart of an embodiment of a face attribute prediction method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an embodiment architecture of a multi-channel neural network provided in an embodiment of the present invention;

FIG. 4 is a flowchart of another embodiment of a face attribute prediction method according to an embodiment of the present invention;

FIG. 5 is a schematic view of an embodiment of a face attribute prediction apparatus provided in an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

In the description that follows, embodiments of the invention will be described with reference to steps and symbols performed by one or more computers, unless otherwise indicated. Thus, these steps and operations will be referred to in several instances as being performed by a computer, which as referred to herein performs operations that include processing units by the computer that represent electronic signals that represent data in a structured form. This operation transforms the data or maintains it in place in the computer's memory system, which may reconfigure or otherwise alter the computer's operation in a manner well known to those skilled in the art. The data structure maintained by the data is the physical location of the memory, which has specific characteristics defined by the data format. However, the principles of the present invention are described in the foregoing text and are not meant to be limiting, and one skilled in the art will recognize that various steps and operations described below may also be implemented in hardware.

The term "module" as used herein may be considered as a software object executing on the computing system. The various components, modules, engines, and services described herein may be viewed as implementing objects on the computing system. The apparatus and methods described herein are preferably implemented in software, but may of course also be implemented in hardware, all within the scope of the invention.

The embodiment of the invention provides a face attribute prediction method, a face attribute prediction device and a storage medium.

Referring to fig. 1, fig. 1 is a schematic diagram of a face recognition system according to an embodiment of the present invention, where, as shown in fig. 1, the face recognition system includes a terminal and a server, and the terminal and the server are connected through a network to perform data interaction. The terminal can be used as image acquisition equipment, can acquire images and convert the acquired images into a computer-readable form, and can also be used as identification equipment to carry out face recognition on the acquired images so as to obtain effective information of the images. The server can also be used as identification equipment for identifying the face of the image acquired by the terminal. The server may also provide targeted services based on the recognition results, for example, authentication may be performed through face recognition or payment may be completed.

The face recognition system may have a face capturing and tracking function, a face modeling and retrieving function, a real person identifying function, and an image quality detecting function, which will not be described in detail herein. In the embodiment of the invention, the face recognition system has the function of recognizing the face attribute of the face image, specifically, the face recognition system can comprise a face attribute prediction device which can be integrated in a server, namely, the server in fig. 1, and the server is mainly used for acquiring the face image to be predicted; carrying out preset processing on the face picture to be predicted to obtain a face processing picture to be predicted; carrying out face attribute calculation according to the face picture to be predicted and the face processing picture to be predicted to obtain a face attribute predicted value of the face picture to be predicted; and predicting the face attribute of the face picture to be predicted according to the face attribute predicted value of the face picture to be predicted.

In the embodiment of the invention, the facial attribute may include expression, action unit, sex, age, race, mouth size, nose bridge height, whether glasses are worn, whether sunglasses are worn, eye size, eyes are open or closed, mouth open or closed, hair length or hairstyle category, face value, front or side, and the like.

In the embodiment of the invention, the server can be a server, a server cluster formed by a plurality of servers, or a cloud computing service center.

It should be noted that, the schematic view of the face recognition system shown in fig. 1 is only an example, and the face recognition system and the scene described in the embodiment of the present invention are for more clearly describing the technical solution of the embodiment of the present invention, and do not constitute a limitation on the technical solution provided by the embodiment of the present invention, and those skilled in the art can know that, with the evolution of the face recognition system and the appearance of a new service scene, the technical solution provided by the embodiment of the present invention is equally applicable to similar technical problems.

The following describes in detail specific embodiments.

In the present embodiment, description will be made from the viewpoint of a face attribute prediction apparatus, which may be integrated in a terminal or a server in particular.

The invention provides a face attribute prediction method, which comprises the following steps: acquiring a face picture to be predicted; carrying out preset processing on the face picture to be predicted to obtain a face processing picture to be predicted; carrying out face attribute calculation according to the face picture to be predicted and the face processing picture to be predicted to obtain a face attribute predicted value of the face picture to be predicted; and predicting the face attribute of the face picture to be predicted according to the face attribute predicted value of the face picture to be predicted.

Referring to fig. 2, an embodiment of a face attribute prediction method according to an embodiment of the present invention includes:

101. and obtaining a face picture to be predicted.

The face picture to be predicted mainly refers to a picture which reaches a preset requirement after the original picture is processed. In the embodiment of the invention, there are various ways to obtain the original picture, for example, the original picture can be obtained by shooting a large number of faces of users, or the original picture can be obtained by searching the face picture on the internet, or the original picture can be obtained from a face picture database, etc. When the face of the user is shot to acquire the original picture, a plurality of face pictures of the same user can be shot to acquire the original picture.

It should be noted that, after the above-mentioned original pictures are obtained by taking the faces of the users, searching the internet, and obtaining the face database, the obtained original pictures may have the face portions at the corners (such as the lower right corner) or the sides of the images, and the obtained original pictures have different sizes, so that it is necessary to uniformly process the obtained original pictures to obtain the face pictures to be predicted. Further, the step of obtaining the face picture to be predicted may further include: acquiring an original picture; and carrying out face detection on the original picture so as to carry out face correction processing on the original picture, and obtaining the face picture to be predicted.

Further, the step of performing face detection on the original picture to perform face correction processing on the original picture to obtain a face picture to be predicted may include: performing face detection on the original picture to determine a face area; determining a preset number of face key points in a face area; carrying out face correction on the original picture according to the face key points to obtain a face correction chart; and carrying out size adjustment on the face correction picture according to the preset size to obtain the face picture to be predicted. It can be understood that the original picture is an original picture containing a human face, only the original picture containing the human face can be processed to obtain a picture of the human face to be predicted, and the picture without the human face can be directly discarded without subsequent processing.

In the embodiment of the invention, the face detection of the original picture can be performed by using an AdaBoost classifier (self-adaptive enhancement classifier) or a deep learning face detection algorithm to determine a face region and a preset number of face key points in the face region. The algorithm for determining the key points of the face by using the AdaBoost classifier and the method for determining the key points of the face by using the deep learning algorithm are all conventional technical means, and are not described herein.

After determining the face key points in the original picture, the face key points can carry out face correction on the original picture to obtain a face correction picture, for example, if the face key points comprise the key points of the eyebrows on two sides in the face key points, the face key points on two sides in the face key points can be connected into a straight line according to the face key points on two sides in the face key points, if the straight line is inclined relative to the upper and lower boundaries of the pattern, the original picture is adjusted to enable the straight line to be parallel to the upper and lower boundaries of the pattern, and finally, the size of the sample face correction picture is adjusted according to a preset size (for example, 50 x 50), so that the face picture to be predicted is obtained. It should be noted that, the size of the original picture generally needs to be larger than the preset size, so that the size adjustment is convenient, and of course, in some embodiments, the size of the original picture may also be smaller than the preset size, where the size adjustment is needed after the amplification processing of the original picture by a preset multiple is performed.

102. And carrying out preset processing on the face picture to be predicted to obtain a face processing picture to be predicted.

Specifically, performing preset processing on the face image to be predicted, and obtaining the face image to be predicted may include: and processing the face picture to be predicted by using a preset processing diagram generation model to obtain a face processing diagram to be predicted. The preset processing may be abstract processing, gray level processing, preset color level adjustment, preset brightness adjustment, preset channel processing, etc., and the corresponding face processing diagram to be predicted may be a face extraction diagram to be predicted, a face gray level diagram to be predicted, a face color level adjustment diagram to be predicted, a face brightness adjustment diagram to be predicted, or a face channel processing diagram to be predicted, etc., which may be specifically set according to actual needs.

It should be noted that, in the embodiment of the present invention, the face processing diagram to be predicted may include one or more of a plurality of types of face processing diagrams to be predicted, and may be specifically set according to requirements of practical applications, for example, the face processing diagram to be predicted is one of a face extraction diagram to be predicted or a face gray-scale diagram to be predicted. In other embodiments of the present invention, the face processing map to be predicted may further include other types of face processing maps to be predicted, for example, face processing maps to be predicted obtained by performing only some color level adjustment, brightness adjustment, and channel processing (such as removing one color channel from RGB channels) on the face image to be predicted.

In one embodiment, the face processing diagram to be predicted includes a face extraction diagram to be predicted, and at this time, the face processing diagram to be predicted is subjected to preset processing to obtain the face processing diagram to be predicted, which may include: and carrying out abstract processing on the face picture to be predicted to obtain a face abstract picture to be predicted. The processing map generating model in the step of obtaining the face processing map to be predicted may be an abstract map generating model, that is, the face image to be predicted may be subjected to abstract processing by using the preset abstract map generating model, so as to obtain the face abstract map to be predicted, and the training mode of the abstract map generating model may refer to the training mode of the following processing map generating model, which is not described herein.

When the face processing image to be predicted includes a plurality of face processing images, a preset processing image generating model is utilized to process the face image to be predicted, a plurality of corresponding processing image generating models can be included in the step of obtaining the face processing image to be predicted, and the setting can be specifically performed according to the actual application requirements, for example, the face processing image to be predicted includes a face extraction image to be predicted and a face gray image to be predicted, the processing image generating models can include an abstract image generating model and a gray image generating model, and the face image to be predicted is respectively processed by utilizing the preset abstract image generating model and the gray image generating model to obtain the face extraction image to be predicted and the face gray image to be predicted. Of course, for the face processing diagram to be predicted, it is not limited that each face processing diagram to be predicted corresponds to a processing diagram generation model, and some face processing diagrams to be predicted can be processed through a program, for example, R channels in RGB channels (i.e., red channel, green channel and blue channel) in the face picture to be predicted are removed through an algorithm, so as to obtain the face processing diagram to be predicted only including image information of G channel and B channel, and the mode of obtaining the face processing diagram to be predicted through the face picture to be predicted is not limited specifically herein.

Further, the processing of the face image to be predicted by using the preset processing diagram generating model, and the training of the processing diagram generating model in the step of obtaining the face processing diagram to be predicted may be performed in the following manner: acquiring a plurality of sample face data, wherein the plurality of sample face data comprise a plurality of sample face pictures and data values of a plurality of corresponding sample face processing pictures; and training a preset processing diagram neural network by using the plurality of sample face data to obtain a processing diagram generation model.

The digital image data can be represented by a matrix, so that matrix theory and matrix algorithm can be adopted to analyze and process the digital image, and matrix representation of the image is obtained. The most typical example is a gray image, the pixel data of which is a matrix, the rows of the matrix correspond to the high (in pixels) of the image, the columns of the matrix correspond to the wide (in pixels) of the image, the elements of the matrix correspond to the pixels of the image, and the values of the matrix elements are the gray values of the pixels. In the embodiment of the invention, the sample face image and the sample face processing image can be matrix representations of the sample face image, and the matrix is used for representing the digital image, so that the characteristic of rows and columns of the image is met, and the addressing operation of a program is convenient, so that the programming of the computer image is very convenient. The data values and predicted values of the sample face processing map described below may be matrix representations of the images.

In the embodiment of the present invention, training a preset processing diagram neural network by using the plurality of sample face data may include:

(1) And inputting each sample face picture into a processing picture neural network to obtain a predicted value of each sample face processing picture.

The processing graph neural network is pre-constructed, the processing graph neural network can be a convolutional neural network (CNN, convolutional Neural Network), the processing graph neural network can comprise an encoder and a decoder, and for each sample face picture, the sample face picture firstly enters the encoder to obtain high-level characteristics. Specifically, the decoder is a continuous convolution operation, and is used for continuously mapping the input picture into the high-level feature space to obtain the high-level features. The decoder is used for restoring the high-layer characteristics finally obtained by the encoder and generating a sample face processing diagram. Specifically, the decoder is continuously deconvoluted, so that the length and width of the high-level feature are continuously increased, and finally the sample face processing diagram with the same size as the sample face picture is output, and of course, the matrix representation of the sample face processing diagram, namely the predicted value of the sample face processing diagram, can be obtained.

(2) And converging the predicted value and the data value of each sample face processing diagram to obtain a processing diagram generation model.

For example, a preset loss function may be specifically adopted to converge the predicted value and the data value of each sample face processing map, so as to obtain a processing map generation model. The loss function can be flexibly set according to actual application requirements, for example, the loss function can be a cross entropy loss function. And continuously training by reducing the error between the predicted value and the data value of each sample face processing diagram so as to adjust the parameters of the processing diagram neural network to proper values, thereby obtaining the processing diagram generation model. Specifically, calculating a predicted value and a data value of each sample face treatment graph according to a preset loss function to obtain loss values of a plurality of sample face treatment graphs; and adjusting parameters of the processing diagram neural network until the loss values of the plurality of sample face processing diagrams are smaller than or equal to a preset threshold value due to the adjusted parameters, stopping adjusting to obtain a processing diagram generation model, wherein a specific processing diagram generation model, such as an abstract diagram generation model or a gray diagram generation model, can be directly referred to a training mode of the processing diagram generation model, and will not be described in an exemplary manner.

103. And carrying out face attribute calculation according to the face picture to be predicted and the face processing picture to be predicted to obtain a face attribute predicted value of the face picture to be predicted.

Specifically, performing face attribute calculation according to the face picture to be predicted and the face processing picture to be predicted, and obtaining the face attribute predicted value of the face picture to be predicted may include:

(1) And extracting the characteristics of the face picture to be predicted and the face processing picture to be predicted respectively to acquire a plurality of face characteristic data.

The extracting the face image to be predicted and the face processing image feature to be predicted to obtain the plurality of face feature data may further include: extracting face features of a face picture to be predicted to obtain first face feature data, wherein the first face feature data comprises a face attribute predicted value of the face picture to be predicted; extracting the face characteristics of the face processing diagram to be predicted to obtain second face characteristic data, wherein the second face characteristic data comprises a face attribute predicted value of the face processing diagram to be predicted. At this time, the plurality of face feature data includes the first face feature data and the second face feature data. The step of extracting the face features of the face picture to be predicted to obtain first face feature data and the step of extracting the face features of the face processing picture to be predicted to obtain second face feature data can be realized through a plurality of neural sub-network models in a preset multi-channel network model, and the detailed process can refer to related contents described by the following multi-channel neural network model.

(2) And splicing the face characteristic data to obtain face splicing data.

When the plurality of face feature data includes the first face feature data and the second face feature data, the step of stitching the plurality of face feature data to obtain face stitching data may further include: and splicing the first face characteristic data and the second face characteristic data to obtain face splicing data.

Further, stitching the first face feature data and the second face feature data to obtain face stitching data may include: normalizing the first face feature data and the second face feature data so that the first face feature data and the second face feature data are in the same order of magnitude; and splicing the first face characteristic data and the second face characteristic data which are in the same order of magnitude to obtain face splicing data. For example, assuming that the first face feature data includes 2048 vectors and the second face feature data includes 1024 vectors, normalization processing is performed on the first face feature data and the second face feature data, and both the first face feature data and the second face feature data are adjusted to 2048 vectors. And after the first face feature data and the second face feature data which are in the same order of magnitude are spliced, 4096 vectors of sample spliced data are obtained.

(3) And linearly transforming the face stitching data to obtain a face attribute predicted value of the sample face picture.

To facilitate the prediction of the following specific face attributes, a linear transformation may be performed on the face stitching data, and the linear transformation may be performed using a linear function, for example y=a×x+b. Wherein Y is the output, X is the input, and a and b are parameters. For example, input X is a 10-dimensional vector, a is a predetermined 4X 10 matrix, and b is a constant. Such as 0.2. Matrix multiplication is performed first, and the constant b is added to the obtained result to obtain an output Y. For example, assume that 20 vectors are obtained after linear transformation is performed on the face stitching data, and the 20 vectors are 20 face attribute predicted values of face prediction.

Further, the step of calculating the face attribute according to the face image to be predicted and the face processing image to be predicted to obtain the face attribute predicted value of the face image to be predicted can be realized according to a preset multichannel neural network model. Specifically, performing face attribute calculation according to the face picture to be predicted and the face processing picture to be predicted to obtain a face attribute predicted value of the face picture to be predicted, which may include: and respectively inputting the face picture to be predicted and the face processing picture to be predicted into a preset multichannel neural network model to obtain a face attribute predicted value of the face picture to be predicted.

The preset multi-channel neural network model is a neural network model including a plurality of input channels, and may be set according to requirements of practical applications, for example, the preset multi-channel neural network model may include a plurality of neural sub-network models, each of which corresponds to an input channel, where the plurality of neural sub-network models have the same structure but different parameter settings. The number of the neural subnetwork models is the same as the total number of face pictures to be predicted and face processing pictures to be predicted, the face pictures to be predicted correspond to one neural subnetwork model, each picture in the face processing pictures to be predicted corresponds to one neural subnetwork model, for example, 1 face picture to be predicted and 2 face processing pictures to be predicted, and then the number of the neural subnetwork models included in the multichannel neural network model is 3, namely, 3 neural subnetwork models.

Specifically, the multi-channel neural network model may be obtained by training a preset multi-channel neural network, as shown in fig. 3, the structure of the multi-channel neural network may include: input layer, convolution layer, pooling layer, specification layer, full connection layer and loss layer. The method comprises the following steps:

Input layer: is responsible for accepting picture input, such as training samples (such as sample face pictures and sample face treatment pictures) or pictures requiring face attribute recognition (such as input of face pictures to be predicted and input of face treatment pictures to be predicted), and the like.

The multichannel neural network includes a plurality of neural subnetwork models therein, each of which includes a convolutional layer and a pooling layer, which are alternately performed (i.e., the convolutional layer and the pooling layer are alternately disposed). The structure of a plurality of sub-networks is identical, but the parameters of the two networks are independent. Thus, the face picture to be predicted and the face processing picture to be predicted can be sufficiently learned respectively.

Convolution layer: the method is mainly used for carrying out feature extraction (namely mapping original data to a hidden layer feature space) on an input image (such as a training sample or an image to be identified), wherein the size of a convolution kernel can be determined according to practical application, for example, the size of a first layer convolution layer laminated kernel is set to be (5, 5), the size of the first layer convolution layer laminated kernel is set to be (3, 3), and optionally, in order to reduce the complexity of calculation and improve the calculation efficiency, the convolution kernel sizes of all convolution layers in a neural sub-network model can be set to be (3, 3); the convolutional layer acts as a detector of features so that the neural subnetwork model can obtain from low-level features to high-level features. Optionally, in order to improve the expressive power of the model, a nonlinear factor may also be added by adding a nonlinear activation function, and in this embodiment of the present invention, the back of the convolution layers are all connected to a nonlinear activation function, which may be specifically set according to the actual application requirement, for example, the nonlinear activation function may be f (x) =max (0, x).

Pooling layer: the pooling layer is used for merging adjacent areas, so that the neural sub-network model can tolerate certain deformation, and the robustness of the model is improved. The pooling layer operates substantially the same as the convolution layer, except that the convolution kernel of the pooling layer takes only the maximum value (max pooling) or average value (average pooling) of the corresponding position, etc.

Specification layer: and splicing the output features in the multiple neural sub-network models through the specification layers, wherein the specification layers normalize the features, so that the output features in the multiple neural sub-network models have the same order of magnitude, and the fusion of the features is facilitated.

Full tie layer: and then inputting the fused features into a full-connection layer, wherein the full-connection layer can map the learned distributed feature representation to a sample marking space, the full-connection layer mainly plays a role of a classifier in the whole convolutional neural network, each node of the full-connection layer is connected with all nodes output by the upper layer (such as the last pooling layer in a neural sub-network model), one node of the full-connection layer is called one neuron in the full-connection layer, and the number of the neurons in the full-connection layer can be determined according to the practical application requirement, for example, the number of the neurons of the full-connection layer can be set to be 2048. Because the multi-channel neural network model comprises a plurality of neural sub-network models, each neural sub-network model can output a plurality of vectors, the number of the vectors is consistent with that of neurons of the neural sub-network model, and a full-connection layer splices the plurality of vectors to obtain a fused vector, for example, the multi-channel neural network model comprises a first neural sub-network model and a second neural sub-network model, and 2048 vectors can be respectively output if the first neural sub-network model and the second neural sub-network model; the full connection layer splices the output vector of the first neural sub-network model and the output vector of the second neural sub-network model to obtain 4096 vectors.

Optionally, in the full connection layer, after the fused vector is obtained, a linear transformation is performed on the fused vector, so that the prediction of the following specific face attribute is facilitated. The linear transformation operation may be performed using a linear function, such as the linear function y=a x+b described above.

Loss layer: the loss layer is used for training the multichannel neural network to obtain a multichannel neural network model, and is particularly used for calculating and comparing the difference between the predicted value of the face attribute and the true value of the face attribute, and continuously correcting and optimizing parameters in the multichannel neural network through a back propagation algorithm to obtain the multichannel neural network model, wherein the loss function can adopt a cross entropy loss function.

The multichannel neural network can also comprise an output layer, which is used for outputting the result of the full-connection layer when predicting the face attribute.

Before the face picture to be predicted and the face processing picture to be predicted are respectively input into the preset multichannel neural network model, training is further needed to obtain the multichannel neural network model in the embodiment of the invention, and at this time, the method in the embodiment of the invention further comprises the following steps: collecting a plurality of sample face pictures, and obtaining a face attribute true value corresponding to each sample face picture; acquiring a sample face processing diagram corresponding to each sample face picture; adding the sample face picture, the sample face processing picture and the face attribute true value corresponding to each sample face picture into a training sample data set; training a preset multichannel neural network by using the training sample data set to obtain a multichannel neural network model.

In the embodiment of the invention, a certain original sample face picture can be acquired, and corresponding face attribute marking is carried out on each picture, and the marked attributes cover the facial features and the hair of the face, and specifically comprise: hair length, hair curls, eyebrows are thick, eyes are large, nose is large, chin is small, smiles are small, caps are worn, masks are worn, sunglasses are worn, make-up is young, and the like. Likewise, the original sample face picture can be acquired by shooting a large number of faces of a user, or can be obtained by searching the face picture on the internet, or can be acquired from a face picture database, and the like. When the user face is shot to acquire the plurality of sample face pictures, the same user face picture can be shot in a plurality of ways of face pictures and the like to acquire the original sample face picture. It should be noted that, after the original sample face picture is obtained by means of shooting the face of the user, searching the internet, obtaining the face database, etc., the picture size and specification may not be uniform, and the original sample face picture may be processed by referring to the manner of processing the picture to be predicted, so as to obtain a plurality of sample face pictures, which are not described herein again.

In the embodiment of the invention, the true value of the face attribute corresponding to the sample face picture is the preset correct face attribute value of the sample face picture, for example, a certain sample face picture comprises a female face, when in test, the neutral attribute of the sample face picture can be manually determined to be female, 0 is used for representing a male, 1 is used for representing a female, and the true value of the neutral attribute of the sample face picture is 1.

In other embodiments of the present invention, the obtaining of the sample face processing map corresponding to each sample face picture may also be performed by the face attribute prediction apparatus itself. Specifically, the obtaining a sample face processing map corresponding to each sample face picture may include: obtaining a sample face picture, and processing the sample face picture by using a preset processing picture generation model to obtain a sample face processing picture. The specific process of training the process map generating model is as described in step 102, and is not described herein.

It should be noted that, in the embodiment of the present invention, a sample face processing diagram corresponding to each sample face picture is obtained, where the sample face processing diagram may include multiple types of sample face processing diagrams, and may be specifically set according to requirements of practical applications, for example, a sample face extraction diagram, a sample face gray-scale diagram, and the like. Other types of sample face treatment diagrams may be further included in other embodiments of the present invention, for example, sample face treatment diagrams obtained by performing only some color level adjustment, brightness adjustment, and channel treatment (such as removing one color channel from an RGB channel) on the sample face picture, such as a sample face color level adjustment diagram, a sample face brightness adjustment diagram, or a sample face channel treatment diagram.

In one embodiment, the sample face processing map includes a sample face extraction map, and the processing map neural network may be an abstract map neural network, and the processing map generating model may be an abstract map generating model, that is, a sample face image may be processed by using a preset abstract map generating model to obtain the sample face extraction map, and a training manner of the abstract map generating model may refer to a training manner of the processing map generating model, which is not described herein. When the sample face processing map includes a plurality of corresponding processing map generating models, the embodiment of the invention may also include a plurality of corresponding processing map generating models, and specifically may be set according to requirements of practical applications, for example, the sample face processing map includes a sample face extraction map and a sample face gray scale map, and may include an abstract map generating model and a gray scale map generating model, and the sample face image is respectively processed by using a preset abstract map generating model and a preset gray scale map generating model to obtain the sample face extraction map and the sample face gray scale map. Of course, for the sample face processing map, it is not limited that each sample face processing map corresponds to a processing map generation model, and some sample face processing maps can be processed by a program, for example, the R channel in the RGB channels (i.e., the red channel, the green channel and the blue channel) in the sample face image is removed by programming, so as to obtain a sample face processing map only including the image information of the G channel or the B channel, and the sample face processing map is obtained by the sample face image, which is not limited specifically herein.

Specifically, when the sample face picture, the sample face processing picture and the face attribute true value corresponding to each sample face picture are added into the training sample data set, each sample face picture, the corresponding sample face processing picture and the face attribute true value corresponding to the sample face picture form a training sample, namely each training sample comprises the sample face picture, the sample face processing picture and the face attribute true value corresponding to the sample face picture. The training sample data set comprises a plurality of such training samples.

Further, training a preset multichannel neural network by using the training sample data set, the obtaining a multichannel neural network model may specifically include:

(1) Training a plurality of neural sub-network models of the multichannel neural network by utilizing a training sample data set to obtain a face attribute predicted value of each sample face picture;

the training the multiple neural sub-network models of the multichannel neural network by using the training sample data set to obtain the face attribute predicted value of each sample face picture may specifically include: respectively taking a sample face picture in the training sample data set as a target sample face picture, and respectively inputting the target sample face picture and a corresponding target sample face processing picture into a plurality of neural sub-network models of the multichannel neural network to obtain a plurality of sample face feature data; splicing the plurality of sample face characteristic data to obtain sample splicing data; and carrying out linear transformation on the sample spliced data to obtain a face attribute predicted value of the sample face picture. And the full-connection layer in the multichannel neural network is used for splicing the plurality of sample face characteristic data so as to acquire sample splicing data.

When the sample face processing diagram only includes one sample face processing diagram, the multiple neural subnetworks only include a first neural subnetwork model and a second neural subnetwork model, one neural subnetwork model is used for processing the sample face diagram, and one neural subnetwork model is used for processing the sample face processing diagram, at this time, the target sample face diagram and the corresponding target sample face processing diagram are respectively input into the multiple neural subnetwork models of the multichannel neural network, so as to obtain multiple sample face feature data, including: inputting a target sample face picture into a first neural sub-network model to obtain first sample face feature data, wherein the first sample face feature data comprises a face attribute predicted value of the target sample face picture; and inputting the target sample face processing diagram into a second neural sub-network model to obtain second sample face feature data, wherein the second sample face feature data comprises a face attribute predicted value of the target sample face processing diagram.

The process of obtaining the first sample face feature data and the second sample face feature data through the first neural sub-network model and the second neural sub-network model may specifically refer to the process of obtaining the first face feature data and the second face feature data by referring to the face picture to be predicted described below, which is not described herein.

In addition, the face abstract map contains abundant position information and texture information of the face, which is favorable for the fine recognition of the face attribute, so that the sample face abstract map is preferably included in the sample face processing map, and the sample face abstract map can be preferably selected when the sample face processing map only comprises one sample face processing map.

Further, the step of stitching the plurality of sample face feature data to obtain sample stitching data may include: carrying out normalization processing on the first sample face feature data and the second sample face feature data so that the first sample face feature data and the second sample face feature data are in the same order of magnitude; and splicing the first sample face characteristic data and the second sample face characteristic data which are in the same order of magnitude to obtain sample spliced data. For example, assuming that the first sample face feature data includes 2048 vectors and the second sample face data includes 1024 vectors, normalization processing is performed on the first sample face feature data and the second sample face feature data, and both the first sample face feature data and the second sample face feature data are adjusted to 2048 vectors. And after the first face feature data and the second face feature data which are in the same order of magnitude are spliced, 4096 vectors of sample spliced data are obtained.

It should be noted that, in the above description, the multi-channel neural network includes only two neural sub-network models, it may be understood that, in other embodiments of the present invention, the neural sub-network models may further include more neural sub-network models, for example, further include a third neural sub-network model, where the sample face processing map may include a first sample face processing map and a second sample face processing map, and the target sample face image and the corresponding target sample face processing map are input to the multiple neural sub-network models of the multi-channel neural network respectively, so as to obtain multiple sample face feature data, including: inputting a target sample face picture into a first neural sub-network model to obtain first sample face feature data, wherein the first sample face feature data comprises predicted values of a plurality of face attributes of the face picture; inputting the first sample face processing diagram to a second neural sub-network model to obtain second sample face feature data, wherein the second sample face feature data comprises predicted values of a plurality of face attributes of the first face processing diagram; inputting the second sample face processing diagram to a third neural sub-network model to obtain third sample face feature data, wherein the third sample face feature data comprises predicted values of a plurality of face attributes of the second face processing diagram;

At this time, the step of stitching the face feature data to obtain sample stitching data includes: and splicing the first sample face feature data, the second sample face feature data and the third face feature data to obtain sample splicing data. The specific implementation process of acquiring the plurality of sample face feature data and splicing the plurality of sample face feature data may refer to the case that the multichannel neural network only includes two neural sub-network models, which is not described herein again.

In order to facilitate the prediction of the following specific face attribute, a linear transformation may be performed on the sample spliced data to obtain a face attribute predicted value of each sample face picture, specifically, a linear transformation may be performed by using a linear function described in the above full connection layer, and 20 vectors are obtained after the linear transformation, where the 20 vectors are the 20 face attribute predicted values of the face prediction.

(2) And converging the face attribute predicted value and the face attribute true value to obtain the multichannel neural network model.

For example, the loss layer may be used to converge the face attribute predicted value and the face attribute actual value, specifically, a preset loss function is used to converge the predicted value and the data value of each sample face processing map, so as to obtain the multichannel neural network model. The loss function can be flexibly set according to actual application requirements, for example, the loss function can be a cross entropy loss function. And continuously training by reducing the error between the face attribute predicted value and the face attribute true value corresponding to each sample face picture so as to adjust the parameters of the multichannel neural network to proper values, thereby obtaining the multichannel neural network model. Specifically, the face attribute true value and the face attribute true value corresponding to each sample face picture are calculated according to a preset loss function, so that loss values of face prediction of a plurality of sample face pictures are obtained; and adjusting parameters of the multichannel neural network until the adjusted parameters enable the loss value of face prediction of the plurality of sample face pictures to be smaller than or equal to a preset threshold value, and stopping adjusting to obtain a multichannel neural network model.

Based on the structure of the multi-channel neural network model, when the face processing diagram to be predicted only includes one face processing diagram to be predicted, the multiple neural sub-networks include only a first neural sub-network model and a second neural sub-network model, one neural sub-network model is used for processing the face picture to be predicted, and one neural sub-network model is used for processing the face processing diagram to be predicted, at this time, the face picture to be predicted and the face processing diagram to be predicted are respectively input into the preset multi-channel neural network model to obtain the face attribute predicted value of the face picture to be predicted, which specifically may include: inputting a face picture to be predicted into a first neural sub-network model to acquire first face feature data, wherein the first face feature data comprises a face attribute predicted value of the face picture to be predicted; and inputting the face processing diagram to be predicted into a second neural sub-network model to acquire second face feature data, wherein the second face feature data comprises a face attribute predicted value of the face processing diagram to be predicted.

The inputting the face picture to be predicted into the first neural sub-network model to obtain the first face feature data may include: extracting face characteristics in a face picture to be predicted; and forward computing the face characteristics of the face picture to be predicted by utilizing the sub-model corresponding to different face attributes in the first neural sub-network model to obtain first face characteristic data. Similarly, inputting the face processing diagram to be predicted into the second neural subnetwork model to obtain second face feature data may include: extracting face characteristics in a face processing diagram to be predicted; and forward computing the face features of the face processing diagram to be predicted by utilizing the sub-model corresponding to different face attributes in the second neural sub-network model to obtain second face feature data.

In each neural sub-network model, the attributes to be identified of the attribute sub-models corresponding to different attributes are different, and the features of the face to be extracted are also different, for example, the attribute of the hairstyle is identified, and only the outline of the face and the position coordinates of the hair are required to be extracted. The face attribute prediction device can extract face features corresponding to corresponding attributes from the face picture to be predicted and the face processing picture to be predicted through attribute sub-models corresponding to different attributes in each neural sub-network model, and forward calculation is carried out on the face features to obtain predicted values of a plurality of attributes of the face picture to be predicted. In the attribute sub-models corresponding to different attributes in each neural sub-network model, initial parameters can be set, for any attribute sub-model in a plurality of attribute sub-models, the attribute sub-model can utilize the parameters and the extracted face features to perform forward calculation, the calculated predicted values of the attributes corresponding to the attribute sub-model are output, the predicted values of all the attributes of the face picture to be predicted are the face attribute predicted values corresponding to the face picture to be predicted, and the predicted values of all the attributes of the face processing picture to be predicted are the face attribute predicted values corresponding to the face processing picture to be predicted.

In one possible implementation, the forward computation may be performed by: in each neural sub-network model, the face feature calculation process can have multi-layer calculation, the last layer calculates the input features, the obtained output is used as the input of the next layer, and the like, the output of the last layer is obtained, and the predicted value of the corresponding attribute of each attribute sub-model is determined based on the output of the last layer. In the multi-layer calculation process, the product of the input and the weight can be calculated in each layer, and then the sum of the product and the offset value can be calculated, and the sum can be used as the output. Of course, the multi-layer calculation is only illustrated by way of example, and other calculations may be included in the calculation process, which will not be repeated in the present invention.

In addition, the face abstract map contains abundant position information and texture information of the face, which is favorable for the fine recognition of the face attribute, so that the face abstract map to be predicted is preferably included in the face processing map to be predicted, and when the face processing map to be predicted only comprises one face processing map to be predicted, the face processing map to be predicted can be the face abstract map to be predicted.

It should be noted that, in the above description, the multi-channel neural network includes only two neural sub-network models, it may be understood that, in other embodiments of the present invention, the neural sub-network models may further include more neural sub-network models, for example, further include a third neural sub-network model, and in this case, the face processing diagram to be predicted may include a first face processing diagram to be predicted and a second face processing diagram to be predicted, and the face processing diagram to be predicted are respectively input into the preset multi-channel neural network model, so as to obtain the face attribute prediction value of the face picture to be predicted, which may include: inputting a face picture to be predicted into a first neural sub-network model to acquire first face feature data, wherein the first face feature data comprises predicted values of a plurality of face attributes of the face picture to be predicted; and respectively inputting the first face processing diagram to be predicted and the second face processing diagram to be predicted into a second neural sub-network model and a third neural sub-network model to acquire second face characteristic data.

Specifically, inputting the first face processing diagram to be predicted and the second face processing diagram to be predicted into the second neural sub-network model and the third neural sub-network model respectively to obtain the second face feature data may further include: inputting the first face processing diagram to be predicted into a second neural sub-network model to obtain third face feature data, wherein the third face feature data comprises predicted values of a plurality of face attributes of the first face processing diagram to be predicted; and inputting the second face processing diagram to be predicted into a third neural sub-network model to acquire fourth face feature data, wherein the fourth face feature data comprises predicted values of a plurality of face attributes of the second face processing diagram to be predicted, and the second face feature data comprises third face feature data and fourth face feature data.

At this time, the step of stitching the first face feature data and the second face feature data to obtain face stitching data includes: and splicing the first face feature data, the third face feature data and the fourth face feature data to obtain face splicing data. The specific implementation process of the above-mentioned face feature data stitching may refer to the case that the foregoing multi-channel neural network includes only two neural sub-network models, which is not described herein again.

104. And predicting the face attribute of the face picture to be predicted according to the face attribute predicted value of the face picture to be predicted.

Because the types of the face attributes are different, the attribute values used for indicating the face attributes in the embodiment of the invention can also have different expression forms, and the attribute values of the face attributes can be in a numerical form, a vector form and other data expression forms, such as an array and the like. For example, the face attribute corresponding to the face picture to be predicted may be a face attribute such as whether to wear glasses, gender, whether to wear mask, and the like. In practical applications, the face attributes may be represented in numerical form, for example, the gender may be two, and 0 may be used for male and 1 may be used for female. There may be 2 types of glasses: yes and no, respectively, may be represented by 0 for no (no glasses are worn) and 1 for yes (glasses are worn). Whether wear the gauze mask can have 2 kinds: if yes, the same meaning can be given by 0 (no mask) and 1 (mask). For example, the attributes of two face images in the two face images to be predicted may be female, wearing glasses, not wearing mask, male, not wearing glasses, not wearing mask, and the face attribute true values of the two face images to be predicted may be 1, 0, and 0, 0. In the embodiment of the invention, the attribute values of some face attributes can be represented in a numerical form, such as an age attribute, and the attribute values of other face attributes can be represented in a vector form, such as a gender attribute and an expression attribute.

In the embodiment of the invention, when determining the face attribute predicted value of the face picture to be predicted, the face attribute of the face picture to be predicted can be predicted, for example, the face attribute predicted value comprises 1, 1 and 0, and the face attribute respectively represented is assumed to be sex, whether glasses are worn and whether a mask is worn, 0 is used for representing a man, and 1 is used for representing a woman; whether or not (not wearing glasses) is represented by 0, and whether or not (wearing glasses) is represented by 1; if the face image is not represented by 0 (no mask is worn) and is represented by 1 (mask is worn), the face image to be predicted can be predicted to have the face attribute of a female according to the face attribute predicted value, and the female can wear glasses and no mask is worn.

It should be noted that, in the above examples, only three face attributes and corresponding face attribute predictors are used as examples, and it is understood that in practical application, there may be more or fewer face attributes and face attribute predictors, which are not limited herein.

The embodiment of the invention obtains the face picture to be predicted; carrying out preset processing on the face picture to be predicted to obtain a face processing picture to be predicted; carrying out face attribute calculation according to the face picture to be predicted and the face processing picture to be predicted to obtain a face attribute predicted value of the face picture to be predicted; and predicting the face attribute of the face picture to be predicted according to the face attribute predicted value of the face picture to be predicted. In the embodiment of the invention, the face attribute is predicted by the face picture to be predicted and the corresponding face processing picture, and the face processing picture can be complemented with the face picture, so that the interference of the background on the face attribute prediction is reduced, and the accuracy and the robustness of the face attribute prediction are improved.

The face attribute prediction method in the embodiment of the invention is described below with reference to a specific application scenario.

Referring to fig. 4, fig. 4 is another flow chart of a face attribute prediction method according to an embodiment of the present invention, where the method flow may include:

201. the method comprises the steps that a server obtains a face picture A to be predicted;

the obtaining the face picture a from the original picture X may include: carrying out face detection on the original picture X to determine a face area; determining a preset number of face key points (such as key points including face outlines, eyes, eyebrows, lips, nose outlines and the like) in a face area; for example, if the face key points include the key points of the eyebrows on both sides in the face key points, the face key points on both sides in the face key points can be connected into a straight line according to the face key points, and if the straight line is inclined relative to the upper and lower boundaries of the pattern, the original picture is adjusted so that the straight line is parallel to the upper and lower boundaries of the pattern, so as to obtain a face correction picture Y; and carrying out size adjustment on the face correction map Y according to 50 x 50 to obtain the face picture A to be predicted.

202. The server performs abstract processing on the face picture A to be predicted to obtain a face abstract picture B to be predicted;

The face abstract map contains abundant position information and texture information of the face, which is favorable for the fine recognition of the face attribute.

In this embodiment, a face picture a to be predicted may be abstracted by using a preset abstract map generation model to obtain a face abstract map B to be predicted. The abstract graph generation model can comprise an encoder and a decoder, wherein the face picture A to be predicted is firstly obtained by the encoder, and the high-level characteristics are obtained. Specifically, the decoder is a continuous convolution operation, and is configured to continuously map the face picture a to be predicted in the high-level feature space, so as to obtain the high-level feature. The decoder is used for recovering the high-level features finally obtained by the encoder and generating a face abstract diagram B to be predicted. Specifically, the decoder is subjected to continuous deconvolution operation, so that the length and width dimensions of the high-level features are continuously increased, and finally, the abstract face diagram B to be predicted, which is the same as the face picture A to be predicted in size, is output.

203. The server inputs the face picture A to be predicted and the face abstract picture B to be predicted into a multichannel neural network model respectively to obtain a face attribute predicted value of the face picture A to be predicted;

In this embodiment, a face picture a to be predicted and a face abstract picture B to be predicted are respectively input into a pre-trained multichannel neural network model to obtain a face attribute prediction value of the face picture a to be predicted. For example, the face attribute predicted values of the face picture a to be predicted include predicted values of three attributes, specifically 1, 1 and 0.

204. And the server predicts the face attribute of the face picture A to be predicted according to the face attribute predicted value of the face picture A to be predicted.

For example, the face attribute predicted values of the face picture a to be predicted include 1, and 0, assuming that the face attributes respectively represented are sex, whether to wear glasses, and whether to wear a mask, 0 represents a man, and 1 represents a woman; whether or not (not wearing glasses) is represented by 0, and whether or not (wearing glasses) is represented by 1; whether 0 represents (without mask) or 1 represents (with mask), the face attribute predictive value of the face picture a to be predicted can be obtained according to the face attribute predictive value of the face picture a to be predicted, and predicting that the face attribute of the face picture A to be predicted is female, wearing glasses and not wearing a mask.

According to the embodiment of the invention, the face attribute is predicted by inputting the face picture A to be predicted and the face abstract picture B to be predicted into the Yizhu multi-channel neural network model, and the face picture A to be predicted and the face abstract picture to be predicted are complemented, so that when the face attribute of the face picture A to be predicted is predicted by utilizing the multi-channel neural network model, the accuracy and the robustness of the face attribute prediction are improved.

In order to facilitate better implementation of the face attribute prediction method provided by the embodiment of the invention, the embodiment of the invention also provides a device based on the face attribute prediction. The meaning of the nouns is the same as that in the face attribute prediction method, and specific implementation details can be referred to the description in the method embodiment.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a face attribute prediction apparatus according to an embodiment of the present invention, where the face attribute prediction apparatus may include an obtaining unit 501, a picture processing unit 502, a calculating unit 503, and a prediction unit 504, and specifically includes:

an obtaining unit 501, configured to obtain a face picture to be predicted;

the image processing unit 502 is configured to perform preset processing on a face image to be predicted to obtain a face processing image to be predicted;

a calculating unit 503, configured to perform face attribute calculation according to the face image to be predicted and the face processing image to be predicted, so as to obtain a face attribute predicted value of the face image to be predicted;

the predicting unit 504 is configured to predict a face attribute of the face picture to be predicted according to the face attribute predicting value of the face picture to be predicted.

Further, the calculating unit 503 includes an extracting subunit and a splicing subunit, which are specifically as follows:

The extraction subunit is used for extracting the characteristics of the face picture to be predicted and the face processing picture to be predicted respectively so as to acquire a plurality of face characteristic data;

the splicing subunit is used for splicing the face characteristic data to acquire face splicing data;

and the transformation subunit is used for carrying out linear transformation on the face splicing data to obtain a face attribute predicted value of the sample face picture.

Further, the extraction subunit is specifically configured to:

extracting face features of a face picture to be predicted to obtain first face feature data, wherein the first face feature data comprises a face attribute predicted value of the face picture to be predicted;

extracting the face characteristics of the face processing diagram to be predicted to obtain second face characteristic data, wherein the second face characteristic data comprises a face attribute predicted value of the face processing diagram to be predicted.

Further, the splicing subunit is specifically configured to:

and splicing the first face characteristic data and the second face characteristic data to obtain face splicing data.

Further, the splicing subunit is specifically configured to:

And splicing the first face characteristic data and the second face characteristic data which are in the same order of magnitude to obtain face splicing data.

Further, the calculating unit 503 is specifically configured to:

and respectively inputting the face picture to be predicted and the face processing picture to be predicted into a preset multichannel neural network model to obtain a face attribute predicted value of the face picture to be predicted.

before a face picture to be predicted and a face processing picture to be predicted are respectively input into a preset multichannel neural network model, acquiring a plurality of sample face pictures, and acquiring a face attribute true value corresponding to each sample face picture;

training a preset multichannel neural network by using the training sample data set to obtain a multichannel neural network model.

Further, the face processing map to be predicted includes a face extraction map to be predicted, and the picture processing unit 502 is specifically configured to:

Further, the acquisition unit 501 includes:

the acquisition subunit is used for acquiring the original picture;

and the detection subunit is used for carrying out face detection on the original picture so as to carry out face correction processing on the original picture and obtain the face picture to be predicted.

Further, the detection subunit is specifically configured to:

performing face detection on the original picture to determine a face area;

determining a preset number of face key points in a face area;

and carrying out size adjustment on the face correction picture according to the preset size to obtain the face picture to be predicted.

According to the embodiment of the invention, the face picture to be predicted is obtained through the obtaining unit 501; the picture processing unit 502 performs preset processing on the face picture to be predicted to obtain a face processing picture to be predicted; the calculating unit 503 calculates the face attribute according to the face picture to be predicted and the face processing picture to be predicted, and obtains the face attribute predicted value of the face picture to be predicted; the prediction unit 504 predicts the face attribute of the face picture to be predicted according to the face attribute prediction value of the face picture to be predicted. In the embodiment of the invention, the face attribute is predicted by the face picture to be predicted and the corresponding face processing picture, and the face processing picture can be complemented with the face picture, so that the interference of the background on the face attribute prediction is reduced, and the accuracy and the robustness of the face attribute prediction are improved.

The embodiment of the invention also provides a server, as shown in fig. 6, which shows a schematic structural diagram of the server according to the embodiment of the invention, specifically:

the server may include one or more processing cores 'processors 601, one or more computer-readable storage media's memory 602, power supply 603, and input unit 604, among other components. Those skilled in the art will appreciate that the server architecture shown in fig. 6 is not limiting of the server and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

the processor 601 is a control center of the server, connects various parts of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 602, and calling data stored in the memory 602, thereby performing overall monitoring of the server. Optionally, the processor 601 may include one or more processing cores; preferably, the processor 601 may integrate an application processor and a modem processor, wherein the application processor primarily processes operating storage media, user interfaces, application programs, etc., and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 601.

The memory 602 may be used to store software programs and modules, and the processor 601 may execute various functional applications and data processing by executing the software programs and modules stored in the memory 602. The memory 602 may mainly include a storage program area and a storage data area, wherein the storage program area may store a storage medium, an application program (such as a sound playing function, an image playing function, etc.) required for operating at least one function, and the like; the storage data area may store data created according to the use of the server, etc. In addition, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 602 may also include a memory controller to provide access to the memory 602 by the processor 601.

The server also includes a power supply 603 for powering the various components, preferably, the power supply 603 can be logically coupled to the processor 601 through a power management storage medium, such that functions of managing charging, discharging, and power consumption are performed through the power management storage medium. The power supply 603 may also include one or more of any components, such as a direct current or alternating current power supply, a rechargeable storage medium, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The server may further comprise an input unit 604, which input unit 604 may be used for receiving input numerical or character information and for generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.

Although not shown, the server may further include a display unit or the like, which is not described herein. In this embodiment, the processor 601 in the server loads executable files corresponding to the processes of one or more application programs into the memory 602 according to the following instructions, and the processor 601 executes the application programs stored in the memory 602, so as to implement various functions as follows:

acquiring a face picture to be predicted; carrying out preset processing on the face picture to be predicted to obtain a face processing picture to be predicted; carrying out face attribute calculation according to the face picture to be predicted and the face processing picture to be predicted to obtain a face attribute predicted value of the face picture to be predicted; and predicting the face attribute of the face picture to be predicted according to the face attribute predicted value of the face picture to be predicted.

In the foregoing embodiments, the descriptions of the embodiments are focused on, and the details of the face attribute prediction method in a certain embodiment may be referred to in the foregoing detailed description, which is not repeated herein.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present invention provides a storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any one of the face attribute prediction methods provided by the embodiments of the present invention. For example, the instructions may perform the steps of:

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

The instructions stored in the storage medium can execute the steps in any face attribute prediction method provided by the embodiment of the present invention, so that the beneficial effects that any face attribute prediction method provided by the embodiment of the present invention can be achieved, and detailed descriptions of the previous embodiments are omitted herein.

The foregoing describes in detail a face attribute prediction method, apparatus and storage medium provided by the embodiments of the present invention, and specific examples are applied to illustrate the principles and embodiments of the present invention, where the foregoing examples are only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present invention, the present description should not be construed as limiting the present invention.

Claims

1. A method for predicting a face attribute, the method comprising:

acquiring a face picture to be predicted;

convolving the face picture to be predicted by adopting an encoder in an abstract picture generation model to obtain high-level characteristics of the face picture to be predicted;

Deconvolution is carried out on the high-level features by adopting a decoder in the abstract map generation model to obtain a face abstract map to be predicted, and the face abstract map to be predicted is used as a face processing map to be predicted;

performing face attribute calculation according to the face picture to be predicted and the face processing picture to be predicted to obtain a face attribute predicted value of the face picture to be predicted, including:

respectively extracting the face picture to be predicted and the face processing picture to be predicted by adopting a preset multichannel network model so as to obtain a plurality of face characteristic data;

splicing the face characteristic data to obtain face splicing data;

performing linear transformation on the face stitching data to obtain a face attribute predicted value of a face picture to be predicted;

and predicting the face attribute of the face picture to be predicted according to the face attribute predicted value of the face picture to be predicted, wherein the face attribute is used for representing the sex attribute and the face part attribute of the face in the face picture to be predicted.

2. The face attribute prediction method according to claim 1, wherein the extracting features of the face picture to be predicted and the face processing picture to be predicted to obtain a plurality of face feature data includes:

Extracting face characteristics of the face picture to be predicted to obtain first face characteristic data, wherein the first face characteristic data comprises a face attribute predicted value of the face picture to be predicted;

and extracting the face characteristics of the face processing diagram to be predicted so as to obtain second face characteristic data, wherein the second face characteristic data comprises a face attribute prediction value of the face processing diagram to be predicted.

3. The method according to claim 2, wherein the stitching the plurality of face feature data to obtain face stitching data includes:

4. A face attribute prediction method according to claim 3, wherein the stitching the first face feature data and the second face feature data to obtain the face stitching data includes:

5. The face attribute prediction method according to claim 1, wherein the calculating the face attribute according to the face picture to be predicted and the face processing picture to be predicted to obtain the face attribute predicted value of the face picture to be predicted includes:

6. The face attribute prediction method according to claim 5, wherein before the face picture to be predicted and the face processing picture to be predicted are input to a preset multi-channel neural network model, respectively, the method further comprises:

collecting a plurality of sample face pictures, and obtaining a face attribute true value corresponding to each sample face picture;

7. The method for predicting a face attribute according to claim 1, wherein the obtaining a face picture to be predicted includes:

acquiring an original picture;

and carrying out face detection on the original picture so as to carry out face correction processing on the original picture, thereby obtaining the face picture to be predicted.

8. The method for predicting a face attribute according to claim 7, wherein the performing face detection on the original picture to perform face correction processing on the original picture to obtain the face picture to be predicted includes:

performing face detection on the original picture to determine a face area;

determining a preset number of face key points in the face area;

9. A face attribute prediction apparatus, the apparatus comprising:

the acquisition unit is used for acquiring the face picture to be predicted;

The picture processing unit is used for convoluting the face picture to be predicted by adopting an encoder in the abstract picture generation model to obtain high-level characteristics of the face picture to be predicted; deconvolution is carried out on the high-level features by adopting a decoder in the abstract map generation model to obtain a face abstract map to be predicted, and the face abstract map to be predicted is used as a face processing map to be predicted;

wherein the computing unit includes:

the extraction subunit is used for respectively extracting the face picture to be predicted and the face processing picture feature to be predicted by adopting a preset multichannel network model so as to acquire a plurality of face feature data;

the transformation subunit is used for carrying out linear transformation on the face splicing data to obtain a face attribute predicted value of the face picture to be predicted;

the predicting unit is used for predicting the face attribute of the face picture to be predicted according to the face attribute predicting value of the face picture to be predicted, wherein the face attribute is used for representing the sex attribute and the face part attribute of the face in the face picture to be predicted.

10. The face attribute prediction apparatus according to claim 9, wherein the extraction subunit is specifically configured to:

11. The face attribute prediction apparatus according to claim 9, wherein the computing unit is specifically configured to:

12. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the face attribute prediction method of any one of claims 1 to 8.