CN111401294A

CN111401294A - Multitask face attribute classification method and system based on self-adaptive feature fusion

Info

Publication number: CN111401294A
Application number: CN202010228805.7A
Authority: CN
Inventors: 崔超然; 申朕; 黄瑾
Original assignee: Shandong University of Finance and Economics
Current assignee: Shandong University of Finance and Economics
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2020-07-10
Anticipated expiration: 2040-03-27
Also published as: CN111401294B

Abstract

The invention discloses a multitask face attribute classification method and a multitask face attribute classification system based on self-adaptive feature fusion, wherein the method comprises the following steps: acquiring a face image to be classified; carrying out preprocessing operation on the face image to be classified; inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute. The method constructs a self-adaptive feature fusion layer, and connects network branches of different tasks to form a uniform multi-task deep convolutional neural network, so that information can be effectively shared among different tasks, and the classification accuracy effect is improved remarkably.

Description

Multitask face attribute classification method and system based on self-adaptive feature fusion

Technical Field

The disclosure relates to the technical field of computer vision and machine learning, in particular to a multitask face attribute classification method and system based on adaptive feature fusion.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

In recent years, deep convolutional neural networks have achieved breakthrough in many computer vision tasks, such as target detection, semantic segmentation, and depth prediction. The multi-task deep convolution neural network aims to process a plurality of related tasks together, improves the learning efficiency, and meanwhile improves the prediction accuracy and generalization performance through the characteristic interaction between tasks to prevent overfitting.

When a multitask deep convolutional neural network is implemented, the most common scheme is to construct a network architecture based on parameter hard sharing. In this scheme, different tasks share a lower network layer and maintain respective branches at the higher network layer. Prior to training, the shared network layer needs to be manually specified by experience. This approach lacks theoretical guidance, and an unreasonable choice for the shared network layer may also lead to a severe degradation of the performance of the method.

In view of this, many researchers have proposed automatically building shared network layers by learning optimal feature combinations for different tasks on a single network layer, thereby avoiding complex enumeration and model training processes when parameters are shared hard.

For example, in the Cross Stitch method (see IshanMisra, AbhinavShrivastava, AbhinavGupta, and commercial Hebert. Cross-batch networks for multi-task learning. InProcedents of the IEEE Conference on Computer Vision and Pattern recognition, pages 3994-.

In the course of implementing the present disclosure, the inventors found that the following technical problems exist in the prior art:

although the above works have been demonstrated in experiments to achieve better performance, they are essentially all learning to construct a fixed feature fusion strategy. After training is complete, all input samples correspond to the same set of feature fusion weights. And the characteristics of the image cannot be well expressed by the features after feature fusion.

Disclosure of Invention

In order to solve the defects of the prior art, the disclosure provides a multitask face attribute classification method and system based on self-adaptive feature fusion; in the multi-task face attribute classification, for some samples, the features needing to be fused among tasks may be very similar; while for other samples, the features may be very different or even complementary to each other. Therefore, when the feature fusion of the multitask learning is carried out, the self characteristics of the feature to be fused are fully considered. Based on the above, the present disclosure introduces a dynamic feature fusion mechanism when designing a multitask deep convolutional neural network, and adaptively fuses features according to the dependency relationship between the features to implement the sharing and interaction of features between tasks.

In a first aspect, the present disclosure provides a multitask face attribute classification method based on adaptive feature fusion;

the multitask face attribute classification method based on the self-adaptive feature fusion comprises the following steps:

acquiring a face image to be classified;

carrying out preprocessing operation on the face image to be classified;

inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute.

In a second aspect, the present disclosure provides a multitask face attribute classification system based on adaptive feature fusion;

a multitask face attribute classification system based on self-adaptive feature fusion comprises the following steps:

an acquisition module configured to: acquiring a face image to be classified;

a pre-processing module configured to: carrying out preprocessing operation on the face image to be classified;

a classification module configured to: inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute.

In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein when the computer instructions are executed by the processor, the method of the first aspect is performed.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.

Compared with the prior art, the beneficial effect of this disclosure is:

the method and the device take the relation among different task feature maps in the multitask deep convolutional neural network into consideration, namely, when feature fusion is carried out, the degree of sharing or retaining feature information is determined according to the characteristics of the feature maps.

When the method is realized, a self-adaptive feature fusion layer is constructed, network branches of different tasks are connected to form a uniform multi-task deep convolution neural network, so that information can be effectively shared among the different tasks, and the classification accuracy effect is improved remarkably.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.

Fig. 1 is a flowchart of a deep multitask learning method based on adaptive feature fusion according to a first embodiment of the present disclosure.

FIG. 2 is a schematic diagram of a network branch connecting two tasks by using an adaptive feature fusion layer to form a unified multitask deep convolutional neural network according to a first embodiment of the present disclosure;

FIG. 3 is a schematic view of the internal connection of a feature fusion layer according to the first embodiment of the disclosure;

fig. 4 is a schematic diagram of an internal connection relationship of a channel level fusion module according to a first embodiment of the present disclosure;

fig. 5 is a schematic diagram of a spatial hierarchy fusion module according to a first embodiment of the present disclosure.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiment I provides a multitask face attribute classification method based on self-adaptive feature fusion;

as shown in fig. 1, the multi-task face attribute classification method based on adaptive feature fusion includes:

s1: acquiring a face image to be classified;

s2: carrying out preprocessing operation on the face image to be classified;

s3: inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute.

As one or more embodiments, the preprocessing operation specifically includes:

first, all images are scaled to 224 × 224 pixels;

and then, calculating the pixel average value of the training set image, and subtracting the pixel average value from each face image to be classified to perform normalization operation.

As one or more embodiments, the obtaining of the multitask face attribute classification model based on the adaptive feature fusion includes:

constructing a multitask neural network model based on self-adaptive feature fusion;

constructing a training set, wherein the training set comprises: the method comprises the following steps of (1) obtaining a plurality of face images, wherein each face image comprises at least two known attributes;

the pre-processing operation of the training set image comprises the steps of firstly scaling all the images to 224 × 224 pixels, then calculating the pixel average value of the training set image, enabling each image to subtract the average value to carry out normalization operation, and finally carrying out horizontal turning and Gaussian blur processing on the training image with set probability before each training;

training a multi-task neural network model based on adaptive feature fusion by using the image after the preprocessing operation to obtain a trained multi-task neural network model based on adaptive feature fusion; namely, the multi-task human face attribute classification model based on the self-adaptive feature fusion.

The beneficial effects of the above technical scheme are: through the preprocessing step, the number of the training samples can be effectively expanded, and the diversity of the training samples is improved.

It is to be understood that the known attributes, at least, include one or more of the following examples: age, gender, expression, etc.

It should be appreciated that in the present embodiment, the Adience dataset is selected to perform the age classification and gender classification tasks on the face image simultaneously. In an Adience data set, the age classification tasks are divided into eight categories of 0-2, 4-6, 8-12, 15-20, 25-32, 38-43, 48-53 and 60 +; gender classification contains both male and female categories;

using the cross-entropy loss function, the loss on gender classification is defined as L_ageThe loss in age classification is L_sexThen the total loss function is L ═ λ L_age+L_sex. Wherein, lambda is a hyperparameter of two types of losses of the balance model. Considering that gender classification is a two-class problem and age classification is a multi-class problem, the value of λ is set to 1/2. Training the network by adopting a random gradient descent algorithm, and determining the network weight which can minimize the loss function;

as one or more embodiments, the adaptive feature fusion based multitasking neural network model comprises:

two network branches in parallel: a first network branch and a second network branch;

a first network branch comprising: the system comprises a convolution layer group A1, a convolution layer group A2, a convolution layer group A3, a convolution layer group A4, a convolution layer group A5, a full connection layer A6 and a softmax layer A7 which are connected in sequence;

a second network branch comprising: a convolution layer group B1, a convolution layer group B2, a convolution layer group B3, a convolution layer group B4, a convolution layer group B5, a full connection layer B6 and a Softmax layer B7 which are connected in sequence;

and the convolution layer groups corresponding to the first network branch and the second network branch are connected through four self-adaptive feature fusion layers.

Further, the convolution layer groups corresponding to the first network branch and the second network branch are connected by four adaptive feature fusion layers, which specifically includes:

the output end of the convolution layer group A1 and the output end of the convolution layer group B1 are both connected with the input end of the first adaptive characteristic fusion layer;

the input end of the convolution layer group A2 and the input end of the convolution layer group B2 are both connected with the output end of the first adaptive characteristic fusion layer;

the output end of the convolution layer group A2 and the output end of the convolution layer group B2 are both connected with the input end of the second adaptive characteristic fusion layer;

the input end of the convolution layer group A3 and the input end of the convolution layer group B3 are both connected with the output end of the second adaptive characteristic fusion layer;

the output end of the convolution layer group A3 and the output end of the convolution layer group B3 are both connected with the input end of the third adaptive characteristic fusion layer;

the input end of the convolution layer group A4 and the input end of the convolution layer group B4 are both connected with the output end of the third adaptive characteristic fusion layer;

the output end of the convolution layer group A4 and the output end of the convolution layer group B4 are both connected with the input end of the fourth adaptive characteristic fusion layer;

an input of the convolution layer group a5 and an input of the convolution layer group B5 are both connected to an output of the fourth adaptive feature fusion layer.

It should be understood that the working principle of the above multitask neural network model based on adaptive feature fusion is as follows:

the first network branch and the second network branch receive the same input image, the first network branch is responsible for classifying the age of the face in the input image, the second network branch is responsible for classifying the gender of the face in the input image, and the output of the network branches represents the probability that the input image belongs to each category on the corresponding attribute;

the first network branch and the second network branch are identical in structure and are based on the ResNet101 network structure (see Kaiming He, Xiangyu Zhuang, Shaoqingren, and Jian Sun. deep residual learning for image Recognition. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770-778, 2016). Each network branch consists of five convolutional layer groups, one fully-connected layer and one softmax layer. Wherein each convolution layer group comprises a plurality of continuous convolution layers and a maximum pooling layer.

And respectively introducing a first adaptive feature fusion layer, a second adaptive feature fusion layer and a fourth adaptive feature fusion layer, and connecting the convolution layer groups corresponding to the first network branch and the second network branch, thereby realizing feature interaction between two tasks and constructing a uniform multi-task deep convolution neural network, wherein the structure of the network is shown in figure 2.

Further, the fully-connected layer a6 of the first network branch performs nonlinear transformation on the input feature map, and maps the feature map into a column vector; the dimension of the column vector is equal to the number of categories on the age attribute, and each dimension corresponds to a specific age category;

further, the fully-connected layer B6 of the second network branch performs nonlinear transformation on the input feature map, and maps the feature map into a column vector; the dimensions of the column vector are equal to the number of categories on the gender attribute, with each dimension corresponding to a particular gender category.

Further, Softmax layer a7 of the first network branch converts each dimension of the input vector into a probability value representing the probability of the input image on each category of age attribute;

further, Softmax layer B7 of the second network branch converts each dimension of the input vector into a probability value representing the probability of the input image on each category of gender attribute;

for one or more embodiments, the first adaptive feature fusion layer, the second adaptive feature fusion layer, the third adaptive feature fusion layer, and the fourth adaptive feature fusion layer are identical in structure.

As one or more embodiments, as shown in fig. 3, the first adaptive feature fusion layer includes:

the system comprises a channel level fusion module and a space level fusion module which are sequentially connected, wherein the input end of the channel level fusion module is the input end of the current adaptive feature fusion layer; and the output end of the spatial hierarchy fusion module is the output end of the current adaptive feature fusion layer.

As one or more embodiments, the channel hierarchy fusion module includes:

the first average pooling layer and the second average pooling layer are parallel;

the output ends of the first average pooling layer and the second average pooling layer are connected with the series unit;

the series unit is connected with the first full connection layer, and the first full connection layer is connected with the second full connection layer;

the second full-connection layer is respectively connected with the third full-connection layer and the fourth full-connection layer;

the third full connection layer is connected with the first Softmax function layer;

the fourth full connection layer is connected with the second Softmax function layer;

the first Softmax function layer is respectively connected with the first multiplier and the second multiplier;

the second Softmax function layer is connected with the third multiplier and the fourth multiplier respectively;

the first multiplier and the second multiplier are both connected with the first adder;

the third multiplier and the fourth multiplier are both connected with the second adder.

As one or more embodiments, as shown in fig. 4, the channel level fusion module operates according to the following principle:

firstly, in a channel level fusion module, inputting original feature maps x of two network branches_AAnd x_BRespectively carrying out average pooling along the channel dimension to obtain

And

and will be

And

are connected together;

then, the connected results are subjected to dimensionality reduction processing respectively through a first full-connection layer and a second full-connection layer to obtain two guide vectors

And

make it

Through the third full connection layer, x is obtained_AAnd x_BRespectively corresponding fusion weight vector

And

make it

Through the fourth full connection layer, x is obtained_AAnd x_BRespectively corresponding fusion weight vector

And

wherein the content of the first and second substances,

and

is equal to the original feature map x_AThe number of the channels of (a) is,

and

is equal to the original feature map x_BThe number of channels of (a);

will be provided with

And

on the corresponding position elementTwo times, so that the Softmax operation is performed

Will be provided with

And

performing Softmax operations on the corresponding position elements two by two, such that

Finally, multiplying and adding the original characteristic graph and the fusion weight vector to respectively obtain

And

namely, it is

Will be provided with

And

input to the spatial hierarchy fusion module.

As one or more embodiments, the spatial hierarchy fusion module includes:

the third average pooling layer and the fourth average pooling layer are arranged in parallel;

the output ends of the third average pooling layer and the fourth average pooling layer are connected with the stacking unit;

the stacking unit is connected with the first convolution layer and the second convolution layer respectively;

the first convolution layer is connected with the fifth full-connection layer, and the second convolution layer is connected with the sixth full-connection layer;

the fifth full connection layer is connected with the third Softmax function layer; the sixth fully connected layer is connected with the fourth Softmax function layer;

the third Softmax function layer is respectively connected with the fifth multiplier and the sixth multiplier;

the fourth Softmax function layer is connected with the seventh multiplier and the eighth multiplier respectively;

the fifth multiplier and the sixth multiplier are both connected with the third adder;

the seventh multiplier and the eighth multiplier are both connected with the fourth adder.

As one or more embodiments, as shown in fig. 5, the spatial hierarchy fusion module operates according to the following principle:

firstly, in a spatial hierarchy fusion module, an input feature map is input

And

respectively carrying out average pooling along the spatial dimension to obtain

And

and will be

And

stacking together;

then, the stacked results are respectively passed through two convolution layers, and only one convolution kernel of 1 × 1 is provided in each convolution layer, so as to obtain two guide matrixes

And

will be provided with

Vectorizing and passing through a full connection layer to obtain

And respectively corresponding fusion weight vectors

And

will be provided with

Vectorizing and passing through a full connection layer to obtain

And

respectively corresponding fusion weight vector

And

will be provided with

And

matrixing them to a size equal to the input profile

The size of the space of (a).

Will be provided with

And

matrixing them to a size equal to the input profile

The spatial dimension of (a);

will be provided with

And

Will be provided with

And

Finally, multiplying and adding the input characteristic graph and the fusion weight vector to respectively obtain

And

namely, it is

Will be provided with

And

respectively into the next set of convolution layers of the first network branch and the second network branch.

The method takes the relation among different task characteristic graphs in the multitask deep convolution neural network into consideration, namely when the characteristic fusion is carried out, the degree of sharing or retaining the characteristic information is determined according to the characteristics of the characteristic graphs, and the self-adaptive characteristic fusion is realized.

The second embodiment provides a multitask face attribute classification system based on adaptive feature fusion;

an acquisition module configured to: acquiring a face image to be classified;

In a third embodiment, the present embodiment further provides an electronic device, which includes a memory, a processor, and computer instructions stored in the memory and executed on the processor, where the computer instructions, when executed by the processor, implement the method in the first embodiment.

In a fourth embodiment, the present embodiment further provides a computer-readable storage medium for storing computer instructions, and the computer instructions, when executed by a processor, implement the method of the first embodiment.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. The multitask face attribute classification method based on the self-adaptive feature fusion is characterized by comprising the following steps:

acquiring a face image to be classified;

carrying out preprocessing operation on the face image to be classified;

2. The method of claim 1, wherein the preprocessing operation comprises:

first, all images are scaled to 224 × 224 pixels;

3. The method of claim 1, wherein the obtaining of the multi-tasking face attribute classification model based on adaptive feature fusion comprises:

4. The method of claim 3, wherein the adaptive feature fusion based multitasking neural network model comprises:

5. The method as set forth in claim 4, wherein,

the working principle of the multitask neural network model based on the self-adaptive feature fusion is as follows:

the adaptive feature fusion layer comprises:

6. The method of claim 5, wherein the channel hierarchy fusion module operates on a principle comprising:

And

and will be

And

are connected together;

And

make it

And

make it

And

wherein the content of the first and second substances,

and

is equal to the original feature map x_AThe number of the channels of (a) is,

and

is equal to the original feature map x_BThe number of channels of (a);

will be provided with

And

Will be provided with

And

And

namely, it is

And

will be provided with

And

input to the spatial hierarchy fusion module.

7. The method of claim 5, wherein the spatial hierarchy fusion module operates on a principle comprising:

firstly, in a spatial hierarchy fusion module, an input feature map is input

And

respectively carrying out average pooling along the spatial dimension to obtain

And

and will be

And

stacking together;

And

will be provided with

Vectorizing and passing through a full connection layer to obtain

And respectively corresponding fusion weight vectors

And

will be provided with

Vectorizing and passing through a full connection layer to obtain

And

respectively corresponding fusion weight vector

And

will be provided with

And

matrixing them to a size equal to the input profile

The spatial dimension of (a);

will be provided with

And

matrixing them to a size equal to the input profile

The spatial dimension of (a);

will be provided with

And

Will be provided with

And

And

namely, it is

And

will be provided with

And

8. The multitask face attribute classification system based on the self-adaptive feature fusion is characterized by comprising the following steps:

an acquisition module configured to: acquiring a face image to be classified;

9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executable on the processor, the computer instructions when executed by the processor performing the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 7.