CN111401294A - Multitask face attribute classification method and system based on self-adaptive feature fusion - Google Patents

Multitask face attribute classification method and system based on self-adaptive feature fusion Download PDF

Info

Publication number
CN111401294A
CN111401294A CN202010228805.7A CN202010228805A CN111401294A CN 111401294 A CN111401294 A CN 111401294A CN 202010228805 A CN202010228805 A CN 202010228805A CN 111401294 A CN111401294 A CN 111401294A
Authority
CN
China
Prior art keywords
fusion
face
feature fusion
adaptive feature
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010228805.7A
Other languages
Chinese (zh)
Other versions
CN111401294B (en
Inventor
崔超然
申朕
黄瑾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University of Finance and Economics
Original Assignee
Shandong University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University of Finance and Economics filed Critical Shandong University of Finance and Economics
Priority to CN202010228805.7A priority Critical patent/CN111401294B/en
Publication of CN111401294A publication Critical patent/CN111401294A/en
Application granted granted Critical
Publication of CN111401294B publication Critical patent/CN111401294B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/178Human faces, e.g. facial parts, sketches or expressions estimating age from face image; using age information for improving recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/179Human faces, e.g. facial parts, sketches or expressions metadata assisted face recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multitask face attribute classification method and a multitask face attribute classification system based on self-adaptive feature fusion, wherein the method comprises the following steps: acquiring a face image to be classified; carrying out preprocessing operation on the face image to be classified; inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute. The method constructs a self-adaptive feature fusion layer, and connects network branches of different tasks to form a uniform multi-task deep convolutional neural network, so that information can be effectively shared among different tasks, and the classification accuracy effect is improved remarkably.

Description

Multitask face attribute classification method and system based on self-adaptive feature fusion
Technical Field
The disclosure relates to the technical field of computer vision and machine learning, in particular to a multitask face attribute classification method and system based on adaptive feature fusion.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
In recent years, deep convolutional neural networks have achieved breakthrough in many computer vision tasks, such as target detection, semantic segmentation, and depth prediction. The multi-task deep convolution neural network aims to process a plurality of related tasks together, improves the learning efficiency, and meanwhile improves the prediction accuracy and generalization performance through the characteristic interaction between tasks to prevent overfitting.
When a multitask deep convolutional neural network is implemented, the most common scheme is to construct a network architecture based on parameter hard sharing. In this scheme, different tasks share a lower network layer and maintain respective branches at the higher network layer. Prior to training, the shared network layer needs to be manually specified by experience. This approach lacks theoretical guidance, and an unreasonable choice for the shared network layer may also lead to a severe degradation of the performance of the method.
In view of this, many researchers have proposed automatically building shared network layers by learning optimal feature combinations for different tasks on a single network layer, thereby avoiding complex enumeration and model training processes when parameters are shared hard.
For example, in the Cross Stitch method (see IshanMisra, AbhinavShrivastava, AbhinavGupta, and commercial Hebert. Cross-batch networks for multi-task learning. InProcedents of the IEEE Conference on Computer Vision and Pattern recognition, pages 3994-.
In the course of implementing the present disclosure, the inventors found that the following technical problems exist in the prior art:
although the above works have been demonstrated in experiments to achieve better performance, they are essentially all learning to construct a fixed feature fusion strategy. After training is complete, all input samples correspond to the same set of feature fusion weights. And the characteristics of the image cannot be well expressed by the features after feature fusion.
Disclosure of Invention
In order to solve the defects of the prior art, the disclosure provides a multitask face attribute classification method and system based on self-adaptive feature fusion; in the multi-task face attribute classification, for some samples, the features needing to be fused among tasks may be very similar; while for other samples, the features may be very different or even complementary to each other. Therefore, when the feature fusion of the multitask learning is carried out, the self characteristics of the feature to be fused are fully considered. Based on the above, the present disclosure introduces a dynamic feature fusion mechanism when designing a multitask deep convolutional neural network, and adaptively fuses features according to the dependency relationship between the features to implement the sharing and interaction of features between tasks.
In a first aspect, the present disclosure provides a multitask face attribute classification method based on adaptive feature fusion;
the multitask face attribute classification method based on the self-adaptive feature fusion comprises the following steps:
acquiring a face image to be classified;
carrying out preprocessing operation on the face image to be classified;
inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute.
In a second aspect, the present disclosure provides a multitask face attribute classification system based on adaptive feature fusion;
a multitask face attribute classification system based on self-adaptive feature fusion comprises the following steps:
an acquisition module configured to: acquiring a face image to be classified;
a pre-processing module configured to: carrying out preprocessing operation on the face image to be classified;
a classification module configured to: inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute.
In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein when the computer instructions are executed by the processor, the method of the first aspect is performed.
In a fourth aspect, the present disclosure also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.
Compared with the prior art, the beneficial effect of this disclosure is:
the method and the device take the relation among different task feature maps in the multitask deep convolutional neural network into consideration, namely, when feature fusion is carried out, the degree of sharing or retaining feature information is determined according to the characteristics of the feature maps.
When the method is realized, a self-adaptive feature fusion layer is constructed, network branches of different tasks are connected to form a uniform multi-task deep convolution neural network, so that information can be effectively shared among the different tasks, and the classification accuracy effect is improved remarkably.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
Fig. 1 is a flowchart of a deep multitask learning method based on adaptive feature fusion according to a first embodiment of the present disclosure.
FIG. 2 is a schematic diagram of a network branch connecting two tasks by using an adaptive feature fusion layer to form a unified multitask deep convolutional neural network according to a first embodiment of the present disclosure;
FIG. 3 is a schematic view of the internal connection of a feature fusion layer according to the first embodiment of the disclosure;
fig. 4 is a schematic diagram of an internal connection relationship of a channel level fusion module according to a first embodiment of the present disclosure;
fig. 5 is a schematic diagram of a spatial hierarchy fusion module according to a first embodiment of the present disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiment I provides a multitask face attribute classification method based on self-adaptive feature fusion;
as shown in fig. 1, the multi-task face attribute classification method based on adaptive feature fusion includes:
s1: acquiring a face image to be classified;
s2: carrying out preprocessing operation on the face image to be classified;
s3: inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute.
As one or more embodiments, the preprocessing operation specifically includes:
first, all images are scaled to 224 × 224 pixels;
and then, calculating the pixel average value of the training set image, and subtracting the pixel average value from each face image to be classified to perform normalization operation.
As one or more embodiments, the obtaining of the multitask face attribute classification model based on the adaptive feature fusion includes:
constructing a multitask neural network model based on self-adaptive feature fusion;
constructing a training set, wherein the training set comprises: the method comprises the following steps of (1) obtaining a plurality of face images, wherein each face image comprises at least two known attributes;
the pre-processing operation of the training set image comprises the steps of firstly scaling all the images to 224 × 224 pixels, then calculating the pixel average value of the training set image, enabling each image to subtract the average value to carry out normalization operation, and finally carrying out horizontal turning and Gaussian blur processing on the training image with set probability before each training;
training a multi-task neural network model based on adaptive feature fusion by using the image after the preprocessing operation to obtain a trained multi-task neural network model based on adaptive feature fusion; namely, the multi-task human face attribute classification model based on the self-adaptive feature fusion.
The beneficial effects of the above technical scheme are: through the preprocessing step, the number of the training samples can be effectively expanded, and the diversity of the training samples is improved.
It is to be understood that the known attributes, at least, include one or more of the following examples: age, gender, expression, etc.
It should be appreciated that in the present embodiment, the Adience dataset is selected to perform the age classification and gender classification tasks on the face image simultaneously. In an Adience data set, the age classification tasks are divided into eight categories of 0-2, 4-6, 8-12, 15-20, 25-32, 38-43, 48-53 and 60 +; gender classification contains both male and female categories;
using the cross-entropy loss function, the loss on gender classification is defined as LageThe loss in age classification is LsexThen the total loss function is L ═ λ Lage+Lsex. Wherein, lambda is a hyperparameter of two types of losses of the balance model. Considering that gender classification is a two-class problem and age classification is a multi-class problem, the value of λ is set to 1/2. Training the network by adopting a random gradient descent algorithm, and determining the network weight which can minimize the loss function;
as one or more embodiments, the adaptive feature fusion based multitasking neural network model comprises:
two network branches in parallel: a first network branch and a second network branch;
a first network branch comprising: the system comprises a convolution layer group A1, a convolution layer group A2, a convolution layer group A3, a convolution layer group A4, a convolution layer group A5, a full connection layer A6 and a softmax layer A7 which are connected in sequence;
a second network branch comprising: a convolution layer group B1, a convolution layer group B2, a convolution layer group B3, a convolution layer group B4, a convolution layer group B5, a full connection layer B6 and a Softmax layer B7 which are connected in sequence;
and the convolution layer groups corresponding to the first network branch and the second network branch are connected through four self-adaptive feature fusion layers.
Further, the convolution layer groups corresponding to the first network branch and the second network branch are connected by four adaptive feature fusion layers, which specifically includes:
the output end of the convolution layer group A1 and the output end of the convolution layer group B1 are both connected with the input end of the first adaptive characteristic fusion layer;
the input end of the convolution layer group A2 and the input end of the convolution layer group B2 are both connected with the output end of the first adaptive characteristic fusion layer;
the output end of the convolution layer group A2 and the output end of the convolution layer group B2 are both connected with the input end of the second adaptive characteristic fusion layer;
the input end of the convolution layer group A3 and the input end of the convolution layer group B3 are both connected with the output end of the second adaptive characteristic fusion layer;
the output end of the convolution layer group A3 and the output end of the convolution layer group B3 are both connected with the input end of the third adaptive characteristic fusion layer;
the input end of the convolution layer group A4 and the input end of the convolution layer group B4 are both connected with the output end of the third adaptive characteristic fusion layer;
the output end of the convolution layer group A4 and the output end of the convolution layer group B4 are both connected with the input end of the fourth adaptive characteristic fusion layer;
an input of the convolution layer group a5 and an input of the convolution layer group B5 are both connected to an output of the fourth adaptive feature fusion layer.
It should be understood that the working principle of the above multitask neural network model based on adaptive feature fusion is as follows:
the first network branch and the second network branch receive the same input image, the first network branch is responsible for classifying the age of the face in the input image, the second network branch is responsible for classifying the gender of the face in the input image, and the output of the network branches represents the probability that the input image belongs to each category on the corresponding attribute;
the first network branch and the second network branch are identical in structure and are based on the ResNet101 network structure (see Kaiming He, Xiangyu Zhuang, Shaoqingren, and Jian Sun. deep residual learning for image Recognition. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770-778, 2016). Each network branch consists of five convolutional layer groups, one fully-connected layer and one softmax layer. Wherein each convolution layer group comprises a plurality of continuous convolution layers and a maximum pooling layer.
And respectively introducing a first adaptive feature fusion layer, a second adaptive feature fusion layer and a fourth adaptive feature fusion layer, and connecting the convolution layer groups corresponding to the first network branch and the second network branch, thereby realizing feature interaction between two tasks and constructing a uniform multi-task deep convolution neural network, wherein the structure of the network is shown in figure 2.
Further, the fully-connected layer a6 of the first network branch performs nonlinear transformation on the input feature map, and maps the feature map into a column vector; the dimension of the column vector is equal to the number of categories on the age attribute, and each dimension corresponds to a specific age category;
further, the fully-connected layer B6 of the second network branch performs nonlinear transformation on the input feature map, and maps the feature map into a column vector; the dimensions of the column vector are equal to the number of categories on the gender attribute, with each dimension corresponding to a particular gender category.
Further, Softmax layer a7 of the first network branch converts each dimension of the input vector into a probability value representing the probability of the input image on each category of age attribute;
further, Softmax layer B7 of the second network branch converts each dimension of the input vector into a probability value representing the probability of the input image on each category of gender attribute;
for one or more embodiments, the first adaptive feature fusion layer, the second adaptive feature fusion layer, the third adaptive feature fusion layer, and the fourth adaptive feature fusion layer are identical in structure.
As one or more embodiments, as shown in fig. 3, the first adaptive feature fusion layer includes:
the system comprises a channel level fusion module and a space level fusion module which are sequentially connected, wherein the input end of the channel level fusion module is the input end of the current adaptive feature fusion layer; and the output end of the spatial hierarchy fusion module is the output end of the current adaptive feature fusion layer.
As one or more embodiments, the channel hierarchy fusion module includes:
the first average pooling layer and the second average pooling layer are parallel;
the output ends of the first average pooling layer and the second average pooling layer are connected with the series unit;
the series unit is connected with the first full connection layer, and the first full connection layer is connected with the second full connection layer;
the second full-connection layer is respectively connected with the third full-connection layer and the fourth full-connection layer;
the third full connection layer is connected with the first Softmax function layer;
the fourth full connection layer is connected with the second Softmax function layer;
the first Softmax function layer is respectively connected with the first multiplier and the second multiplier;
the second Softmax function layer is connected with the third multiplier and the fourth multiplier respectively;
the first multiplier and the second multiplier are both connected with the first adder;
the third multiplier and the fourth multiplier are both connected with the second adder.
As one or more embodiments, as shown in fig. 4, the channel level fusion module operates according to the following principle:
firstly, in a channel level fusion module, inputting original feature maps x of two network branchesAAnd xBRespectively carrying out average pooling along the channel dimension to obtain
Figure BDA0002428644860000091
And
Figure BDA0002428644860000092
and will be
Figure BDA0002428644860000093
And
Figure BDA0002428644860000094
are connected together;
then, the connected results are subjected to dimensionality reduction processing respectively through a first full-connection layer and a second full-connection layer to obtain two guide vectors
Figure BDA0002428644860000101
And
Figure BDA0002428644860000102
make it
Figure BDA0002428644860000103
Through the third full connection layer, x is obtainedAAnd xBRespectively corresponding fusion weight vector
Figure BDA0002428644860000104
And
Figure BDA0002428644860000105
make it
Figure BDA0002428644860000106
Through the fourth full connection layer, x is obtainedAAnd xBRespectively corresponding fusion weight vector
Figure BDA0002428644860000107
And
Figure BDA0002428644860000108
wherein the content of the first and second substances,
Figure BDA0002428644860000109
and
Figure BDA00024286448600001010
is equal to the original feature map xAThe number of the channels of (a) is,
Figure BDA00024286448600001011
and
Figure BDA00024286448600001012
is equal to the original feature map xBThe number of channels of (a);
will be provided with
Figure BDA00024286448600001013
And
Figure BDA00024286448600001014
on the corresponding position elementTwo times, so that the Softmax operation is performed
Figure BDA00024286448600001015
Will be provided with
Figure BDA00024286448600001016
And
Figure BDA00024286448600001017
performing Softmax operations on the corresponding position elements two by two, such that
Figure BDA00024286448600001018
Finally, multiplying and adding the original characteristic graph and the fusion weight vector to respectively obtain
Figure BDA00024286448600001019
And
Figure BDA00024286448600001020
namely, it is
Figure BDA00024286448600001021
Will be provided with
Figure BDA00024286448600001022
And
Figure BDA00024286448600001023
input to the spatial hierarchy fusion module.
As one or more embodiments, the spatial hierarchy fusion module includes:
the third average pooling layer and the fourth average pooling layer are arranged in parallel;
the output ends of the third average pooling layer and the fourth average pooling layer are connected with the stacking unit;
the stacking unit is connected with the first convolution layer and the second convolution layer respectively;
the first convolution layer is connected with the fifth full-connection layer, and the second convolution layer is connected with the sixth full-connection layer;
the fifth full connection layer is connected with the third Softmax function layer; the sixth fully connected layer is connected with the fourth Softmax function layer;
the third Softmax function layer is respectively connected with the fifth multiplier and the sixth multiplier;
the fourth Softmax function layer is connected with the seventh multiplier and the eighth multiplier respectively;
the fifth multiplier and the sixth multiplier are both connected with the third adder;
the seventh multiplier and the eighth multiplier are both connected with the fourth adder.
As one or more embodiments, as shown in fig. 5, the spatial hierarchy fusion module operates according to the following principle:
firstly, in a spatial hierarchy fusion module, an input feature map is input
Figure BDA0002428644860000111
And
Figure BDA0002428644860000112
respectively carrying out average pooling along the spatial dimension to obtain
Figure BDA0002428644860000113
And
Figure BDA0002428644860000114
and will be
Figure BDA0002428644860000115
And
Figure BDA0002428644860000116
stacking together;
then, the stacked results are respectively passed through two convolution layers, and only one convolution kernel of 1 × 1 is provided in each convolution layer, so as to obtain two guide matrixes
Figure BDA0002428644860000117
And
Figure BDA0002428644860000118
will be provided with
Figure BDA0002428644860000119
Vectorizing and passing through a full connection layer to obtain
Figure BDA00024286448600001110
And respectively corresponding fusion weight vectors
Figure BDA00024286448600001111
And
Figure BDA00024286448600001112
will be provided with
Figure BDA00024286448600001113
Vectorizing and passing through a full connection layer to obtain
Figure BDA00024286448600001114
And
Figure BDA00024286448600001115
respectively corresponding fusion weight vector
Figure BDA00024286448600001116
And
Figure BDA00024286448600001117
will be provided with
Figure BDA00024286448600001118
And
Figure BDA00024286448600001119
matrixing them to a size equal to the input profile
Figure BDA00024286448600001120
The size of the space of (a).
Will be provided with
Figure BDA00024286448600001121
And
Figure BDA00024286448600001122
matrixing them to a size equal to the input profile
Figure BDA00024286448600001123
The spatial dimension of (a);
will be provided with
Figure BDA00024286448600001124
And
Figure BDA00024286448600001125
performing Softmax operations on the corresponding position elements two by two, such that
Figure BDA00024286448600001126
Will be provided with
Figure BDA00024286448600001127
And
Figure BDA00024286448600001128
performing Softmax operations on the corresponding position elements two by two, such that
Figure BDA00024286448600001129
Finally, multiplying and adding the input characteristic graph and the fusion weight vector to respectively obtain
Figure BDA00024286448600001130
And
Figure BDA00024286448600001131
namely, it is
Figure BDA00024286448600001132
Will be provided with
Figure BDA00024286448600001133
And
Figure BDA00024286448600001134
respectively into the next set of convolution layers of the first network branch and the second network branch.
The method takes the relation among different task characteristic graphs in the multitask deep convolution neural network into consideration, namely when the characteristic fusion is carried out, the degree of sharing or retaining the characteristic information is determined according to the characteristics of the characteristic graphs, and the self-adaptive characteristic fusion is realized.
The second embodiment provides a multitask face attribute classification system based on adaptive feature fusion;
a multitask face attribute classification system based on self-adaptive feature fusion comprises the following steps:
an acquisition module configured to: acquiring a face image to be classified;
a pre-processing module configured to: carrying out preprocessing operation on the face image to be classified;
a classification module configured to: inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute.
In a third embodiment, the present embodiment further provides an electronic device, which includes a memory, a processor, and computer instructions stored in the memory and executed on the processor, where the computer instructions, when executed by the processor, implement the method in the first embodiment.
In a fourth embodiment, the present embodiment further provides a computer-readable storage medium for storing computer instructions, and the computer instructions, when executed by a processor, implement the method of the first embodiment.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (10)

1. The multitask face attribute classification method based on the self-adaptive feature fusion is characterized by comprising the following steps:
acquiring a face image to be classified;
carrying out preprocessing operation on the face image to be classified;
inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute.
2. The method of claim 1, wherein the preprocessing operation comprises:
first, all images are scaled to 224 × 224 pixels;
and then, calculating the pixel average value of the training set image, and subtracting the pixel average value from each face image to be classified to perform normalization operation.
3. The method of claim 1, wherein the obtaining of the multi-tasking face attribute classification model based on adaptive feature fusion comprises:
constructing a multitask neural network model based on self-adaptive feature fusion;
constructing a training set, wherein the training set comprises: the method comprises the following steps of (1) obtaining a plurality of face images, wherein each face image comprises at least two known attributes;
the pre-processing operation of the training set image comprises the steps of firstly scaling all the images to 224 × 224 pixels, then calculating the pixel average value of the training set image, enabling each image to subtract the average value to carry out normalization operation, and finally carrying out horizontal turning and Gaussian blur processing on the training image with set probability before each training;
training a multi-task neural network model based on adaptive feature fusion by using the image after the preprocessing operation to obtain a trained multi-task neural network model based on adaptive feature fusion; namely, the multi-task human face attribute classification model based on the self-adaptive feature fusion.
4. The method of claim 3, wherein the adaptive feature fusion based multitasking neural network model comprises:
two network branches in parallel: a first network branch and a second network branch;
a first network branch comprising: the system comprises a convolution layer group A1, a convolution layer group A2, a convolution layer group A3, a convolution layer group A4, a convolution layer group A5, a full connection layer A6 and a softmax layer A7 which are connected in sequence;
a second network branch comprising: a convolution layer group B1, a convolution layer group B2, a convolution layer group B3, a convolution layer group B4, a convolution layer group B5, a full connection layer B6 and a Softmax layer B7 which are connected in sequence;
and the convolution layer groups corresponding to the first network branch and the second network branch are connected through four self-adaptive feature fusion layers.
5. The method as set forth in claim 4, wherein,
the working principle of the multitask neural network model based on the self-adaptive feature fusion is as follows:
the first network branch and the second network branch receive the same input image, the first network branch is responsible for classifying the age of the face in the input image, the second network branch is responsible for classifying the gender of the face in the input image, and the output of the network branches represents the probability that the input image belongs to each category on the corresponding attribute;
the adaptive feature fusion layer comprises:
the system comprises a channel level fusion module and a space level fusion module which are sequentially connected, wherein the input end of the channel level fusion module is the input end of the current adaptive feature fusion layer; and the output end of the spatial hierarchy fusion module is the output end of the current adaptive feature fusion layer.
6. The method of claim 5, wherein the channel hierarchy fusion module operates on a principle comprising:
firstly, in a channel level fusion module, inputting original feature maps x of two network branchesAAnd xBRespectively carrying out average pooling along the channel dimension to obtain
Figure FDA0002428644850000021
And
Figure FDA0002428644850000022
and will be
Figure FDA0002428644850000023
And
Figure FDA0002428644850000024
are connected together;
then, the connected results are subjected to dimensionality reduction processing respectively through a first full-connection layer and a second full-connection layer to obtain two guide vectors
Figure FDA0002428644850000031
And
Figure FDA0002428644850000032
make it
Figure FDA0002428644850000033
Through the third full connection layer, x is obtainedAAnd xBRespectively corresponding fusion weight vector
Figure FDA0002428644850000034
And
Figure FDA0002428644850000035
make it
Figure FDA0002428644850000036
Through the fourth full connection layer, x is obtainedAAnd xBRespectively corresponding fusion weight vector
Figure FDA0002428644850000037
And
Figure FDA0002428644850000038
wherein the content of the first and second substances,
Figure FDA0002428644850000039
and
Figure FDA00024286448500000310
is equal to the original feature map xAThe number of the channels of (a) is,
Figure FDA00024286448500000311
and
Figure FDA00024286448500000312
is equal to the original feature map xBThe number of channels of (a);
will be provided with
Figure FDA00024286448500000313
And
Figure FDA00024286448500000314
performing Softmax operations on the corresponding position elements two by two, such that
Figure FDA00024286448500000315
Will be provided with
Figure FDA00024286448500000316
And
Figure FDA00024286448500000317
performing Softmax operations on the corresponding position elements two by two, such that
Figure FDA00024286448500000318
Finally, multiplying and adding the original characteristic graph and the fusion weight vector to respectively obtain
Figure FDA00024286448500000319
And
Figure FDA00024286448500000320
namely, it is
Figure FDA00024286448500000321
And
Figure FDA00024286448500000322
will be provided with
Figure FDA00024286448500000323
And
Figure FDA00024286448500000324
input to the spatial hierarchy fusion module.
7. The method of claim 5, wherein the spatial hierarchy fusion module operates on a principle comprising:
firstly, in a spatial hierarchy fusion module, an input feature map is input
Figure FDA00024286448500000325
And
Figure FDA00024286448500000326
respectively carrying out average pooling along the spatial dimension to obtain
Figure FDA00024286448500000327
And
Figure FDA00024286448500000328
and will be
Figure FDA00024286448500000329
And
Figure FDA00024286448500000330
stacking together;
then, the stacked results are respectively passed through two convolution layers, and only one convolution kernel of 1 × 1 is provided in each convolution layer, so as to obtain two guide matrixes
Figure FDA00024286448500000331
And
Figure FDA00024286448500000332
will be provided with
Figure FDA00024286448500000333
Vectorizing and passing through a full connection layer to obtain
Figure FDA00024286448500000334
And respectively corresponding fusion weight vectors
Figure FDA00024286448500000335
And
Figure FDA00024286448500000336
will be provided with
Figure FDA00024286448500000337
Vectorizing and passing through a full connection layer to obtain
Figure FDA00024286448500000338
And
Figure FDA00024286448500000339
respectively corresponding fusion weight vector
Figure FDA00024286448500000340
And
Figure FDA00024286448500000341
will be provided with
Figure FDA00024286448500000342
And
Figure FDA00024286448500000343
matrixing them to a size equal to the input profile
Figure FDA00024286448500000344
The spatial dimension of (a);
will be provided with
Figure FDA00024286448500000345
And
Figure FDA00024286448500000346
matrixing them to a size equal to the input profile
Figure FDA00024286448500000347
The spatial dimension of (a);
will be provided with
Figure FDA0002428644850000041
And
Figure FDA0002428644850000042
performing Softmax operations on the corresponding position elements two by two, such that
Figure FDA0002428644850000043
Will be provided with
Figure FDA0002428644850000044
And
Figure FDA0002428644850000045
performing Softmax operations on the corresponding position elements two by two, such that
Figure FDA0002428644850000046
Finally, multiplying and adding the input characteristic graph and the fusion weight vector to respectively obtain
Figure FDA0002428644850000047
And
Figure FDA0002428644850000048
namely, it is
Figure FDA0002428644850000049
And
Figure FDA00024286448500000410
will be provided with
Figure FDA00024286448500000411
And
Figure FDA00024286448500000412
respectively into the next set of convolution layers of the first network branch and the second network branch.
8. The multitask face attribute classification system based on the self-adaptive feature fusion is characterized by comprising the following steps:
an acquisition module configured to: acquiring a face image to be classified;
a pre-processing module configured to: carrying out preprocessing operation on the face image to be classified;
a classification module configured to: inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executable on the processor, the computer instructions when executed by the processor performing the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 7.
CN202010228805.7A 2020-03-27 2020-03-27 Multi-task face attribute classification method and system based on adaptive feature fusion Active CN111401294B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010228805.7A CN111401294B (en) 2020-03-27 2020-03-27 Multi-task face attribute classification method and system based on adaptive feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010228805.7A CN111401294B (en) 2020-03-27 2020-03-27 Multi-task face attribute classification method and system based on adaptive feature fusion

Publications (2)

Publication Number Publication Date
CN111401294A true CN111401294A (en) 2020-07-10
CN111401294B CN111401294B (en) 2022-07-15

Family

ID=71432935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010228805.7A Active CN111401294B (en) 2020-03-27 2020-03-27 Multi-task face attribute classification method and system based on adaptive feature fusion

Country Status (1)

Country Link
CN (1) CN111401294B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832522A (en) * 2020-07-21 2020-10-27 深圳力维智联技术有限公司 Construction method and system of face data set and computer readable storage medium
CN112215157A (en) * 2020-10-13 2021-01-12 北京中电兴发科技有限公司 Multi-model fusion-based face feature dimension reduction extraction method
CN112651960A (en) * 2020-12-31 2021-04-13 上海联影智能医疗科技有限公司 Image processing method, device, equipment and storage medium
CN112784776A (en) * 2021-01-26 2021-05-11 山西三友和智慧信息技术股份有限公司 BPD facial emotion recognition method based on improved residual error network

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529402A (en) * 2016-09-27 2017-03-22 中国科学院自动化研究所 Multi-task learning convolutional neural network-based face attribute analysis method
CN106815566A (en) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 A kind of face retrieval method based on multitask convolutional neural networks
CN107766850A (en) * 2017-11-30 2018-03-06 电子科技大学 Based on the face identification method for combining face character information
CN108615010A (en) * 2018-04-24 2018-10-02 重庆邮电大学 Facial expression recognizing method based on the fusion of parallel convolutional neural networks characteristic pattern
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning
CN109978074A (en) * 2019-04-04 2019-07-05 山东财经大学 Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning
CN110119689A (en) * 2019-04-18 2019-08-13 五邑大学 A kind of face beauty prediction technique based on multitask transfer learning
CN110197217A (en) * 2019-05-24 2019-09-03 中国矿业大学 It is a kind of to be interlocked the image classification method of fused packet convolutional network based on depth
CN110796239A (en) * 2019-10-30 2020-02-14 福州大学 Deep learning target detection method based on channel and space fusion perception

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529402A (en) * 2016-09-27 2017-03-22 中国科学院自动化研究所 Multi-task learning convolutional neural network-based face attribute analysis method
CN106815566A (en) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 A kind of face retrieval method based on multitask convolutional neural networks
CN107766850A (en) * 2017-11-30 2018-03-06 电子科技大学 Based on the face identification method for combining face character information
CN108615010A (en) * 2018-04-24 2018-10-02 重庆邮电大学 Facial expression recognizing method based on the fusion of parallel convolutional neural networks characteristic pattern
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning
CN109978074A (en) * 2019-04-04 2019-07-05 山东财经大学 Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning
CN110119689A (en) * 2019-04-18 2019-08-13 五邑大学 A kind of face beauty prediction technique based on multitask transfer learning
CN110197217A (en) * 2019-05-24 2019-09-03 中国矿业大学 It is a kind of to be interlocked the image classification method of fused packet convolutional network based on depth
CN110796239A (en) * 2019-10-30 2020-02-14 福州大学 Deep learning target detection method based on channel and space fusion perception

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SANGHYUN WOO等: "CBAM: Convolutional Block Attention Module", 《ARXIV:1807.06521V2》 *
YILONG CHEN等: "Channel-Unet: A Spatial Channel-Wise Convolutional Neural Network for Liver and Tumors Segmentation", 《FRONTIERS IN GENETICS》 *
曹洁等: "基于自适应特征融合的人脸识别", 《计算机工程与应》 *
谢郑楠: "基于多任务特征选择和自适应模型的人脸特征点检测", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832522A (en) * 2020-07-21 2020-10-27 深圳力维智联技术有限公司 Construction method and system of face data set and computer readable storage medium
CN111832522B (en) * 2020-07-21 2024-02-27 深圳力维智联技术有限公司 Face data set construction method, system and computer readable storage medium
CN112215157A (en) * 2020-10-13 2021-01-12 北京中电兴发科技有限公司 Multi-model fusion-based face feature dimension reduction extraction method
CN112215157B (en) * 2020-10-13 2021-05-25 北京中电兴发科技有限公司 Multi-model fusion-based face feature dimension reduction extraction method
CN112651960A (en) * 2020-12-31 2021-04-13 上海联影智能医疗科技有限公司 Image processing method, device, equipment and storage medium
CN112784776A (en) * 2021-01-26 2021-05-11 山西三友和智慧信息技术股份有限公司 BPD facial emotion recognition method based on improved residual error network

Also Published As

Publication number Publication date
CN111401294B (en) 2022-07-15

Similar Documents

Publication Publication Date Title
CN111401294B (en) Multi-task face attribute classification method and system based on adaptive feature fusion
CN110175671B (en) Neural network construction method, image processing method and device
CN111767979B (en) Training method, image processing method and image processing device for neural network
Gao et al. Global second-order pooling convolutional networks
CN112613581B (en) Image recognition method, system, computer equipment and storage medium
CN112926641B (en) Three-stage feature fusion rotating machine fault diagnosis method based on multi-mode data
CN111091130A (en) Real-time image semantic segmentation method and system based on lightweight convolutional neural network
CN110717851A (en) Image processing method and device, neural network training method and storage medium
WO2021218517A1 (en) Method for acquiring neural network model, and image processing method and apparatus
CN113159067A (en) Fine-grained image identification method and device based on multi-grained local feature soft association aggregation
CN111950699A (en) Neural network regularization method based on characteristic space correlation
CN113298235A (en) Neural network architecture of multi-branch depth self-attention transformation network and implementation method
CN111079767A (en) Neural network model for segmenting image and image segmentation method thereof
CN112766458A (en) Double-current supervised depth Hash image retrieval method combining classification loss
CN114612761A (en) Network architecture searching method for image recognition
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN114846382A (en) Microscope and method with convolutional neural network implementation
CN114298289A (en) Data processing method, data processing equipment and storage medium
CN115330759B (en) Method and device for calculating distance loss based on Hausdorff distance
CN116246110A (en) Image classification method based on improved capsule network
CN116756391A (en) Unbalanced graph node neural network classification method based on graph data enhancement
CN113516580B (en) Method and device for improving neural network image processing efficiency and NPU
CN113688946B (en) Multi-label image recognition method based on spatial correlation
CN112529064B (en) Efficient real-time semantic segmentation method
CN112560824B (en) Facial expression recognition method based on multi-feature adaptive fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant