CN115565051B - Lightweight face attribute recognition model training method, recognition method and device - Google Patents

Lightweight face attribute recognition model training method, recognition method and device Download PDF

Info

Publication number
CN115565051B
CN115565051B CN202211421512.6A CN202211421512A CN115565051B CN 115565051 B CN115565051 B CN 115565051B CN 202211421512 A CN202211421512 A CN 202211421512A CN 115565051 B CN115565051 B CN 115565051B
Authority
CN
China
Prior art keywords
face
face attribute
training
recognition model
extraction network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211421512.6A
Other languages
Chinese (zh)
Other versions
CN115565051A (en
Inventor
郭理鹏
陆金刚
王为
方伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Xinsheng Electronic Technology Co Ltd
Original Assignee
Zhejiang Xinsheng Electronic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Xinsheng Electronic Technology Co Ltd filed Critical Zhejiang Xinsheng Electronic Technology Co Ltd
Priority to CN202211421512.6A priority Critical patent/CN115565051B/en
Publication of CN115565051A publication Critical patent/CN115565051A/en
Application granted granted Critical
Publication of CN115565051B publication Critical patent/CN115565051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a lightweight face attribute recognition model training method, a recognition method, computer equipment and a storage medium. The training method of the model comprises the steps of preprocessing an acquired face data set to form a face training image set; the method comprises the steps that a feature extraction network comprising a plurality of sequentially transmitted structure blocks is constructed based on face attribute data, an output feature map of each structure block is a multi-dimensional tensor and integrates input feature map information, channel information obtained based on input feature map transformation and space position information. And obtaining the predicted probability value of each face attribute category according to the output of the feature extraction network, and determining a loss function selected by the face attribute category in error loss calculation based on the relationship between the predicted probability value and the probability threshold value over-parameter. And training the constructed feature extraction network according to the error loss output by the error loss function to obtain a face attribute recognition model.

Description

Lightweight face attribute recognition model training method, recognition method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a lightweight face attribute recognition model training method, a recognition method, a computer device, and a storage medium.
Background
With the rapid development of society, quick and effective automatic identity authentication becomes more and more important in the security field. The human face attribute identification is the most direct means for identity verification, compared with other human body biological characteristics, the human face attribute identification has the characteristics of being direct, accurate and efficient, is more easily accepted by users and is not easy to perceive, and has become an important auxiliary means in the fields of intelligent monitoring, public security systems, safety verification systems and the like.
The face attribute recognition is similar to other biological feature recognition technologies, and the features are extracted firstly, and then the extracted features are classified. The traditional machine learning method firstly adopts methods such as Local Binary Pattern (LBP) and Scale Invariant Feature Transform (SIFT) to extract features from a face training image, and then inputs the features into classifiers such as a Support Vector Machine (SVM) and a Decision Tree (DT) for classification, so as to obtain a face attribute result. The method improves the performance of face attribute recognition to a certain extent, but is easily influenced by factors such as environment, illumination, posture and the like, and is difficult to obtain good effect in industrial application.
At present, a face attribute recognition method based on deep learning is mainly adopted, and a robust high-precision algorithm model is trained through a large number of face training images with standard postures. The method mainly comprises two types, wherein one type is that each attribute is identified by adopting an independent model, a plurality of models occupy a large amount of resources when running simultaneously, the identification speed is slow, and particularly, good effects are difficult to be exerted in edge equipment with limited computing resources. In the method, the extraction of the human face attribute features is usually carried out by using a depth separable convolution or a depth residual error network. For the deep residual error network, because the number of the feature layers is large and fixed, the parameter quantity of the human face attribute model trained by the deep residual error network is large, the consumption of memory resources is large, and the application in a low-power chip is difficult. For the separable convolution network, although the parameter quantity and the calculated quantity of the model can be reduced to a certain extent, when the model is quantized in the process of deploying the model to the chip, the precision loss of the separable convolution is large, so that the model reasoning result generates errors, and the accuracy rate of identification is seriously influenced. In addition, when a multi-branch model predicts multiple attributes simultaneously, the model data set needs to label multiple attributes in each picture simultaneously. In an actual scene, the probability of certain attributes is low, when the data volume is large, serious data imbalance can be caused, and the conditions of part of labeling errors and low data set quality can be caused, so that the problems of high model training difficulty, model overfitting, low recognition accuracy, difficult algorithm falling and the like are caused.
Disclosure of Invention
In order to overcome at least one defect of the prior art, the invention provides a lightweight face attribute recognition model training method, a recognition method, computer equipment and a storage medium.
In order to achieve the above object, the present invention provides a training method for a lightweight face attribute recognition model, which comprises:
preprocessing the acquired face data set to form a face training image set, wherein each face training image in the face training image set carries corresponding face attribute data and a label value;
constructing a feature extraction network fusing input information, channel information and spatial position information based on the face attribute data to extract features of the face training image; the feature extraction network comprises a plurality of sequentially transmitted structure blocks, an input feature map of each structure block is transformed to generate a first feature map, and the first feature maps are aggregated along two mutually perpendicular spatial dimensions to respectively obtain channel information and spatial position information; embedding the obtained channel information and the spatial position information into the first characteristic diagram to form a second characteristic diagram; fusing the second feature map to the input feature map of the structure block to form an output feature map of the multi-dimensional tensor;
determining a loss function selected by each face attribute category when calculating loss errors according to the output of the feature extraction network; if the predicted probability value of a certain face attribute category is smaller than the probability threshold value super-parameter, selecting a first loss function containing the probability threshold value super-parameter to calculate the error loss between the predicted probability value of the attribute category and the label value; otherwise, selecting a second loss function to calculate the error loss of the attribute category;
training the constructed feature extraction network according to the error loss to obtain a face attribute recognition model; and dynamically updating the probability threshold value hyperparameter and the number of the structural blocks and the number of channels in the feature extraction network in the training process.
According to an embodiment of the invention, the output feature map of the last structure block in the feature extraction network is subjected to dimensionality reduction conversion, starting from a certain dimension of the input feature map information, the rest dimensions are converted into one-dimensional vectors and then output to the full-connection layer, and the full-connection layer outputs the predicted values containing all face attribute categories.
According to an embodiment of the invention, based on the type of the face attribute in the face attribute data and the category corresponding to each attribute, the prediction probability value of each face attribute category is respectively obtained from the prediction values which are output by the feature extraction network and contain all the face attribute categories.
According to an embodiment of the invention, the input feature map of each structure block is convolved, regularized and transformed by a nonlinear activation function to form a first feature map.
According to an embodiment of the invention, the first loss function introduces a probability threshold hyperparameter on the basis of the second loss function to attenuate the loss weight; the first loss function and the second loss function respectively comprise a balance over-parameter for balancing the positive and negative samples and a difficulty over-parameter for balancing the simple and difficult samples, and the balance over-parameter and the difficulty over-parameter are both over-parameters.
According to an embodiment of the invention, the probability threshold hyperparameter is dynamically updated in the training process by taking a preset turn of traversing the face training image training set as a period.
According to one embodiment of the invention, the parameters of the feature extraction network are saved as candidate identification models each time the update probability threshold exceeds the parameters; and inputting the test set into a plurality of candidate recognition models, and selecting the candidate recognition model with the optimal prediction accuracy of the test set as the trained face attribute recognition model.
The invention also provides a lightweight face attribute identification method, which comprises the following steps:
acquiring a face image to be recognized;
and carrying out face attribute recognition on the face image to be recognized by using the face attribute recognition model obtained by training by using the lightweight face attribute recognition model training method to obtain a recognition result.
The invention also provides computer equipment which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the lightweight face attribute recognition model training method or the lightweight face attribute recognition method when executing the computer program.
In another aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the lightweight face attribute recognition model training method or the lightweight face attribute recognition method.
In summary, the lightweight face attribute recognition model training method and the recognition method provided by the invention adopt a single model to recognize the multiple attribute features of the face, and respectively perform aggregation on two spatial dimensions on the first feature map formed after each structural block is converted when a feature extraction network is constructed so as to respectively generate channel information and position information. The second characteristic diagram formed by embedding the two pieces of position information not only considers the relationship between the characteristic diagram channels but also considers the position information of the space, so that the model can better position and identify the target, and the parameter and the calculated amount of the model are effectively reduced. Meanwhile, the input feature map of each module is fused again on the basis of the second feature map so as to retain the input feature map information, thereby effectively making up the loss of the input feature information caused by the feature map to the processing transformation in the feature extraction process and improving the accuracy of identification.
In addition, in the aspect of data, the problems of model overfitting and low identification accuracy caused by training of image samples can be effectively solved by selecting a loss function based on the probability threshold hyperparameters and dynamically adjusting the probability threshold hyperparameters in the training process, and the identification accuracy of the model is greatly improved. The training method and the recognition method of the lightweight face attribute recognition model provided by the invention can be deployed in low-calculation-force edge equipment and have high recognition accuracy.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
FIG. 1 is an application scenario diagram of a lightweight face attribute recognition model training method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a lightweight face attribute recognition model training method according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a feature extraction network module.
Fig. 4 is a schematic diagram illustrating the principle of each module in the feature extraction network.
Fig. 5 is a schematic specific flowchart of step S30 in fig. 2.
Fig. 6 is a schematic diagram corresponding to fig. 5.
Fig. 7 is a schematic diagram illustrating a specific flowchart of step S40 in fig. 2.
Fig. 8 is a schematic structural diagram of a lightweight face attribute recognition model training device according to an embodiment of the present invention.
Fig. 9 is a schematic flow chart of a lightweight face attribute identification method according to an embodiment of the present invention.
Fig. 10 is a diagram showing an internal structure of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The training method for the lightweight face attribute recognition model provided by the embodiment can be applied to an application environment shown in fig. 1. Wherein the terminal 101 communicates with the server 102 via a network. The server 102 receives a model training instruction sent by the terminal 101, and the server 102 responds to the model training instruction to obtain a face training image set, wherein the face training image set comprises a plurality of face training images, and each face training image is correspondingly marked with face attribute data and a label value. The server 102 constructs a feature extraction network based on the face attribute data of each face training image and inputs a plurality of face training images in the face training image set into the constructed feature extraction network. The server 102 continuously trains the feature extraction network based on the error loss between the predicted probability value and the label value of each face attribute category output by the feature extraction network, and takes the trained feature extraction network as a face attribute recognition model. The terminal 101 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 102 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In an embodiment, as shown in fig. 2, a lightweight face attribute recognition model training method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
and step S10, acquiring a face data set. The face data set is a set comprising a plurality of images with different facial attributes, and the facial attributes of the images in the face data set can be understood as various types, such as various attributes including different sexes, different ages, different expressions, whether glasses are worn or not, whether a mask is worn or not, and the like. Specifically, the face data set is obtained through the following steps: and acquiring a face image from the starting database and the monitoring scene data and cleaning the face image to obtain a face data set. Specifically, the terminal 102 may carry a link of open source data in the issued training instruction so that the server can collect the face data set based on the link.
And step S20, preprocessing the acquired face data set to form a face training image set, wherein each face training image in the face training image set carries corresponding face attribute data and a label value.
Step S201, each face image in the face data set is detected to obtain a face region and a key point of each face image. And correcting the face region according to the key points on the detected face image by taking the position relation among the key points on the face in the standard posture as a reference, so that each acquired face image is converted into the standard posture and is scaled to the uniform size to form a face training image.
Step S202, performing attribute labeling on each face training image to form corresponding face attribute data and label values. The face attribute data includes the number of face attributes and the category included in each attribute. In this embodiment, the number of attributes is five, which includes gender (male, female), wearing glasses (wearing glasses, not wearing glasses), expression (laughing, not laughing), age (children, young people, old people), mask (wearing mask, not wearing mask); i.e. each attribute contains the number of categories of
Figure 402142DEST_PATH_IMAGE001
(sex) of the animal,
Figure 850441DEST_PATH_IMAGE002
(wearing the glasses),
Figure 36702DEST_PATH_IMAGE003
(the expression of the facial expression),
Figure 228649DEST_PATH_IMAGE004
(age) of the subject to be examined,
Figure 964393DEST_PATH_IMAGE005
(mask). However, the present invention does not set any limit to the number of face attributes and the category of each attribute. In other embodiments, different types and numbers of attributes may be selected according to different application scenarios.
The class of each attribute is encoded to form a corresponding tag value, such as gender: female is 0, male is 1; and coding and labeling are carried out according to the rule, and a label value corresponding to each attribute type is generated. And storing a plurality of face training images carrying the face attribute data and the label values into a database of a server to form a face training image set.
When the server 102 receives the model training instruction, a face training image set is obtained from the database in response to the model training instruction. Step S30, based on the number of the face attributes in the face attribute dataMAnd number of categories per attributen i And constructing a feature extraction network fusing the input information, the channel information and the spatial position information to extract the features of the face training image. The present embodiment describes the steps of constructing the feature extraction network in detail based on the Regnet network structure. However, the present invention does not set any limit to the choice of this underlying network. In other embodiments, other neural network structures comprising a plurality of sequentially transmitted structure blocks may be selected as the model basis, such as a CNN network.
As shown in fig. 3 and 4, the Regnet Network (Network) is mainly composed of three parts, namely a trunk (Stem), a Body (Body), and a Head (Head), wherein the trunk and the Head are fixed, the trunk is a common convolutional layer with a convolution kernel of 3 × 3 and a step size of 2. The header is a classifier composed of a global pooling and full connection layer. The most important is that the Body (Body) part consists of a stack of 4 stages (stages), each Stage consisting of a series of building blocks (blocks) stacked in turn.
The steps for constructing the feature extraction network based on the Regnet network structure provided in this embodiment will be described in detail below with reference to fig. 3 and 6.
After obtaining the underlying Regnet model:
step S301, inputting characteristic diagram of each structure block
Figure 193380DEST_PATH_IMAGE006
Performing transformation processing to form a first feature map
Figure 257151DEST_PATH_IMAGE007
. In this embodiment, the transformation process includes convolution, regularization, and Relu activation function. However, the present invention is not limited thereto.
Step S302, a first characteristic diagram
Figure 633816DEST_PATH_IMAGE007
Feature aggregation is performed along two spatial dimensions, horizontal and vertical, respectively, and the encoded height is
Figure 833853DEST_PATH_IMAGE008
To (1) a
Figure 233742DEST_PATH_IMAGE009
The output of each channel is represented as:
Figure 909443DEST_PATH_IMAGE010
has a width of
Figure 177613DEST_PATH_IMAGE011
To (1) a
Figure 373102DEST_PATH_IMAGE009
The output of each channel is represented as:
Figure 334105DEST_PATH_IMAGE012
wherein, the first and the second end of the pipe are connected with each other,
Figure 998567DEST_PATH_IMAGE013
indicating the second in the width direction
Figure 680215DEST_PATH_IMAGE013
A plurality of coordinate points;
Figure 120423DEST_PATH_IMAGE014
inputting the first feature map on the c channel;
Figure 111382DEST_PATH_IMAGE015
indicating the first in the height direction
Figure 512408DEST_PATH_IMAGE015
A plurality of coordinate points;
Figure 122380DEST_PATH_IMAGE007
is a first characteristic diagram of the light source,
Figure 774685DEST_PATH_IMAGE016
and
Figure 77491DEST_PATH_IMAGE017
are respectively a first characteristic diagram
Figure 434654DEST_PATH_IMAGE007
C is the channel index in the first characteristic diagram,
Figure 848318DEST_PATH_IMAGE018
and
Figure 856594DEST_PATH_IMAGE019
width and height of the first profile, respectively.
To the first feature map by the above two transformations
Figure 330300DEST_PATH_IMAGE007
In the horizontal and vertical directionsAnd the direction is aggregated to obtain a characteristic diagram of the perception channel information in the horizontal dimension, and obtain a characteristic diagram of the perception space position information in the vertical dimension.
And step 303, performing intermediate feature mapping on the two acquired perceptual feature maps. Specifically, the two perceptual feature maps obtained in step S302 are connected, then transformed by 1 × 1 convolution, and processed by using a non-linear activation function to form an intermediate feature map
Figure 440339DEST_PATH_IMAGE020
(ii) a Namely that
Figure 392114DEST_PATH_IMAGE020
Intermediate feature maps that encode the channel information and spatial position information in the horizontal and vertical directions.
Figure 21941DEST_PATH_IMAGE021
Wherein, [, ]]For join operations along the spatial dimension, F is a 1x1 convolution,
Figure 666549DEST_PATH_IMAGE022
for the non-linear activation function, the Relu activation function is used here.
Step S304, mapping the intermediate features along two spatial dimensions, namely horizontal and vertical
Figure 998304DEST_PATH_IMAGE020
Decomposed into two tensors
Figure 753771DEST_PATH_IMAGE023
Figure 471060DEST_PATH_IMAGE024
. Using two 1x1 convolution variations
Figure 286569DEST_PATH_IMAGE025
And
Figure 105621DEST_PATH_IMAGE026
respectively will be respectively provided with
Figure 664778DEST_PATH_IMAGE023
And
Figure 688DEST_PATH_IMAGE024
transformation into the first characteristic diagram
Figure 862465DEST_PATH_IMAGE007
Tensor with same number of channels:
Figure 293446DEST_PATH_IMAGE027
Figure 780928DEST_PATH_IMAGE028
herein, the
Figure 613755DEST_PATH_IMAGE029
Is the sigmoid activation function.
Step S305, two tensors of the transformed representation channel information and space position information
Figure 646433DEST_PATH_IMAGE030
And
Figure 190809DEST_PATH_IMAGE031
embedded in the first characteristic diagram
Figure 91769DEST_PATH_IMAGE007
To output a second characteristic diagram:
Figure 388889DEST_PATH_IMAGE032
step S306, obtaining a second characteristic diagram
Figure 982682DEST_PATH_IMAGE033
Then the input feature map of the structure block is compared with the input feature map of the structure block
Figure 247310DEST_PATH_IMAGE006
And fusing to obtain an output characteristic diagram of the multi-dimensional tensor containing the input characteristic diagram information, the channel information and the spatial position information. In this embodiment, the second characteristic diagram is obtained
Figure 951961DEST_PATH_IMAGE033
Then, the feature map is input
Figure 369167DEST_PATH_IMAGE006
Adding the obtained data and taking the obtained data as an output characteristic diagram after passing through a nonlinear activation function Relu
Figure 868281DEST_PATH_IMAGE034
Figure 118740DEST_PATH_IMAGE035
Wherein, c is the channel index,
Figure 502448DEST_PATH_IMAGE036
the function is activated for Relu.
The feature extraction network provided by this embodiment is based on a Regnet network based on the relationship between feature map channels and embeds spatial position information to simultaneously consider the relationship between feature map channels and feature spaces (CA is used to represent the embedding of channel information and spatial position information in each Block in fig. 3 and 4), so that each weight parameter in the second feature map contains the inter-channel information and the spatial position information. The consideration of the channel information and the spatial information greatly improves the capability of extracting the network positioning target information by the characteristics, improves the identification accuracy rate and greatly reduces the calculation amount of the parameters. Furthermore, the second characteristic diagram and the input characteristic diagram are fused again, so that the output characteristic diagram of each structural block is a four-dimensional tensor [ S, C, H, W ]; the method comprises the following steps of S, acquiring a human face training image, wherein S is the number of human face training images and represents the number of input images for each training; c is the number of the characteristic image channels, and the characteristic is channel information; h is the feature map height, W is the feature map width, and the representation is the spatial position information. The second feature map and the input feature map are fused again, information of the input feature map is well reserved, information loss caused by feature map transformation in the process of embedding spatial position information is effectively avoided, and comprehensiveness and integrity of feature extraction are improved so as to further guarantee accuracy of model identification. However, the present invention does not set any limit to the specific form of the output characteristic diagram. In other embodiments, the output profile may also include a plurality of dimensions characterizing a plurality of input profile information.
As shown in fig. 3 and 4, in one stage (stage), the output of each structure block will be the input of the next structure block; the output of the last fabric block in the previous stage is input to the first fabric block in the next stage. The output of the last building block in the last phase forms the output of the body (body) which will be connected into the head (head).
In the present embodiment, step S307: and performing dimension reduction conversion on the output feature map of the last structural block in the feature extraction network, starting from a certain dimension of the input feature map information, converting the rest dimensions into one-dimensional vectors and outputting the one-dimensional vectors to a full connection layer, and outputting a predicted value containing all face attribute categories by the full connection layer. In particular, the four-dimensional tensor [ S, C, H, W ] output for the last structure block in the last phase]Expanding from the first dimension S, converting the latter dimension into one-dimensional vector, and changing into [ S, C H W ]]. Then, the two-dimensional vector [ S, C H W ] is used]As input to the full connection layer, connect
Figure 898795DEST_PATH_IMAGE037
All connected layers of (C), output vector [ S, N]Wherein N is the number of all classes in the M kinds of face attributesn i The sum of (a) and (b). As stated above, the sum N =2+ 3+ 2+ 11 of all the categories in the five face attributes outputs the vector [ S,11 +11]Namely the predicted value of 11 face attribute categories. However, the present invention is directed to face attribute categories andthe number of categories each attribute contains is not limited in any way.
And S40, obtaining the prediction probability value of each face attribute type according to the output of the feature extraction network, and determining a loss function used by the face attribute type in calculating error loss based on the relation between the prediction probability value and the probability threshold value over-parameter. Since the predicted value of each face attribute class is output by the full connection layer in the feature extraction network constructed in step S30, in order to determine the relationship between the predicted probability value and the probability threshold value hyperparameter of each face attribute class, this embodiment provides an implementation manner of step S40, as shown in fig. 7, specifically as follows:
step S401, converting the prediction value of each face attribute category into a corresponding prediction probability. In the face attribute, since each type of attribute has two or three outputs, for example, for the gender attribute, the output types are two and are male and female respectively; similarly, the three attributes of wearing glasses, expression, and mask have two output categories, while the age attribute has three output categories of children, young people, and old people. In the model training and recognition process, only one output class is confirmed to be correct for each attribute, so a softmax function is adopted to convert the predicted value of each attribute class into a corresponding predicted probability value. In this embodiment, the five attributes have 11 predicted values, and the corresponding softmax function forms 11 predicted probability values.
After the prediction probability values of all the face attribute classes are obtained, error loss between the prediction probability value of each face attribute class and the corresponding label value needs to be calculated so as to guide the optimization training of the model. In this embodiment, since the face image includes multiple attributes, each of which includes multiple categories, the face attribute data set is a multi-attribute data set, which is very prone to problems such as unbalanced data, incorrect labeling of data portions, and low quality of the data set; the existing error loss calculation based on the cross entropy loss function excessively depends on the data with few samples, so that overfitting is caused. In view of this, the embodiment introduces the equalization hyper-parameter and the difficulty hyper-parameter on the basis of the focallloss functionNumber and probability threshold hyperparameters
Figure 693444DEST_PATH_IMAGE038
To form a first loss function and a second loss function; and based on the predicted probability value and the probability threshold hyperparameter
Figure 808031DEST_PATH_IMAGE038
The first loss function or the second loss function is selected according to the relationship, so that the problems of data imbalance, part labeling errors and low quality of the multi-attribute data set are greatly solved, and the specific steps are as follows:
step S402, judging whether the prediction probability value of each face attribute category obtained in step S401 is smaller than the probability threshold value and exceeds the parameters
Figure 464271DEST_PATH_IMAGE038
. If the judgment shows that the predicted probability value of a certain face attribute class
Figure 980703DEST_PATH_IMAGE039
Sub-probability threshold hyperparameter
Figure 713298DEST_PATH_IMAGE038
Then, step S403 is performed: selecting hyper-parameters containing probability threshold
Figure 315181DEST_PATH_IMAGE038
Calculating a predicted probability value of the attribute class
Figure 775112DEST_PATH_IMAGE039
And error loss between tag values based on a probability threshold over-parameter
Figure 536263DEST_PATH_IMAGE038
To attenuate the loss weights. If the step S402 judges that the predicted probability value of a certain face attribute category is larger than or equal to the probability threshold value hyperparameter
Figure 548082DEST_PATH_IMAGE038
Then a second penalty function is selected to calculate the error penalty for the attribute class. The first loss function introduces a probability threshold hyperparameter on the basis of the second loss function
Figure 512627DEST_PATH_IMAGE038
The expression of both is as follows:
Figure 900883DEST_PATH_IMAGE040
wherein, the first and the second end of the pipe are connected with each other,
Figure 749497DEST_PATH_IMAGE041
loss of error;
Figure 73162DEST_PATH_IMAGE039
is the predicted probability value of the face attribute class after the softmax function,
Figure 649637DEST_PATH_IMAGE042
(ii) a Alpha and gamma are hyperparameters, alpha is an equalization coefficient equalization hyperparameter for balancing positive and negative samples,
Figure 700638DEST_PATH_IMAGE043
Figure 780590DEST_PATH_IMAGE044
is a difficulty coefficient difficulty hyperparameter balancing simple and difficult samples,
Figure 275156DEST_PATH_IMAGE045
Figure 338927DEST_PATH_IMAGE046
is a probability threshold hyperparameter. In the case of a preferred one,
Figure 695084DEST_PATH_IMAGE047
for probability threshold hyperparameter
Figure 629542DEST_PATH_IMAGE046
When the predicted probability value of the face attribute category
Figure 295010DEST_PATH_IMAGE039
Sub-probability threshold hyperparameter
Figure 580498DEST_PATH_IMAGE046
When the temperature of the water is higher than the set temperature,
Figure 238881DEST_PATH_IMAGE046
and the loss weight is attenuated, so that the influence of a small number of error samples on the model is effectively reduced. Setting up the initial
Figure 168791DEST_PATH_IMAGE048
Figure 129794DEST_PATH_IMAGE049
For the number of times the probability threshold exceeds the parameter update, initial
Figure 260167DEST_PATH_IMAGE050
Probability threshold hyperparameter
Figure 332028DEST_PATH_IMAGE046
The dynamic adjustment process of (2) is as follows:
Figure 647603DEST_PATH_IMAGE051
probability threshold hyperparameter with increasing training times
Figure 513928DEST_PATH_IMAGE046
It is gradually increased to gradually reduce the effect of labeling the wrong sample on the model effect. Specifically, in the training process, a preset round (epoch) of traversing a face training image training set is taken as a period to dynamically update the probability threshold hyperparameter. Such as every 50 rounds of model training (epoch),
Figure 898642DEST_PATH_IMAGE049
increase by 1, probability threshold over-parameter
Figure 508615DEST_PATH_IMAGE046
And also dynamically adjusted accordingly.
And S50, training the constructed feature extraction network according to the error loss, and dynamically updating the probability threshold value hyperparameter and the number of the structural blocks and the number of the channels in the feature extraction network in the training process to obtain a face attribute recognition model. Specifically, the loss errors calculated by the first loss function and the second loss function in step S40 are input to the adam optimizer, which has an initial learning rate of 0.001 and a weight attenuation coefficient of 0.0005. Training parameters were set, initial learning rate 0.001, batch size 128, 1000 rounds (epoch) of training. And (3) loading the pre-training model constructed in the step (S30), taking the face attribute data and the label value in the face training image set in the step (S10) as input, and continuously repeating the steps (S20) to (S50) to train the model to obtain the face attribute recognition model meeting the requirements.
The model constructed in step S30 of this embodiment is based on the Regnet network, and based on the adjustability of the number of structure blocks (Block) and the number of channels in the model, the number of structure blocks and the number of channels in each Stage (Stage) and the probability threshold value hyperparameter are dynamically adjusted in the training process
Figure 413117DEST_PATH_IMAGE046
So as to obtain a model with optimal identification accuracy. In this embodiment, the probability threshold per update is saved
Figure 342021DEST_PATH_IMAGE046
Taking parameters of the time-domain feature extraction network as candidate identification models; for example, updating the probability threshold hyperparameter after every 50 rounds of training (epoch)
Figure 89397DEST_PATH_IMAGE046
And storing the characteristic extraction network parameters at the moment to form a candidate identification model. To be obtained in advanceAnd inputting the face image test set into a plurality of candidate recognition models, and selecting the candidate recognition model with the optimal prediction accuracy of the test set as the trained face attribute recognition model. The face test images in the face image test set are obtained by adopting the acquisition and preprocessing modes of the step S10 and the step S20. However, the determination method of the optimal face attribute recognition model is not limited in any way. In other embodiments, when the number of the structure blocks and the number of the channels in the basic network on which the training model is based cannot be adjusted, the optimal face attribute recognition model can be obtained according to the convergence degree of the loss function and the test accuracy as conditions.
In one embodiment, after the face attribute recognition model is obtained through training, the face attribute recognition model can be used for face attribute recognition. Specifically, a face image to be recognized is acquired, and the face image to be recognized is input to the face attribute recognition model. The facial attribute recognition model determines a plurality of attributes in the facial image, such as attributes of different sexes, different glasses, different expressions, different ages, different masks and the like, by performing feature extraction and feature classification on the facial image to be recognized.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In an embodiment, as shown in fig. 8, the embodiment further provides a lightweight face attribute recognition model training device, including:
the acquisition module 10 acquires a face data set.
And the preprocessing module 20 is used for preprocessing the acquired face data set to form a face training image set, wherein each face training image in the face training image set carries corresponding face attribute data and a label value.
The network-building block 30 is provided with,
constructing a feature extraction network fusing input information, channel information and spatial position information based on the face attribute data to extract features of the face training image; the feature extraction network comprises a plurality of sequentially transmitted structure blocks, an input feature map of each structure block is transformed to generate a first feature map, and the first feature maps are aggregated along two mutually perpendicular spatial dimensions to respectively obtain channel information and spatial position information; embedding the obtained channel information and the obtained spatial position information into the first characteristic diagram to form a second characteristic diagram; and fusing the second feature map to the input feature map of the structure block to form an output feature map of the multi-dimensional tensor.
A loss function determining module 40, which determines a loss function selected by each face attribute category when calculating a loss error according to the output of the feature extraction network; if the predicted probability value of a certain face attribute category is smaller than the probability threshold value super-parameter, selecting a first loss function containing the probability threshold value super-parameter to calculate the error loss between the predicted probability value of the attribute category and the label value; otherwise, selecting a second loss function to calculate the error loss of the attribute category;
the training module 50 trains the constructed feature extraction network according to the error loss to obtain a face attribute recognition model; and dynamically updating the probability threshold value hyperparameter and the number of the structural blocks in the feature extraction network in the training process.
In an embodiment, the network construction module 30 further performs dimension reduction conversion on the output feature map of the last structure block in the feature extraction network, and outputs the output feature map to the full-connected layer after converting the remaining dimensions into one-dimensional vectors from a certain dimension in which the information of the feature map is input, where the full-connected layer outputs the predicted values including all face attribute categories.
In one embodiment, based on the type of the face attribute in the face attribute data and the category corresponding to each attribute, the loss function determining module 40 further obtains a prediction probability value of each face attribute from prediction values including all face attributes output by the feature extraction network.
In one embodiment, the training module 50 dynamically updates the probability threshold hyperparameters during the training process with a preset number of passes through the training set of face training images as a period.
In one embodiment, training module 50 saves the feature extraction network parameters as candidate recognition models each time the update probability threshold exceeds the parameter; and inputting the test set into a plurality of candidate recognition models, and selecting the candidate recognition model with the optimal prediction accuracy of the test set as the trained face attribute recognition model.
For the specific limitation of the training apparatus for the lightweight face attribute recognition model, reference may be made to the above limitation on the training method for the lightweight face attribute recognition model, and details are not described here again. All modules in the lightweight face attribute recognition model training device can be completely or partially realized through software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In an embodiment, as shown in fig. 9, there is further provided a lightweight face attribute identification method, where the identification method includes:
step S100, a face image to be recognized is obtained, where the face image to be recognized may be an image obtained by shooting through a camera on a monitoring device or an electronic terminal.
And S200, correcting the face image to be recognized to a standard face posture based on the preprocessing step of the step S20 in the model training method.
And step S300, loading the face attribute recognition model obtained by training the lightweight face attribute recognition model training method, and inputting the preprocessed face image to be recognized to obtain a face attribute recognition result.
FIG. 10 is a diagram that illustrates an internal structure of the computer device in one embodiment. The computer device may specifically be the server 102 in fig. 1. As shown in fig. 10, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program, which, when executed by the processor, causes the processor to implement the lightweight face attribute recognition model training method. The internal memory may also store a computer program, and when the computer program is executed by the processor, the computer program may enable the processor to execute a lightweight face attribute recognition model training method.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the lightweight face attribute recognition model training apparatus provided in the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 10. The memory of the computer device may store various program modules constituting the lightweight face attribute recognition model training apparatus, such as the obtaining module 10, the preprocessing module 20, the network construction module 30, the loss function determination module 40 and the training module 50 shown in fig. 8. The program modules constitute computer programs that cause the processor to execute the steps of the lightweight face attribute recognition model training method of the embodiments of the present application described in the present specification.
In one embodiment, a computer device is provided, which includes a memory and a processor, the memory storing a computer program, which when executed by the processor, causes the processor to perform the steps of the above-mentioned lightweight face attribute recognition model training method. Here, the steps of the lightweight face attribute recognition model training method may be steps in the lightweight face attribute recognition model training methods of the above embodiments.
In one embodiment, a computer-readable storage medium is provided, which stores a computer program, and when the computer program is executed by a processor, the computer program causes the processor to execute the steps of the training method for the lightweight face attribute recognition model. The steps of the training method for the lightweight face attribute recognition model may be steps in the training method for the lightweight face attribute recognition model in the foregoing embodiments.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the lightweight face attribute recognition method described above. Here, the steps of the lightweight face attribute identification method may be steps in the lightweight face attribute identification methods of the above embodiments.
In one embodiment, a computer readable storage medium is provided, storing a computer program that, when executed by a processor, causes the processor to perform the steps of the lightweight face attribute recognition method described above. Here, the steps of the lightweight face attribute identification method may be steps in the lightweight face attribute identification methods of the foregoing embodiments.
In summary, the lightweight face attribute recognition model training method and the recognition method provided by the invention provide a feature extraction network based on the fusion of Regnet and feature map spatial position information for the problems of face attribute algorithm models and deployment. After fusion, the relationship among characteristic diagram channels is considered for the characteristic extraction network, and meanwhile, the position information of the characteristic space is also considered, so that the problems of large precision loss, model error generation, low recognition accuracy rate and the like in the quantization process caused by the use of the conventional deep separable convolution are solved while the model parameter quantity and the calculated quantity are effectively reduced, and the performance of the face attribute algorithm in low-calculation-force edge equipment is effectively improved.
Aiming at the problem of human face attribute data, the method is based onMultiple over-parameter combination pair improved FocalLoss-
Figure 378427DEST_PATH_IMAGE052
The method is optimized based on FocalLoss loss, and the problems of high classification difficulty caused by unbalanced face attribute sample data and low face quality are effectively solved by adjusting and controlling the sample balance and the weight of the difficultly-classified samples. And dynamically adjusting the probability threshold value hyper-parameter according to the change of the training turns to reduce the influence of the labeling error sample on the model effect. The problems of complex data preprocessing, low model overfitting and low identification accuracy caused by unbalanced samples, low image quality and a small amount of error labels in a training set are effectively solved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. The volatile memory may comprise random access memory
(RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
Although the present invention has been described with reference to the preferred embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A training method for a lightweight face attribute recognition model is characterized by comprising the following steps:
preprocessing the acquired face data set to form a face training image set, wherein each face training image in the face training image set carries corresponding face attribute data and a label value;
constructing a feature extraction network fusing input information, channel information and spatial position information based on the face attribute data to extract features of the face training image; the feature extraction network comprises a plurality of sequentially transmitted structure blocks, an input feature map of each structure block is transformed to generate a first feature map, and the first feature maps are aggregated along two mutually perpendicular spatial dimensions to respectively obtain channel information and spatial position information; embedding the obtained channel information and the spatial position information into the first characteristic diagram to form a second characteristic diagram; fusing the second eigen map to the input eigenmaps of the structure block to form an output eigenmap of the multi-dimensional tensor;
determining a loss function selected by each face attribute category when calculating loss errors according to the output of the feature extraction network; if the predicted probability value of a certain face attribute category is smaller than the probability threshold value super-parameter, selecting a first loss function containing the probability threshold value super-parameter to calculate the error loss between the predicted probability value of the attribute category and the label value; otherwise, selecting a second loss function to calculate the error loss of the attribute category;
training the constructed feature extraction network according to the error loss to obtain a face attribute recognition model; and dynamically updating the probability threshold hyperparameter and the number of the structural blocks and the number of channels in the feature extraction network in the training process.
2. The training method of the lightweight face attribute recognition model according to claim 1, characterized in that the output feature map of the last structure block in the feature extraction network is subjected to dimensionality reduction conversion, starting from a certain dimension where the input feature map information is located, the remaining dimensions are converted into one-dimensional vectors and then output to the full-connection layer, and the full-connection layer outputs the predicted values containing all face attribute categories.
3. The training method of the lightweight face attribute recognition model according to claim 2, wherein the prediction probability value of each face attribute category is obtained from the prediction values containing all face attribute categories output by the feature extraction network based on the types of the face attributes in the face attribute data and the categories corresponding to each attribute.
4. The training method of the lightweight face attribute recognition model according to claim 1, wherein the input feature map of each structure block is transformed into the first feature map after being convolved, regularized and subjected to nonlinear activation function.
5. The training method of the lightweight face attribute recognition model according to claim 1, wherein the first loss function introduces a probability threshold hyperparameter on the basis of the second loss function to attenuate the loss weight; the first loss function and the second loss function respectively comprise a balance over-parameter for balancing the positive and negative samples and a difficulty over-parameter for balancing the simple and difficult samples, and the balance over-parameter and the difficulty over-parameter are both over-parameters.
6. The training method of the lightweight face attribute recognition model according to claim 1, wherein the probability threshold hyperparameter is dynamically updated in a cycle of a preset turn of traversing a training set of face training images during the training process.
7. The training method of the lightweight face attribute recognition model according to claim 6, wherein the parameters of the feature extraction network are saved as candidate recognition models each time the update probability threshold exceeds the parameters; and inputting the test set into a plurality of candidate recognition models, and selecting the candidate recognition model with the optimal prediction accuracy of the test set as the trained face attribute recognition model.
8. A lightweight face attribute identification method is characterized by comprising the following steps:
acquiring a face image to be recognized;
carrying out face attribute recognition on the face image to be recognized by using the face attribute recognition model obtained by training the lightweight face attribute recognition model training method according to any one of claims 1 to 7 to obtain a recognition result.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 8 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
CN202211421512.6A 2022-11-15 2022-11-15 Lightweight face attribute recognition model training method, recognition method and device Active CN115565051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211421512.6A CN115565051B (en) 2022-11-15 2022-11-15 Lightweight face attribute recognition model training method, recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211421512.6A CN115565051B (en) 2022-11-15 2022-11-15 Lightweight face attribute recognition model training method, recognition method and device

Publications (2)

Publication Number Publication Date
CN115565051A CN115565051A (en) 2023-01-03
CN115565051B true CN115565051B (en) 2023-04-18

Family

ID=84769736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211421512.6A Active CN115565051B (en) 2022-11-15 2022-11-15 Lightweight face attribute recognition model training method, recognition method and device

Country Status (1)

Country Link
CN (1) CN115565051B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368672A (en) * 2020-02-26 2020-07-03 苏州超云生命智能产业研究院有限公司 Construction method and device for genetic disease facial recognition model
WO2021068487A1 (en) * 2019-10-12 2021-04-15 深圳壹账通智能科技有限公司 Face recognition model construction method, apparatus, computer device, and storage medium
CN112766176A (en) * 2021-01-21 2021-05-07 深圳市安软科技股份有限公司 Training method of lightweight convolutional neural network and face attribute recognition method
CN113920571A (en) * 2021-11-06 2022-01-11 北京九州安华信息安全技术有限公司 Micro-expression identification method and device based on multi-motion feature fusion
CN114693963A (en) * 2021-12-15 2022-07-01 全球能源互联网研究院有限公司 Recognition model training and recognition method and device based on electric power data feature extraction
CN114821736A (en) * 2022-05-13 2022-07-29 中国人民解放军国防科技大学 Multi-modal face recognition method, device, equipment and medium based on contrast learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021068487A1 (en) * 2019-10-12 2021-04-15 深圳壹账通智能科技有限公司 Face recognition model construction method, apparatus, computer device, and storage medium
CN111368672A (en) * 2020-02-26 2020-07-03 苏州超云生命智能产业研究院有限公司 Construction method and device for genetic disease facial recognition model
CN112766176A (en) * 2021-01-21 2021-05-07 深圳市安软科技股份有限公司 Training method of lightweight convolutional neural network and face attribute recognition method
CN113920571A (en) * 2021-11-06 2022-01-11 北京九州安华信息安全技术有限公司 Micro-expression identification method and device based on multi-motion feature fusion
CN114693963A (en) * 2021-12-15 2022-07-01 全球能源互联网研究院有限公司 Recognition model training and recognition method and device based on electric power data feature extraction
CN114821736A (en) * 2022-05-13 2022-07-29 中国人民解放军国防科技大学 Multi-modal face recognition method, device, equipment and medium based on contrast learning

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Bozhen Hu.A lightweight spatial and temporal multi-feature fusion network for defect detection.《IEEE Transactions on Image Processing》.2020,全文. *
Michael Weber.Automated Focal Loss for Image based Object Detection.《arXiv:1904.09048v1》.2019,全文. *
姜开永 ; 甘俊英 ; 谭海英 ; .基于深度学习的人脸美丽预测模型及其应用.五邑大学学报(自然科学版).2018,(02),全文. *
尹茜 ; .基于轻量级神经网络的人脸检测算法.常州信息职业技术学院学报.2019,(06),全文. *
李亚 ; 张雨楠 ; 彭程 ; 杨俊钦 ; 刘淼 ; .基于多任务学习的人脸属性识别方法.计算机工程.2020,(03),全文. *
葛栢林.基于轻量化深度卷积神经网络的目标检测算法研究及其应用.《知网硕士电子期刊》.2022,全文. *

Also Published As

Publication number Publication date
CN115565051A (en) 2023-01-03

Similar Documents

Publication Publication Date Title
CN109902546B (en) Face recognition method, face recognition device and computer readable medium
CN110532920B (en) Face recognition method for small-quantity data set based on FaceNet method
WO2019100724A1 (en) Method and device for training multi-label classification model
CN109271958B (en) Face age identification method and device
CN109840531A (en) The method and apparatus of training multi-tag disaggregated model
CN111368672A (en) Construction method and device for genetic disease facial recognition model
CN110222718B (en) Image processing method and device
CN109902192B (en) Remote sensing image retrieval method, system, equipment and medium based on unsupervised depth regression
CN110414541B (en) Method, apparatus, and computer-readable storage medium for identifying an object
CN111832581B (en) Lung feature recognition method and device, computer equipment and storage medium
US20230316733A1 (en) Video behavior recognition method and apparatus, and computer device and storage medium
CN113505797B (en) Model training method and device, computer equipment and storage medium
CN112699941B (en) Plant disease severity image classification method, device, equipment and storage medium
CN113221645B (en) Target model training method, face image generating method and related device
Dai et al. Hybrid deep model for human behavior understanding on industrial internet of video things
CN111126155B (en) Pedestrian re-identification method for generating countermeasure network based on semantic constraint
CN113705596A (en) Image recognition method and device, computer equipment and storage medium
CN113434699A (en) Pre-training method of BERT model, computer device and storage medium
CN114821736A (en) Multi-modal face recognition method, device, equipment and medium based on contrast learning
CN111275005A (en) Drawn face image recognition method, computer-readable storage medium and related device
CN109101984B (en) Image identification method and device based on convolutional neural network
CN115565051B (en) Lightweight face attribute recognition model training method, recognition method and device
CN116758379A (en) Image processing method, device, equipment and storage medium
CN113516182B (en) Visual question-answering model training and visual question-answering method and device
CN112699809B (en) Vaccinia category identification method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant