CN115063862A - Age estimation method based on feature contrast loss - Google Patents

Age estimation method based on feature contrast loss Download PDF

Info

Publication number
CN115063862A
CN115063862A CN202210731136.4A CN202210731136A CN115063862A CN 115063862 A CN115063862 A CN 115063862A CN 202210731136 A CN202210731136 A CN 202210731136A CN 115063862 A CN115063862 A CN 115063862A
Authority
CN
China
Prior art keywords
constructing
network
sub
window
head
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210731136.4A
Other languages
Chinese (zh)
Other versions
CN115063862B (en
Inventor
孟明明
张亮
潘力立
李宏亮
孟凡满
吴庆波
许林峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210731136.4A priority Critical patent/CN115063862B/en
Publication of CN115063862A publication Critical patent/CN115063862A/en
Application granted granted Critical
Publication of CN115063862B publication Critical patent/CN115063862B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/178Human faces, e.g. facial parts, sketches or expressions estimating age from face image; using age information for improving recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Abstract

The invention discloses an age estimation method based on feature contrast loss, and belongs to the field of computer vision. Firstly, selecting an attention mechanism as a basic structure of a feature extraction network, and using an offset window transformation network based on the attention mechanism as a main structure of the feature extraction network for extracting robust age features from a face image; then, a distance estimation network for calculating relative distances between features is designed, the sequence constraint relation of a label space is reserved through a feature-based contrast loss guide feature space, so that tail features can utilize information of head features, the prediction accuracy of tail data is improved, and the problem of long tail distribution in age estimation is solved.

Description

Age estimation method based on feature contrast loss
Technical Field
The invention belongs to the field of machine learning, and mainly relates to an age estimation problem based on a face image; the method mainly solves the problem of long tail distribution in an age estimation task.
Background
The phenomenon of long tail distribution widely exists in various data sets, and a machine learning model which relies on data for training is often influenced by the long tail distribution in the data sets, so that the fitting error of the model to tail data is far larger than that of head data. For example, in a scene oriented to human face attribute analysis, a significant long-tail distribution phenomenon exists in the age distribution of the existing age data set, namely, a large amount of data is distributed in the middle age stage, and only a small amount of samples exist in the age intervals of infants and the elderly. The depth model obtained by training the age data set based on long-tail distribution can always give accurate prediction in the middle age group, and has larger errors for the age groups of infants and the elderly, which is a problem to be solved urgently in the current age-long-tail regression analysis.
The existing solutions to the long tail distribution problem can be divided into two categories, namely a data-based method and a model-based method. The data-based method comprises two types of resampling and reweighing; resampling can be achieved by undersampling the head data or oversampling the tail data, but this may result in overfitting the tail data, and also may not fully utilize a large amount of head data; the weight is to construct different loss weights for the samples according to the distribution of the data of different labels in the whole training set, usually, the head data is assigned with a smaller weight, and the tail data is assigned with a larger weight, and when the data size is huge, the weight method may cause unstable optimization. Model-based methods include two-stage methods, transfer learning, etc.; firstly, training a feature extraction network by using an actual data set, then fixing parameters of the feature extraction network, and retraining a prediction head network by using a re-weighting method; and the transfer learning models the head data and the tail data respectively, so that the knowledge of the head data is transferred to the tail data. These methods are applicable to any long-tail distribution, but they do not adequately account for the differences and connections between the long-tail regression task and the long-tail classification task. Reference documents: zhou B, Cui Q, Wei X S, et al BBN: binary-bridge network with systematic learning for long-linked visual retrieval [ C ], Proceedings of the IEEE/CVF conference on computer vision and pattern retrieval.2020: 9719. 9728.
Different from the long-tail classification task, no constraint relation exists among all class labels, the long-tail regression task such as age estimation exists among all age labels, in order to fully utilize the constraint relation, the mutual relation of data between a label space and a feature space is mined, a label smoothing and feature smoothing method is provided for a long-tail regression analysis method, smooth transformation is firstly carried out on the label space and the feature space, so that adjacent labels can fully utilize the features of each other, and then a general method for solving long-tail distribution is combined on the basis, and finally the goal of reducing tail data errors is achieved. The method starts from two dimensions of a label space and a feature space respectively, and provides a new research direction for solving long-tail regression. Reference: yang Y, Zha K, Chen Y, et al, delving in vivo aggregated regression [ C ], International Conference on Machine learning, PMLR,2021:11842-11851.
Aiming at the problem of large error of tail data in age estimation, the invention provides an age estimation method based on feature comparison loss, and the estimation accuracy of the tail data is improved.
Disclosure of Invention
The invention provides an age estimation method based on feature contrast loss, which is used for solving the problem of larger tail data error caused by long tail distribution in an age estimation task.
The invention is composed of three parts, namely a feature extraction network, a distance estimation network and a prediction head; the feature extraction network is suitable for a swin transform structure and is used for extracting features from the face image; the distance estimation network accepts a pair of features as input and outputs the distance between the pair of features for calculating the feature-based contrast loss; the prediction head receives a feature as an input and outputs a predicted value corresponding to the feature for calculating the L2 loss between the true value and the predicted value. The training process of the invention comprises the steps of firstly sampling a batch of samples, and calculating the corresponding characteristics of the samples through a characteristic extraction network; then, any two groups of characteristics are combined, the combined result is input into a distance estimation network to calculate the characteristic-based contrast loss, then, a single characteristic is input into a prediction head, and the L2 loss is calculated; finally, the feature-based contrast loss and the L2 loss are combined, and all parameters of the whole model are optimized simultaneously through back propagation. In the testing stage, the sample firstly obtains the characteristics through the characteristic extraction network, and then the characteristics give a predicted value through the prediction head. Through the method, the characteristic contrast loss is introduced on the basis of the L2 loss, so that the characteristics of the head data and the tail data can be mutually corrected, the order constraint relation between labels can be reserved for the characteristics, and the fitting capability of the model on the tail data is improved. The general structural schematic of the process is shown in figure 1.
For convenience in describing the present disclosure, certain terms are first defined.
Definition 1: softmax function. The softmax function is to normalize the vector x such that each element in each vector ranges between (0,1) and the sum of all elements after normalization is 1; the normalized value of the ith element may be expressed as:
Figure BDA0003713539740000021
where K is the total number of elements of the vector x.
Definition 2: attention is paid to the mechanism. The attention mechanism is a method for transforming features, and usually requires that the features are mapped into query, key and value 3 modules, which are abbreviated as Q, K, V; then calculating the matching degree of the query and the key; and finally, carrying out weighted output with value, wherein the attention mechanism used by the invention can be expressed as: attention (Q, K, V) ═ softmax (QK) T ) V, the schematic structural diagram of which is shown in FIG. 2.
Definition 3: a multi-head attention mechanism. Using different mappings for characteristics to obtain a plurality of groups of different query, key and value modules, respectively calculating an attention mechanism for each group Q, K, V, and then cascading and linearly transforming the attention results of each group to realize a multi-head attention mechanism which can be expressed as MultiHead (Q, K, V) ═ Concat (head) 1 ,…,head h )W o Head therein h Attention results for group h are shown.
Definition 4: and (5) layer normalization. Layer Normalization (LN) is to normalize all neurons in a certain Layer, and to scale and translate after Normalization. The layer normalization can be expressed as:
Figure BDA0003713539740000031
where μ, σ denotes the mean and variance of all neurons in the layer, and γ, β denotes the scaling and translation parameters.
Definition 5: the GELU activation function. The expression of the GELU activation function is GELU (x) ═ x × Φ (x), where Φ (x) is the cumulative distribution function of the standard gaussian distribution.
Definition 6: the ReLU function. The ReLU function expression is ReLU (x) max (0, x).
Definition 7: the scatter function. The Flatten function transforms the shape of the tensor, and expands the high-dimensional tensor into a one-dimensional vector.
Therefore, the technical scheme of the invention is an age estimation method based on feature contrast loss, which comprises the following steps:
step 1: preprocessing the data set;
firstly, acquiring an image data set for age estimation, then carrying out face alignment on the image, and normalizing to [ -1,1 ]; then randomly dividing the training set and the test set; randomly cutting the images in the training set, randomly turning the images in the training set in a mirror image manner, only cutting the images in the test set at the center, and cutting the images in the test set to be consistent with the images in the training set;
and 2, step: constructing a feature extraction network;
1) constructing an area embedding unit;
firstly, dividing an image into sub-regions, dividing the image obtained in the step (1) into a plurality of a multiplied by a sub-regions, then using a multiplied by a convolution kernel to convolute each divided sub-region image in a step a mode, and using layer normalization to normalize the images;
2) constructing window division;
on the basis of the subareas, two forms of window division are respectively carried out by adopting a first division mode and a second division mode, wherein the division method of the first division mode comprises the following steps: selecting adjacent b multiplied by b sub-regions as a window, and dividing the whole image; the second division mode is that the window is respectively translated by half window size to the right and downwards on the basis of the first division mode, and the upper left window is circularly shifted;
3) constructing a multilayer perceptron module;
the multilayer perceptron module is composed of two full connection layers, wherein the first full connection layer is activated by using a GELU function, and the second full connection layer does not use an activation function;
4) constructing an offset window module;
the offset window module is composed of two continuous multi-head attention modules, wherein one of the multi-head attention modules calculates multi-head attention on the basis of a division mode one and is recorded as W-MSA; the other calculates the attention of the multiple heads in the division mode II and is recorded as SW-MSA;
5) constructing a sub-region merging module;
the sub-region combining module is a down-sampling module, combines adjacent a multiplied by a sub-regions into a sub-region, and simultaneously enlarges the channel dimension of the features by a times;
6) constructing a feature extraction network;
firstly, an input image input area embedding unit acquires characteristics; then, carrying out feature transformation and extraction through a plurality of offset window modules, and then connecting a sub-region merging module to carry out dimension reduction on the features; then repeatedly stacking the offset window module and the sub-region merging module to construct an offset window transformation network; finally, the output characteristics are expanded and linearly transformed;
and step 3: constructing a distance estimation network;
the distance estimation network consists of 3 layers of full connection layers, the first two layers of full connection layers are activated by using the ReLU after being output, and the last layer of full connection layer is not activated;
and 4, step 4: constructing a measuring head in advance;
the prediction head is composed of 2 layers of full connection layers, the first layer of full connection layer is activated by using softmax after being output, and the second layer of full connection layer is not activated;
and 5: determining a loss function;
1) constructing characteristic contrast loss;
first construct image and label sample pairs from the training set of step 1 { (x) 1 ,y 1 ),…(x n ,y n )…,(x N ,y N )},x n Representing an image, y n Representing labels, N representing total number, then inputting the images in the sample pairs into a feature extraction network to obtain corresponding features (f) 1 ,…,f N ) If the distance estimation network is denoted as DE, the characteristic contrast loss is as follows:
Figure BDA0003713539740000041
2) constructing a prediction loss;
if the prediction head is denoted as G, the loss L is predicted pred Calculated from the following formula:
Figure BDA0003713539740000042
wherein, G (f) i ) Representing a feature f i The predicted result of (2);
3) constructing a total loss function;
the total loss function is formed by weighted summation of the characteristic contrast loss and the prediction loss, and the weight coefficient is lambda and has the following form:
Figure BDA0003713539740000043
step 6: training network parameters;
performing network training by using the total loss function constructed in the step 5, and updating parameters of the feature extraction network, the distance estimation network and the prediction head;
and 7: and in the testing stage, the trained feature extraction network and the prediction head in the step 6 are selected, for a given picture, the feature extraction network is firstly input to extract features, and then the features are input into the prediction head to obtain the predicted age.
The innovation of the invention is that:
1) extracting features of the facial image using an attention-based mechanism of shifted window transform network structure to obtain a more robust feature representation;
2) the method comprises the steps of providing a feature-based contrast loss function, calculating the distance between two features by introducing a distance estimation module, and constraining a feature space by the feature-based contrast loss function to keep an order constraint relation of a label space, so that the features of tail data can acquire information from the features of head data, and the accuracy of the tail data in age estimation is improved.
Drawings
FIG. 1 is a schematic diagram of the network architecture of the method of the present invention;
FIG. 2 is a schematic view of the attention mechanism of the present invention;
FIG. 3 is a schematic diagram of a region embedding unit according to the present invention;
FIG. 4 is a schematic diagram of window division according to the present invention;
FIG. 5 is a schematic diagram of a multi-layered perceptron of the present invention;
FIG. 6 is a schematic diagram of an offset window according to the present invention;
FIG. 7 is a schematic diagram of a feature extraction network according to the present invention;
FIG. 8 is a schematic diagram of a distance estimation network according to the present invention;
FIG. 9 is a diagram of a prediction header according to the present invention.
The specific implementation mode is as follows:
step 1: preprocessing the data set;
acquiring a MOPRPH II data set, wherein the MORPPH II data set is an age estimation data set and comprises 55134 images; firstly, carrying out face alignment on an image, and normalizing the image to [ -1,1 ]; then randomly selecting 80% of data as a training set, and taking the rest 20% as a test set; the images in the training set were randomly cropped to 224 x 224 size and randomly mirror-flipped, and the images in the test set were only cropped in the center, again to 224 x 224 size.
And 2, step: constructing a feature extraction network;
1) constructing a region Embedding unit (Patch Embedding); the image is first divided into sub-regions, the 224 × 224 image is divided into 56 × 56 sub-regions of 4 × 4, then the 4 × 4 convolution kernel is used to convolute the sub-region divided image in a step 4 manner, and normalization is performed using layer normalization, and the structure of the region embedding unit is shown in fig. 3.
2) Constructing window division; on the basis of the subareas, two forms of window division are respectively carried out, and a division mode I and a division mode II are respectively used for representing the two division methods, wherein the division mode II is that the window is respectively translated by half window size towards the right and downwards on the basis of the division mode I, and the window at the upper left is circularly shifted. The first division mode and the second division mode refer to the left sub-image and the right sub-image of fig. 4, the thin line boxes represent sub-regions, the thick line boxes represent windows, and the numbers in the drawings represent sub-region labels.
3) Constructing a multilayer perceptron module; the multi-layered perceptron Module (MLP) consists of two fully-connected layers, where the first fully-connected layer is followed by activation using the GELU function, and the second fully-connected layer does not use the activation function. The structure diagram of the multilayer perceptron module is shown in fig. 5.
4) Constructing an offset window module; the shifting window module is composed of two continuous multi-head attentions, the difference of the two multi-head attentions lies in that windows used in attention calculation are different, namely two dividing methods corresponding to a dividing mode I and a dividing mode II, the multi-head attentions calculated on the dividing modes of the dividing mode I and the dividing mode II are marked as W-MSA and SW-MSA, when the shifting window module calculates each multi-head attention, layer normalization needs to be firstly carried out, and layer normalization and nonlinear transformation are carried out on output of the multi-head attention. The structure diagram of the offset window module is shown in fig. 6.
5) Constructing a subregion merging module; the sub-region merging module (Patch Merge) is a down-sampling module that merges adjacent 2 × 2 sub-regions into one sub-region while doubling the channel dimension of the feature.
6) Constructing a feature extraction network; the shift window conversion network firstly embeds the input image input area into a unit to obtain the characteristics; then, carrying out feature transformation and extraction through a plurality of offset window modules, and connecting a sub-region merging module to carry out dimension reduction on the features; then, repeatedly stacking the offset window module and the sub-region merging module to construct an offset window transformation network; and finally, performing expansion and linear transformation on the output characteristics. A schematic diagram of the structure of the feature extraction network is shown in fig. 7.
And step 3: constructing a distance estimation network; the distance estimation network is composed of 3 layers of full connection layers, the first two layers of full connection layers are activated by using the ReLU after being output, and the last layer of full connection layer is not activated. A schematic diagram of the distance estimation network is shown in fig. 8.
And 4, step 4: constructing a measuring head in advance; the prediction header is composed of 2 layers of fully-connected layers, the first layer of fully-connected layers is activated by using softmax after being output, and the second layer of fully-connected layers is not activated. A schematic diagram of the structure of the measuring probe is shown in fig. 9.
And 5: designing a loss function;
1) constructing characteristic contrast loss; first construct image and label pairs from the training set in step 1 { (x) 1 ,y 1 ),…,(x N ,y N ) Then inputting the images in the sample pairs into a feature extraction network to obtain corresponding features (f) 1 ,…,f N ) If the distance estimation network is denoted as DE, the characteristic contrast loss is as follows:
Figure BDA0003713539740000071
feature contrast loss calculates two distances, first DE (f) i ,f j ) Calculates the characteristic f i And f j Is followed by y i -y j Calculating the L1 distance between two labels corresponding to the features, and optimizing the feature distance to enable the feature contrast loss to be close to the label distance, so that information interaction can be generated between the features, and the order constraint relation of the label space is reserved.
2) Constructing a prediction loss; with the prediction head denoted as G, the predicted loss is calculated by:
Figure BDA0003713539740000072
3) constructing a total loss function; the total loss function is formed by weighted summation of the characteristic contrast loss and the prediction loss, and the weight coefficient is lambda and has the following form:
Figure BDA0003713539740000073
step 6: training network parameters; performing network training by using the total loss function constructed in the step 5, and updating parameters of the feature extraction network, the distance estimation network and the prediction head;
and 7: and in the testing stage, the trained feature extraction network and the prediction head in the step 6 are selected, for a given picture, the feature extraction network is firstly input to extract features, and then the features are input into the prediction head to obtain the predicted age.

Claims (1)

1. An age estimation method based on feature contrast loss, the method comprising:
step 1: preprocessing the data set;
firstly, acquiring an image data set for age estimation, then carrying out face alignment on the images, and normalizing to [ -1,1 ]; then randomly dividing the training set and the test set; randomly cutting the images in the training set, and randomly turning the images in the training set in a mirror image manner, wherein the images in the testing set are only cut in the center, and the cut size is the same as the cut size of the images in the training set;
step 2: constructing a feature extraction network;
1) constructing an area embedding unit;
firstly, dividing an image into sub-regions, dividing the image obtained in the step (1) into a plurality of a multiplied by a sub-regions, then using a multiplied by a convolution kernel to convolute each divided sub-region image in a step a mode, and using layer normalization to normalize the images;
2) constructing window division;
on the basis of the sub-area, two forms of window division are respectively carried out by adopting a first division mode and a second division mode, wherein the division method of the first division mode comprises the following steps: selecting adjacent b multiplied by b sub-regions as a window, and dividing the whole image; the second division mode is that the window is respectively translated by half window size to the right and downwards on the basis of the first division mode, and the upper left window is circularly shifted;
3) constructing a multilayer perceptron module;
the multi-layer perceptron module is composed of two full connection layers, wherein the first full connection layer is activated by using a GELU function, and the second full connection layer does not use an activation function;
4) constructing an offset window module;
the offset window module is composed of two continuous multi-head attention modules, wherein one of the multi-head attention modules calculates multi-head attention on the basis of a division mode one and is recorded as W-MSA; the other calculates the attention of the multiple heads in the division mode II and is recorded as SW-MSA;
5) constructing a subregion merging module;
the sub-region combining module is a down-sampling module, combines adjacent a multiplied by a sub-regions into a sub-region, and simultaneously enlarges the channel dimension of the features by a times;
6) constructing a feature extraction network;
firstly, an input image input area embedding unit acquires characteristics; then, carrying out feature transformation and extraction through a plurality of offset window modules, and then connecting a sub-region merging module to carry out dimension reduction on the features; then, repeatedly stacking the offset window module and the sub-region merging module to construct an offset window transformation network; finally, the output characteristics are expanded and linearly transformed;
and step 3: constructing a distance estimation network;
the distance estimation network consists of 3 layers of full connection layers, the first two layers of full connection layers are activated by using the ReLU after being output, and the last layer of full connection layer is not activated;
and 4, step 4: constructing a measuring head in advance;
the prediction head is composed of 2 layers of full connection layers, the first layer of full connection layer is activated by using softmax after being output, and the second layer of full connection layer is not activated;
and 5: determining a loss function;
1) constructing characteristic contrast loss;
first construct image and label sample pairs from the training set of step 1 { (x) 1 ,y 1 ),…(x n ,y n )…,(x N ,y N )},x n Representing an image, y n Representing labels, N representing total number, then inputting the images in the sample pairs into a feature extraction network to obtain corresponding features (f) 1 ,…,f N ) If the distance estimation network is denoted as DE, the characteristic contrast loss is as follows:
Figure FDA0003713539730000021
2) constructing a prediction loss;
if the prediction head is denoted as G, the loss L is predicted pred Calculated from the following formula:
Figure FDA0003713539730000022
wherein, G (f) i ) Representing a feature f i The predicted result of (2);
3) constructing a total loss function;
the total loss function is formed by weighted summation of the characteristic contrast loss and the prediction loss, and the weight coefficient is lambda and has the following form:
Figure FDA0003713539730000023
and 6: training network parameters;
performing network training by using the total loss function constructed in the step 5, and updating parameters of the feature extraction network, the distance estimation network and the prediction head;
and 7: and in the testing stage, the trained feature extraction network and the prediction head in the step 6 are selected, for a given picture, the feature extraction network is firstly input to extract features, and then the features are input into the prediction head to obtain the predicted age.
CN202210731136.4A 2022-06-24 2022-06-24 Age estimation method based on feature contrast loss Active CN115063862B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210731136.4A CN115063862B (en) 2022-06-24 2022-06-24 Age estimation method based on feature contrast loss

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210731136.4A CN115063862B (en) 2022-06-24 2022-06-24 Age estimation method based on feature contrast loss

Publications (2)

Publication Number Publication Date
CN115063862A true CN115063862A (en) 2022-09-16
CN115063862B CN115063862B (en) 2024-04-23

Family

ID=83202233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210731136.4A Active CN115063862B (en) 2022-06-24 2022-06-24 Age estimation method based on feature contrast loss

Country Status (1)

Country Link
CN (1) CN115063862B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140300758A1 (en) * 2013-04-04 2014-10-09 Bao Tran Video processing systems and methods
US20170351905A1 (en) * 2016-06-06 2017-12-07 Samsung Electronics Co., Ltd. Learning model for salient facial region detection
CN108171209A (en) * 2018-01-18 2018-06-15 中科视拓(北京)科技有限公司 A kind of face age estimation method that metric learning is carried out based on convolutional neural networks
CN112950631A (en) * 2021-04-13 2021-06-11 西安交通大学口腔医院 Age estimation method based on saliency map constraint and X-ray head skull positioning lateral image
CN114038055A (en) * 2021-10-27 2022-02-11 电子科技大学长三角研究院(衢州) Image generation method based on contrast learning and generation countermeasure network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140300758A1 (en) * 2013-04-04 2014-10-09 Bao Tran Video processing systems and methods
US20170351905A1 (en) * 2016-06-06 2017-12-07 Samsung Electronics Co., Ltd. Learning model for salient facial region detection
CN108171209A (en) * 2018-01-18 2018-06-15 中科视拓(北京)科技有限公司 A kind of face age estimation method that metric learning is carried out based on convolutional neural networks
CN112950631A (en) * 2021-04-13 2021-06-11 西安交通大学口腔医院 Age estimation method based on saliency map constraint and X-ray head skull positioning lateral image
CN114038055A (en) * 2021-10-27 2022-02-11 电子科技大学长三角研究院(衢州) Image generation method based on contrast learning and generation countermeasure network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HONGYU PAN等: "Revised Contrastive Loss for Robust Age Estimation from Face", 《2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)》, 29 November 2018 (2018-11-29) *
LILI PAN, MINGMING MENG, YAZHOU REN, YALI ZHENG, ZENGLIN XU: "Self-Paced Deep Regression Forests with Consideration of Ranking Fairness", 《 COMPUTER VISION AND PATTERN RECOGNITION》, 11 June 2022 (2022-06-11) *
孟明明: "面向人脸面部属性分析的深度判别模型研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 January 2023 (2023-01-15) *
李大湘;马宣;任娅琼;刘颖;: "基于深度代价敏感CNN的年龄估计算法", 模式识别与人工智能, no. 02, 15 February 2020 (2020-02-15) *

Also Published As

Publication number Publication date
CN115063862B (en) 2024-04-23

Similar Documents

Publication Publication Date Title
CN113673489A (en) Video group behavior identification method based on cascade Transformer
CN111738363B (en) Alzheimer disease classification method based on improved 3D CNN network
CN113010656B (en) Visual question-answering method based on multi-mode fusion and structural control
CN115731441A (en) Target detection and attitude estimation method based on data cross-modal transfer learning
Liu et al. Iterative relaxed collaborative representation with adaptive weights learning for noise robust face hallucination
CN115223082A (en) Aerial video classification method based on space-time multi-scale transform
CN112651316A (en) Two-dimensional and three-dimensional multi-person attitude estimation system and method
CN112132878A (en) End-to-end brain nuclear magnetic resonance image registration method based on convolutional neural network
CN114241191A (en) Cross-modal self-attention-based non-candidate-box expression understanding method
CN117574904A (en) Named entity recognition method based on contrast learning and multi-modal semantic interaction
CN114723787A (en) Optical flow calculation method and system
CN116519106B (en) Method, device, storage medium and equipment for determining weight of live pigs
CN115063862A (en) Age estimation method based on feature contrast loss
CN117409279A (en) Multi-mode information fusion key method based on data privacy protection
CN116386079A (en) Domain generalization pedestrian re-recognition method and system based on meta-graph perception
CN115861384A (en) Optical flow estimation method and system based on generation of countermeasure and attention mechanism
CN115662565A (en) Medical image report generation method and equipment integrating label information
CN116012903A (en) Automatic labeling method and system for facial expressions
CN114170460A (en) Multi-mode fusion-based artwork classification method and system
CN114155406A (en) Pose estimation method based on region-level feature fusion
CN113313210A (en) Method and apparatus for data processing
Burugupalli Image classification using transfer learning and convolution neural networks
CN111626923B (en) Image conversion method based on novel attention model
CN117392392B (en) Rubber cutting line identification and generation method
Chatterjee Adaptive machine learning algorithms with python

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant