CN115063862A - Age estimation method based on feature contrast loss - Google Patents
Age estimation method based on feature contrast loss Download PDFInfo
- Publication number
- CN115063862A CN115063862A CN202210731136.4A CN202210731136A CN115063862A CN 115063862 A CN115063862 A CN 115063862A CN 202210731136 A CN202210731136 A CN 202210731136A CN 115063862 A CN115063862 A CN 115063862A
- Authority
- CN
- China
- Prior art keywords
- constructing
- network
- sub
- window
- head
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000000605 extraction Methods 0.000 claims abstract description 31
- 230000009466 transformation Effects 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 24
- 238000010606 normalization Methods 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 abstract description 10
- 230000006870 function Effects 0.000 description 22
- 238000010586 diagram Methods 0.000 description 14
- 239000000523 sample Substances 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 3
- 238000012952 Resampling Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000000611 regression analysis Methods 0.000 description 2
- 238000013526 transfer learning Methods 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/178—Human faces, e.g. facial parts, sketches or expressions estimating age from face image; using age information for improving recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Abstract
The invention discloses an age estimation method based on feature contrast loss, and belongs to the field of computer vision. Firstly, selecting an attention mechanism as a basic structure of a feature extraction network, and using an offset window transformation network based on the attention mechanism as a main structure of the feature extraction network for extracting robust age features from a face image; then, a distance estimation network for calculating relative distances between features is designed, the sequence constraint relation of a label space is reserved through a feature-based contrast loss guide feature space, so that tail features can utilize information of head features, the prediction accuracy of tail data is improved, and the problem of long tail distribution in age estimation is solved.
Description
Technical Field
The invention belongs to the field of machine learning, and mainly relates to an age estimation problem based on a face image; the method mainly solves the problem of long tail distribution in an age estimation task.
Background
The phenomenon of long tail distribution widely exists in various data sets, and a machine learning model which relies on data for training is often influenced by the long tail distribution in the data sets, so that the fitting error of the model to tail data is far larger than that of head data. For example, in a scene oriented to human face attribute analysis, a significant long-tail distribution phenomenon exists in the age distribution of the existing age data set, namely, a large amount of data is distributed in the middle age stage, and only a small amount of samples exist in the age intervals of infants and the elderly. The depth model obtained by training the age data set based on long-tail distribution can always give accurate prediction in the middle age group, and has larger errors for the age groups of infants and the elderly, which is a problem to be solved urgently in the current age-long-tail regression analysis.
The existing solutions to the long tail distribution problem can be divided into two categories, namely a data-based method and a model-based method. The data-based method comprises two types of resampling and reweighing; resampling can be achieved by undersampling the head data or oversampling the tail data, but this may result in overfitting the tail data, and also may not fully utilize a large amount of head data; the weight is to construct different loss weights for the samples according to the distribution of the data of different labels in the whole training set, usually, the head data is assigned with a smaller weight, and the tail data is assigned with a larger weight, and when the data size is huge, the weight method may cause unstable optimization. Model-based methods include two-stage methods, transfer learning, etc.; firstly, training a feature extraction network by using an actual data set, then fixing parameters of the feature extraction network, and retraining a prediction head network by using a re-weighting method; and the transfer learning models the head data and the tail data respectively, so that the knowledge of the head data is transferred to the tail data. These methods are applicable to any long-tail distribution, but they do not adequately account for the differences and connections between the long-tail regression task and the long-tail classification task. Reference documents: zhou B, Cui Q, Wei X S, et al BBN: binary-bridge network with systematic learning for long-linked visual retrieval [ C ], Proceedings of the IEEE/CVF conference on computer vision and pattern retrieval.2020: 9719. 9728.
Different from the long-tail classification task, no constraint relation exists among all class labels, the long-tail regression task such as age estimation exists among all age labels, in order to fully utilize the constraint relation, the mutual relation of data between a label space and a feature space is mined, a label smoothing and feature smoothing method is provided for a long-tail regression analysis method, smooth transformation is firstly carried out on the label space and the feature space, so that adjacent labels can fully utilize the features of each other, and then a general method for solving long-tail distribution is combined on the basis, and finally the goal of reducing tail data errors is achieved. The method starts from two dimensions of a label space and a feature space respectively, and provides a new research direction for solving long-tail regression. Reference: yang Y, Zha K, Chen Y, et al, delving in vivo aggregated regression [ C ], International Conference on Machine learning, PMLR,2021:11842-11851.
Aiming at the problem of large error of tail data in age estimation, the invention provides an age estimation method based on feature comparison loss, and the estimation accuracy of the tail data is improved.
Disclosure of Invention
The invention provides an age estimation method based on feature contrast loss, which is used for solving the problem of larger tail data error caused by long tail distribution in an age estimation task.
The invention is composed of three parts, namely a feature extraction network, a distance estimation network and a prediction head; the feature extraction network is suitable for a swin transform structure and is used for extracting features from the face image; the distance estimation network accepts a pair of features as input and outputs the distance between the pair of features for calculating the feature-based contrast loss; the prediction head receives a feature as an input and outputs a predicted value corresponding to the feature for calculating the L2 loss between the true value and the predicted value. The training process of the invention comprises the steps of firstly sampling a batch of samples, and calculating the corresponding characteristics of the samples through a characteristic extraction network; then, any two groups of characteristics are combined, the combined result is input into a distance estimation network to calculate the characteristic-based contrast loss, then, a single characteristic is input into a prediction head, and the L2 loss is calculated; finally, the feature-based contrast loss and the L2 loss are combined, and all parameters of the whole model are optimized simultaneously through back propagation. In the testing stage, the sample firstly obtains the characteristics through the characteristic extraction network, and then the characteristics give a predicted value through the prediction head. Through the method, the characteristic contrast loss is introduced on the basis of the L2 loss, so that the characteristics of the head data and the tail data can be mutually corrected, the order constraint relation between labels can be reserved for the characteristics, and the fitting capability of the model on the tail data is improved. The general structural schematic of the process is shown in figure 1.
For convenience in describing the present disclosure, certain terms are first defined.
Definition 1: softmax function. The softmax function is to normalize the vector x such that each element in each vector ranges between (0,1) and the sum of all elements after normalization is 1; the normalized value of the ith element may be expressed as:where K is the total number of elements of the vector x.
Definition 2: attention is paid to the mechanism. The attention mechanism is a method for transforming features, and usually requires that the features are mapped into query, key and value 3 modules, which are abbreviated as Q, K, V; then calculating the matching degree of the query and the key; and finally, carrying out weighted output with value, wherein the attention mechanism used by the invention can be expressed as: attention (Q, K, V) ═ softmax (QK) T ) V, the schematic structural diagram of which is shown in FIG. 2.
Definition 3: a multi-head attention mechanism. Using different mappings for characteristics to obtain a plurality of groups of different query, key and value modules, respectively calculating an attention mechanism for each group Q, K, V, and then cascading and linearly transforming the attention results of each group to realize a multi-head attention mechanism which can be expressed as MultiHead (Q, K, V) ═ Concat (head) 1 ,…,head h )W o Head therein h Attention results for group h are shown.
Definition 4: and (5) layer normalization. Layer Normalization (LN) is to normalize all neurons in a certain Layer, and to scale and translate after Normalization. The layer normalization can be expressed as:where μ, σ denotes the mean and variance of all neurons in the layer, and γ, β denotes the scaling and translation parameters.
Definition 5: the GELU activation function. The expression of the GELU activation function is GELU (x) ═ x × Φ (x), where Φ (x) is the cumulative distribution function of the standard gaussian distribution.
Definition 6: the ReLU function. The ReLU function expression is ReLU (x) max (0, x).
Definition 7: the scatter function. The Flatten function transforms the shape of the tensor, and expands the high-dimensional tensor into a one-dimensional vector.
Therefore, the technical scheme of the invention is an age estimation method based on feature contrast loss, which comprises the following steps:
step 1: preprocessing the data set;
firstly, acquiring an image data set for age estimation, then carrying out face alignment on the image, and normalizing to [ -1,1 ]; then randomly dividing the training set and the test set; randomly cutting the images in the training set, randomly turning the images in the training set in a mirror image manner, only cutting the images in the test set at the center, and cutting the images in the test set to be consistent with the images in the training set;
and 2, step: constructing a feature extraction network;
1) constructing an area embedding unit;
firstly, dividing an image into sub-regions, dividing the image obtained in the step (1) into a plurality of a multiplied by a sub-regions, then using a multiplied by a convolution kernel to convolute each divided sub-region image in a step a mode, and using layer normalization to normalize the images;
2) constructing window division;
on the basis of the subareas, two forms of window division are respectively carried out by adopting a first division mode and a second division mode, wherein the division method of the first division mode comprises the following steps: selecting adjacent b multiplied by b sub-regions as a window, and dividing the whole image; the second division mode is that the window is respectively translated by half window size to the right and downwards on the basis of the first division mode, and the upper left window is circularly shifted;
3) constructing a multilayer perceptron module;
the multilayer perceptron module is composed of two full connection layers, wherein the first full connection layer is activated by using a GELU function, and the second full connection layer does not use an activation function;
4) constructing an offset window module;
the offset window module is composed of two continuous multi-head attention modules, wherein one of the multi-head attention modules calculates multi-head attention on the basis of a division mode one and is recorded as W-MSA; the other calculates the attention of the multiple heads in the division mode II and is recorded as SW-MSA;
5) constructing a sub-region merging module;
the sub-region combining module is a down-sampling module, combines adjacent a multiplied by a sub-regions into a sub-region, and simultaneously enlarges the channel dimension of the features by a times;
6) constructing a feature extraction network;
firstly, an input image input area embedding unit acquires characteristics; then, carrying out feature transformation and extraction through a plurality of offset window modules, and then connecting a sub-region merging module to carry out dimension reduction on the features; then repeatedly stacking the offset window module and the sub-region merging module to construct an offset window transformation network; finally, the output characteristics are expanded and linearly transformed;
and step 3: constructing a distance estimation network;
the distance estimation network consists of 3 layers of full connection layers, the first two layers of full connection layers are activated by using the ReLU after being output, and the last layer of full connection layer is not activated;
and 4, step 4: constructing a measuring head in advance;
the prediction head is composed of 2 layers of full connection layers, the first layer of full connection layer is activated by using softmax after being output, and the second layer of full connection layer is not activated;
and 5: determining a loss function;
1) constructing characteristic contrast loss;
first construct image and label sample pairs from the training set of step 1 { (x) 1 ,y 1 ),…(x n ,y n )…,(x N ,y N )},x n Representing an image, y n Representing labels, N representing total number, then inputting the images in the sample pairs into a feature extraction network to obtain corresponding features (f) 1 ,…,f N ) If the distance estimation network is denoted as DE, the characteristic contrast loss is as follows:
2) constructing a prediction loss;
if the prediction head is denoted as G, the loss L is predicted pred Calculated from the following formula:
wherein, G (f) i ) Representing a feature f i The predicted result of (2);
3) constructing a total loss function;
the total loss function is formed by weighted summation of the characteristic contrast loss and the prediction loss, and the weight coefficient is lambda and has the following form:
step 6: training network parameters;
performing network training by using the total loss function constructed in the step 5, and updating parameters of the feature extraction network, the distance estimation network and the prediction head;
and 7: and in the testing stage, the trained feature extraction network and the prediction head in the step 6 are selected, for a given picture, the feature extraction network is firstly input to extract features, and then the features are input into the prediction head to obtain the predicted age.
The innovation of the invention is that:
1) extracting features of the facial image using an attention-based mechanism of shifted window transform network structure to obtain a more robust feature representation;
2) the method comprises the steps of providing a feature-based contrast loss function, calculating the distance between two features by introducing a distance estimation module, and constraining a feature space by the feature-based contrast loss function to keep an order constraint relation of a label space, so that the features of tail data can acquire information from the features of head data, and the accuracy of the tail data in age estimation is improved.
Drawings
FIG. 1 is a schematic diagram of the network architecture of the method of the present invention;
FIG. 2 is a schematic view of the attention mechanism of the present invention;
FIG. 3 is a schematic diagram of a region embedding unit according to the present invention;
FIG. 4 is a schematic diagram of window division according to the present invention;
FIG. 5 is a schematic diagram of a multi-layered perceptron of the present invention;
FIG. 6 is a schematic diagram of an offset window according to the present invention;
FIG. 7 is a schematic diagram of a feature extraction network according to the present invention;
FIG. 8 is a schematic diagram of a distance estimation network according to the present invention;
FIG. 9 is a diagram of a prediction header according to the present invention.
The specific implementation mode is as follows:
step 1: preprocessing the data set;
acquiring a MOPRPH II data set, wherein the MORPPH II data set is an age estimation data set and comprises 55134 images; firstly, carrying out face alignment on an image, and normalizing the image to [ -1,1 ]; then randomly selecting 80% of data as a training set, and taking the rest 20% as a test set; the images in the training set were randomly cropped to 224 x 224 size and randomly mirror-flipped, and the images in the test set were only cropped in the center, again to 224 x 224 size.
And 2, step: constructing a feature extraction network;
1) constructing a region Embedding unit (Patch Embedding); the image is first divided into sub-regions, the 224 × 224 image is divided into 56 × 56 sub-regions of 4 × 4, then the 4 × 4 convolution kernel is used to convolute the sub-region divided image in a step 4 manner, and normalization is performed using layer normalization, and the structure of the region embedding unit is shown in fig. 3.
2) Constructing window division; on the basis of the subareas, two forms of window division are respectively carried out, and a division mode I and a division mode II are respectively used for representing the two division methods, wherein the division mode II is that the window is respectively translated by half window size towards the right and downwards on the basis of the division mode I, and the window at the upper left is circularly shifted. The first division mode and the second division mode refer to the left sub-image and the right sub-image of fig. 4, the thin line boxes represent sub-regions, the thick line boxes represent windows, and the numbers in the drawings represent sub-region labels.
3) Constructing a multilayer perceptron module; the multi-layered perceptron Module (MLP) consists of two fully-connected layers, where the first fully-connected layer is followed by activation using the GELU function, and the second fully-connected layer does not use the activation function. The structure diagram of the multilayer perceptron module is shown in fig. 5.
4) Constructing an offset window module; the shifting window module is composed of two continuous multi-head attentions, the difference of the two multi-head attentions lies in that windows used in attention calculation are different, namely two dividing methods corresponding to a dividing mode I and a dividing mode II, the multi-head attentions calculated on the dividing modes of the dividing mode I and the dividing mode II are marked as W-MSA and SW-MSA, when the shifting window module calculates each multi-head attention, layer normalization needs to be firstly carried out, and layer normalization and nonlinear transformation are carried out on output of the multi-head attention. The structure diagram of the offset window module is shown in fig. 6.
5) Constructing a subregion merging module; the sub-region merging module (Patch Merge) is a down-sampling module that merges adjacent 2 × 2 sub-regions into one sub-region while doubling the channel dimension of the feature.
6) Constructing a feature extraction network; the shift window conversion network firstly embeds the input image input area into a unit to obtain the characteristics; then, carrying out feature transformation and extraction through a plurality of offset window modules, and connecting a sub-region merging module to carry out dimension reduction on the features; then, repeatedly stacking the offset window module and the sub-region merging module to construct an offset window transformation network; and finally, performing expansion and linear transformation on the output characteristics. A schematic diagram of the structure of the feature extraction network is shown in fig. 7.
And step 3: constructing a distance estimation network; the distance estimation network is composed of 3 layers of full connection layers, the first two layers of full connection layers are activated by using the ReLU after being output, and the last layer of full connection layer is not activated. A schematic diagram of the distance estimation network is shown in fig. 8.
And 4, step 4: constructing a measuring head in advance; the prediction header is composed of 2 layers of fully-connected layers, the first layer of fully-connected layers is activated by using softmax after being output, and the second layer of fully-connected layers is not activated. A schematic diagram of the structure of the measuring probe is shown in fig. 9.
And 5: designing a loss function;
1) constructing characteristic contrast loss; first construct image and label pairs from the training set in step 1 { (x) 1 ,y 1 ),…,(x N ,y N ) Then inputting the images in the sample pairs into a feature extraction network to obtain corresponding features (f) 1 ,…,f N ) If the distance estimation network is denoted as DE, the characteristic contrast loss is as follows:
feature contrast loss calculates two distances, first DE (f) i ,f j ) Calculates the characteristic f i And f j Is followed by y i -y j Calculating the L1 distance between two labels corresponding to the features, and optimizing the feature distance to enable the feature contrast loss to be close to the label distance, so that information interaction can be generated between the features, and the order constraint relation of the label space is reserved.
2) Constructing a prediction loss; with the prediction head denoted as G, the predicted loss is calculated by:
3) constructing a total loss function; the total loss function is formed by weighted summation of the characteristic contrast loss and the prediction loss, and the weight coefficient is lambda and has the following form:
step 6: training network parameters; performing network training by using the total loss function constructed in the step 5, and updating parameters of the feature extraction network, the distance estimation network and the prediction head;
and 7: and in the testing stage, the trained feature extraction network and the prediction head in the step 6 are selected, for a given picture, the feature extraction network is firstly input to extract features, and then the features are input into the prediction head to obtain the predicted age.
Claims (1)
1. An age estimation method based on feature contrast loss, the method comprising:
step 1: preprocessing the data set;
firstly, acquiring an image data set for age estimation, then carrying out face alignment on the images, and normalizing to [ -1,1 ]; then randomly dividing the training set and the test set; randomly cutting the images in the training set, and randomly turning the images in the training set in a mirror image manner, wherein the images in the testing set are only cut in the center, and the cut size is the same as the cut size of the images in the training set;
step 2: constructing a feature extraction network;
1) constructing an area embedding unit;
firstly, dividing an image into sub-regions, dividing the image obtained in the step (1) into a plurality of a multiplied by a sub-regions, then using a multiplied by a convolution kernel to convolute each divided sub-region image in a step a mode, and using layer normalization to normalize the images;
2) constructing window division;
on the basis of the sub-area, two forms of window division are respectively carried out by adopting a first division mode and a second division mode, wherein the division method of the first division mode comprises the following steps: selecting adjacent b multiplied by b sub-regions as a window, and dividing the whole image; the second division mode is that the window is respectively translated by half window size to the right and downwards on the basis of the first division mode, and the upper left window is circularly shifted;
3) constructing a multilayer perceptron module;
the multi-layer perceptron module is composed of two full connection layers, wherein the first full connection layer is activated by using a GELU function, and the second full connection layer does not use an activation function;
4) constructing an offset window module;
the offset window module is composed of two continuous multi-head attention modules, wherein one of the multi-head attention modules calculates multi-head attention on the basis of a division mode one and is recorded as W-MSA; the other calculates the attention of the multiple heads in the division mode II and is recorded as SW-MSA;
5) constructing a subregion merging module;
the sub-region combining module is a down-sampling module, combines adjacent a multiplied by a sub-regions into a sub-region, and simultaneously enlarges the channel dimension of the features by a times;
6) constructing a feature extraction network;
firstly, an input image input area embedding unit acquires characteristics; then, carrying out feature transformation and extraction through a plurality of offset window modules, and then connecting a sub-region merging module to carry out dimension reduction on the features; then, repeatedly stacking the offset window module and the sub-region merging module to construct an offset window transformation network; finally, the output characteristics are expanded and linearly transformed;
and step 3: constructing a distance estimation network;
the distance estimation network consists of 3 layers of full connection layers, the first two layers of full connection layers are activated by using the ReLU after being output, and the last layer of full connection layer is not activated;
and 4, step 4: constructing a measuring head in advance;
the prediction head is composed of 2 layers of full connection layers, the first layer of full connection layer is activated by using softmax after being output, and the second layer of full connection layer is not activated;
and 5: determining a loss function;
1) constructing characteristic contrast loss;
first construct image and label sample pairs from the training set of step 1 { (x) 1 ,y 1 ),…(x n ,y n )…,(x N ,y N )},x n Representing an image, y n Representing labels, N representing total number, then inputting the images in the sample pairs into a feature extraction network to obtain corresponding features (f) 1 ,…,f N ) If the distance estimation network is denoted as DE, the characteristic contrast loss is as follows:
2) constructing a prediction loss;
if the prediction head is denoted as G, the loss L is predicted pred Calculated from the following formula:
wherein, G (f) i ) Representing a feature f i The predicted result of (2);
3) constructing a total loss function;
the total loss function is formed by weighted summation of the characteristic contrast loss and the prediction loss, and the weight coefficient is lambda and has the following form:
and 6: training network parameters;
performing network training by using the total loss function constructed in the step 5, and updating parameters of the feature extraction network, the distance estimation network and the prediction head;
and 7: and in the testing stage, the trained feature extraction network and the prediction head in the step 6 are selected, for a given picture, the feature extraction network is firstly input to extract features, and then the features are input into the prediction head to obtain the predicted age.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210731136.4A CN115063862B (en) | 2022-06-24 | 2022-06-24 | Age estimation method based on feature contrast loss |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210731136.4A CN115063862B (en) | 2022-06-24 | 2022-06-24 | Age estimation method based on feature contrast loss |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115063862A true CN115063862A (en) | 2022-09-16 |
CN115063862B CN115063862B (en) | 2024-04-23 |
Family
ID=83202233
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210731136.4A Active CN115063862B (en) | 2022-06-24 | 2022-06-24 | Age estimation method based on feature contrast loss |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115063862B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140300758A1 (en) * | 2013-04-04 | 2014-10-09 | Bao Tran | Video processing systems and methods |
US20170351905A1 (en) * | 2016-06-06 | 2017-12-07 | Samsung Electronics Co., Ltd. | Learning model for salient facial region detection |
CN108171209A (en) * | 2018-01-18 | 2018-06-15 | 中科视拓(北京)科技有限公司 | A kind of face age estimation method that metric learning is carried out based on convolutional neural networks |
CN112950631A (en) * | 2021-04-13 | 2021-06-11 | 西安交通大学口腔医院 | Age estimation method based on saliency map constraint and X-ray head skull positioning lateral image |
CN114038055A (en) * | 2021-10-27 | 2022-02-11 | 电子科技大学长三角研究院(衢州) | Image generation method based on contrast learning and generation countermeasure network |
-
2022
- 2022-06-24 CN CN202210731136.4A patent/CN115063862B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140300758A1 (en) * | 2013-04-04 | 2014-10-09 | Bao Tran | Video processing systems and methods |
US20170351905A1 (en) * | 2016-06-06 | 2017-12-07 | Samsung Electronics Co., Ltd. | Learning model for salient facial region detection |
CN108171209A (en) * | 2018-01-18 | 2018-06-15 | 中科视拓(北京)科技有限公司 | A kind of face age estimation method that metric learning is carried out based on convolutional neural networks |
CN112950631A (en) * | 2021-04-13 | 2021-06-11 | 西安交通大学口腔医院 | Age estimation method based on saliency map constraint and X-ray head skull positioning lateral image |
CN114038055A (en) * | 2021-10-27 | 2022-02-11 | 电子科技大学长三角研究院(衢州) | Image generation method based on contrast learning and generation countermeasure network |
Non-Patent Citations (4)
Title |
---|
HONGYU PAN等: "Revised Contrastive Loss for Robust Age Estimation from Face", 《2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)》, 29 November 2018 (2018-11-29) * |
LILI PAN, MINGMING MENG, YAZHOU REN, YALI ZHENG, ZENGLIN XU: "Self-Paced Deep Regression Forests with Consideration of Ranking Fairness", 《 COMPUTER VISION AND PATTERN RECOGNITION》, 11 June 2022 (2022-06-11) * |
孟明明: "面向人脸面部属性分析的深度判别模型研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 January 2023 (2023-01-15) * |
李大湘;马宣;任娅琼;刘颖;: "基于深度代价敏感CNN的年龄估计算法", 模式识别与人工智能, no. 02, 15 February 2020 (2020-02-15) * |
Also Published As
Publication number | Publication date |
---|---|
CN115063862B (en) | 2024-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113673489A (en) | Video group behavior identification method based on cascade Transformer | |
CN111738363B (en) | Alzheimer disease classification method based on improved 3D CNN network | |
CN113010656B (en) | Visual question-answering method based on multi-mode fusion and structural control | |
CN115731441A (en) | Target detection and attitude estimation method based on data cross-modal transfer learning | |
Liu et al. | Iterative relaxed collaborative representation with adaptive weights learning for noise robust face hallucination | |
CN115223082A (en) | Aerial video classification method based on space-time multi-scale transform | |
CN112651316A (en) | Two-dimensional and three-dimensional multi-person attitude estimation system and method | |
CN112132878A (en) | End-to-end brain nuclear magnetic resonance image registration method based on convolutional neural network | |
CN114241191A (en) | Cross-modal self-attention-based non-candidate-box expression understanding method | |
CN117574904A (en) | Named entity recognition method based on contrast learning and multi-modal semantic interaction | |
CN114723787A (en) | Optical flow calculation method and system | |
CN116519106B (en) | Method, device, storage medium and equipment for determining weight of live pigs | |
CN115063862A (en) | Age estimation method based on feature contrast loss | |
CN117409279A (en) | Multi-mode information fusion key method based on data privacy protection | |
CN116386079A (en) | Domain generalization pedestrian re-recognition method and system based on meta-graph perception | |
CN115861384A (en) | Optical flow estimation method and system based on generation of countermeasure and attention mechanism | |
CN115662565A (en) | Medical image report generation method and equipment integrating label information | |
CN116012903A (en) | Automatic labeling method and system for facial expressions | |
CN114170460A (en) | Multi-mode fusion-based artwork classification method and system | |
CN114155406A (en) | Pose estimation method based on region-level feature fusion | |
CN113313210A (en) | Method and apparatus for data processing | |
Burugupalli | Image classification using transfer learning and convolution neural networks | |
CN111626923B (en) | Image conversion method based on novel attention model | |
CN117392392B (en) | Rubber cutting line identification and generation method | |
Chatterjee | Adaptive machine learning algorithms with python |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |