CN116189272A - Facial expression recognition method and system based on feature fusion and attention mechanism - Google Patents
Facial expression recognition method and system based on feature fusion and attention mechanism Download PDFInfo
- Publication number
- CN116189272A CN116189272A CN202310493454.6A CN202310493454A CN116189272A CN 116189272 A CN116189272 A CN 116189272A CN 202310493454 A CN202310493454 A CN 202310493454A CN 116189272 A CN116189272 A CN 116189272A
- Authority
- CN
- China
- Prior art keywords
- facial expression
- feature
- output
- neural network
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000008921 facial expression Effects 0.000 title claims abstract description 63
- 238000000034 method Methods 0.000 title claims abstract description 49
- 230000007246 mechanism Effects 0.000 title claims abstract description 40
- 230000004927 fusion Effects 0.000 title claims abstract description 38
- 239000013598 vector Substances 0.000 claims abstract description 53
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 34
- 230000014509 gene expression Effects 0.000 claims abstract description 15
- 238000003062 neural network model Methods 0.000 claims abstract description 15
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 15
- 238000011176 pooling Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 10
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 13
- 238000012545 processing Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 2
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000004630 mental health Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Evolutionary Computation (AREA)
- Human Computer Interaction (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及一种人脸表情识别方法,属于图像处理技术领域。The invention relates to a method for recognizing facial expressions, and belongs to the technical field of image processing.
背景技术Background Art
人脸面部表情是除语言之外,能表达内心情感的重要载体。近几年来,人脸表情识别(fer)在物联网、人工智能、心理健康评估等领域大放异彩,得社会各界的广泛关注与应用。Facial expressions are an important carrier for expressing inner emotions besides language. In recent years, facial expression recognition (FER) has shined in the fields of Internet of Things, artificial intelligence, mental health assessment, etc., and has received extensive attention and application from all walks of life.
但是现有的表情识别主要基于人工设置的特征和机器学习的方法,这些方法主要存在下述缺陷:人工设置的特征通常会带来不可避免的人为因素和误差、必须要以人为干预辅佐才能从原始图像中提取有用的识别特征、难以从原始图像中获取深入的高语义特征以及深度特征。However, existing expression recognition is mainly based on manually set features and machine learning methods. These methods have the following main defects: manually set features usually bring inevitable human factors and errors, human intervention is necessary to extract useful recognition features from the original image, and it is difficult to obtain in-depth high-semantic features and deep features from the original image.
为了获取深层次的高语义特征,卷积的层数也越来越多,但通过增加网络层数的方法来增强网络的学习能力的方法并不总是可行的,因为网络层数到达一定的深度之后,再增加网络层数,那么网络就会出现随机梯度消失的问题,也会导致网络的准确率下降。为了解决这一问题,传统的方法是采用数据初始化和正则化的方法,这解决了梯度消失的问题,但是网络准确率的问题并没有改善。In order to obtain deep semantic features, the number of convolution layers is increasing. However, it is not always feasible to enhance the learning ability of the network by increasing the number of network layers. After the network layer reaches a certain depth, if the number of network layers is increased, the network will have the problem of random gradient disappearance, which will also cause the network accuracy to decrease. To solve this problem, the traditional method is to use data initialization and regularization methods, which solves the problem of gradient disappearance, but the problem of network accuracy is not improved.
发明内容Summary of the invention
本发明所要解决的技术问题:在人脸表情识别过程中,如何获取深层次的高语义特征,进而获得更好的人脸表情识别效果。The technical problem to be solved by the present invention is: in the process of facial expression recognition, how to obtain deep-level high-semantic features to obtain better facial expression recognition effect.
为解决上述技术问题,本发明提供一种基于特征融合和注意力机制的人脸表情识别方法,包括以下步骤:In order to solve the above technical problems, the present invention provides a facial expression recognition method based on feature fusion and attention mechanism, comprising the following steps:
(1)对获取的人脸表情数据集进行预处理;(1) Preprocess the acquired facial expression dataset;
(2)构建人脸表情识别神经网络模型,包括ResNet50卷积神经网络和带有多头自注意力机制的Transformer模型;(2) Construct a neural network model for facial expression recognition, including a ResNet50 convolutional neural network and a Transformer model with a multi-head self-attention mechanism;
(3)提取ResNet50卷积神经网络的两个中间层特征以及末层特征,所述中间层特征包含图像结构信息,末层特征包含语义特征;(3) Extracting two middle layer features and the last layer features of the ResNet50 convolutional neural network, wherein the middle layer features contain image structure information and the last layer features contain semantic features;
(4)将两个中间层输出的特征图沿通道维度进行拼接,得到具有权重的特征向量,从而实现不同层次特征的融合;(4) The feature maps output by the two intermediate layers are concatenated along the channel dimension to obtain a feature vector with weights, thereby achieving the fusion of features at different levels;
(5)对末层特征、具有权重的特征向量同时进行卷积操作,分别得到输出结果一和输出结果二,将输出结果一和输出结果二输入Transformer网络模型;(5) Perform convolution operations on the last layer features and the feature vector with weights at the same time to obtain
(6)在Transformer网络模型中,将输出结果一和输出结果二进行二次拼接;(6) In the Transformer network model,
(7)将二次拼接后的结果进行下采样,再送入全连接层,最后输入softmax分类器进行分类,得到表情分类结果。(7) The result after the second splicing is downsampled, sent to the fully connected layer, and finally input into the softmax classifier for classification to obtain the expression classification result.
前述的一种基于特征融合和注意力机制的人脸表情识别方法,在步骤(1)中,对获取的人脸表情数据集进行预处理,包括以下步骤:In the aforementioned facial expression recognition method based on feature fusion and attention mechanism, in step (1), the acquired facial expression dataset is preprocessed, including the following steps:
创建PIL对象,使人脸表情数据集中所有图像的操作均基于PIL对象;Create a PIL object so that all operations on images in the facial expression dataset are based on the PIL object;
将人脸表情图像调整为224×224大小,按照给定的输入数据被翻转的概率p随机水平翻转输入数据;The facial expression image is resized to 224×224, and the input data is randomly flipped horizontally according to the given probability p of the input data being flipped;
对水平翻转后的输入数据进行归一化处理;Normalize the input data after horizontal flipping;
将归一化处理后的数据集图像加载至人脸表情识别神经网络模型。The normalized dataset images are loaded into the facial expression recognition neural network model.
前述的一种基于特征融合和注意力机制的人脸表情识别方法,在步骤(2)中,所述ResNet50卷积神经网络结构包括七个部分:In the aforementioned facial expression recognition method based on feature fusion and attention mechanism, in step (2), the ResNet50 convolutional neural network structure includes seven parts:
第一部分用于对输入图像填充参数;The first part is used to fill the parameters of the input image;
第二部分不包含残差块,用于对输入图像数据依次进行卷积、正则化、激活函数、最大池化的计算;The second part does not contain residual blocks and is used to perform convolution, regularization, activation function, and maximum pooling calculations on the input image data in sequence;
第三部分、第四部分、第五部分、第六部分各包含若干个残差块block,其中每个残差块block均有三层卷积;The third, fourth, fifth and sixth parts each contain several residual blocks, each of which has three layers of convolution.
第七部分包括一个平均池化层和全连接层,第六部分输出的图像数据依次经过一个平均池化层和全连接层后输出结果特征图。The seventh part includes an average pooling layer and a fully connected layer. The image data output by the sixth part passes through an average pooling layer and a fully connected layer in turn and then outputs the resulting feature map.
前述的一种基于特征融合和注意力机制的人脸表情识别方法,在步骤(4)中,将两个中间层输出的特征图沿通道维度进行拼接,两个中间层输出的特征图中特征向量大小分别为512×60×60和1024×60×60,经过拼接后,得到具有权重的特征向量1536×60×60。In the aforementioned facial expression recognition method based on feature fusion and attention mechanism, in step (4), the feature maps output by the two intermediate layers are concatenated along the channel dimension. The feature vector sizes in the feature maps output by the two intermediate layers are 512×60×60 and 1024×60×60, respectively. After concatenation, a weighted feature vector of 1536×60×60 is obtained.
前述的一种基于特征融合和注意力机制的人脸表情识别方法,在步骤(5)中,对末层特征、具有权重的特征向量进一步分别通过两个卷积层,所述两个卷积层分别为(1 ×1)卷积层和(3 × 3)卷积层,所述(1 × 1)卷积层用于将原通道数2048 压缩成256,所述(3 × 3)卷积层用于特征融合,分别得到输出结果一和输出结果二。In the aforementioned facial expression recognition method based on feature fusion and attention mechanism, in step (5), the last layer features and the feature vector with weights are further passed through two convolutional layers respectively, and the two convolutional layers are respectively a (1 × 1) convolutional layer and a (3 × 3) convolutional layer. The (1 × 1) convolutional layer is used to compress the original channel number 2048 into 256, and the (3 × 3) convolutional layer is used for feature fusion, and output results one and output results two are obtained respectively.
前述的一种基于特征融合和注意力机制的人脸表情识别方法,在步骤(6)中,Q、K、V分别代表查询向量、关键向量、值向量,关键向量与值向量以成对形式出现,将步骤(5)中的输出结果一和输出结果二输入RKTM模块,分别做为查询向量和关键向量;In the aforementioned facial expression recognition method based on feature fusion and attention mechanism, in step (6), Q , K , and V represent the query vector, the key vector, and the value vector, respectively, and the key vector and the value vector appear in pairs. The
在常微分方程中,欧拉公式表示为下式:In ordinary differential equations, Euler's formula is expressed as follows:
, ,
ResNet50卷积神经网络所采用的残差连接表示为 :The residual connection used by the ResNet50 convolutional neural network is expressed as:
, ,
使用二阶龙格库特公式对Transformer网络模型进行求解得到以下式:The second-order Runge-Coot formula is used to solve the Transformer network model to obtain the following formula:
, ,
其中,表示时间,表示Transformer网络模型,为来自ResNet50卷积神经网络的模型参数,、分别代表RKTM模块中的注意力子模块一和注意力子模块二;in, Indicates time, Represents the Transformer network model, are the model parameters from the ResNet50 convolutional neural network, , They represent the
对于一张输入图像,首先使用ResNet50卷积神经网络提取特征得到,其中为特征,R是实数集,分别表示通道数、长、宽,对多维数据进行降维后得到,设参数一,则有,特征的大小记作,其中,b表示每一批训练的样本大小;For an input image First, we use the ResNet50 convolutional neural network to extract features ,in is characterized, R is the set of real numbers, Represent the number of channels, length, and width respectively. After reducing the dimensionality of multidimensional data, , let
Transformer网络模型的计算过程为:Transformer Network Model The calculation process is:
设注意力机制的头部标签(head)为,将特征变形为,其中Assume the head label of the attention mechanism is , the feature Transformed into ,in
参数二;
交换参数二和参数一两个通道后得到,设置矩阵一、矩阵二、矩阵三为可学习参数,则得到Exchange
, ,
将查询向量与关键向量的转置矩阵相乘,并在最后维进行操作,得到注意力分数矩阵,运算过程如下:The query vector With key vector The transposed matrix of Multiply and perform in the last dimension Operation, get the attention score matrix , the operation process is as follows:
, ,
再将注意力分数矩阵与值向量相乘得到输出Then combine the attention score matrix and the value vector Multiply to get the output
, ,
输出的形状为,将式(6)输出结果带入到二阶龙格库特公式得到Transformer网络模型表达式。Output The shape is , Substitute the output of formula (6) into the second-order Runge-Coot formula to obtain the Transformer network model expression.
一种基于特征融合和注意力机制的人脸表情识别系统,包括以下模块:A facial expression recognition system based on feature fusion and attention mechanism, including the following modules:
预处理模块:对获取的人脸表情数据集进行预处理;Preprocessing module: preprocess the acquired facial expression data set;
神经网络模型构建模块:构建人脸表情识别神经网络模型,包括ResNet50卷积神经网络和带有多头自注意力机制的Transformer模型;Neural network model building module: Build a neural network model for facial expression recognition, including the ResNet50 convolutional neural network and the Transformer model with a multi-head self-attention mechanism;
信息提取模块:提取ResNet50卷积神经网络的两个中间层特征以及末层特征,所述中间层特征包含图像结构信息,末层特征包含语义特征;Information extraction module: extracts two middle layer features and the last layer features of the ResNet50 convolutional neural network, wherein the middle layer features contain image structure information and the last layer features contain semantic features;
一次拼接模块:将两个中间层输出的特征图沿通道维度进行拼接,得到具有权重的特征向量,从而实现特征融合;One-step concatenation module: concatenates the feature maps output by the two intermediate layers along the channel dimension to obtain a feature vector with weights, thereby achieving feature fusion;
卷积操作模块:对末层特征、具有权重的特征向量同时进行卷积操作,分别得到输出结果一和输出结果二,将输出结果一和输出结果二输入Transformer网络模型;Convolution operation module: Perform convolution operation on the last layer features and the feature vector with weights at the same time to obtain
二次拼接模块:在Transformer网络模型中,将输出结果一和输出结果二进行二次拼接;Secondary splicing module: In the Transformer network model,
分类模块:将二次拼接后的结果进行下采样,再送入全连接层,最后输入softmax分类器进行分类,从而得到表情分类结果。Classification module: The result after the second splicing is downsampled, then sent to the fully connected layer, and finally input into the softmax classifier for classification to obtain the expression classification result.
一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现如上述的基于特征融合和注意力机制的人脸表情识别方法。A computer-readable storage medium stores a computer program, which, when executed by a processor, implements the facial expression recognition method based on feature fusion and attention mechanism as described above.
一种嵌入式装置,所述嵌入式装置配置有可信执行环境,所述可信执行环境,包括:An embedded device is configured with a trusted execution environment, wherein the trusted execution environment comprises:
存储器,用于存储指令;A memory for storing instructions;
处理器,用于执行所述指令,使得所述嵌入式装置执行实现如上述的基于特征融合和注意力机制的人脸表情识别方法。The processor is used to execute the instructions so that the embedded device implements the facial expression recognition method based on feature fusion and attention mechanism as described above.
本发明所达到的有益效果:本发明的基于特征融合和注意力机制的人脸表情识别方法,基于ResNet50神经网络,其中残差模块可以解决梯度问题,而ResNet50神经网络的深层次网络也使其表达的特征更好,相应的检测或分类的性能更强,可以减少参数量,也能在一定程度上减少计算量。Transformer模型的基本特点是引入了自注意力机制(Self-Attention)和残差连接结构(Residual Connection),相比传统的序列模型,能够从全局上充分考虑输入序列中所有位置的信息,从而能够有效训练更深层的网络,整体达到了一个双重提升识别准确率的效果,同时加速了训练进程。The beneficial effects achieved by the present invention: The facial expression recognition method based on feature fusion and attention mechanism of the present invention is based on ResNet50 neural network, in which the residual module can solve the gradient problem, and the deep network of ResNet50 neural network also makes its expression features better, and the corresponding detection or classification performance is stronger, which can reduce the number of parameters and the amount of calculation to a certain extent. The basic feature of the Transformer model is the introduction of the self-attention mechanism (Self-Attention) and the residual connection structure (Residual Connection). Compared with the traditional sequence model, it can fully consider the information of all positions in the input sequence from a global perspective, so that it can effectively train a deeper network, and achieve a double effect of improving the recognition accuracy as a whole, while accelerating the training process.
同时,本发明通过对二阶龙格库特公式进行求解可以得到泛化能力更强的模型,即在非本实施例训练数据上也拥有着良好分类的能力。At the same time, the present invention can obtain a model with stronger generalization ability by solving the second-order Runge-Coot formula , that is, it has good classification ability even on training data other than that of this embodiment.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明实施例1的整体网络结构示意图;FIG1 is a schematic diagram of the overall network structure of
图2为ResNet50卷积神经网络网络结构示意图;Figure 2 is a schematic diagram of the network structure of the ResNet50 convolutional neural network;
图3为RKTM模块的结构示意图;FIG3 is a schematic diagram of the structure of the RKTM module;
图4为本发明方法的识别准确率示意图;FIG4 is a schematic diagram of the recognition accuracy of the method of the present invention;
图5为直接训练ResNet50卷积神经网络的准确率示意图。Figure 5 is a schematic diagram of the accuracy of directly training the ResNet50 convolutional neural network.
具体实施方式DETAILED DESCRIPTION
以下结合附图和具体实施例对本发明的技术方案作进一步说明。The technical solution of the present invention is further described below in conjunction with the accompanying drawings and specific embodiments.
实施例1Example 1
本实施例中,使用的是人脸表情公共数据集fer2013,此数据集由35886张不同人脸表情图像组成,每张图像是大小固定为48×48的灰度图像,共有7个类别的表情,为生气(anger)、厌恶(disgust)、恐惧(fear)、开心(happiness)、伤心(sadness)、惊讶(surprise)、中性(neutral),官方将表情相关数据保存在cvs文件,可转化后存储为图像数据。In this embodiment, the public facial expression dataset fer2013 is used. This dataset consists of 35,886 images of different facial expressions. Each image is a grayscale image with a fixed size of 48×48. There are 7 categories of expressions, namely anger, disgust, fear, happiness, sadness, surprise, and neutral. The official saves the expression-related data in a cvs file, which can be converted and stored as image data.
一种基于特征融合和注意力机制的人脸表情识别方法,包括以下步骤:A facial expression recognition method based on feature fusion and attention mechanism includes the following steps:
1)对获取的人脸表情数据集进行预处理,包括以下步骤:1) Preprocessing the acquired facial expression dataset includes the following steps:
创建PIL对象,使人脸表情数据集中所有图像的操作均基于PIL对象;Create a PIL object so that all operations on images in the facial expression dataset are based on the PIL object;
将人脸表情图像调整为224×224大小,按照默认给定的输入数据被翻转的概率p=0.5随机水平翻转输入数据;The facial expression image is resized to 224×224, and the input data is randomly flipped horizontally with a default probability of p=0.5.
对水平翻转后的输入数据进行归一化处理,mean平均值:[0.485, 0.456,0.406],std标准差:[0.229, 0.224, 0.225];Normalize the input data after horizontal flipping, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225];
将归一化处理后的数据集图像加载至人脸表情识别神经网络模型,通过预处理,对数据集中的数据进行增强以丰富训练数据。The normalized dataset images are loaded into the facial expression recognition neural network model, and the data in the dataset are enhanced through preprocessing to enrich the training data.
2)构建人脸表情识别神经网络模型,图1为实施例1的整体神经网络结构示意图,包括ResNet50卷积神经网络和带有多头自注意力机制的Transformer模型;Transformer模型相较于卷积神经网络具有全局感受野的特点,任意两个像素的距离是相同的,并且可以衡量整个特征图中向量间的关系,RKTM作为编码器的主要编码功能模块。2) Construct a neural network model for facial expression recognition. Figure 1 is a schematic diagram of the overall neural network structure of Example 1, including a ResNet50 convolutional neural network and a Transformer model with a multi-head self-attention mechanism. Compared with the convolutional neural network, the Transformer model has the characteristics of a global receptive field, the distance between any two pixels is the same, and the relationship between vectors in the entire feature map can be measured. RKTM is the main encoding function module of the encoder.
如图2所示,所述ResNet50卷积神经网络结构包括七个部分:As shown in Figure 2, the ResNet50 convolutional neural network structure includes seven parts:
第一部分(stage0)用于对输入图像填充参数(padding),参数为(3,3);The first part (stage0) is used to fill the input image with padding parameters (3, 3).
第二部分(stage1)不包含残差块,用于对输入图像数据依次进行卷积、正则化、激活函数、最大池化的计算;The second part (stage 1) does not contain residual blocks and is used to perform convolution, regularization, activation function, and maximum pooling calculations on the input image data in sequence;
第三部分、第四部分、第五部分、第六部分均各包含若干个残差块block,数量分别是3、4、6、3,其中每个残差块block均有三层卷积,卷积核的大小分别是1×1、3×3、1×1,进行卷积操作时步长均为1,但第二次卷积的填充参数(padding)为(1,1),即在输入图像数据的四周补充了一圈0;The third, fourth, fifth and sixth parts each contain several residual blocks, the number of which are 3, 4, 6 and 3 respectively. Each residual block has three layers of convolution, and the sizes of the convolution kernels are 1×1, 3×3 and 1×1 respectively. The step size is 1 during the convolution operation, but the padding parameter (padding) of the second convolution is (1, 1), that is, a circle of 0 is added around the input image data;
第七部分包括一个平均池化层和全连接层,第六部分输出的图像数据依次经过一个平均池化层和全连接层(avgpool, fc)后输出结果,使大小为224×224的输入图像成为大小56×56的特征图,极大减少了存储空间。The seventh part includes an average pooling layer and a fully connected layer. The image data output by the sixth part passes through an average pooling layer and a fully connected layer (avgpool, fc) in turn and then outputs the result, so that the input image of size 224×224 becomes a feature map of size 56×56, which greatly reduces the storage space.
3)ResNet50卷积神经网络的第四部分、第五部分输出特征包含丰富的图像结构信息,均称为中间层;第六部分是ResNet50卷积神经网络的最后一个包含卷积操作的层,输出特征包含丰富的语义特征,称为末层;由于ResNet50卷积神经网络使用ImageNet进行训练,对应的是分类任务,所以特征提取器的末层输出为语义特征;3) The output features of the fourth and fifth parts of the ResNet50 convolutional neural network contain rich image structure information, both of which are called intermediate layers; the sixth part is the last layer of the ResNet50 convolutional neural network that contains convolution operations, and the output features contain rich semantic features, which is called the final layer; since the ResNet50 convolutional neural network is trained using ImageNet, which corresponds to a classification task, the final layer output of the feature extractor is a semantic feature;
4)将两个中间层输出的特征图沿通道维度进行拼接,两个中间层输出的特征图中特征向量大小分别为512×60×60和1024×60×60,经过拼接后,得到具有权重的特征向量1536×60×60,从而实现不同层次特征的融合;4) The feature maps output by the two middle layers are concatenated along the channel dimension. The feature vector sizes in the feature maps output by the two middle layers are 512×60×60 and 1024×60×60 respectively. After concatenation, a weighted feature vector of 1536×60×60 is obtained, thereby realizing the fusion of features at different levels.
5)对末层特征、具有权重的特征向量进一步分别通过两个卷积层,所述两个卷积层分别为(1 × 1)卷积层和(3 × 3)卷积层,所述(1 × 1)卷积层用于将原通道数2048压缩成256,所述(3 × 3)卷积层用于特征融合,分别得到输出结果一和输出结果二。此步骤可以确保两个输出结果能够顺利输入下面的Transformer网络模型,即一种使用自注意力机制的深度学习模型。5) The last layer features and the feature vectors with weights are further passed through two convolutional layers, namely, a (1 × 1) convolutional layer and a (3 × 3) convolutional layer. The (1 × 1) convolutional layer is used to compress the original number of channels 2048 into 256, and the (3 × 3) convolutional layer is used for feature fusion, and
6)在Transformer网络模型中,将输出结果一和输出结果二进行二次拼接;6) In the Transformer network model,
7)将拼接后的特征进行下采样,此目的是提取特征,再送入全连接层得到最终特征向量,最后输入softmax分类器进行计算,输出类别概率,从而得到表情分类结果;7) Downsample the concatenated features to extract features, then send them to the fully connected layer to get the final feature vector, and finally input them into the softmax classifier for calculation and output the category probability to get the expression classification result;
8)在fer2013人脸表情公共数据集上进行验证,如图4所示,本发明方法的识别准确率达到了65%,而直接训练ResNet50网络的识别率只能达到57%,如图5所示,本实施例通过特征融合和嵌入改进的注意力机制,将特定数据集上的人脸识别准确率提升了8%。大量的研究结果表明,利用卷积神经网络提取的深度特征对于平移、旋转、缩放等形变具有很好的鲁棒性,不同卷积层能够提取不同层级的特征,可以有效表征图像的局部和全局特性,故本实施例的模型具有更好的鲁棒性。8) Verification was performed on the fer2013 facial expression public dataset. As shown in FIG4, the recognition accuracy of the method of the present invention reached 65%, while the recognition rate of directly training the ResNet50 network could only reach 57%. As shown in FIG5, this embodiment improved the face recognition accuracy on a specific dataset by 8% through feature fusion and embedding an improved attention mechanism. A large number of research results show that the deep features extracted by convolutional neural networks have good robustness to deformations such as translation, rotation, and scaling. Different convolutional layers can extract features at different levels and can effectively characterize the local and global characteristics of the image. Therefore, the model of this embodiment has better robustness.
实施例 2Example 2
一种基于特征融合和注意力机制的人脸表情识别方法,包括以下步骤:A facial expression recognition method based on feature fusion and attention mechanism includes the following steps:
(1)对获取的人脸表情数据集进行预处理;(1) Preprocess the acquired facial expression dataset;
(2)构建人脸表情识别神经网络模型,包括ResNet50卷积神经网络和带有多头自注意力机制的Transformer模型;(2) Construct a neural network model for facial expression recognition, including a ResNet50 convolutional neural network and a Transformer model with a multi-head self-attention mechanism;
(3)提取ResNet50卷积神经网络的两个中间层特征以及末层特征,所述中间层特征包含图像结构信息,末层特征包含语义特征;(3) Extracting two middle layer features and the last layer features of the ResNet50 convolutional neural network, wherein the middle layer features contain image structure information and the last layer features contain semantic features;
(4)将两个中间层输出的特征图沿通道维度进行拼接,得到具有权重的特征向量,从而实现不同层次特征的融合;(4) The feature maps output by the two intermediate layers are concatenated along the channel dimension to obtain a feature vector with weights, thereby achieving the fusion of features at different levels;
(5)对末层特征、具有权重的特征向量同时进行卷积操作,分别得到输出结果一和输出结果二,将输出结果一和输出结果二输入Transformer网络模型;(5) Perform convolution operations on the last layer features and the feature vector with weights at the same time to obtain
(6)在Transformer网络模型中,将输出结果一和输出结果二进行二次拼接;(6) In the Transformer network model,
(7)将二次拼接后的结果进行下采样,再送入全连接层,最后输入softmax分类器进行分类,从而得到表情分类结果。(7) The result after the second splicing is downsampled, sent to the fully connected layer, and finally input into the softmax classifier for classification to obtain the expression classification result.
在步骤6)中,图3是RKTM模块的结构图,即多头自注意力模块,为Transformer模型中的编码器,其中,Q、K、V分别代表查询向量query、关键向量key、值向量value,关键向量与值向量以成对形式出现,均取决于输入值input。In step 6), Figure 3 is a structural diagram of the RKTM module, i.e., the multi-head self-attention module, which is the encoder in the Transformer model, where Q , K , and V represent the query vector query, the key vector key, and the value vector value, respectively. The key vector and the value vector appear in pairs and both depend on the input value input.
将步骤5)中的输出结果一和输出结果二输入RKTM模块,分别做为查询向量和关键向量;Input the
在常微分方程中,欧拉公式表示为下式:In ordinary differential equations, Euler's formula is expressed as follows:
, ,
ResNet50卷积神经网络所采用的残差连接表示为 :The residual connection used by the ResNet50 convolutional neural network is expressed as:
, ,
欧拉公式是龙格库特(Runge-Kutta)公式的一阶形式,使用二阶龙格库特公式对Transformer网络模型进行求解得到以下公式:The Euler formula is the first-order form of the Runge-Kutta formula. Using the second-order Runge-Kutta formula to solve the Transformer network model yields the following formula:
, ,
其中,表示时间,表示Transformer网络模型,为来自ResNet50卷积神经网络的模型参数,、分别代表RKTM模块中的注意力子模块一和注意力子模块二。in, Indicates time, Represents the Transformer network model, are the model parameters from the ResNet50 convolutional neural network, , They represent the
对于一张输入图像,首先使用ResNet50卷积神经网络提取特征得到,其中,为特征,R是实数集,分别表示通道数、长、宽,对多维数据进行降维后得到,设参数一,则有,由于深度学习使用小批量训练的方式,所以特征的大小记作, 其中,b即batch_size ,是每一批训练的样本大小;For an input image First, we use the ResNet50 convolutional neural network to extract features ,in, is characterized, R is the set of real numbers, Represent the number of channels, length, and width respectively. After reducing the dimensionality of multidimensional data, , let
Transformer网络模型的计算过程为:Transformer Network Model The calculation process is:
设注意力机制的头部标签(head)为,将特征变形为,其中Assume the head label of the attention mechanism is , the feature Transformed into ,in
参数二;
交换参数二和参数一两个通道后得到,设置矩阵一、矩阵二、矩阵三为可学习参数,则得到
, ,
将查询向量与关键向量的转置矩阵相乘,即点积计算,并在最后维进行操作,得到注意力分数矩阵,运算过程如下:The query vector With key vector The transposed matrix of Multiplication, that is, dot product calculation, is performed in the last dimension Operation, get the attention score matrix , the operation process is as follows:
, ,
注意力分数衡量了两两特征之间的相似度,再将注意力分数矩阵与值向量相乘得到输出The attention score measures the similarity between two features, and then the attention score matrix is combined with the value vector Multiply to get the output
, ,
输出的形状为,可以看出,与的空间维度保持一致,因此将式(6)输出结果带入到二阶龙格库特公式得到Transformer网络模型表达式。本步骤得到了Transformer网络模型的具体模型。Output The shape is , it can be seen that and The spatial dimension of is consistent, so the output result of formula (6) is substituted into the second-order Runge-Coot formula to obtain the Transformer network model Expression. This step obtains the specific model of the Transformer network model.
步骤5)中的输出结果一和输出结果二分别经过Transformer网络模型处理后,得到两个输出特征,将所述两个输出特征进行二次拼接操作,即64×7×7和64×7×7拼接后变成128×7×7,实现了特征的二次拼接。After the
一种基于特征融合和注意力机制的人脸表情识别系统,包括以下模块:A facial expression recognition system based on feature fusion and attention mechanism, including the following modules:
预处理模块:对获取的人脸表情数据集进行预处理;Preprocessing module: preprocess the acquired facial expression data set;
神经网络模型构建模块:构建人脸表情识别神经网络模型,包括ResNet50卷积神经网络和带有多头自注意力机制的Transformer模型;Neural network model building module: Build a neural network model for facial expression recognition, including the ResNet50 convolutional neural network and the Transformer model with a multi-head self-attention mechanism;
信息提取模块:提取ResNet50卷积神经网络的两个中间层特征以及末层特征,所述中间层特征包含图像结构信息,末层特征包含语义特征;Information extraction module: extracts two middle layer features and the last layer features of the ResNet50 convolutional neural network, wherein the middle layer features contain image structure information and the last layer features contain semantic features;
一次拼接模块:将两个中间层输出的特征图沿通道维度进行拼接,得到具有权重的特征向量,从而实现特征融合;One-step concatenation module: concatenates the feature maps output by the two intermediate layers along the channel dimension to obtain a weighted feature vector, thereby achieving feature fusion;
卷积操作模块:对末层特征、具有权重的特征向量同时进行卷积操作,分别得到输出结果一和输出结果二,将输出结果一和输出结果二输入Transformer网络模型;Convolution operation module: Perform convolution operation on the last layer features and the feature vector with weights at the same time to obtain
二次拼接模块:在Transformer网络模型中,将输出结果一和输出结果二进行二次拼接;Secondary splicing module: In the Transformer network model,
分类模块:将二次拼接后的结果进行下采样,再送入全连接层,最后输入softmax分类器进行分类,从而得到表情分类结果。Classification module: The result after the second splicing is downsampled, then sent to the fully connected layer, and finally input into the softmax classifier for classification to obtain the expression classification result.
一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现如上述的基于特征融合和注意力机制的人脸表情识别方法。A computer-readable storage medium stores a computer program, which, when executed by a processor, implements the facial expression recognition method based on feature fusion and attention mechanism as described above.
一种嵌入式装置,所述嵌入式装置配置有可信执行环境,所述可信执行环境,包括:An embedded device is configured with a trusted execution environment, wherein the trusted execution environment comprises:
存储器,用于存储指令;A memory for storing instructions;
处理器,用于执行所述指令,使得所述嵌入式装置执行实现如上述的基于特征融合和注意力机制的人脸表情识别方法。The processor is used to execute the instructions so that the embedded device implements the facial expression recognition method based on feature fusion and attention mechanism as described above.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to the flowcharts and/or block diagrams of the methods, devices (systems), and computer program products according to the embodiments of the present application. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the processes and/or boxes in the flowchart and/or block diagram, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing device to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing device generate a device for implementing the functions specified in one process or multiple processes in the flowchart and/or one box or multiple boxes in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to operate in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明技术原理的前提下,还可以做出若干改进和变形,这些改进和变形也应视为本发明的保护范围。The above is only a preferred embodiment of the present invention. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the technical principles of the present invention. These improvements and modifications should also be regarded as the scope of protection of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310493454.6A CN116189272B (en) | 2023-05-05 | 2023-05-05 | Facial expression recognition method and system based on feature fusion and attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310493454.6A CN116189272B (en) | 2023-05-05 | 2023-05-05 | Facial expression recognition method and system based on feature fusion and attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116189272A true CN116189272A (en) | 2023-05-30 |
CN116189272B CN116189272B (en) | 2023-07-07 |
Family
ID=86433105
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310493454.6A Active CN116189272B (en) | 2023-05-05 | 2023-05-05 | Facial expression recognition method and system based on feature fusion and attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116189272B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118365974A (en) * | 2024-06-20 | 2024-07-19 | 山东省水利科学研究院 | Water quality class detection method, system and equipment based on hybrid neural network |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110081881A (en) * | 2019-04-19 | 2019-08-02 | 成都飞机工业(集团)有限责任公司 | It is a kind of based on unmanned plane multi-sensor information fusion technology warship bootstrap technique |
CN111680541A (en) * | 2020-04-14 | 2020-09-18 | 华中科技大学 | A Multimodal Sentiment Analysis Method Based on Multidimensional Attention Fusion Network |
CN112418095A (en) * | 2020-11-24 | 2021-02-26 | 华中师范大学 | A Facial Expression Recognition Method and System Combined with Attention Mechanism |
CN112541409A (en) * | 2020-11-30 | 2021-03-23 | 北京建筑大学 | Attention-integrated residual network expression recognition method |
CN114764941A (en) * | 2022-04-25 | 2022-07-19 | 深圳技术大学 | Expression recognition method and device and electronic equipment |
CN115424313A (en) * | 2022-07-20 | 2022-12-02 | 河海大学常州校区 | Expression recognition method and device based on deep and shallow layer multi-feature fusion |
CN115862091A (en) * | 2022-11-09 | 2023-03-28 | 暨南大学 | Facial expression recognition method, device, equipment and medium based on Emo-ResNet |
CN115984930A (en) * | 2022-12-26 | 2023-04-18 | 中国电信股份有限公司 | Micro expression recognition method and device and micro expression recognition model training method |
-
2023
- 2023-05-05 CN CN202310493454.6A patent/CN116189272B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110081881A (en) * | 2019-04-19 | 2019-08-02 | 成都飞机工业(集团)有限责任公司 | It is a kind of based on unmanned plane multi-sensor information fusion technology warship bootstrap technique |
CN111680541A (en) * | 2020-04-14 | 2020-09-18 | 华中科技大学 | A Multimodal Sentiment Analysis Method Based on Multidimensional Attention Fusion Network |
CN112418095A (en) * | 2020-11-24 | 2021-02-26 | 华中师范大学 | A Facial Expression Recognition Method and System Combined with Attention Mechanism |
CN112541409A (en) * | 2020-11-30 | 2021-03-23 | 北京建筑大学 | Attention-integrated residual network expression recognition method |
CN114764941A (en) * | 2022-04-25 | 2022-07-19 | 深圳技术大学 | Expression recognition method and device and electronic equipment |
CN115424313A (en) * | 2022-07-20 | 2022-12-02 | 河海大学常州校区 | Expression recognition method and device based on deep and shallow layer multi-feature fusion |
CN115862091A (en) * | 2022-11-09 | 2023-03-28 | 暨南大学 | Facial expression recognition method, device, equipment and medium based on Emo-ResNet |
CN115984930A (en) * | 2022-12-26 | 2023-04-18 | 中国电信股份有限公司 | Micro expression recognition method and device and micro expression recognition model training method |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118365974A (en) * | 2024-06-20 | 2024-07-19 | 山东省水利科学研究院 | Water quality class detection method, system and equipment based on hybrid neural network |
Also Published As
Publication number | Publication date |
---|---|
CN116189272B (en) | 2023-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Depth-wise separable convolutions and multi-level pooling for an efficient spatial CNN-based steganalysis | |
CN109615582B (en) | A Face Image Super-resolution Reconstruction Method Based on Attribute Description Generative Adversarial Network | |
CN109948691B (en) | Image description generation method and device based on depth residual error network and attention | |
CN111626300B (en) | Image segmentation method and modeling method of image semantic segmentation model based on context perception | |
CN108596024B (en) | Portrait generation method based on face structure information | |
CN110796111B (en) | Image processing method, device, equipment and storage medium | |
CN110689599B (en) | 3D visual saliency prediction method based on non-local enhancement generation countermeasure network | |
CN109829959B (en) | Facial analysis-based expression editing method and device | |
CN107437096A (en) | Image classification method based on the efficient depth residual error network model of parameter | |
CN115565238B (en) | Face-changing model training method, face-changing model training device, face-changing model training apparatus, storage medium, and program product | |
CN110175248B (en) | A face image retrieval method and device based on deep learning and hash coding | |
CN111984772A (en) | Medical image question-answering method and system based on deep learning | |
CN113255788B (en) | Method and system for generating confrontation network face correction based on two-stage mask guidance | |
CN115393933A (en) | A video face emotion recognition method based on frame attention mechanism | |
CN116580278A (en) | Lip language identification method, equipment and storage medium based on multi-attention mechanism | |
CN115830666A (en) | A video expression recognition method and application based on spatio-temporal feature decoupling | |
CN117876793A (en) | A hyperspectral image tree species classification method and device | |
CN116189272A (en) | Facial expression recognition method and system based on feature fusion and attention mechanism | |
CN114821770B (en) | Text-to-image cross-modal person re-identification methods, systems, media and devices | |
CN115984700A (en) | Remote sensing image change detection method based on improved Transformer twin network | |
CN117036368A (en) | Image data processing method, device, computer equipment and storage medium | |
CN118918336A (en) | Image change description method based on visual language model | |
CN116778577A (en) | Lip language identification method based on deep learning | |
CN116311455A (en) | An Expression Recognition Method Based on Improved Mobile-former | |
CN114782995A (en) | A Human Interaction Behavior Detection Method Based on Self-Attention Mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20241218 Address after: Room 401, 4th Floor, Building 6, No. 6 Fengxin Road, Yuhuatai District, Nanjing City, Jiangsu Province, 210012 Patentee after: Nanjing EasyVision Cuizhi Technology Co.,Ltd. Country or region after: China Address before: No. 9, Wenyuan Road, Qixia District, Nanjing, Jiangsu 210033 Patentee before: NANJING University OF POSTS AND TELECOMMUNICATIONS Country or region before: China |