CN112183419B

CN112183419B - A Micro-Expression Classification Method Based on Optical Flow Generation Network and Reordering

Info

Publication number: CN112183419B
Application number: CN202011070119.8A
Authority: CN
Inventors: 柯逍; 林艳; 王俊强
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-10-09
Filing date: 2020-10-09
Publication date: 2022-06-10
Anticipated expiration: 2040-10-09
Also published as: CN112183419A

Abstract

The invention relates to a micro-expression classification method based on optical flow generation network and reordering. First, acquire the micro-expression dataset, extract the start frame and peak frame, and perform preprocessing; train the optical flow generation network to generate its optical flow features according to all the start frames and peak frames; then, the obtained optical flow image, According to the LOSO principle, it is divided into the corresponding training set and test set, and input the residual network for training; finally, the results obtained by the preliminary classification of the residual network are reordered to obtain the final result with higher accuracy.

Description

A Micro-Expression Classification Method Based on Optical Flow Generation Network and Reordering

技术领域technical field

本发明涉及模式识别与计算机视觉领域，特别是一种基于光流生成网络和重排序的微表情分类方法。The invention relates to the field of pattern recognition and computer vision, in particular to a micro-expression classification method based on an optical flow generation network and reordering.

背景技术Background technique

在情感计算领域，经常会有对人脸表情的研究，以此判断人类当时当刻的情感，但是人身为最高等的动物，有时候会伪装或者掩藏自己的情感，这种情况下，人们并不能从人脸面部的宏观表情中获取到有用的信息。为了能够从伪装的人脸表情中挖掘出有用的信息，埃克曼发现了一种短暂的、不自觉的、快速的面部情绪，即微表情，在人们试图隐藏某种真实的情绪时，它会被激发出来，不自觉地显现在人脸上。一个标准的微表情持续1/5到1/25秒，通常只出现在脸部的特定部位。In the field of affective computing, there are often studies on human facial expressions to judge the emotions of human beings at the moment, but human beings are the most advanced animals and sometimes disguise or hide their emotions. In this case, people do not No useful information can be obtained from the macroscopic expressions of the human face. In order to be able to mine useful information from disguised facial expressions, Ekman discovered a short-lived, involuntary, rapid facial emotion known as microexpressions, which occurs when people try to hide a real emotion. It will be stimulated and will appear on people's faces unconsciously. A standard microexpression lasts 1/5 to 1/25 of a second and usually appears only on a specific part of the face.

微表情在国家安全，犯罪审讯，以及医疗应用方面有着巨大的前景，但是微表情的微妙和简洁是对人类的肉眼构成了巨大挑战，因此，近年来人们提出了许多利用计算机视觉和机器学习算法来实现微表情自动识别的工作。Micro-expressions have great prospects in national security, criminal interrogation, and medical applications, but the subtlety and simplicity of micro-expressions pose a great challenge to the human eye. Therefore, in recent years, many algorithms using computer vision and machine learning have been proposed. To realize the automatic recognition of micro-expressions.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于光流生成网络和重排序的微表情分类方法，能够有效地对微表情图像进行分类。The purpose of the present invention is to provide a micro-expression classification method based on optical flow generation network and reordering, which can effectively classify micro-expression images.

为实现上述目的，本发明的技术方案是：一种基于光流生成网络和重排序的的微表情分类方法，包括如下步骤：In order to achieve the above object, the technical scheme of the present invention is: a micro-expression classification method based on optical flow generation network and reordering, comprising the following steps:

步骤S1、获取微表情数据集，提取出起始帧和峰值帧，并进行预处理；Step S1, obtaining a micro-expression data set, extracting a start frame and a peak frame, and performing preprocessing;

步骤S2、训练光流生成网络，根据所有的起始帧和峰值帧生成其光流特征；Step S2, train the optical flow generation network, and generate its optical flow feature according to all the start frames and peak frames;

步骤S3、将得到的光流图像，根据LOSO原则划分训练集和测试集，输入残差网络进行训练；Step S3: Divide the obtained optical flow image into a training set and a test set according to the LOSO principle, and input the residual network for training;

步骤S4、对残差网络得到的分类结果进行重排序，得到精度更高的最终结果。Step S4, reordering the classification results obtained by the residual network to obtain a final result with higher accuracy.

在本发明一实施例中，所述步骤S1具体包括如下步骤：In an embodiment of the present invention, the step S1 specifically includes the following steps:

步骤S11、获取微表情数据集，进行人脸对齐后将图像剪裁成224*224大小的图像；Step S11, obtaining a micro-expression dataset, and trimming the image into a 224*224 size image after face alignment;

步骤S12、对于自带起始帧和峰值帧标注的微表情数据集，直接根据标注内容提取出起始帧和峰值帧，并执行步骤S15；Step S12, for the micro-expression data set marked with the start frame and the peak frame, directly extract the start frame and the peak frame according to the labeling content, and perform step S15;

步骤S13、对于未自带起始帧和峰值帧标注的微表情数据集，利用帧差法提取出视频序列的起始帧和峰值帧；帧差法的方式为：令P＝{p_i},i＝1,2,...表示输入的图像序列，其中p_i表示第i个输入的图像，令序列首帧为起始帧，即p_start＝p₁，将视频序列的首帧和第n帧对应像素的灰度值记为f1(x,y)、fn(x,y)，将两帧图像对应像素点的灰度值进行相减，并取其绝对值，得到差分图像Dn，Dn(x,y)＝|fn(x,y)-f1(x,y)|，计算差分图像的平均帧间差分Dnavg，计算方法如下：Step S13, for the micro-expression data set without the start frame and the peak frame label, use the frame difference method to extract the start frame and the peak frame of the video sequence; the frame difference method is as follows: let P={pi _} ,i=1,2,...represents the input image sequence, where p _i represents the ith input image, let the first frame of the sequence be the start frame, that is, p _start =p ₁ , the first frame of the video sequence and the The gray value of the pixel corresponding to the nth frame is denoted as f1(x,y), fn(x,y), the gray value of the corresponding pixel of the two frames of images is subtracted, and the absolute value is taken to obtain the difference image Dn , Dn(x,y)=|fn(x,y)-f1(x,y)|, calculate the average inter-frame difference Dnavg of the difference image, the calculation method is as follows:

其中，Dn.shape[0]表示差分图像Dn的高度，Dn.shape[1]表示差分图像Dn的宽度。计算出除起始帧以外的所有帧和起始帧的平均帧间差分并排序，平均帧间差分最大的帧即为该图像序列对应的峰值帧p_apex；提取出起始帧和峰值帧后执行步骤S15；Among them, Dn.shape[0] represents the height of the difference image Dn, and Dn.shape[1] represents the width of the difference image Dn. Calculate and sort the average inter-frame difference of all frames except the start frame and the start frame, and the frame with the largest average inter-frame difference is the peak frame _papex corresponding to the image sequence; after extracting the start frame and the peak frame Execute step S15;

步骤S15、对提取出的起始帧和峰值帧，进行欧拉动作放大，其计算过程如下：Step S15, performing Euler action amplification on the extracted start frame and peak frame, and the calculation process is as follows:

I(x,t)＝g(x+(1+α)δ(t))I(x,t)=g(x+(1+α)δ(t))

其中I(x,t)表示图像在位置x和时刻t的亮度值，g(·)表示欧拉动作放大过程的映射函数，而δ(t)表示运动偏差，通过调整运动放大系数α来生成放大后的图像。where I(x,t) represents the brightness value of the image at position x and time t, g( ) represents the mapping function of the Euler motion amplification process, and δ(t) represents the motion deviation, which is generated by adjusting the motion amplification coefficient α Enlarged image.

在本发明一实施例中，所述步骤S2中，所述光流生成网络采用两个子网络的结构进行点对点的像素训练，两个子网络之间相互平级，一个子网络针对大位移进行光流估计，另一个子网络针对小位移进行光流估计，每个子网络都由特征提取模块和光流估计模块构成，最后再对两个子网络得到的光流估计进行融合得到最终的光流估计图像；In an embodiment of the present invention, in step S2, the optical flow generation network adopts the structure of two sub-networks to perform point-to-point pixel training, the two sub-networks are level with each other, and one sub-network performs optical flow for large displacements Estimation, another sub-network performs optical flow estimation for small displacement, each sub-network is composed of a feature extraction module and an optical flow estimation module, and finally the optical flow estimates obtained by the two sub-networks are fused to obtain the final optical flow estimation image;

S21、对于大位移光流估计子网络，其特征提取模块由九个卷积层构成，该特征提取模块的输入为放大后的输入图像对的叠加，令该特征提取模块的特征映射函数为H(·)，计算过程如下：S21. For the large-displacement optical flow estimation sub-network, the feature extraction module is composed of nine convolutional layers. The input of the feature extraction module is the superposition of the enlarged input image pairs, and the feature mapping function of the feature extraction module is H. ( ), the calculation process is as follows:

feature_big＝H(p_ls+p_la)feature _big =H(p _ls +p _la )

其中p_ls表示放大以后的起始帧，p_la表示放大以后的峰值帧，feature_big表示针对大位移运动特征提取的结果；where p _ls represents the start frame after zooming in, p _la represents the peak frame after zooming in, and feature _big represents the result of feature extraction for large displacement motion;

大位移光流估计子网络的光流估计模块由上池化层和卷积层构成，将特征提取模块输出得到的特征值feature_big和特征提取模块第5-1层，即特征提取模块第4层的特征feature_big4进行叠加，并上池化进行光流的估计和光流图像分辨率的还原，得到光流估计模块第1层的计算结果，其计算过程如下：The optical flow estimation module of the large displacement optical flow estimation sub-network is composed of an upper pooling layer and a convolutional layer. The feature value feature _big output by the feature extraction module and the 5-1 layer of the feature extraction module are the fourth layer of the feature extraction module. The feature _big4 of the layer is superimposed, and the upper pool is used to estimate the optical flow and restore the resolution of the optical flow image, and obtain the calculation result of the first layer of the optical flow estimation module. The calculation process is as follows:

feature_Bflow1＝estimate(feature_big+feature_big4)feature _Bflow1 = estimate(feature _big +feature _big4 )

其中feature_Bflow1表示大位移子网络下光流估计模块第1层输出的特征，estimate(·)表示大位移子网络下光流估计模块第1层的映射函数；Among them, feature _Bflow1 represents the output feature of the first layer of the optical flow estimation module under the large displacement sub-network, and estimate( ) represents the mapping function of the first layer of the optical flow estimation module under the large displacement sub-network;

接着对于剩下的光流估计模块2-4层，都将前一层计算的结果再加入下一层的输入，其计算过程如下：Then, for the remaining 2-4 layers of the optical flow estimation module, the calculation results of the previous layer are added to the input of the next layer. The calculation process is as follows:

feature_Bflowi＝estimate(feature_big+feature_big(5-i)+feature_Bflow(i-1))feature _Bflowi = estimate(feature _big +feature _big(5-i) +feature _Bflow(i-1) )

其中feature_Bflow(i-1)表示大位移子网络下光流估计模块第i-1层输出的特征；Among them, feature _Bflow(i-1) represents the feature output by the i-1th layer of the optical flow estimation module under the large displacement sub-network;

S22、对于小位移光流估计子网络，其特征提取模块由九个卷积层构成，前三个卷积分别对输入的未经动作放大的起始帧p_s和峰值帧p_a分别进行特征提取，后六个卷积的输入为前三个卷积对两图像帧输出结果的叠加，令该特征提取模块前三个卷积的映射函数为first(·)，后六个卷积的映射函数为last(·)，其计算过程如下：S22. For the small-displacement optical flow estimation sub-network, the feature extraction module is composed of nine convolution layers, and the first three convolutions respectively perform features on the input starting frame p _s and peak frame p _a without action amplification. Extraction, the input of the last six convolutions is the superposition of the output results of the first three convolutions on the two image frames, so that the mapping function of the first three convolutions of the feature extraction module is first( ), the mapping function of the last six convolutions The function is last( ), and its calculation process is as follows:

feature_small＝last(first(p_s)+first(p_a))feature _small = last(first( _ps )+first(p _a ))

其中feature_small表示针对小位移运动特征提取的结果；Among them, feature _small represents the result of feature extraction for small displacement motion;

小位移光流估计子网络的光流估计模块由上池化层和卷积层构成，将特征提取模块输出得到的特征值feature_small和特征提取模块第6-1层，即特征提取模块第5层的特征feature_small5进行叠加，并上池化进行光流的估计和光流图像分辨率的还原，得到光流估计模块第一层的计算结果，其计算过程如下：The optical flow estimation module of the small-displacement optical flow estimation sub-network is composed of an upper pooling layer and a convolutional layer. The feature value feature _small output by the feature extraction module and the 6-1 layer of the feature extraction module are the fifth layer of the feature extraction module. The feature _small5 of the layer is superimposed and pooled to estimate the optical flow and restore the resolution of the optical flow image, and obtain the calculation result of the first layer of the optical flow estimation module. The calculation process is as follows:

feature_Sflow1＝estimate(feature_small+feature_small5)feature _Sflow1 = estimate(feature _small +feature _small5 )

其中feature_Sflow1表示小位移子网络下光流估计模块第1层输出的特征，estimate(·)表示小位移子网络下光流估计模块第1层的映射函数；Among them, feature _Sflow1 represents the feature output by the first layer of the optical flow estimation module under the small displacement sub-network, and estimate( ) represents the mapping function of the first layer of the optical flow estimation module under the small displacement sub-network;

接着对于剩下的光流估计模块2-5层，都将前一层计算的结果再加入下一层的输入，其计算过程如下：Then, for the remaining 2-5 layers of the optical flow estimation module, the calculation results of the previous layer are added to the input of the next layer. The calculation process is as follows:

feature_Sflowi＝estimate(feature_small+feature_small(6-i)+feature_Sflow(i-1))feature _Sflowi = estimate(feature _small +feature _small(6-i) +feature _Sflow(i-1) )

其中feature_Sflow(i-1)表示小位移子网络下光流估计模块第i-1层输出的特征；Among them, feature _Sflow(i-1) represents the feature output by the i-1th layer of the optical flow estimation module under the small displacement sub-network;

S23、将大位移光流估计子网络和小位移光流估计子网络得到的结果进行融合，得到最后的输出结果，令fusion(·)表示最后的融合操作，其计算过程如下：S23, fuse the results obtained by the large displacement optical flow estimation sub-network and the small displacement optical flow estimation sub-network to obtain the final output result, let fusion( ) represent the final fusion operation, and the calculation process is as follows:

p_fusion＝fusion(feature_Bflow4+feature_Sflow5)。p _fusion = fusion(feature _Bflow4 +feature _Sflow5 ).

在本发明一实施例中，所述步骤S3具体包括如下步骤：In an embodiment of the present invention, the step S3 specifically includes the following steps:

步骤S31、每个光流图像数据集下的有多个subject，每个subject代表一个被试者，并且每个subject下都含有多个微表情序列，表示由该被试者产生的多个微表情序列，根据leave-one-subject-out的原则，划分数据集时一次取一个数据集的一个subject作为测试集，其余的所有subject合并在一起作为一个训练集，最后一个数据集得到Sub_i个训练集和测试集，其中Sub_i表示一个数据集下subject的个数；Step S31, there are multiple subjects under each optical flow image dataset, each subject represents a subject, and each subject contains multiple micro-expression sequences, representing multiple micro-expression sequences generated by the subject. Expression sequence, according to the principle of leave-one-subject-out, when dividing the data set, one subject of one data set is taken as the test set at a time, and all the remaining subjects are combined as a training set, and the last data set gets Sub _i Training set and test set, where Sub _i represents the number of subjects in a dataset;

步骤S32、对于已经划分好的测试集和训练集将其依次输入残差网络进行分类，得到一个初步分类的结果。Step S32 , input the divided test set and training set into the residual network for classification, and obtain a preliminary classification result.

在本发明一实施例中，所述步骤S4具体包括如下步骤：In an embodiment of the present invention, the step S4 specifically includes the following steps:

步骤S41、对于残差网络训练得到的初步分类结果，可能会存在对同一张图的其中两个结果概率非常相近的情况，因此需要对结果进行重排序；Step S41, for the preliminary classification results obtained by the residual network training, there may be situations where the probabilities of two of the results in the same graph are very similar, so the results need to be reordered;

步骤S42、根据被测试图像分类概率相近的分类结果，在训练集中选出对应分类下的图像，根据选中的图像选择k个近邻，计算过程如下：Step S42, according to the classification results with similar classification probabilities of the tested images, select images under the corresponding classification in the training set, and select k nearest neighbors according to the selected images, and the calculation process is as follows:

其中e_i表示第i个选中的训练集图像，p表示被测试图像；where e _i represents the i-th selected training set image, and p represents the tested image;

步骤S43、计算被测试图像p和被选中图像e_i的距离，其计算过程如下：Step S43, calculate the distance between the tested image p and the selected image e _i , and the calculation process is as follows:

D_i＝1-probe_max(e_i)+probe_max(p)D _i =1-probe _max ( _ei )+probe _max (p)

其中probe_max表示分类结果概率中最大的概率；Among them, probe _max represents the maximum probability of the classification result probability;

步骤S44：对于每个被选中的训练集图像e_i，计算与被测试图像p的Jaccard距离D_j，计算过程如下：Step S44: For each selected training set image e _i , calculate the Jaccard distance D _j from the tested image p, and the calculation process is as follows:

加权D_i和D_j即得到最终的距离结果；The final distance result is obtained by weighting D _i and D _j ;

利用重排序的方式对微表情图像再分类，使得微表情识别过程中可能出现的其中两类概率过于接近导致错分的情况减少，提高微表情识别的准确率。The micro-expression images are re-classified by means of re-ranking, so that the probabilities of the two types in the micro-expression recognition process are too close to reduce misclassification, and the accuracy of micro-expression recognition is improved.

相较于现有技术，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1、本发明构建的光流生成网络和重排序的微表情分类方法能够有效地对微表情图像进行分类，提升了微表情图像的分类效果。1. The optical flow generation network and the reordered micro-expression classification method constructed by the present invention can effectively classify micro-expression images, and improve the classification effect of micro-expression images.

2、本发明通过神经网络的方式生成两帧之间光流估计的结果，比传统的光流生成方法效果更具鲁棒性，性能更好，边界更清晰。2. The present invention generates an optical flow estimation result between two frames by means of a neural network, which is more robust, has better performance and clearer boundaries than the traditional optical flow generation method.

3、针对以往微表情识别过程中，经常会出现某两类表情难以区分的问题，本发明使用对分类结果存在概率相近的测试图像进行重排序的方法，更好地对单种微表情进行分类，提升了分类效果。3. Aiming at the problem that certain two types of expressions are difficult to distinguish in the past micro-expression recognition process, the present invention uses the method of reordering test images with similar classification results, so as to better classify a single micro-expression. , to improve the classification effect.

附图说明Description of drawings

图1为本发明的原理示意图。FIG. 1 is a schematic diagram of the principle of the present invention.

具体实施方式Detailed ways

下面结合附图，对本发明的技术方案进行具体说明。The technical solutions of the present invention will be described in detail below with reference to the accompanying drawings.

应该指出，以下详细说明都是例示性的，旨在对本申请提供进一步的说明。除非另有指明，本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the application. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本申请的示例性实施方式。如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used herein is for the purpose of describing specific embodiments only, and is not intended to limit the exemplary embodiments according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural as well, furthermore, it is to be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates that There are features, steps, operations, devices, components and/or combinations thereof.

如图1所示，本实施例提供了一种基于光流生成网络和重排序的微表情分类方法，具体包括以下步骤：As shown in FIG. 1 , this embodiment provides a micro-expression classification method based on an optical flow generation network and reordering, which specifically includes the following steps:

步骤S1：获取微表情数据集，提取出起始帧和峰值帧，并进行预处理；Step S1: obtain the micro-expression data set, extract the start frame and the peak frame, and perform preprocessing;

步骤S2：训练光流生成网络，根据所有的起始帧和峰值帧生成其光流特征；Step S2: train the optical flow generation network, and generate its optical flow features according to all the start frames and peak frames;

步骤S3：将得到的光流图像，根据LOSO原则划分训练集和测试集，输入残差网络进行训练；Step S3: Divide the obtained optical flow image into a training set and a test set according to the LOSO principle, and input the residual network for training;

步骤S4：最后对残差网络得到的分类结果进行重排序，得到精度更高的最终结果。Step S4: Finally, reorder the classification results obtained by the residual network to obtain a final result with higher accuracy.

在本实施例中，包括步骤S1具体包括以下步骤：In this embodiment, including step S1 specifically includes the following steps:

步骤S11：获取微表情数据集，进行人脸对齐后将图像剪裁成224*224大小的图像；Step S11: Acquire a micro-expression dataset, and cut the image into a 224*224 size image after face alignment;

步骤S12：对于自带起始帧和峰值帧标注的微表情数据集，直接根据标注内容提取出起始帧和峰值帧；Step S12: For the micro-expression data set with the start frame and the peak frame label, directly extract the start frame and the peak frame according to the labeling content;

步骤S13：对于未自带起始帧和峰值帧标注的微表情数据集，利用帧差法提取出视频序列的起始帧和峰值帧；Step S13: For the micro-expression data set without the start frame and peak frame labeling, use the frame difference method to extract the start frame and peak frame of the video sequence;

步骤S14：帧差法的具体内容为,令P＝{p_i},i＝1,2,...表示输入的图像序列，其中p_i表示第i个输入的图像,我们令序列首帧为起始帧,即p_start＝p₁,将视频序列的首帧和第n帧对应像素的灰度值记为f1(x,y),fn(x,y),将两帧图像对应像素点的灰度值进行相减，并取其绝对值，得到差分图像Dn,Dn(x,y)＝|fn(x,y)-f1(x,y)|,计算差分图像的平均帧间差分Dnavg,计算方法如下:Step S14: The specific content of the frame difference method is, let P={pi _} , _i =1, 2, . is the starting frame, that is, p _start =p ₁ , the gray value of the first frame of the video sequence and the corresponding pixel of the nth frame are recorded as f1(x, y), fn(x, y), and the corresponding pixels of the two frames of images are The gray value of the point is subtracted, and its absolute value is taken to obtain the difference image Dn, Dn(x,y)=|fn(x,y)-f1(x,y)|, and the average frame interval of the difference image is calculated. Differential Dnavg, the calculation method is as follows:

其中，Dn.shape[0]表示差分图像Dn的高度，Dn.shape[1]表示差分图像Dn的宽度。计算出除起始帧以外的所有帧和起始帧的平均帧间差分并排序,平均帧间差分最大的帧即为该图像序列对应的峰值帧p_apex；Among them, Dn.shape[0] represents the height of the difference image Dn, and Dn.shape[1] represents the width of the difference image Dn. Calculate and sort the average inter-frame difference of all frames except the initial frame and the initial frame, and the frame with the largest average inter-frame difference is the peak frame _papex corresponding to the image sequence;

步骤S15：对已处理好的起始帧和峰值帧，进行欧拉动作放大，其计算过程如下：Step S15: Perform Euler action amplification on the processed start frame and peak frame, and the calculation process is as follows:

I(x,t)＝g(x+(1+α)δ(t))I(x,t)=g(x+(1+α)δ(t))

其中I(x,t)表示图像在位置x和时刻t的亮度值，g(·)表示欧拉动作放大过程的映射函数，而δ(t)表示运动偏差，该方法通过调整运动放大系数

来生成放大后的图像。where I(x, t) represents the brightness value of the image at position x and time t, g( ) represents the mapping function of the Euler motion amplification process, and δ(t) represents the motion deviation. This method adjusts the motion amplification factor by adjusting the motion amplification factor.

to generate an enlarged image.

在本实施例中，步骤S2具体包括以下步骤：In this embodiment, step S2 specifically includes the following steps:

步骤S21：所述光流生成网络采用两个子网络的结构进行点对点的像素训练，两个子网络之间相互平级，一个子网络专门针对大位移进行光流估计，另一个子网络专门针对小位移进行光流估计，每个子网络都由特征提取模块和光流估计模块构成，最后再对两个子网络得到的光流估计进行融合得到最终的光流估计图像；Step S21: The optical flow generation network adopts the structure of two sub-networks to perform point-to-point pixel training, and the two sub-networks are level with each other, one sub-network is dedicated to optical flow estimation for large displacements, and the other sub-network is dedicated to small displacements. Perform optical flow estimation, each sub-network is composed of a feature extraction module and an optical flow estimation module, and finally fuse the optical flow estimates obtained by the two sub-networks to obtain the final optical flow estimation image;

步骤S22：对于专门针对大位移进行光流估计的子网络，其特征提取模块主要是由九个卷积层构成，该模块的输入为放大后的输入图像对的叠加，令该模块的特征映射函数为H(·)，其计算过程如下：Step S22: For the sub-network specially designed for optical flow estimation for large displacement, its feature extraction module is mainly composed of nine convolutional layers. The function is H( ), and its calculation process is as follows:

feature_big＝H(p_ls+p_la)feature _big =H(p _ls +p _la )

其中p_ls表示放大以后的起始帧，p_la表示放大以后的峰值帧，feature_big表示针对大位移运动特征提取的结果。where p _ls represents the start frame after zooming in, p _la represents the peak frame after zooming in, and feature _big represents the result of feature extraction for large displacement motion.

大位移光流估计子网络的光流估计模块主要由上池化层和卷积层构成，我们将上层模块输出得到的特征值feature_big和上层模块第5-1层，即第4层的特征feature_big4进行叠加，并上池化进行光流的估计和光流图像分辨率的还原，得到第一层的计算结果，其计算过程如下：The optical flow estimation module of the large-displacement optical flow estimation sub-network is mainly composed of an upper pooling layer and a convolutional layer. We use the feature value feature _big output by the upper module and the 5-1 layer of the upper module, that is, the feature of the fourth layer. The feature _big4 is superimposed, and the upper pool is used to estimate the optical flow and restore the resolution of the optical flow image, and obtain the calculation result of the first layer. The calculation process is as follows:

其中feature_Bflow1表示大位移子网络下光流估计模块第1层输出的特征，estimate(·)表示该层的映射函数；Among them, feature _Bflow1 represents the feature output by the first layer of the optical flow estimation module under the large displacement sub-network, and estimate( ) represents the mapping function of this layer;

接着对于剩下的2-4层，都将前一层计算的结果再加入下一层的输入，其计算过程如下：Then, for the remaining 2-4 layers, the calculation results of the previous layer are added to the input of the next layer. The calculation process is as follows:

其中feature_Bflow(i-1)表示大位移子网络下光流估计模块第i-1层输出的特征；Among them, feature _Bflow(i-1) represents the feature of the output of the i-1th layer of the optical flow estimation module under the large displacement sub-network;

步骤S23：对于专门针对小位移进行光流估计的子网络，其特征提取模块主要是由九个卷积层构成，前三个卷积分别对输入的未经动作放大的起始帧p_s和峰值帧p_a进行分别的特征提取，后六个卷积的输入为前三个卷积对两图像帧输出结构的叠加，令该模块前三个卷积的映射函数为first(·)，后六个卷积的映射函数为last(·)，其计算过程如下：Step S23: For the sub-network specially designed for optical flow estimation for small displacements, the feature extraction module is mainly composed of nine convolution _layers . The peak frame _pa is subjected to separate feature extraction, and the input of the last six convolutions is the superposition of the first three convolutions on the output structure of the two image frames, so that the mapping function of the first three convolutions of this module is first( ), The mapping function of the six convolutions is last( ), and the calculation process is as follows:

其中feature_small表示针对小位移运动特征提取的结果。Among them, feature _small represents the result of feature extraction for small displacement motion.

小位移光流估计子网络的光流估计模块主要由上池化层和卷积层构成，我们将上层模块输出得到的特征值feature_small和上层模块第6-1层，即第5层的特征feature_small5进行叠加，并上池化进行光流的估计和光流图像分辨率的还原，得到第一层的计算结果，其计算过程如下：The optical flow estimation module of the small displacement optical flow estimation sub-network is mainly composed of an upper pooling layer and a convolutional layer. We use the feature value feature _small output by the upper module and the 6-1 layer of the upper module, that is, the features of the fifth layer. The feature _small5 is superimposed and pooled to estimate the optical flow and restore the resolution of the optical flow image to obtain the calculation result of the first layer. The calculation process is as follows:

其中feature_Sflow1表示小位移子网络下光流估计模块第1层输出的特征，estimate(·)表示该层的映射函数；Among them, feature _Sflow1 represents the feature output by the first layer of the optical flow estimation module under the small displacement sub-network, and estimate( ) represents the mapping function of this layer;

接着对于剩下的2-5层，都将前一层计算的结果再加入下一层的输入，其计算过程如下：Then, for the remaining 2-5 layers, the calculation results of the previous layer are added to the input of the next layer. The calculation process is as follows:

步骤S24：最后再将两个子网络得到的结果进行融合，得到最后的输出结果，令fusion(·)表示最后的融合操作，其计算过程如下：Step S24: Finally, fuse the results obtained by the two sub-networks to obtain the final output result, let fusion( ) represent the final fusion operation, and the calculation process is as follows:

p_fusion＝fusion(feature_Bflow4+feature_Sflow5)p _fusion = fusion(feature _Bflow4 +feature _Sflow5 )

利用卷积神经网络模拟大位移和小位移的光流估计结果并融合，有助于提高模型的泛化性，使其对微表情变化过大或者过小的微表情片段具有更合理的调节，并且相比于传统的方法，卷积神经网络的实现方式减少了微表情光流估计过程中可能产生的边缘模糊问题，使光流估计的结果更准确。Using the convolutional neural network to simulate the optical flow estimation results of large displacement and small displacement and fuse them can help to improve the generalization of the model and make it more reasonable to adjust the micro-expression fragments with too large or too small changes. And compared with the traditional method, the implementation of the convolutional neural network reduces the edge blurring problem that may occur in the process of micro-expression optical flow estimation, so that the result of the optical flow estimation is more accurate.

在本实施例中，步骤S3具体包括以下步骤：In this embodiment, step S3 specifically includes the following steps:

步骤S31：每个数据集下的有多个subject，每个subject代表一个被试者，并且每个subject下都含有多个微表情序列，表示由该被试者产生的多个微表情序列，根据leave-one-subject-out的原则，我们划分数据集时一次取一个数据集的一个subject作为测试集，其余的所有subject合并在一起作为一个训练集，所以最后我们对一个数据集就能得到Sub_i个训练集和测试集，其中Sub_i表示一个数据集下subject的个数。Step S31: There are multiple subjects under each dataset, each subject represents a subject, and each subject contains multiple micro-expression sequences, representing multiple micro-expression sequences generated by the subject, According to the principle of leave-one-subject-out, when we divide the data set, we take one subject of one data set as the test set at a time, and all the remaining subjects are combined together as a training set, so finally we can get one data set. Sub _i training sets and test sets, where Sub _i represents the number of subjects in a dataset.

步骤S32：对于已经划分好的测试集和训练集我们将其依次输入残差网络进行分类，得到一个初步分类的结果；Step S32: For the divided test set and training set, we sequentially input them into the residual network for classification, and obtain a preliminary classification result;

在本实施例中，步骤S4具体包括以下步骤：In this embodiment, step S4 specifically includes the following steps:

步骤S41：对于初步分类的结果，可能会存在对同一张图的某两个结果概率非常相近的情况，这就需要对结果进行重排序；Step S41: For the results of the preliminary classification, there may be situations where the probabilities of some two results of the same graph are very similar, which requires reordering the results;

步骤S42：根据被测试图像分类概率相近的分类结果，在训练集中选出对应分类下的图像，根据选中的图像选择k个近邻，计算过程如下：Step S42: According to the classification results with similar classification probabilities of the tested images, images under the corresponding classification are selected in the training set, and k nearest neighbors are selected according to the selected images. The calculation process is as follows:

其中e_i表示第i个选中的训练集图像，p表示被测试图像。where e _i represents the i-th selected training set image, and p represents the test image.

步骤S43：计算被测试图像p和被选中图像e_i的距离，其计算过程如下：Step S43: Calculate the distance between the tested image p and the selected image e _i , and the calculation process is as follows:

D_i＝1-probe_max(e_i)+probe_max(p)D _i =1-probe _max ( _ei )+probe _max (p)

其中probe_max表示分类结果概率中最大的概率。Among them, probe _max represents the largest probability among the classification result probabilities.

加权D_i和D_j即得到最终的距离结果。The final distance result is obtained by weighting D _i and D _j .

利用重排序的方式对微表情图像再分类，使得微表情识别过程中可能出现的某两类概率过于接近导致错分的情况大大减少，提高了微表情识别的准确率。The re-sorting method is used to re-classify the micro-expression images, so that the probabilities of certain two categories that may occur in the process of micro-expression recognition are too close to cause misclassification to be greatly reduced, and the accuracy of micro-expression recognition is improved.

以上是本发明的较佳实施例，凡依本发明技术方案所作的改变，所产生的功能作用未超出本发明技术方案的范围时，均属于本发明的保护范围。The above are the preferred embodiments of the present invention, all changes made according to the technical solutions of the present invention, when the resulting functional effects do not exceed the scope of the technical solutions of the present invention, belong to the protection scope of the present invention.

Claims

1. a micro-expression classification method based on optical flow generation network and reordering, is characterized in that, comprises the steps:

Step S1, obtaining a micro-expression data set, extracting a start frame and a peak frame, and performing preprocessing;

Step S2, train the optical flow generation network, and generate its optical flow feature according to all the start frames and peak frames;

Step S3: Divide the obtained optical flow image into a training set and a test set according to the LOSO principle, and input the residual network for training;

Step S4, reordering the classification results obtained by the residual network to obtain a final result with higher accuracy;

In the step S2, the optical flow generation network adopts the structure of two sub-networks to perform point-to-point pixel training, and the two sub-networks are level with each other. The displacement is used to estimate the optical flow. Each sub-network is composed of a feature extraction module and an optical flow estimation module. Finally, the optical flow estimation obtained by the two sub-networks is fused to obtain the final optical flow estimation image;

S21. For the large-displacement optical flow estimation sub-network, the feature extraction module is composed of nine convolutional layers. The input of the feature extraction module is the superposition of the enlarged input image pairs, and the feature mapping function of the feature extraction module is H. ( ), the calculation process is as follows:

feature _big =H(p _ls +p _la )

where p _ls represents the start frame after zooming in, p _la represents the peak frame after zooming in, and feature _big represents the result of feature extraction for large displacement motion;

The optical flow estimation module of the large displacement optical flow estimation sub-network is composed of an upper pooling layer and a convolutional layer. The feature value feature _big output by the feature extraction module and the 5-1 layer of the feature extraction module are the fourth layer of the feature extraction module. The feature _big4 of the layer is superimposed, and the upper pool is used to estimate the optical flow and restore the resolution of the optical flow image, and obtain the calculation result of the first layer of the optical flow estimation module. The calculation process is as follows:

feature _Bflow1 = estimate(feature _big +feature _big4 )

Among them, feature _Bflow1 represents the output feature of the first layer of the optical flow estimation module under the large displacement sub-network, and estimate( ) represents the mapping function of the first layer of the optical flow estimation module under the large displacement sub-network;

Then, for the remaining 2-4 layers of the optical flow estimation module, the calculation results of the previous layer are added to the input of the next layer. The calculation process is as follows:

feature _Bflowi = estimate(feature _big +feature _big(5-i) +feature _Bflow(i-1) )

Among them, feature _Bflow(i-1) represents the feature of the output of the i-1th layer of the optical flow estimation module under the large displacement sub-network;

S22. For the small-displacement optical flow estimation sub-network, the feature extraction module is composed of nine convolution layers, and the first three convolutions respectively perform features on the input starting frame p _s and peak frame p _a without action amplification. Extraction, the input of the last six convolutions is the superposition of the output results of the first three convolutions on the two image frames, so that the mapping function of the first three convolutions of the feature extraction module is first( ), the mapping function of the last six convolutions The function is last( ), and its calculation process is as follows:

feature _small = last(first( _ps )+first(p _a ))

Among them, feature _small represents the result of feature extraction for small displacement motion;

The optical flow estimation module of the small-displacement optical flow estimation sub-network is composed of an upper pooling layer and a convolutional layer. The feature value feature _small output by the feature extraction module and the 6-1 layer of the feature extraction module are the fifth layer of the feature extraction module. The feature _small5 of the layer is superimposed and pooled to estimate the optical flow and restore the resolution of the optical flow image, and obtain the calculation result of the first layer of the optical flow estimation module. The calculation process is as follows:

feature _Sflow1 = estimate(feature _small +feature _small5 )

Among them, feature _Sflow1 represents the feature output by the first layer of the optical flow estimation module under the small displacement sub-network, and estimate( ) represents the mapping function of the first layer of the optical flow estimation module under the small displacement sub-network;

Then, for the remaining 2-5 layers of the optical flow estimation module, the calculation results of the previous layer are added to the input of the next layer. The calculation process is as follows:

feature _Sflowi = estimate(feature _small +feature _small(6-i) +feature _Sflow(i-1) )

Among them, feature _Sflow(i-1) represents the feature output by the i-1th layer of the optical flow estimation module under the small displacement sub-network;

S23, fuse the results obtained by the large displacement optical flow estimation sub-network and the small displacement optical flow estimation sub-network to obtain the final output result, let fusion( ) represent the final fusion operation, and the calculation process is as follows:

p _fusion = fusion(feature _Bflow4 +feature _Sflow5 )

The step S4 specifically includes the following steps:

Step S41, for the preliminary classification results obtained by the residual network training, there may be situations where the probabilities of two of the results in the same graph are very similar, so the results need to be reordered;

Step S42, according to the classification results with similar classification probabilities of the tested images, select images under the corresponding classification in the training set, and select k nearest neighbors according to the selected images, and the calculation process is as follows:

where e _i represents the i-th selected training set image, and p represents the tested image;

Step S43, calculate the distance between the tested image p and the selected image e _i , and the calculation process is as follows:

D _i =1-probe _max ( _ei )+probe _max (p)

Among them, probe _max represents the maximum probability of the classification result probability;

Step S44: For each selected training set image e _i , calculate the Jaccard distance D _j from the tested image p, and the calculation process is as follows:

The final distance result is obtained by weighting D _i and D _j .

2. a kind of micro-expression classification method based on optical flow generation network and reordering according to claim 1, is characterized in that, described step S1 specifically comprises the steps:

Step S11, obtaining a micro-expression dataset, and trimming the image into a 224*224 size image after face alignment;

Step S12, for the micro-expression data set marked with the start frame and the peak frame, directly extract the start frame and the peak frame according to the labeling content, and perform step S15;

Step S13, for the micro-expression data set without the start frame and the peak frame label, use the frame difference method to extract the start frame and the peak frame of the video sequence; the frame difference method is as follows: let P={pi _} ,i=1,2,...represents the input image sequence, where p _i represents the ith input image, let the first frame of the sequence be the start frame, that is, p _start =p ₁ , the first frame of the video sequence and the The gray value of the pixel corresponding to the nth frame is denoted as f1(x,y), fn(x,y), the gray value of the corresponding pixel of the two frames of images is subtracted, and the absolute value is taken to obtain the difference image Dn , Dn(x,y)=|fn(x,y)-f1(x,y)|, calculate the average inter-frame difference Dnavg of the difference image, the calculation method is as follows:

Among them, Dn.shape[0] represents the height of the difference image Dn, Dn.shape[1] represents the width of the difference image Dn, calculate the average inter-frame difference of all frames except the start frame and the start frame and sort, The frame with the largest difference between the average frames is the peak frame _papex corresponding to the image sequence; Step S15 is performed after extracting the start frame and the peak frame;

Step S15, performing Euler action amplification on the extracted start frame and peak frame, and the calculation process is as follows:

I(x,t)=g(x+(1+α)δ(t))

where I(x,t) represents the brightness value of the image at position x and time t, g( ) represents the mapping function of the Euler motion amplification process, and δ(t) represents the motion deviation, which is generated by adjusting the motion amplification coefficient α Enlarged image.

3. a kind of micro-expression classification method based on optical flow generation network and reordering according to claim 1, is characterized in that, described step S3 specifically comprises the steps:

Step S31, there are multiple subjects under each optical flow image dataset, each subject represents a subject, and each subject contains multiple micro-expression sequences, representing multiple micro-expression sequences generated by the subject. Expression sequence, according to the principle of leave-one-subject-out, when dividing the data set, one subject of one data set is taken as the test set at a time, and all the remaining subjects are combined as a training set, and the last data set gets Sub _i Training set and test set, where Sub _i represents the number of subjects in a dataset;

Step S32 , input the divided test set and training set into the residual network for classification, and obtain a preliminary classification result.