CN104933416A

CN104933416A - Micro expression sequence feature extracting method based on optical flow field

Info

Publication number: CN104933416A
Application number: CN201510360969.4A
Authority: CN
Inventors: 徐峰; 张军平
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2015-06-26
Filing date: 2015-06-26
Publication date: 2015-09-23
Anticipated expiration: 2035-06-26
Also published as: CN104933416B

Abstract

The invention belongs to the technical field of computer vision, and specifically relates to a micro-expression sequence feature extraction method based on an optical flow field. The present invention first extracts the dense optical flow field between adjacent frames under the premise of a certain number of micro-expression frames; then eliminates the influence of face translation on micro-expression recognition through fine alignment; then, the aligned light The flow field is divided into a series of space-time blocks, and the main direction is extracted in each space-time block to characterize the motion mode of most points in the block; the main directions in all blocks are quantified, spliced, and expressed Reach the form of vector, that is, get the designed micro-expression sequence features. The above-mentioned novel feature based on motion description proposed by the present invention can be used for micro-expression recognition. The method of the invention is superior to other existing methods in the comprehensive index of accuracy rate, precision and recall rate, and promotes the further development of micro-expression recognition technology. At the same time, the method can characterize the dynamic patterns of micro-expressions, which provides a deeper understanding for the analysis of micro-expressions.

Description

Micro-expression Sequence Feature Extraction Method Based on Optical Flow Field

技术领域 technical field

本发明属于计算机视觉技术领域，具体涉及一种基于光流场的微表情序列特征提取方法。 The invention belongs to the technical field of computer vision, and in particular relates to a micro-expression sequence feature extraction method based on an optical flow field.

背景技术 Background technique

目前，微表情识别存在诸多的困难，现阶段仍未形成实用性的方法和理论框架。其难点主要表现在特征提取上。当前使用的特征往往是通用的视频特征表达，没有针对微表情这一应用进行优化，也不能对微表情提供深层的理解。 At present, there are many difficulties in micro-expression recognition, and practical methods and theoretical frameworks have not yet been formed at this stage. Its difficulty is mainly manifested in feature extraction. The currently used features are often general-purpose video feature expressions, which are not optimized for the application of micro-expressions, nor can they provide a deep understanding of micro-expressions. the

微表情最初发现于1969年，是心理学家通过观察与抑郁症病人的谈话录像时发现的[1]。视频中的病人经常表现出正常的笑容，然而可以发现有几帧异常痛苦的表情。心理学家将之命名为微表情。 Microexpressions were first discovered in 1969 when psychologists observed videotaped conversations with depressed patients [1]. The patients in the video often show a normal smile, however, a few frames of abnormally distressed expressions can be found. Psychologists call them microexpressions.

与常规表情不同，微表情是一种人无法主观控制的微小表情。因此，观察微表情来判断真实心理状态在公安审讯、心理疾病诊治、商业谈判等领域具有潜在且重要的应用价值，目前已经受到了相当的关注。 Unlike regular expressions, micro-expressions are tiny expressions that cannot be controlled subjectively. Therefore, observing micro-expressions to judge the real psychological state has potential and important application value in the fields of public security interrogation, diagnosis and treatment of mental illness, business negotiation, etc., and has received considerable attention.

然而微表情的识别并不容易，主要难点就在于其1）持续时间短、2）动作幅度小。即使受过专业训练的人员其识别准确率也不高。因此，一种基于计算机视觉的自动识别算法能够提高识别稳定性，并能极大地节省人力，具有很强的应用价值。其涉及到的技术领域主要有：人脸检测、人脸关键点定位、人脸对齐、图像预处理、特征提取、机器学习等。 However, the recognition of micro-expression is not easy, the main difficulty lies in its 1) short duration and 2) small range of motion. Even with professionally trained personnel, the recognition accuracy is not high. Therefore, an automatic recognition algorithm based on computer vision can improve the stability of recognition, and can greatly save manpower, which has strong application value. The technical fields involved mainly include: face detection, face key point positioning, face alignment, image preprocessing, feature extraction, machine learning, etc.

尽管微表情识别发展不是很完善，但还是有大量的学者对其进行了研究，代表文献如下面所列。 Although the development of micro-expression recognition is not perfect, there are still a large number of scholars who have studied it, and the representative literature is listed below.

当前国际上持续深入研究微表情识别的主要有两个研究小组。芬兰Oulu大学从时空纹理入手，试图将通用的视频特征应用到微表情上，提取有效的表达，来进行微表情的识别。如Pfister使用的三正交面局部二值模式（Local Binary Patten on Three Orthogonal Planes， LBP-TOP）特征，在X-Y，X-T，Y-T三个平面上提取局部二值模式（Local Binary Pattern），共同用于微表情描述[2]。其中局部二值模式对于每一个像素点，用一个二进制数编码其与周围像素的值大小关系。然后统计该二进制编码的分布直方图，将之作为特征表达。但是微表情分析对人脸的精细对齐要求很高，该方法并不能很好的处理该问题。 At present, there are two main research groups that continue to study micro-expression recognition in the world. Starting from the spatio-temporal texture, Oulu University in Finland tried to apply general video features to micro-expressions, extract effective expressions, and recognize micro-expressions. For example, the Local Binary Patten on Three Orthogonal Planes (LBP-TOP) feature used by Pfister extracts the local binary pattern (Local Binary Pattern) on the three planes of X-Y, X-T, and Y-T. In micro-expression description [2]. Among them, for each pixel in the local binary mode, a binary number is used to encode its relationship with the value of surrounding pixels. Then count the distribution histogram of the binary code and express it as a feature. However, micro-expression analysis has high requirements for fine alignment of faces, and this method cannot handle this problem well.

中科院心理所王甦菁研究员从机器学习理论上入手，将每个微表情图像序列看作一个三阶张量，然后通过判别式张量子空间分析（Discriminative Tensor Subspace Analysis，DTSA）学习一组子空间映射，使得相同类别的张量之间距离尽量小，而不同类别的张量之间距离尽量大。然后通过极限学习机（Extreme Learning Machine）识别映射之后的微表情张量[5]，其本质上是一种机器学习算法，并没有在特征表达层面上对微表情提供深入的理解。 Researcher Wang Sujing from the Institute of Psychology, Chinese Academy of Sciences started with machine learning theory, regarded each micro-expression image sequence as a third-order tensor, and then learned a set of subspaces through discriminative tensor subspace analysis (DTSA) Mapping, so that the distance between tensors of the same category is as small as possible, and the distance between tensors of different categories is as large as possible. Then, the micro-expression tensor after mapping is identified by Extreme Learning Machine [5], which is essentially a machine learning algorithm and does not provide a deep understanding of micro-expressions at the level of feature expression.

引用文献：Citation:

[1] Ekman P, Friesen W V. Nonverbal leakage and clues to deception. Psychiatry, vol.32, no.1, pp.88-106, 1969. [1] Ekman P, Friesen W V. Nonverbal leakage and clues to deception. Psychiatry, vol.32, no.1, pp.88-106, 1969.

[2] T. Pfister, X. Li, G. Zhao, and M. Pietikainen. Recognising spontaneous facial micro-expressions. CVPR, 2011. [2] T. Pfister, X. Li, G. Zhao, and M. Pietikainen. Recognizing spontaneous facial micro-expressions. CVPR, 2011.

[3] M. Shreve, S. Godavarthy, V. Manohar, D. Goldgof, and S. Sarkar. Towards macro- and micro-expression spotting in video using strain patterns. IEEE Workshop on Applications of Computer Vision, 2009. [3] M. Shreve, S. Godavarthy, V. Manohar, D. Goldgof, and S. Sarkar. Towards macro- and micro-expression spotting in video using strain patterns. IEEE Workshop on Applications of Computer Vision, 2009.

[4] M. Shreve, S. Godavarthy, D. Goldgof, and S. Sarkar. Macro-and micro-expression spotting in long videos using spatio-temporal strain, AFGR, 2011. [4] M. Shreve, S. Godavarthy, D. Goldgof, and S. Sarkar. Macro-and micro-expression spotting in long videos using spatio-temporal strain, AFGR, 2011.

[5] S.-J.Wang, H.-L.Chen, W.-J.Yan, Y.-H.Chen, and X.Fu, Face recognition and micro-expression recognition based on discriminant tensor subspace analysis plus extreme learning machine, Neural Processing Letters, vol.39, no.1, pp. 25–43, 2014. [5] S.-J.Wang, H.-L.Chen, W.-J.Yan, Y.-H.Chen, and X.Fu, Face recognition and micro-expression recognition based on discriminant tensor subspace analysis plus extreme learning machine, Neural Processing Letters, vol.39, no.1, pp. 25–43, 2014.

[6] X.Li, T.Pfister, X. Huang, G. Zhao, and M. Pietikainen. A spontaneous micro-expression database: Inducement, collection and baseline, AFGR, 2013. [6] X.Li, T.Pfister, X. Huang, G. Zhao, and M. Pietikainen. A spontaneous micro-expression database: Inducement, collection and baseline, AFGR, 2013.

[7] W.-J. Yan, Q. Wu, Y.-J. Liu, S.-J. Wang, and X. Fu, CASME database: A dataset of spontaneous micro-expressions collected from neutralized faces, AFGR, 2013 [7] W.-J. Yan, Q. Wu, Y.-J. Liu, S.-J. Wang, and X. Fu, CASME database: A dataset of spontaneous micro-expressions collected from neutralized faces, AFGR, 2013

[8] W.-J.Yan, X.Li, S.-J.Wang, G.Zhao,Y.-J.Liu,Y.-H.Chen, and X.Fu, CASME II: an improved spontaneous micro-expression database and the baseline evaluation, PLoS ONE, vol.9, no.1, p.e86041, 2014. [8] W.-J.Yan, X.Li, S.-J.Wang, G.Zhao, Y.-J.Liu, Y.-H.Chen, and X.Fu, CASME II: an improved spontaneous micro-expression database and the baseline evaluation, PLoS ONE, vol.9, no.1, p.e86041, 2014.

[9] Wu Q, Shen X, Fu X. The machine knows what you are hiding: an automatic micro-expression recognition system. Affective Computing and Intelligent Interaction. Springer Berlin Heidelberg, pp.152-162, 2011.。 [9] Wu Q, Shen X, Fu X. The machine knows what you are hiding: an automatic micro-expression recognition system. Effective Computing and Intelligent Interaction. Springer Berlin Heidelberg, pp.152-162, 2011.

发明内容 Contents of the invention

本发明的目的在于提供一种有效的微表情序列特征提取方法。 The purpose of the present invention is to provide an effective micro-expression sequence feature extraction method.

本发明提出的微表情序列特征提取方法，首先，在微表情帧数一定的前提下抽取相邻帧之间的稠密光流场；在稠密光流场的基础上通过一种简洁的方法进行精细对齐，消除人脸平移对微表情识别带来的影响；然后，把对齐后的光流场分割为一系列时空分块，在每个时空分块中抽取主方向，用以表征该分块中绝大多数点的运动模式；将所有分块中的主方向量化、拼接，并表达成向量的形式，即得到设计的微表情序列特征。图1为本发明流程图示。 The micro-expression sequence feature extraction method proposed by the present invention, at first, extracts the dense optical flow field between adjacent frames under the premise of a certain number of micro-expression frames; Fine alignment eliminates the impact of face translation on micro-expression recognition; then, the aligned optical flow field is divided into a series of spatiotemporal blocks, and the main direction is extracted in each spatiotemporal block to represent the block The motion patterns of most of the points in the block; the main directions in all the blocks are quantified, concatenated, and expressed in the form of vectors, that is, the designed micro-expression sequence features are obtained. Fig. 1 is a flow diagram of the present invention.

本发明提出的微表情序列特征提取方法，具体步骤为： The micro-expression sequence feature extraction method that the present invention proposes, concrete steps are:

1、给定一段人脸表情序列，通过插值法将视频对齐到指定帧数，得到。其中，所述插值法，可以是线性插值法，也可以是Pfister描述的流形插值法等[2]。 1. Given a sequence of facial expressions , align the video to the specified number of frames by interpolation ,get . Wherein, the interpolation method may be a linear interpolation method, or a manifold interpolation method described by Pfister [2].

2、在确定长度的微表情序列中，使用Horn-Schunck方法估计稠密光流场。其中，是与之间的光流场，其公式表达式是： 2. In a micro-expression sequence of a certain length, use the Horn-Schunck method to estimate the dense optical flow field . in, yes and The optical flow field between , its formula expression is:

， ,

表示在第行第列的像素值，和分别是和在第行第列的元素；称为该位置的运动向量。在实际问题中上述公式并不能严格成立，只能近似，所以存在一定误差。因此本发明将在步骤4中介绍一种迭代的方法对主方向进行估计。 express on the row number the pixel value of the column, and respectively and on the row number elements of the column; is called the motion vector for that location. In practical problems, the above formula cannot be strictly established, but can only be approximated, so there is a certain error. Therefore, the present invention will introduce an iterative method in step 4 to estimate the main direction.

3、使用精细化对齐算法消除面部整体位移。以水平分量为例，对每一个，计算直方图。等于光流场水平分量中值为的数量。令， 3. Use the refined alignment algorithm to eliminate the overall displacement of the face. in horizontal components For example, for each , computing the histogram . Equal to the median value of the horizontal component of the optical flow field quantity. make ,

即是出现次数最多的水平分量值的相反数。令中所有值加上，则得到精确对齐后的水平光流场， Right now is the inverse of the most frequently occurring horizontal component value. make All values in plus , then the precisely aligned horizontal optical flow field is obtained,

， ,

式中是和具有相同维度，且元素都是的矩阵； In the formula yes and have the same dimension, and the elements are matrix;

对竖直分量V的精细化对齐是类似的： The refined alignment for the vertical component V is similar:

，。 , .

4、为了得到紧凑的表达，把对齐后的光流场切分成时空块，令每个时空块的尺寸为，在每个时空块中寻求一个主方向描述该时空块。其算法流程为： 4. In order to obtain a compact expression, the aligned optical flow field is divided into space-time blocks, so that the size of each space-time block is , in each space-time block seek a main direction to describe the space-time block. Its algorithm flow is:

（a）初始化主方向估计值P为一个二维的单位向量：P=（1，0）； (a) Initialize the main direction estimation value P as a two-dimensional unit vector: P=(1,0);

（b）在分块中的每个平面坐标，，寻找一个时间坐标，使得该位置上的运动向量与的内积最大： (b) Each plane coordinate in the block , , looking for a time coordinate , so that the motion vector at this position and has the largest inner product:

； ;

（c）将上述找到的运动向量平均，并做归一化，将其作为P的更新值： (c) Average and normalize the motion vectors found above, and use it as the update value of P:

（d）重复步骤(a)-(c),直到P收敛或者超出最大步数限制。 (d) Repeat steps (a)-(c) until P converges or exceeds the maximum number of steps.

5、通过上述步骤，在每一个时空分块中寻求得到一个主方向，把每个方向量化到若干个区间，并用区间的编号表示该主方向，例如图2描述了一种把主方向量化到10个区间的策略。把所有分块中的主方向拉接，即得到整个序列的描述特征。 5. Through the above steps, seek a main direction in each space-time block, quantize each direction into several intervals, and use the number of the interval to represent the main direction. For example, Figure 2 describes a method to quantize the main direction into 10 interval strategies. Connect the main directions in all the blocks to obtain the descriptive features of the entire sequence.

由上述步骤得到的特征可以用来描述一个微表情。使用监督学习方法学习数据集中的带有标签的微表情，得到训练好的分类器（如支持向量机，Support Vector Machine）。对未标记的微表情序列提取上述特征，就可以用该分类器预测其标签。 The features obtained by the above steps can be used to describe a micro-expression. Use the supervised learning method to learn the labeled micro-expression in the data set, and get the trained classifier (such as support vector machine, Support Vector Machine). By extracting the above features from the unlabeled micro-expression sequence, the classifier can be used to predict its label.

本发明的关键在于步骤3、步骤4，也是本发明主要贡献：精细化对齐方法；快速的基于迭代的主方向估计方法。下面分别详细介绍： The key of the present invention lies in steps 3 and 4, which are also the main contributions of the present invention: a refined alignment method; a fast iterative-based main direction estimation method. The following are detailed introductions:

精细化对齐方法Refined Alignment Method

在上述步骤3中，需要消除拍摄对象脸部整体的平移运动对特征提取造成的影响。本发明提取的光流场中即包含了脸部的整体平移运动，也包含了微表情的局部运动。由于微表情只牵涉脸的局部，整体的光流场应该在大部分位置的值为0。因此，本发明将整体平移运动拆分成水平平移和竖直平移，分别对光流场的水平分量和竖直分量寻找一个修正量，修正后的光流场的水平分量和竖直分量分别为。 In the above step 3, it is necessary to eliminate the impact of the overall translational motion of the subject's face on the feature extraction. The optical flow field extracted by the present invention not only includes the overall translational motion of the face, but also includes the local motion of micro-expressions. Since the micro-expressions only involve local parts of the face, the overall optical flow field should have a value of 0 in most positions. Therefore, the present invention splits the overall translation motion into horizontal translation and vertical translation, and finds a correction amount for the horizontal component and vertical component of the optical flow field respectively , the horizontal and vertical components of the corrected optical flow field are respectively .

对于每个，计算的直方图等于光流场水平分量中值为的数量。令，即是出现次数最多的水平分量值的相反数。对中所有值施加修正量，即： for each ,calculate Histogram of Equal to the median value of the horizontal component of the optical flow field quantity. make ,Right now is the inverse of the most frequently occurring horizontal component value. right Apply corrections to all values in ,Right now:

式中是和具有相同维度，且元素都是的矩阵。这样得到的水平分量在绝大部分位置的值是0，其个数就是原先的。 In the formula yes and have the same dimension, and the elements are matrix. The resulting horizontal component The value in most positions is 0, and its number is the original .

对的处理是类似的，对于每个，计算的直方图等于光流场水平分量中值为的数量。令，对中所有值施加修正量，即： right The treatment is similar, for each ,calculate Histogram of Equal to the median value of the horizontal component of the optical flow field quantity. make ,right Apply corrections to all values in ,Right now:

式中是和具有相同维度，且元素都是的矩阵。 In the formula yes and have the same dimension, and the elements are matrix.

快速的基于迭代的主方向估计方法A Fast Iterative Based Principal Direction Estimation Method

对于人脸的表情，有两个合理假设：限于肌肉尺度，在人脸的一小块区域上，运动方向是趋同的；由于肌肉运动速度，在非常短的时间窗口中，运动方向是趋同的。 For facial expressions, there are two reasonable assumptions: limited to the muscle scale, the direction of motion is convergent on a small area of the face; due to the speed of muscle movement, the direction of motion is convergent in a very short time window .

在得到修正后的光流场后，我们把光流场分割为时空分块，根据对微表情的假设，这些时空分块中的运动向量应当是趋同的，因此可以用一个主方向去表征。一种最简单的方式是取平均，然而取平均会把光流场的误差也考虑在主方向之内，一定程度上影响了特征的正确性。为此本发明设计了一种迭代算法，其流程为： After obtaining the corrected optical flow field, we divide the optical flow field into spatio-temporal blocks. According to the assumption of micro-expressions, the motion vectors in these spatio-temporal blocks should be convergent, so they can be represented by a main direction. One of the simplest ways is to take the average, but taking the average will also consider the error of the optical flow field in the main direction, which affects the correctness of the features to a certain extent. The present invention has designed a kind of iterative algorithm for this reason, and its flow process is:

(a)初始化主方向估计值P为一个二维的单位向量：P=（1，0） (a) Initialize the main direction estimation value P as a two-dimensional unit vector: P=(1,0)

(b)在分块中的每个平面坐标，，寻找一个时间坐标，使得该位置上的运动向量与的内积最大: (b) Each plane coordinate in the block , , looking for a time coordinate , so that the motion vector at this position and has the largest inner product:

(c)将上述找到的运动向量平均，并做归一化，将其作为P的更新值: (c) Average and normalize the motion vectors found above, and use it as the update value of P:

(d)重复步骤(a)-(c),直到P收敛或者超出最大步数限制。 (d) Repeat steps (a)-(c) until P converges or exceeds the maximum number of steps.

在正确估计的运动向量占多数的情况下，该方法可以忽略少量光流错误的情况，并快速收敛。 In the case where correctly estimated motion vectors are in the majority, the method can ignore a small amount of optical flow errors and converge quickly.

把上述求出的主方向量化到若干的区间，用区间的编号表示该主方向，并把所有主方向拉接，就得到了一段微表情序列的特征。 Quantize the main direction obtained above into several intervals, use the number of the interval to represent the main direction, and connect all the main directions to obtain the characteristics of a micro-expression sequence.

本发明提出的上述基于运动描述的新型特征，可用于微表情识别，该方法能够对微表情序列进行精细化的对齐操作，使得后续的分析更具有合理性。从实验结果可以发现，该方法在准确率、精度与召回率的综合指标上优于其他已有的方法，推进了微表情识别技术的进一步发展。同时，该方法能刻画微表情的动态模式，为微表情的分析提供了更深层的理解。 The above-mentioned new features based on motion description proposed by the present invention can be used for micro-expression recognition. This method can perform fine alignment operations on micro-expression sequences, making subsequent analysis more reasonable. From the experimental results, it can be found that the method is superior to other existing methods in the comprehensive indicators of accuracy, precision and recall rate, which promotes the further development of micro-expression recognition technology. At the same time, the method can characterize the dynamic patterns of micro-expressions, which provides a deeper understanding for the analysis of micro-expressions.

下面详细说明本发明的实验效果。 The experimental effects of the present invention will be described in detail below.

实验1，采用两种对比方法，分别是基于三正交面局部二值模式（LBP-TOP）的方法，和基于判别式张量子空间分析（DTSA）的方法。实验在CASME I、CASME II、SMIC、SMIC2四个数据集上进行，其中SMIC2包含三个子数据集：HS、VIS、NIR。 In experiment 1, two comparison methods were used, namely the method based on the local binary pattern of three orthogonal surfaces (LBP-TOP) and the method based on discriminant tensor space analysis (DTSA). The experiment is carried out on four datasets of CASME I, CASME II, SMIC, and SMIC2, where SMIC2 contains three sub-datasets: HS, VIS, and NIR.

其中，CASME I包含8类，分别是鄙视（contempt）、恶心（disgust）、恐惧（fear）、快乐（happiness）、低沉（repression）、悲伤（sadness）、惊讶（surprise）和紧张（tense）。CASME I的帧率是60帧/秒。 Among them, CASME I contains 8 categories, which are contempt, disgust, fear, happiness, repression, sadness, surprise and tension. The frame rate of CASME I is 60 frames per second.

CASME II包含7类，分别是恶心（disgust）、恐惧（fear）、快乐（happiness）、低沉（repression）、悲伤（sadness）、惊讶（surprise）和其他（other）。CASME II的帧率是200帧/秒。 CASME II contains 7 categories, namely nausea (disgust), fear (fear), happiness (happiness), depression (repression), sadness (sadness), surprise (surprise) and other (other). The frame rate of CASME II is 200 frames per second.

表1展示了CASME I和CASME II中各类别样例的数量。 Table 1 shows the number of samples in each category in CASME I and CASME II.

在CASME I和CASME II中，为了获取有效的微表情，参与者需要观看诱发情绪的视频，同时尽力不做出表情，否则将减少参与的奖励。 In CASME I and CASME II, in order to acquire effective microexpressions, participants need to watch emotion-evoking videos while trying not to make expressions, otherwise the reward for participation will be reduced.

SMIC和SMIC2的三个子数据集都包含两类任务，分别是检测和归类。在检测的任务中，给定一个人脸序列，需要该序列中是否包含了微表情。在归类的任务中，给定一个微表情序列，需要指出属于何种微表情。 The three sub-datasets of SMIC and SMIC2 all contain two types of tasks, namely detection and classification. In the detection task, given a face sequence, it is necessary whether the sequence contains micro-expressions. In the classification task, given a micro-expression sequence, it is necessary to indicate which micro-expression it belongs to.

对于归类任务，SMIC仅有两类，即积极（positive）与消极（negative）；SMIC2包含三类，分别是积极（positive）、消极（negative）和惊讶（surprise）。 For classification tasks, SMIC has only two categories, namely positive and negative; SMIC2 contains three categories, namely positive, negative and surprise.

SMIC和SMIC2-HS的帧率是100帧/秒； SMIC2-VIS和SMIC2-NIR的帧率都是25帧/秒。 The frame rate of SMIC and SMIC2-HS is 100 frames per second; the frame rate of SMIC2-VIS and SMIC2-NIR is 25 frames per second.

表2展示了SMIC和SMIC2中各类别样例的数量。 Table 2 shows the number of samples in each category in SMIC and SMIC2.

SMIC和SMIC2的微表情诱发方法与CASME I/II类似，参与者需要观看诱发情绪的视频，同时克制自己的表情，否则将要填写冗长的问卷作为惩罚。 The micro-expression induction method of SMIC and SMIC2 is similar to CASME I/II. Participants need to watch the video that induces emotions while restraining their expressions, otherwise they will have to fill out a lengthy questionnaire as a punishment.

图3展示了上述数据集的一些样例。 Figure 3 shows some samples of the above datasets.

实验使用了有两个分别衡量指标，分别是准确率和。其中准确率的定义是： The experiment uses two separate measurement indicators, namely the accuracy rate and . Among them, the accuracy rate The definition is:

的定义是 is defined as

在上述定义中，表示正确分类的正例，表示正确分类的负例，表示错误分为正例的负例，表示错误分为负例的正例。下标表示以第类的样例为正例，其余类的样例为负例的设置。 In the above definition, Denotes positive examples that are correctly classified, represents correctly classified negative examples, Indicates that the error is classified as a negative example of a positive example, Represents positive examples that are incorrectly classified as negative examples. subscript means the first The samples of one class are positive examples, and the samples of other classes are negative examples.

表3、表4显示了三种方法在六个数据集上的结果。可以看出本发明方法在所有问题上都取得了最优的结果。需要注意的是，基于判别式张量子空间分析（DTSA）的方法在CASME I和CASME II上的结果并不如原先论文中的好，这是因为原论文的实验没有使用样例数较少的类别，而我们使用了完整的数据集进行实验。 Tables 3 and 4 show the results of the three methods on six datasets. It can be seen that the method of the present invention has achieved optimal results on all problems. It should be noted that the results of CASME I and CASME II based on discriminant tensor space analysis (DTSA) are not as good as those in the original paper, because the experiments in the original paper did not use categories with a small number of samples , while we used the full dataset for experiments.

因此，针对CASME I和CASME II，我们添加一组实验，仅使用样例数量较多的类别。具体的，CASME I中仅使用恶心、低沉、惊讶和紧张四种情绪；CASME II中仅使用恶心、快乐、低沉、惊讶和其他五种表情。实验结果如表5、表6所示。可以看到我们仍然取得了最优的成绩。 Therefore, for CASME I and CASME II, we add a set of experiments using only categories with a large number of samples. Specifically, only nausea, depression, surprise and tension are used in CASME I; only nausea, happiness, depression, surprise and other five expressions are used in CASME II. The experimental results are shown in Table 5 and Table 6. It can be seen that we still achieved the best results.

实验2，为了验证本发明的精细化对齐的作用，我们对照了使用精细化对齐过程和不使用这种过程的结果差距。结果如表7所示，大量的实验结果表明，精细化对齐对实验结果有积极的影响。 In experiment 2, in order to verify the effect of the refined alignment of the present invention, we compared the result gap between using the refined alignment process and not using such a process. The results are shown in Table 7, and a large number of experimental results show that fine alignment has a positive impact on the experimental results.

实验3，主方向迭代求解法并不能在理论上保证收敛，需要设置最大迭代周期，一旦超过即结束迭代过程。因此考虑该算法在实际应用中的收敛速度。图4展示了迭代周期与收敛分块比重的关系。可以看到，在完成三次迭代后，有90%的分块中的主方向已经收敛。在三中t的取值下，都会有一部分分块中的主方向无法收敛，这一比例在t=2时为0.500%，在t=3时为0.833%，在t=4时为0.834%。这样微小的比例并不足以影响算法的效率和特征的正确性。 In experiment 3, the iterative solution method in the main direction cannot theoretically guarantee convergence. It is necessary to set the maximum iteration period, and the iteration process will end once it exceeds. Therefore, consider the convergence speed of the algorithm in practical applications. Figure 4 shows the relationship between the iteration period and the proportion of converged blocks. It can be seen that after completing three iterations, the main directions in 90% of the blocks have converged. Under the three values of t, there will be some main directions in the blocks that cannot converge. This ratio is 0.500% when t=2, 0.833% when t=3, and 0.834% when t=4 . Such a small ratio is not enough to affect the efficiency of the algorithm and the correctness of the features.

附图说明 Description of drawings

图1为基于光流场的微表情识别方法流程图。 Figure 1 is a flow chart of the micro-expression recognition method based on optical flow field.

图2为主方向量化示意图。其中，左图是一个时空分块中的运动向量，右图是估计得到的主方向量化到10分区间的结果。 Figure 2 is a schematic diagram of main direction quantization. Among them, the left picture is the motion vector in a space-time block, and the right picture is the result of quantizing the estimated main direction to 10 partitions.

图3数据集样例。其中，第一行来自SMIC2-VIS，是一个消极的微表情；第二行来自SMIC2-NIR，是一个积极的微表情，原样本包含13张图片，此处展示前8张；第三行来自SMIC2-HS，是一个惊讶的微表情，原样本包含25张图片，此处展示其中等距离的8张（每3张展示1张）；第四行来自SMIC，这不是一个非微表情样本，用于检测任务，原样本包含34张图片，此处展示其中等距离的8张（每4张展示1张）；第五行来自CASME I，是一个恶心的微表情，原样本包含10张，此处展示前8张；第六行来自CASME II，是一个消沉的微表情，原样本包含66张图片，此处展示其中等距离的8张（每8张展示1张）。 Figure 3 Sample dataset. Among them, the first line is from SMIC2-VIS, which is a negative micro-expression; the second line is from SMIC2-NIR, which is a positive micro-expression. The original sample contains 13 pictures, and the first 8 pictures are shown here; the third line is from SMIC2-HS, which is a surprised microexpression, the original sample contains 25 images, 8 of which are shown here at an equidistant distance (1 for every 3 images); the fourth row is from SMIC, which is not a non-microexpression sample, For detection tasks, the original sample contains 34 pictures, and here are 8 of them in the middle distance (1 for every 4 pictures); the fifth line is from CASME I, which is a disgusting micro-expression, the original sample contains 10 pictures, here The first 8 pictures are shown here; the sixth line is from CASME II, which is a depressed micro-expression. The original sample contains 66 pictures, and 8 of them are shown here at a medium distance (1 picture for every 8 pictures).

图4不同t值下的收敛速度。横轴是迭代次数，纵轴是该迭代次数下主方向收敛的时空分块占总分块数量的比例。 Figure 4 Convergence speed at different t values. The horizontal axis is the number of iterations, and the vertical axis is the ratio of the space-time blocks that converge in the main direction to the total number of blocks under the number of iterations.

具体实施方式 Detailed ways

本发明提供了一种描述微表情的特征的方法，将这种特征用于微表情的识别与分类。以下举例说明本发明的运用方式。 The present invention provides a method for describing the features of micro-expressions, and uses such features for identification and classification of micro-expressions. The following examples illustrate the application of the present invention.

在实际运用中，需要事先从长时间的视频序列中进行序列的分割。分割可以使用定长的时间窗口，也可以匹配使用特定模式进行分割。本发明并不涉及分割技术，因此，以下仅以一个简单时间窗口技术举例。 In practical application, it is necessary to segment the sequence from the long-term video sequence in advance. Segmentation can use a fixed-length time window, or match a specific pattern for segmentation. The present invention does not involve segmentation technology, therefore, only a simple time window technology is used as an example below.

使用高速摄像机（50-200fps）拍摄人面部视频，在其中检测微表情序列并进行分类。常规的25fps摄像机也可以捕捉微表情，然而可能遗漏一些极为短小的微表情。另外，即使对于捕捉到的微表情，也无法像高速摄像机那样提供时间上相似的信息。 A high-speed camera (50-200fps) is used to capture human facial video, in which micro-expression sequences are detected and classified. Conventional 25fps cameras can also capture micro-expressions, but some extremely short micro-expressions may be missed. In addition, even for captured micro-expressions, it cannot provide temporally similar information as high-speed cameras can.

研究证明微表情通常持续时间在0.05秒至0.2秒之间[9]。为此，我们维护一个长度为0.2秒的时间窗口，每个时刻我们总是可以获取过去0.2秒的视频序列。 Studies have shown that micro-expressions usually last between 0.05 seconds and 0.2 seconds [9]. To this end, we maintain a time window of length 0.2 seconds, and at each moment we always have access to the video sequence 0.2 seconds past.

在这一0.2秒的视频序列中进行人脸检测，得到统一尺寸的方框使其包围人脸。舍弃视频帧的其它部分，得到0.2秒的人脸序列。 Face detection is performed in this 0.2-second video sequence, and a box of uniform size is obtained to surround the face. Discard other parts of the video frame to obtain a 0.2-second face sequence.

进行线性插值，得到固定长度为20帧的人脸序列。具体的，对每个平面位置，将所有帧在这一位置上的像素值看作一个函数在固定采样间隔上的取值。将0.2秒的视频分割为19个定长的区间，在每个间隔点上取得左右各自最近邻的像素取值，进行线性插值。由此得到统一的20帧人脸序列。 Perform linear interpolation to obtain a face sequence with a fixed length of 20 frames. Specifically, for each plane position, the pixel values of all frames at this position are regarded as the value of a function at a fixed sampling interval. Divide the 0.2-second video into 19 fixed-length intervals, obtain the left and right nearest neighbor pixel values at each interval point, and perform linear interpolation. A unified 20-frame face sequence is thus obtained.

在这20帧人脸序列中，使用Horn-Schunck方法在两两相邻的帧之间计算稠密光流场。 In the 20-frame face sequence, the Horn-Schunck method is used to calculate the dense optical flow field between two adjacent frames.

为了消除平移对后续特征的影响，需要进行基于光流场的精细对齐。具体地，对每一光流场的水平和竖直分量和，计算直方图和，其中返回水平分量为的运动向量的数量，返回水平分量为的运动向量的数量。计算。 In order to eliminate the influence of translation on subsequent features, fine alignment based on optical flow field is required. Specifically, for each horizontal and vertical component of the optical flow field and , computing the histogram and ,in Returns the horizontal component as The number of motion vectors, Returns the horizontal component as The number of motion vectors. calculate .

然后令。这样就完成了精细对齐的过程。 Then order . This completes the process of fine alignment.

把微表情时空序列切分成较小的时空分块。在每一个时空分块中，只要找出一个运动主方向，就能表征该时空分块中的运动模式。为此，假定该主方向为P，并初始化其为一单位向量P=(1,0) 。在每个平面坐标上，寻找一个时间坐标，使得该位置上的运动向量与P的内积最大。对所有水平坐标上找到的运动向量求平均并作归一化，作为P的新的估计值。这样迭代直到P收敛。这种算法并不能保证收敛性，因此，设置一个最大迭代次数20，一旦迭代次数超过这一最大值，就结束迭代。 Divide the micro-expression spatio-temporal sequence into smaller spatio-temporal blocks. In each spatio-temporal block, as long as a main motion direction is found, the motion pattern in this spatio-temporal block can be represented. For this, assume that the main direction is P, and initialize it to a unit vector P=(1,0) . On each plane coordinate, find a time coordinate so that the inner product of the motion vector and P at this position is the largest. The motion vectors found on all horizontal coordinates are averaged and normalized as a new estimate of P. This iterates until P converges. This algorithm does not guarantee convergence. Therefore, a maximum number of iterations is set to 20, and once the number of iterations exceeds this maximum value, the iteration is terminated.

这样，就可以求出所有时空分块的主方向，将这些方向离散化到10个方向，分别用1，…10表示。拉接所有时空分块中的主方向，即得到该微表情序列的最终特征。 In this way, the main directions of all space-time blocks can be obtained, and these directions are discretized into 10 directions, denoted by 1,...10 respectively. The main directions in all spatio-temporal blocks are connected to obtain the final features of the micro-expression sequence.

我们对于数据库中的所有微表情计算其特征，使用基于径向基函数核（Radial Basis Function Kernel，RBF Kernel）的SVM进行训练，得到一个训练好的SVM分类起。对于任何一个微表情序列，首先用线性插值将其插值到20帧，抽取主方向特征。用训练好的SVM分类器进行表情类型的判别。 We calculate the characteristics of all the micro-expressions in the database, and use the SVM based on the Radial Basis Function Kernel (RBF Kernel) for training to obtain a trained SVM classification. For any micro-expression sequence, first use linear interpolation to interpolate it to 20 frames, and extract the main direction features. Use the trained SVM classifier to discriminate the type of expression.

表1 SMIC和SMIC2各类别样例的数量 Table 1 The number of samples in each category of SMIC and SMIC2

表2 CASME I和CASME II各类别样例的数量 Table 2 The number of samples in each category of CASME I and CASME II

表3分类结果的 Table 3 Classification results of

表4分类结果的准确率 Table 4 Accuracy of classification results

表5 排除样例数较少的类别后的CASME I/II分类结果的 Table 5 CASME I/II classification results after excluding categories with a small number of samples

表6排除样例数较少的类别后的CASME I/II分类结果的准确率 Table 6 The accuracy rate of CASME I/II classification results after excluding categories with a small number of samples

数据集data set 本发明 this invention LBP-top LBP-top DTSA DTSA CASME I CASME I 56.14% 56.14% 40.35% 40.35% 46.20% 46.20% CASME II Casme II 45.93% 45.93% 40.65% 40.65% 36.18% 36.18%

表7 精细化对齐带来的分类性能提升 Table 7 Classification performance improvement brought about by fine alignment

Claims

1. A micro-expression sequence feature extraction method based on optical flow field, characterized in that the specific steps are:

(1) Given a sequence of facial expressions , align the video to the specified number of frames by interpolation ,get ;

(2) In a micro-expression sequence of a certain length, use the Horn-Schunck method to estimate the dense optical flow field ;in, yes and The optical flow field between , its formula expression is:

,

express on the row number the pixel value of the column, and respectively and on the row number elements of the column; is called the motion vector for that position;

(3) Use the refined alignment algorithm to eliminate the overall displacement of the face, for the horizontal component ,Every , computing the histogram , Equal to the median value of the horizontal component of the optical flow field the number of orders ,

Right now is the opposite number of the horizontal component value that occurs most frequently; let All values in plus , then the precisely aligned horizontal optical flow field is obtained:

,

In the formula, yes and have the same dimension, and the elements are matrix;

The refined alignment for the vertical component V is similar:

,

;

(4) Divide the aligned optical flow field into space-time blocks, so that the size of each space-time block is , seek a main direction in each space-time block to describe the space-time block, and its algorithm flow is:

(a) Initialize the main direction estimation value P as a two-dimensional unit vector: P=(1,0);

(b) Each plane coordinate in the block , , looking for a time coordinate , so that the motion vector at this position and has the largest inner product:

;

(c) Average and normalize the motion vectors found above, and use it as the update value of P:

(d) Repeat steps (a)-(c) until P converges or exceeds the maximum number of steps;

(5) Through the above steps, seek a main direction in each space-time block, quantize each direction into several intervals, and use the number of the interval to represent the main direction, and connect the main directions in all blocks, That is, the descriptive features of the entire sequence are obtained.