CN109101881A

CN109101881A - A kind of real-time blink detection method based on multiple dimensioned timing image

Info

Publication number: CN109101881A
Application number: CN201810743856.6A
Authority: CN
Inventors: 肖阳; 胡桂雷; 曹治国; 孟璐斌; 熊拂; 张博深; 姜文祥; 王焱乘
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2018-07-06
Filing date: 2018-07-06
Publication date: 2018-12-28
Anticipated expiration: 2038-07-06
Also published as: CN109101881B

Abstract

The invention discloses a real-time blink detection method based on multi-scale time-series images, comprising: positioning the first frame of images in the time-series images to obtain the positions of both eyes; using it to extract human eye images from the first frame of images to obtain human eye Template; use the human eye template to initialize the human eye tracker, use the updated human eye tracker to track the time-series images, obtain the time-series human eye images, and update the human eye tracker to extract the artificial descriptors of the processed time-series human eye images, Then extract the differ feature of the artificial descriptor, and concatenate it with the artificial descriptor to obtain the human eye feature. Encode the human eye features into a blink behavior feature heat map according to the time sequence; then input the blink behavior feature heat map into the LSTM network line by line to obtain multiple hidden states; finally, multiple hidden states are concatenated to obtain multi-scale time series features. Blink detection, to determine whether the time-series images contain blinking behavior. The invention improves the accuracy and stability of blink detection under unconstrained conditions.

Description

A real-time blink detection method based on multi-scale time-series images

技术领域technical field

本发明属于数字图像识别技术领域，更具体地，涉及一种基于多尺度时序图像的实时眨眼检测方法。The invention belongs to the technical field of digital image recognition, and more specifically relates to a real-time blink detection method based on multi-scale time-series images.

背景技术Background technique

随着各种智能化的应用设备的普及，通过观测人的眨眼行为来反应目标当前状态应成为一种良好的人际交互方式，所以无论是活体验证，疲劳驾驶检测，测谎等情形都对其有着很大的需求。With the popularization of various intelligent application devices, it should become a good way of human interaction to reflect the current state of the target by observing the blinking behavior of people, so whether it is living body verification, fatigue driving detection, polygraph detection, etc. There is a great demand.

目前的主要的眨眼检测算法主要分为以下两种：一种是提取单帧传统特征(LBP，HOG等)，然后利用分类器(SVM，Adboost等)进行训练，得出判定当前眼睛的睁闭状态；另一种就是基于某种启发式规则进行眨眼检测，如hough变换检测瞳孔等。The current main blink detection algorithms are mainly divided into the following two types: one is to extract the traditional features of a single frame (LBP, HOG, etc.), and then use the classifier (SVM, Adboost, etc.) state; the other is to perform blink detection based on some heuristic rules, such as hough transform to detect pupils, etc.

上述方法存在以下不足之处，对于第一种方法，由于眨眼是一种时序行为，单张图片的特征判断眼睛状态并不能解释这一行为，同时单张图片的检测算法在自然条件下成功率低，鲁棒性差。另一种算法是基于某些启发式规则进行的眨眼检测，这种算法在无约束条件下存在模型能力不足，极易误判，并且如检测连通域中心变化等规则在自然条件下会出现规则崩溃，无法检测的现象。The above method has the following shortcomings. For the first method, since blinking is a sequential behavior, the characteristics of a single picture can not explain the behavior of the eye state, and the detection algorithm of a single picture has a higher success rate under natural conditions. Low, poor robustness. Another algorithm is eye blink detection based on some heuristic rules. This algorithm has insufficient model ability under unconstrained conditions, and is prone to misjudgment, and rules such as detecting changes in the center of connected domains will appear regular under natural conditions. Crashes, undetectable phenomena.

发明内容Contents of the invention

针对现有技术的以上缺陷或改进需求，本发明提供了一种基于多尺度时序图像的实时眨眼检测方法，其目的在于通过引入多帧时序图像中的时序信息，同时对图像提取人工描述子，编码得到眨眼行为特征图，再考虑不同时序尺度，以提高自然条件下眨眼检测的正确率和稳定性。由此解决现有技术存在正确率低和稳定性差的技术问题。In view of the above defects or improvement needs of the prior art, the present invention provides a real-time blink detection method based on multi-scale time-series images. The blink behavior feature map is obtained by encoding, and then different timing scales are considered to improve the accuracy and stability of blink detection under natural conditions. Therefore, the technical problems of low correct rate and poor stability in the prior art are solved.

为实现上述目的，本发明提供了一种基于多尺度时序图像的实时眨眼检测方法，包括：In order to achieve the above object, the present invention provides a real-time blink detection method based on multi-scale time-series images, including:

(1)建立包含人眼的时序图像的数据库，对数据库中时序图像的第一帧图像进行定位得到双眼位置；(1) Establishing a database containing time-series images of human eyes, and positioning the first frame image of the time-series images in the database to obtain the positions of the eyes;

(2)利用双眼位置从第一帧图像中提取人眼图像，得到人眼模板；(2) Extract the human eye image from the first frame image by using the position of both eyes to obtain the human eye template;

(3)利用人眼模板初始化人眼追踪器，利用初始化后的人眼追踪器追踪时序图像中的第一帧图像的后续帧图像并更新人眼追踪器；(3) Utilize the human eye template to initialize the human eye tracker, utilize the initialized human eye tracker to track the follow-up frame images of the first frame image in the sequential images and update the human eye tracker;

(4)利用更新后的人眼追踪器追踪时序图像，得到时序人眼图像，对时序人眼图像依次进行灰度化处理和图像均衡化处理，得到处理后的时序人眼图像；(4) Utilize the updated human eye tracker to track the time-series images to obtain the time-series human eye images, and sequentially perform grayscale processing and image equalization processing on the time-series human eye images to obtain the processed time-series human eye images;

(5)提取处理后的时序人眼图像的人工描述子，提取人工描述子的differ特征，再将differ特征和人工描述子串接，得到人眼特征，之后将人眼特征按照时序信息编码成一张眨眼行为特征热图；(5) Extract the artificial descriptor of the processed time-series human eye image, extract the difference feature of the artificial descriptor, and then concatenate the difference feature and the artificial descriptor to obtain the human eye feature, and then encode the human eye feature according to the time-series information into a A heat map of blinking behavior characteristics;

(6)将眨眼行为特征热图按行逐条输入LSTM网络，得到多个隐藏状态；(6) Input the blink behavior feature heat map into the LSTM network line by line to obtain multiple hidden states;

(7)将多个隐藏状态进行串接得到多尺度时序特征后进行眨眼检测，判断时序图像中是否包含眨眼行为。(7) After concatenating multiple hidden states to obtain multi-scale time series features, perform eye blink detection to determine whether the time series images contain eye blink behavior.

进一步地，步骤(1)包括：Further, step (1) includes:

利用包含人眼的视频数据库建立包含人眼的时序图像的数据库，利用人脸对齐算法对数据库中时序图像的第一帧图像进行定位得到双眼位置。The video database containing human eyes is used to establish a database containing time-series images of human eyes, and the face alignment algorithm is used to locate the first frame of the time-series images in the database to obtain the positions of the eyes.

进一步地，步骤(2)包括：Further, step (2) includes:

(2-1)利用双眼位置，得到双眼之间的距离，基于双眼之间的距离确定人眼区域的宽高，得到人眼区域；(2-1) Utilize the position of both eyes to obtain the distance between the two eyes, determine the width and height of the human eye area based on the distance between the two eyes, and obtain the human eye area;

(2-2)根据人眼区域，从第一帧图像中提取人眼图像，对人眼图像进行灰度化处理后依据人眼图像的大小，判断是否将人眼图像缩小K倍，K≥0，最后将得到的人眼图像作为人眼模板。(2-2) According to the human eye area, extract the human eye image from the first frame image, and then judge whether to reduce the human eye image by K times according to the size of the human eye image after grayscale processing of the human eye image, K≥ 0, and finally use the obtained human eye image as the human eye template.

进一步地，步骤(3)包括：Further, step (3) includes:

(3-1)利用人眼模板初始化人眼追踪器；(3-1) Initialize the human eye tracker using the human eye template;

(3-2)使用初始化后的人眼追踪器追踪时序图像中的第一帧图像的后续帧图像，获得后续帧图像的人眼位置Pos，并反馈每次追踪的置信度C；(3-2) Use the initialized human eye tracker to track the subsequent frame images of the first frame image in the time series images, obtain the human eye position Pos of the subsequent frame images, and feed back the confidence C of each tracking;

(3-3)设置追踪分数阈值T，若C＞T，利用人眼位置Pos提取出新的人眼图像，得到新的人眼模板，若C≤T，则执行步骤(1)；(3-3) Set the tracking score threshold T, if C>T, use the human eye position Pos to extract a new human eye image, and obtain a new human eye template, if C≤T, then perform step (1);

(3-4)将新的人眼模板初始化后的追踪器和初始化后的人眼追踪器加权融合，更新人眼追踪器。(3-4) The tracker after initialization of the new eye template and the eye tracker after initialization are weighted and fused, and the eye tracker is updated.

进一步地，differ特征为：将处理后的时序人眼图像提取的每帧人工描述子都与前一帧人工描述子对应位相减得到的差值特征。Further, the differ feature is: a difference feature obtained by subtracting the artificial descriptor of each frame extracted from the processed time-series human eye image from the corresponding bit of the artificial descriptor of the previous frame.

进一步地，步骤(6)包括：Further, step (6) includes:

(6-1)建立一个n个cell的LSTM网络，每个cell含有h个隐藏层，LSTM网络的输入不使用消除过拟合的dropout技术，LSTM网络的细胞间传递的细胞状态使用幅度为d的dropout技术，d＜1；(6-1) Establish an LSTM network with n cells, each cell contains h hidden layers, the input of the LSTM network does not use the dropout technology that eliminates overfitting, and the cell state of the LSTM network is transmitted between cells using a range of d dropout technology, d<1;

(6-2)利用含有标签的样本热图训练LSTM网络，得到多个样本隐藏状态，将多个样本隐藏状态进行串接得到样本多尺度时序特征后训练sphereface模型，得到sphereface模型的参数和训练好的LSTM网络；(6-2) Train the LSTM network by using the sample heat map containing the label to obtain multiple sample hidden states, and then train the sphereface model by concatenating the multiple sample hidden states to obtain the multi-scale time series features of the sample, and obtain the parameters and training of the sphereface model A good LSTM network;

(6-3)将眨眼行为特征热图按行逐条输入训练好的LSTM网络，得到多个隐藏状态。(6-3) Input the heat map of eye blinking behavior features into the trained LSTM network line by line to obtain multiple hidden states.

进一步地，步骤(7)包括：Further, step (7) includes:

将多个隐藏状态进行串接得到多尺度时序特征后输入sphereface模型进行眨眼检测，判断时序图像是否包含眨眼行为。Multiple hidden states are concatenated to obtain multi-scale time-series features and then input to the sphereface model for eye blink detection to determine whether the time-series images contain eye blink behavior.

总体而言，通过本发明所构思的以上技术方案与现有技术相比，能够取得下列有益效果：Generally speaking, compared with the prior art, the above technical solutions conceived by the present invention can achieve the following beneficial effects:

(1)首先本发明实现了实时运算，再者将将differ特征和人工描述子串接，得到人眼特征，具有更强的语义信息；其次，将眨眼行为编码成眨眼行为特征热图，更有利于直观观察与图像领域算法的处理，最后，利用LSTM网络，得到多个隐藏状态，适应不同人眨眼速度不同的问题。使得本发明相较于传统的眨眼检测算法在无约束条件下的时序眨眼数据库中，在速度和准确度上均具有更好的表现。(1) Firstly, the present invention realizes real-time computing, and then concatenates the differ features and artificial descriptors to obtain human eye features, which has stronger semantic information; secondly, the blink behavior is encoded into a heat map of blink behavior characteristics, which is more It is beneficial to the intuitive observation and the processing of the algorithm in the image field. Finally, the LSTM network is used to obtain multiple hidden states to adapt to the problem of different blinking speeds of different people. Compared with the traditional eye blink detection algorithm, the present invention has better performance in speed and accuracy in the time-series eye blink database under unconstrained conditions.

(2)眨眼是一种基于时序的行为，时序信息的引入对于识别眨眼的本质特征有着重要意义，基于现有技术没有基于时序信息建立的眨眼数据库，本发明利用包含人眼的视频数据库建立包含人眼的时序图像数据库。定位算法能力较为准确的确定出人眼的位置，由于处于初始化的步骤中，其耗时的问题并不会影响程序的实时性。本发明基于定位算法得到的比较准确的人眼图像，初始化后的人眼追踪器能够较为准确且快速的追踪人眼区域，保证了程序在实时性和定位准确性上都能得到满意的结果。(2) Blinking is a timing-based behavior. The introduction of timing information is of great significance for identifying the essential characteristics of blinking. Based on the prior art, there is no blink database based on timing information. The present invention utilizes a video database containing human eyes to establish a database containing A time-series image database of the human eye. The positioning algorithm can accurately determine the position of the human eye. Since it is in the initialization step, its time-consuming problem will not affect the real-time performance of the program. The invention is based on a relatively accurate human eye image obtained by a positioning algorithm, and the initialized human eye tracker can track the human eye area more accurately and quickly, ensuring that the program can obtain satisfactory results in terms of real-time and positioning accuracy.

(3)本发明利用图像均衡化处理能够在一定程度上减少因为光照强度不同带来的影响，增强程序的鲁棒性。由于程序受环境，定位准确性等因素的影响，不可避免会引入噪声，本发明使用差分机构能在一定程度上削弱低频噪声。LSTM在提取时序样本的时序信息上有着很大的优势，能够很好的提取时序样本中的时序信息。(3) The present invention utilizes image equalization processing to reduce the influence caused by different light intensities to a certain extent, and enhance the robustness of the program. Because the program is affected by factors such as the environment and positioning accuracy, noise will inevitably be introduced. The present invention uses a differential mechanism to weaken low-frequency noise to a certain extent. LSTM has great advantages in extracting timing information of timing samples, and can extract timing information in timing samples very well.

附图说明Description of drawings

图1是本发明实施例提供的一种基于多尺度时序图像的实时眨眼检测方法的流程图；1 is a flowchart of a real-time blink detection method based on multi-scale time-series images provided by an embodiment of the present invention;

图2(a)是本发明实施例1提供的有眨眼行为的时序图像的示意图；Fig. 2 (a) is a schematic diagram of time-series images with blinking behavior provided by Embodiment 1 of the present invention;

图2(b)是本发明实施例1提供的无眨眼行为的时序图像的示意图；Fig. 2(b) is a schematic diagram of time-series images without blinking behavior provided by Embodiment 1 of the present invention;

图3是本发明实施例1提供的眨眼检测方法的流程示意图；FIG. 3 is a schematic flow chart of a blink detection method provided in Embodiment 1 of the present invention;

图4是本发明实施例1提供的获取人眼特征的流程示意图。FIG. 4 is a schematic flow chart of acquiring human eye features provided by Embodiment 1 of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。此外，下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

如图1所示，一种基于多尺度时序图像的实时眨眼检测方法，包括：As shown in Figure 1, a real-time blink detection method based on multi-scale time series images, including:

(1)利用包含人眼的视频数据库建立包含人眼的时序图像的数据库，利用人脸对齐算法对数据库中时序图像的第一帧图像进行定位得到双眼位置。(1) Use the video database containing human eyes to establish a database containing time-series images of human eyes, and use the face alignment algorithm to locate the first frame of the time-series images in the database to obtain the positions of the eyes.

(2)利用双眼位置，得到双眼之间的距离，基于双眼之间的距离确定人眼区域的宽高，得到人眼区域；根据人眼区域，从第一帧图像中提取人眼图像，对人眼图像进行灰度化处理后依据人眼图像的大小，判断是否将人眼图像缩小K倍，K≥0，最后将得到的人眼图像作为人眼模板。(2) Utilize the position of both eyes to obtain the distance between the two eyes, determine the width and height of the human eye area based on the distance between the two eyes, and obtain the human eye area; according to the human eye area, extract the human eye image from the first frame image, and After grayscale processing of the human eye image, it is judged whether to reduce the human eye image by K times according to the size of the human eye image, K≥0, and finally the obtained human eye image is used as a human eye template.

(3)利用人眼模板初始化人眼追踪器；使用初始化后的人眼追踪器追踪时序图像中的第一帧图像的后续帧图像，获得后续帧图像的人眼位置Pos，并反馈每次追踪的置信度C；设置追踪分数阈值T，若C＞T，利用人眼位置Pos提取出新的人眼图像，得到新的人眼模板，若C≤T，则执行步骤(1)；将新的人眼模板初始化后的追踪器和初始化后的人眼追踪器加权融合，更新人眼追踪器。(3) Use the human eye template to initialize the human eye tracker; use the initialized human eye tracker to track the subsequent frame images of the first frame image in the sequence image, obtain the human eye position Pos of the subsequent frame images, and feed back each tracking Confidence C; set the tracking score threshold T, if C>T, use the human eye position Pos to extract a new human eye image, get a new human eye template, if C≤T, then perform step (1); set the new The tracker after the initialization of the human eye template and the weighted fusion of the initialized human eye tracker are used to update the human eye tracker.

(5)提取处理后的时序人眼图像的人工描述子，提取人工描述子的differ特征，再将differ特征和人工描述子串接，得到人眼特征，之后将人眼特征按照时序信息编码成一张眨眼行为特征热图；differ特征为：将处理后的时序人眼图像提取的每帧人工描述子都与前一帧人工描述子对应位相减得到的差值特征。(5) Extract the artificial descriptor of the processed time-series human eye image, extract the difference feature of the artificial descriptor, and then concatenate the difference feature and the artificial descriptor to obtain the human eye feature, and then encode the human eye feature according to the time-series information into a A heat map of blinking behavior features; the differ feature is: the difference feature obtained by subtracting the artificial descriptor of each frame extracted from the processed time-series human eye image from the corresponding bit of the previous frame of artificial descriptor.

(6)建立一个n个cell的LSTM网络，每个cell含有h个隐藏层，LSTM网络的输入不使用消除过拟合的dropout技术，LSTM网络的节点间传递的细胞状态使用幅度为d的dropout技术，d＜1；利用含有标签的样本热图训练LSTM网络，得到多个样本隐藏状态，将多个样本隐藏状态串接得到样本多尺度时序特征后训练sphereface模型，得到sphereface模型的参数和训练好的LSTM网络；将眨眼行为特征热图按行逐条输入训练好的LSTM网络，得到多个隐藏状态。(6) Establish an LSTM network with n cells, each cell contains h hidden layers, the input of the LSTM network does not use the dropout technology that eliminates overfitting, and the cell state transmitted between the nodes of the LSTM network uses a dropout with an amplitude of d technology, d<1; train the LSTM network by using the sample heat map with labels to obtain the hidden states of multiple samples, and then train the sphereface model by concatenating the hidden states of multiple samples to obtain the multi-scale time series features of the samples, and obtain the parameters and training of the sphereface model A good LSTM network; input the blink behavior feature heatmap line by line into the trained LSTM network to obtain multiple hidden states.

(7)将多个隐藏状态串接得到多尺度时序特征后输入sphereface模型进行眨眼检测，判断时序图像是否包含眨眼行为。(7) Multiple hidden states are concatenated to obtain multi-scale time-series features, and then input to the sphereface model for eye blink detection to determine whether the time-series images contain eye blink behavior.

实施例1Example 1

步骤1，利用包含人眼的视频数据库建立包含人眼的时序图像的数据库，利用人脸对齐算法对数据库中时序图像的第一帧图像进行定位得到双眼位置，包括以下几个步骤：Step 1, use the video database containing human eyes to establish a database containing time-series images of human eyes, and use the face alignment algorithm to locate the first frame of images in the database in time-series images to obtain the positions of the eyes, including the following steps:

(1-1)建立自然条件下包含人眼的视频数据库，视频数据库含有200多个帧长度不等的包含人眼的时序图像，利用包含人眼的视频数据库建立图2(a)所示的有眨眼行为的时序图像和图2(b)所示的无眨眼行为的时序图像；(1-1) Establish a video database containing human eyes under natural conditions. The video database contains more than 200 time-series images containing human eyes with different frame lengths. Use the video database containing human eyes to establish the image shown in Figure 2(a) Timing images with blinking behavior and timing images without blinking behavior shown in Figure 2(b);

本发明实施例中利用包含人眼的视频数据库建立370多个自然条件下和400个左右室内的包含人眼的时序图像；In the embodiment of the present invention, the video database containing human eyes is used to establish more than 370 time-series images containing human eyes under natural conditions and about 400 indoors;

(1-2)如图3所示，数据库图像即为包含人眼的时序图像，利用人脸对齐算法对时序图像中的第一帧图像进行定位得到双眼位置。(1-2) As shown in Figure 3, the database image is a time-series image containing human eyes, and the face alignment algorithm is used to locate the first frame of images in the time-series image to obtain the positions of both eyes.

步骤2，制成追踪算法的初模板，包括以下子步骤：Step 2, make the initial template of the tracking algorithm, including the following sub-steps:

(2-1)利用双眼位置，得到双眼之间的距离，基于双眼之间的距离确定人眼区域的宽和高，如下：(2-1) Use the position of the eyes to obtain the distance between the eyes, and determine the width and height of the human eye area based on the distance between the eyes, as follows:

其中，表示右眼的x坐标，表示左眼的x坐标，表示右眼的y坐标，表示左眼的y坐标，width表示眼睛区域的宽，height表示眼睛区域的高，以双眼位置为中心，依据眼睛区域的宽和高确定人眼区域；in, Indicates the x-coordinate of the right eye, Indicates the x-coordinate of the left eye, Indicates the y coordinate of the right eye, Indicates the y coordinate of the left eye, width indicates the width of the eye area, height indicates the height of the eye area, centering on the position of the eyes, the human eye area is determined according to the width and height of the eye area;

(2-2)根据人眼区域，提取出人眼图像并进行灰度化处理，(2-2) According to the human eye area, extract the human eye image and perform grayscale processing,

K＝0.2989×R+0.5870×G+0.1140×BK＝0.2989×R+0.5870×G+0.1140×B

其中，K是灰度图像，R，G，B为人眼图像的三个通道；Among them, K is the grayscale image, R, G, and B are the three channels of the human eye image;

(2-3)依据人眼图像的大小，判断是否将人眼图像调整成原来的0.7倍，最后将调整后人眼图像作为人眼模板。(2-3) According to the size of the human eye image, judge whether to adjust the human eye image to 0.7 times the original size, and finally use the adjusted human eye image as the human eye template.

步骤3，利用人眼模板，结合KCF(Kernel Correlation Filter)算法，进行实时追踪并更新追踪器，包括以下子步骤：Step 3, using the human eye template, combined with the KCF (Kernel Correlation Filter) algorithm, to perform real-time tracking and update the tracker, including the following sub-steps:

(3-1)将人眼模板输入人眼追踪器，利用人眼模板初始化人眼追踪器：(3-1) Input the eye template into the eye tracker, and use the eye template to initialize the eye tracker:

tracker＝init(image)tracker=init(image)

其中，tracker是初始化后获得的人眼追踪器，image是初始化使用的人眼模板；Among them, tracker is the human eye tracker obtained after initialization, and image is the human eye template used for initialization;

(3-2)使用初始化后的人眼追踪器tracker追踪后续帧图像，获得人眼位置Pos，并反馈每次追踪的置信度C：(3-2) Use the initialized human eye tracker tracker to track subsequent frame images, obtain the human eye position Pos, and feed back the confidence C of each tracking:

[Pos，C]＝tracker(image₀_c)[Pos, C] = tracker(image _{0_c} )

其中，image₀是后续帧图像，Pos为该图像对应的人眼的位置，C为其置信度；Among them, image ₀ is the subsequent frame image, Pos is the position of the human eye corresponding to the image, and C is its confidence level;

(3-3)设置追踪分数阈值0.35，若C＞0.35，利用人眼位置Pos提取出新的人眼图像，得到新的人眼模板，若C≤0.35，则执行步骤(1)；(3-3) Set the tracking score threshold to 0.35. If C>0.35, use the human eye position Pos to extract a new human eye image to obtain a new human eye template. If C≤0.35, perform step (1);

(3-4)将新的人眼模板初始化后的追踪器和初始化后的人眼追踪器加权融合，更新人眼追踪器：(3-4) The tracker after initialization of the new eye template and the eye tracker after initialization are weighted and fused, and the eye tracker is updated:

其中，image是当前帧图像，tracker_new为新的人眼模板初始化后的追踪器，image_next为下一帧待测图像，Pos为下一帧待测图像对应的定位结果。Among them, image is the current frame image, tracker_new is the tracker after initialization of the new human eye template, image_next is the next frame of the image to be tested, and Pos is the positioning result corresponding to the next frame of the image to be tested.

步骤4，提取人眼图片进行预处理，包括以下几个步骤：Step 4, extract the human eye image for preprocessing, including the following steps:

(4-1)对于时序图像，利用更新后的人眼追踪器追踪时序图像，获得每帧图像中的人眼图像，将其按时序排列，组成时序人眼图像E₁，E₂，...E_n。(4-1) For time series images, use the updated human eye tracker to track the time series images, obtain the human eye images in each frame of images, and arrange them in time series to form time series human eye images E ₁ , E ₂ , .. .E _n .

E＝image(Pos)E=image(Pos)

其中，image是当前帧图像，E为提取出的人眼图像；Among them, image is the current frame image, and E is the extracted human eye image;

(4-2)对于时序人眼图像中的每帧图像E_i(i＝1，2，...n)，首先进行灰度化处理，再利用直方图均衡化算法处理灰度化，获得处理后的时序人眼图像 (4-2) For each frame of image E _i (i=1, 2,...n) in the time-series human eye image, first perform grayscale processing, and then use the histogram equalization algorithm to process the grayscale, and obtain Processed time-series human eye images

E^pro＝histgram(rgb2gray(E))E ^pro =histgram(rgb2gray(E))

其中，E是当前人眼图像，E^pro为处理后的时序人眼图像。Among them, E is the current human eye image, and E ^pro is the processed time-series human eye image.

步骤5，提取处理后的时序人眼图像的人工描述子，编码成眨眼行为特征热图：Step 5, extract the artificial descriptor of the processed time-series human eye image, and encode it into a blink behavior feature heat map:

(5-1)对于时序人眼图像采取LBP(Local Binary Pattern)特征，提取每帧图像的LBP特征，得到该帧图像的LBP特征图 (5-1) For time series human eye images Take the LBP (Local Binary Pattern) feature, extract the LBP feature of each frame image, and get the LBP feature map of the frame image

其中，(x_c，y_c)是中心坐标点，p是邻域的第p个像素，i_p为邻域像素的灰度值，i_c为中心像素的灰度值，s(x)为符号函数，如下： Among them, (x _c , y _c ) is the central coordinate point, p is the pth pixel of the neighborhood, i _p is the gray value of the neighborhood pixel, _ic is the gray value of the center pixel, s(x) is Symbolic functions, as follows:

(5-2)对于得出的LBP特征图将其每个像素先二值化得到一个二值化长串数据：(5-2) For the obtained LBP feature map Binarize each pixel first to obtain a long string of binarized data:

S＝{S_xy[b₁，b₂，...b₈]{bi＝0 or 1(i＝1，2，..8)}}。S={S _xy [b ₁ , b ₂ , . . . b ₈ ]{bi=0 or 1(i=1, 2, ..8)}}.

其中S_xy表示中坐标为(x，y)处的二进制数值；where S _xy represents The binary value at the middle coordinate (x, y);

(5-3)对于S中的每个S_xy，首先统计该长串中没有跳变的序列，即全0或是全1序列，分为2类；(5-3) For each S _xy in S, first count the sequences without jumps in the long string, that is, all 0 or all 1 sequences, which are divided into two categories;

(5-3)对于S中的每个S_xy，再统计该长串中只有一次(0→1 or 1→0)的跳变序列，按照发生在序列中的第p个位置，将这些序列分为14类；(5-3) For each S _xy in S, count the hopping sequences that have only one (0→1 or 1→0) jump sequence in the long string, and divide these sequences according to the pth position that occurs in the sequence Divided into 14 categories;

(5-4)之后统计S中发生两次跳变的序列S_xy，找第1，2次跳变位置的不同，再将这些序列分为42类；(5-4) After counting the sequence S _xy of two jumps in S, find the difference between the first and second jump positions, and then divide these sequences into 42 categories;

(5-5)将剩下的特征归入最后1类，对所有同一类的序列标注同一数值，则将LBP特征替换为维度为59的uniform-LBP特征；(5-5) Classify the remaining features into the last category, and mark the same value for all sequences of the same category, then replace the LBP feature with a uniform-LBP feature with a dimension of 59;

(5-6)将第n帧的uniform-LBP特征(即为人工描述子)都与前一帧的uniform-LBP特征对应位相减，得到差值特征(differ特征)，再将differ特征串接在第n帧的uniform-LBP特征后面，得到第n帧的LBP-differ特征(人眼特征)；(5-6) Subtract the uniform-LBP feature (that is, the artificial descriptor) of the nth frame from the corresponding bit of the uniform-LBP feature of the previous frame to obtain the difference feature (difference feature), and then concatenate the differ feature After the uniform-LBP feature of the nth frame, the LBP-differ feature (human eye feature) of the nth frame is obtained;

LBP_differ_i＝[uniform_LBP_i，(uniform_LBP_i⊙uniform_LBP_i-1)]LBP_differ _i ＝[uniform_LBP _i ，(uniform_LBP _i ⊙uniform_LBP _i-1 )]

其中：in:

具体地，如图4所示，提取Frame1和Frame2的uniform-LBP特征，将Frame2的uniform-LBP特征与Framel的uniform-LBP特征对应位相减，得到Frame2的differ特征，再将Frame2的differ特征串接在Frame2的uniform-LBP特征后面，得到Frame2的LBP-differ特征。Specifically, as shown in Figure 4, the uniform-LBP features of Frame1 and Frame2 are extracted, and the corresponding bits of the uniform-LBP feature of Frame2 are subtracted from the uniform-LBP features of Framel to obtain the differ feature of Frame2, and then the differ feature string of Frame2 After the uniform-LBP feature of Frame2, the LBP-differ feature of Frame2 is obtained.

(5-7)将n帧人眼特征按时序拼接为一张9*118的眨眼行为特征热图map：(5-7) Stitch n frames of human eye features in time series into a 9*118 blink behavior feature heat map:

map＝[LBP_differ₁；LBP_differ₂；...；LBP_differ_n]map = [LBP_differ ₁ ; LBP_differ ₂ ; ...; LBP_differ _n ]

步骤6，LSTM处理眨眼行为特征热图，得出结果，包括以下几个步骤：Step 6, LSTM processes the blink behavior feature heat map, and obtains the result, including the following steps:

(6-1)建立一个9个cell的LSTM网络，每个cell含有2个隐藏层，隐藏层为128个，网络的输入不使用dropout，节点间传递的细胞状态使用幅度为0.5的dropout；(6-1) Establish an LSTM network with 9 cells, each cell contains 2 hidden layers, and the number of hidden layers is 128. The input of the network does not use dropout, and the cell state transmitted between nodes uses dropout with an amplitude of 0.5;

(6-2)将9*118的眨眼行为特征热图依据其代表的时序，按行依次作为训练好的LSTM神经网络的输入，得到多个隐藏状态。(6-2) Use the 9*118 eye blinking behavior feature heatmap according to the time series it represents, and use it as the input of the trained LSTM neural network row by row to obtain multiple hidden states.

步骤7，将多个隐藏状态串接得到多尺度时序特征后输入sphereface模型进行眨眼检测，判断时序图像是否包含眨眼行为，包括以下几个步骤：Step 7, concatenating multiple hidden states to obtain multi-scale time-series features and then input the sphereface model for eye blink detection to determine whether the time-series images contain eye blinking behavior, including the following steps:

(7-1)取出LSTM网络后2个隐藏层的128维输出隐藏状态O_i(i＝1，2，...n)，对其按时序进行串接：(7-1) Take out the 128-dimensional output hidden state O _i (i=1, 2,...n) of the last two hidden layers of the LSTM network, and concatenate them in sequence:

O^comb＝[O_n-1，O_n]O ^comb ＝[O _n-1 ，O _n ]

其中O^comb为LSTM网络的256维多时间尺度输出，O_i为第i个cell的输出；Among them, O ^comb is the 256-dimensional multi-time scale output of the LSTM network, and O _i is the output of the i-th cell;

(7-2)将O^comb作为训练好参数的sphereface模型输入，计算得出结果res[res1，res2]；利用：(7-2) Input O ^comb as the sphereface model with trained parameters, and calculate the result res[res1, res2]; use:

state＝argmax(res)state=argmax(res)

其中，state为最终结果，0表示时序图像未眨眼，1表示时序图像发生眨眼行为，argmax函数的表示形式为：Among them, state is the final result, 0 means that the time-series image has not blinked, 1 means that the time-series image blinked, and the expression of the argmax function is:

sphereface模型采用的是A-softmax类的损失函数，具体如下：The sphereface model uses the loss function of the A-softmax class, as follows:

其中，i函数表示第1表示第i类，数的表示形式为：1表示该图像序列发生眨眼行为，表示y_i，w_i之间的夹角， Among them, the i function means that the 1st means the i-th category, and the representation of the number is: 1 means that the blinking behavior occurs in the image sequence, Indicates the angle between y _i and w _i ,

首先本发明实现了实时运算，再者将differ特征和人工描述子串接，得到人眼特征，具有更强的语义信息；其次，将眨眼行为编码成眨眼行为特征热图，更有利于直观观察与图像领域算法的处理，最后，利用LSTM网络，得到多个隐藏状态，将其串接为多尺度时序特征进行训练能够适应不同人眨眼速度不同的问题。使得本发明相较于传统的眨眼检测算法在无约束条件下的时序眨眼数据库中，在速度和准确度上均具有更好的表现。Firstly, the present invention realizes real-time computing, and secondly connects the differ feature and the artificial descriptor to obtain the human eye feature, which has stronger semantic information; secondly, the blinking behavior is encoded into a blinking behavior feature heat map, which is more conducive to intuitive observation The processing with the algorithm in the image field, and finally, using the LSTM network to obtain multiple hidden states, and concatenating them into multi-scale temporal features for training can adapt to the problem of different blinking speeds of different people. Compared with the traditional eye blink detection algorithm, the present invention has better performance in speed and accuracy in the time-series eye blink database under unconstrained conditions.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。It is easy for those skilled in the art to understand that the above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention, All should be included within the protection scope of the present invention.

Claims

1. A real-time blink detection method based on multi-scale time-series images, characterized in that, comprising:

(1) Establishing a database containing time-series images of human eyes, and positioning the first frame image of the time-series images in the database to obtain the positions of the eyes;

(2) Extract the human eye image from the first frame image by using the position of both eyes to obtain the human eye template;

(3) Utilize the human eye template to initialize the human eye tracker, utilize the initialized human eye tracker to track the follow-up frame images of the first frame image in the sequential images and update the human eye tracker;

(4) Utilize the updated human eye tracker to track the time-series images to obtain the time-series human eye images, and sequentially perform grayscale processing and image equalization processing on the time-series human eye images to obtain the processed time-series human eye images;

(5) Extract the artificial descriptor of the processed time-series human eye image, extract the difference feature of the artificial descriptor, concatenate the difference feature and the artificial descriptor to obtain the human eye feature, and encode the human eye feature into a blink according to the time-series information Behavioral characteristic heat map;

(6) Input the blink behavior feature heat map into the LSTM network line by line to obtain multiple hidden states;

(7) Multiple hidden states are concatenated to obtain multi-scale time-series features, and then blink detection is performed to determine whether the time-series images contain blinking behavior.

2. a kind of real-time blink detection method based on multi-scale time series image as claimed in claim 1, is characterized in that, described step (1) comprises:

The video database containing human eyes is used to establish a database containing time-series images of human eyes, and the face alignment algorithm is used to locate the first frame of the time-series images in the database to obtain the positions of the eyes.

3. a kind of real-time blink detection method based on multi-scale time series images as claimed in claim 1 or 2, is characterized in that, described step (2) comprises:

(2-1) Utilize the position of both eyes to obtain the distance between the two eyes, determine the width and height of the human eye area based on the distance between the two eyes, and obtain the human eye area;

(2-2) According to the human eye area, extract the human eye image from the first frame of image, after grayscale processing of the human eye image, judge whether to reduce the human eye image by K times according to the size of the human eye image, K ≥0, and finally the obtained human eye image is used as the human eye template.

4. a kind of real-time blink detection method based on multi-scale time series images as claimed in claim 1 or 2, is characterized in that, described step (3) comprises:

(3-1) Initialize the human eye tracker using the human eye template;

(3-2) Use the initialized human eye tracker to track the subsequent frame images of the first frame image in the time series images, obtain the human eye position Pos of the subsequent frame images, and feed back the confidence C of each tracking;

(3-3) Set the tracking score threshold T, if C>T, use the human eye position Pos to extract a new human eye image, and obtain a new human eye template, if C≤T, then perform step (1);

(3-4) The tracker after initialization of the new eye template and the eye tracker after initialization are weighted and fused, and the eye tracker is updated.

5. A kind of real-time blink detection method based on multi-scale time-series images as claimed in claim 1 or 2, wherein said differ feature is: each frame of artificial descriptor extracted from the processed time-series human eye images The difference feature obtained by subtracting the corresponding bit from the artificial descriptor of the previous frame.

6. a kind of real-time blink detection method based on multi-scale time series images as claimed in claim 1 or 2, is characterized in that, described step (6) comprises:

(6-1) Establish an LSTM network with n cells, each cell contains h hidden layers, the input of the LSTM network does not use the dropout technology that eliminates overfitting, and the cell state of the LSTM network is transmitted between cells using a range of d dropout technology, d<1;

(6-2) Use the sample heat map containing labels to train the LSTM network to obtain the hidden states of multiple samples, concatenate the hidden states of multiple samples according to the time sequence to obtain the multi-scale time series features of the samples, train the sphereface model, and obtain the sphereface model. Parameters and trained LSTM network;

(6-3) Input the heat map of eye blinking behavior features into the trained LSTM network line by line to obtain multiple hidden states.

7. a kind of real-time blink detection method based on multi-scale time series images as claimed in claim 6, is characterized in that, described step (7) comprises:

The multi-scale time-series features obtained by concatenating multiple hidden states according to the time series are input into the sphereface model for blink detection, and it is judged whether the time-series images contain eye blinking behavior.