CN109508679B

CN109508679B - Method, device and equipment for realizing three-dimensional eye gaze tracking and storage medium

Info

Publication number: CN109508679B
Application number: CN201811375929.7A
Authority: CN
Inventors: 张国生; 李东; 冯广; 章云
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2018-11-19
Filing date: 2018-11-19
Publication date: 2023-02-10
Anticipated expiration: 2038-11-19
Also published as: CN109508679A

Abstract

The invention discloses a method, a device, equipment and a computer readable storage medium for realizing eyeball three-dimensional sight tracking, which comprises the following steps: inputting a human face image to be detected into a pre-constructed head posture detection network to obtain a head posture in the human face image; inputting the face image into a pre-constructed eyeball action detection network to obtain the eyeball action of the face image; and inputting the head posture and the eyeball motion into a pre-constructed three-dimensional sight line vector detection network to obtain a three-dimensional sight line direction vector of the eyeball in the face image. The method, the device, the equipment and the computer readable storage medium provided by the invention can extract the three-dimensional sight direction vector of the eyeball of the shot person from the two-dimensional face image, and have wide application scenes.

Description

Method, device, equipment and storage medium for realizing eyeball three-dimensional gaze tracking

技术领域technical field

本发明涉及眼球跟踪技术领域，特别是涉及一种实现眼球三维视线跟踪的方法、装置、设备以及计算机可读存储介质。The present invention relates to the technical field of eyeball tracking, in particular to a method, device, device and computer-readable storage medium for realizing eyeball three-dimensional line-of-sight tracking.

背景技术Background technique

眼球跟踪算法的研究已经有较为成熟的成果，且已经成功地在很多商业应用上实现，例如VR/AR技术，虽然传统的眼球跟踪技术能够实现较高的精度，然而现阶段眼球跟踪算法基本上是基于传统的图像处理方法，依赖于昂贵的红外设备，且需要在头部安装特殊的检测设备，检测眼球的特征。传统的图像处理方法检测精度受光线变化的影响，且检测距离受到严重的约束。所以急需一种能通过普通摄像头拍摄的一种RGB图像实现眼球跟踪的算法。在计算机视觉领域，深度卷积神经网络已经在很多方面取得了重大成果，例如目标检测、实例分割等等。The research on eye tracking algorithm has achieved relatively mature results, and has been successfully implemented in many commercial applications, such as VR/AR technology. Although traditional eye tracking technology can achieve high precision, the current eye tracking algorithm is basically It is based on the traditional image processing method, relies on expensive infrared equipment, and needs to install special detection equipment on the head to detect the characteristics of the eyeball. The detection accuracy of traditional image processing methods is affected by light changes, and the detection distance is severely restricted. Therefore, there is an urgent need for an algorithm that can realize eye tracking through a RGB image captured by a common camera. In the field of computer vision, deep convolutional neural networks have achieved significant results in many aspects, such as object detection, instance segmentation, and so on.

现有技术中也有相应的基于深度学习的眼球跟踪技术，具体步骤如下：获取视网膜病变影像数据；对视网膜病变影像数据进行数据标注，得到标注数据；建立初始深度学习网络；将视网膜病变影像数据输入初始深度学习网络中，输出得到相应的预测数据；利用损失函数对视网膜病变影像数据相应的标注数据和预测数据进行比较，得到比较结果；根据比较结果，调节初始深度学习网络中的参数，直到比较结果达到预设阈值，得到最终的深度学习网络模型；利用深度学习网络模型对待测视网膜病变影像数据进行处理，得到相应的眼球中心坐标以及眼球直径。There is also a corresponding eye tracking technology based on deep learning in the prior art. The specific steps are as follows: obtain the image data of retinopathy; perform data labeling on the image data of retinopathy to obtain the labeled data; establish an initial deep learning network; input the image data of retinopathy In the initial deep learning network, the corresponding predicted data is output; use the loss function to compare the corresponding labeled data and predicted data of the retinopathy image data, and obtain the comparison result; according to the comparison result, adjust the parameters in the initial deep learning network until the comparison The result reaches the preset threshold, and the final deep learning network model is obtained; the image data of the retinopathy to be tested is processed by using the deep learning network model, and the corresponding eyeball center coordinates and eyeball diameter are obtained.

因此现有的眼球跟踪技术中，一种是基于传统的图像处理算法实现眼球跟踪技术，虽然这类算法已经有较为成熟的商业应用，然而传统的图像处理算法检测精度受光线变化的影响，而且依赖于昂贵的头部佩戴红外线设备，头部的便捷性体验较差，检测距离也受到约束。另一种是基于深度学习算法的眼球跟踪算法，然而现有的基于技术中基于深度学习算法的眼球跟踪算法仅能检测眼球中心位置及眼球直径，仅包含眼球动作的二维信息，应用场景受到约束。Therefore, in the existing eye tracking technology, one is based on the traditional image processing algorithm to realize the eye tracking technology. Although this type of algorithm has relatively mature commercial applications, the detection accuracy of the traditional image processing algorithm is affected by the change of light, and Relying on expensive head-mounted infrared devices, the convenience experience of the head is poor, and the detection distance is also limited. The other is an eye tracking algorithm based on a deep learning algorithm. However, the existing eye tracking algorithm based on a deep learning algorithm in the technology can only detect the center position of the eye and the diameter of the eye, and only contains two-dimensional information of eye movements. constraint.

综上所述可以看出，如何通过二维人脸图像获取眼球的三维视线方向向量是目前有待解决的问题。From the above, it can be seen that how to obtain the three-dimensional line-of-sight direction vector of the eyeball through the two-dimensional face image is a problem to be solved at present.

发明内容Contents of the invention

本发明的目的是提供一种实现眼球三维视线跟踪的方法、装置、设备以及计算机可读存储介质，以解决现有技术中基于深度学习的眼球跟踪算法只能检测出眼球的二维信息的问题。The purpose of the present invention is to provide a method, device, device and computer-readable storage medium for realizing eyeball three-dimensional sight tracking, so as to solve the problem that the eye tracking algorithm based on deep learning in the prior art can only detect two-dimensional information of the eyeball .

为解决上述技术问题，本发明提供一种实现眼球三维视线跟踪的方法，包括：将待检测的人脸图像输入至预先构建的头部姿态检测网络，得到所述人脸图像中的头部姿态；将所述人脸图像输入至预先构建的眼球动作检测网络，得到所述人脸图像的眼球动作；将所述头部姿态和所述眼球动作输入至预先构建的三维视线向量检测网络，得到所述人脸图像中眼球的三维视线方向向量。In order to solve the above-mentioned technical problems, the present invention provides a method for realizing eyeball three-dimensional line of sight tracking, including: inputting the face image to be detected into a pre-built head pose detection network, and obtaining the head pose in the face image ; Input the human face image to the pre-built eye movement detection network to obtain the eye movement of the human face image; input the head posture and the eye movement to the pre-built three-dimensional line of sight vector detection network to obtain The three-dimensional line-of-sight direction vector of the eyeball in the face image.

优选地，所述将待检测的人脸图像输入至预先构建的头部姿态检测网络，得到所述人脸图像中的头部姿态前包括：Preferably, the input of the human face image to be detected to the pre-built head posture detection network, before obtaining the head posture in the human face image includes:

采集多幅具有头部姿态和眼球视线的三维标签的人脸图像，构建人脸图像数据集，其中，所述人脸图像为RGB图像；Gather multiple face images with head posture and three-dimensional labels of eyeballs, construct a face image data set, wherein the face image is an RGB image;

构建初始头部姿态检测网络和初始眼球动作检测网络；Build an initial head pose detection network and an initial eye movement detection network;

利用所述人脸图像数据集分别对所述初始头部姿态检测网络和所述初始眼球动作检测网络进行训练，得到完成训练的所述头部姿态检测网络和所述眼球动作检测网络。Using the face image data set to train the initial head pose detection network and the initial eye movement detection network respectively, to obtain the trained head pose detection network and eye movement detection network.

优选地，所述采集多幅具有头部姿态和眼球视线的三维标签的人脸图像，构建人脸图像数据集包括：Preferably, the collection of multiple face images with three-dimensional labels of head posture and eyeball sight, and constructing a face image data set includes:

利用面阵摄像头阵列中各个摄像头分别采集数据提供者的人脸图像，得到人脸图像第一子集合；Using each camera in the area array camera array to separately collect the face images of the data provider to obtain the first subset of face images;

所述面阵摄像头阵列中每行摄像头采集到多幅人脸图像，表示所述数据提供者在y方向不同的头部姿态；Each row of cameras in the area array camera array collects multiple face images, representing different head postures of the data provider in the y direction;

所述面阵摄像头阵列中每列摄像头采集到的多幅人脸图像，表示所述数据提供者在p方向不同的头部姿态；The multiple face images collected by each row of cameras in the area array camera array represent different head postures of the data provider in the p direction;

对所述面阵摄像头阵列采集到的所述人脸图像分别进行顺时针和逆时针方向的旋转，得到表示所述数据提供者在r方向不同的头部姿态的人脸图像第二子集合；Rotating the face images collected by the area camera array clockwise and counterclockwise, respectively, to obtain a second subset of face images representing different head postures of the data provider in the r direction;

合并所述人脸图像第一子集合和所述人脸图像第二子集合得到所述人脸图像数据集。Combining the first subset of face images and the second subset of face images to obtain the face image data set.

优选地，所述利用面阵摄像头阵列中各个摄像头分别采集数据提供者的人脸图像包括：Preferably, said using each camera in the area array camera array to respectively collect the face image of the data provider includes:

采集所述每幅人脸图像时，记录所述数据提供者眼球正视的显示屏上的动点，从而确定所述数据提供者眼球视线的三维向量标签，且同时记录每幅人脸图像中的头部姿态。When collecting each face image, record the moving point on the display screen where the data provider's eyes are facing up, so as to determine the three-dimensional vector label of the data provider's eye sight, and record the moving points in each face image at the same time. head pose.

优选地，所述构建初始头部姿态检测网络包括：Preferably, said building an initial head pose detection network includes:

以Alex NET模型为基本结构，构建所述初始头部检测网络，所述初步头部检测网络的网络结构为：Taking the Alex NET model as the basic structure, construct the initial head detection network, the network structure of the preliminary head detection network is:

C(3,1,6)-BN-PReLU-P(2,2)-C(3,1,16)-BN-PReLU-P(2,2)-C(3,1,24)-BN-PReLU-C(3,1,24)-PReLU(3,1,16)-BN-PReLU-P(2,2)-FC(256)-FC(128)-PReLU-FC(3)；C(3,1,6)-BN-PReLU-P(2,2)-C(3,1,16)-BN-PReLU-P(2,2)-C(3,1,24)-BN -PReLU-C(3,1,24)-PReLU(3,1,16)-BN-PReLU-P(2,2)-FC(256)-FC(128)-PReLU-FC(3);

其中，C(k,s,c)表示卷积核尺寸为k，卷积步长为s，通道数为c的卷积层，P(k,s)表示核尺寸为k，步长为s的最大值池化层，BN表示批归一化，PReLU表示激活函数，FC(n)表示全连接层，神经元个数为n。Among them, C(k,s,c) means that the convolution kernel size is k, the convolution step size is s, and the number of channels is c convolutional layer, and P(k,s) means that the kernel size is k, and the step size is s The maximum pooling layer, BN means batch normalization, PReLU means activation function, FC(n) means fully connected layer, and the number of neurons is n.

优选地，所述利用所述人脸图像数据集分别对所述初始头部姿态检测网络和所述初始眼球动作检测网络进行训练包括：Preferably, using the face image data set to train the initial head pose detection network and the initial eye movement detection network respectively includes:

利用所述人脸图像数据集对所述头部姿态检测网络和所述初始眼球动作检测网络进行训练；Using the face image data set to train the head pose detection network and the initial eye movement detection network;

其中，损失函数Loss₁＝Loss_h+Loss_e为所述初步头部姿态检测网络的损失函数

和所述初步眼球动作检测网络损失函数

之和。Wherein, the loss function Loss ₁ =Loss _h +Loss _e is the loss function of the preliminary head pose detection network

and the preliminary eye movement detection network loss function

Sum.

优选地，将所述头部姿态和所述眼球动作输入至预先构建的三维视线向量检测网络，得到所述人脸图像中眼球的三维视线方向向量前包括：Preferably, the head posture and the eyeball movement are input to a pre-built three-dimensional line of sight vector detection network, and before obtaining the three-dimensional line of sight direction vector of the eyeball in the face image includes:

利用所述头部姿态检测网络和所述眼球动作检测网络分别对所述人脸数据集合中的人脸图像进行检测，得到每幅人脸图像的头部姿态和眼球动作；Using the head posture detection network and the eye movement detection network to detect the face images in the face data set respectively, to obtain the head posture and eye movements of each face image;

利用所述各幅人脸图像的头部姿态和眼球动作对预先建立的初始三维视线向量检测网络进行训练，从而得到完成训练的三维视线向量检测网络；Utilizing the head postures and eye movements of each of the face images to train the pre-established initial three-dimensional line of sight vector detection network, thereby obtaining the three-dimensional line of sight vector detection network that has completed the training;

当前损失函数Loss₂＝Loss₁+Loss_g＝Loss_h+Loss_e+Loss_g为损失函数Loss₁和所述初始三维视线向量检测网络损失函数

之和。The current loss function Loss ₂ = Loss ₁ + Loss _g = Loss _h + Loss _e + Loss _g is the loss function Loss ₁ and the initial three-dimensional sight vector detection network loss function

Sum.

本发明还提供了一种实现眼球三维视线跟踪的装置，包括：The present invention also provides a device for realizing eyeball three-dimensional line-of-sight tracking, including:

头部姿态检测模块，用于将待检测的人脸图像输入至预先构建的头部姿态检测网络，得到所述人脸图像中的头部姿态；Head posture detection module, for inputting the human face image to be detected to the pre-built head posture detection network to obtain the head posture in the human face image;

眼球动作检测模块，用于将所述人脸图像输入至预先构建的眼球动作检测网络，得到所述人脸图像的眼球动作；An eye movement detection module, configured to input the face image to a pre-built eye movement detection network to obtain the eye movement of the face image;

三维视线检测模块，用于将所述头部姿态和所述眼球动作输入至预先构建的三维视线向量检测网络，得到所述人脸图像中眼球的三维视线方向向量。The 3D line of sight detection module is configured to input the head posture and the eyeball movement into a pre-built 3D line of sight vector detection network to obtain the 3D line of sight direction vector of the eyeballs in the face image.

本发明还提供了一种实现眼球三维视线跟踪的设备，包括：The present invention also provides a device for realizing eyeball three-dimensional line of sight tracking, including:

存储器，用于存储计算机程序；处理器，用于执行所述计算机程序时实现上述一种实现眼球三维视线跟踪的方法的步骤。The memory is used to store the computer program; the processor is used to implement the steps of the above-mentioned method for realizing eyeball three-dimensional sight tracking when executing the computer program.

本发明还提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现上述一种实现眼球三维视线跟踪的方法的步骤。The present invention also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned method for realizing eyeball three-dimensional gaze tracking are realized.

本发明所提供的实现眼球三维视线跟踪的方法，将待检测的人脸图像输入至预先构建的头部姿态检测网络，得到了所述人脸图像中的头部姿态。将所述人脸图像输入至所述预先构建的眼球动作检测网络中，得到所述人脸图像中的眼球动作。将所述头部姿态和所述眼球动作输入至预先构建的三维视线向量检测网络，以便于根据几何约束并通过视线转换网络得到所述人脸图像中眼球的三维视线方向向量。本发明所述提供的眼球跟踪方法，基于深度学习网络，从二维的人脸图像中提取被拍摄者的头部姿态和眼球动作，并将所述头部姿态和所述眼球动作输入预先训练好的三维视线向量检测网络内，得到所述人脸图像中被拍摄者眼球的三维视线方向向量。本发明所提供的方法具体广泛的应用领域，通过人脸图像得到眼球的三维视线向量方向可以用于安全驾驶的监控领域、人机交互领域、心理研究领域等；解决了现有技术中通过深度神经网络实现眼球跟踪技术时，只能检测到眼球中心位置和眼球直径，不具有广泛应用场景的问题。相对应的，本发明所提供的装置、设备以及计算机可读存储介质，均具有上述有益效果。In the method for realizing eyeball three-dimensional line of sight tracking provided by the present invention, the face image to be detected is input to a pre-built head pose detection network, and the head pose in the face image is obtained. The human face image is input into the pre-built eye movement detection network to obtain the eye movement in the human face image. The head pose and the eye movement are input to a pre-built 3D line of sight vector detection network, so as to obtain the 3D line of sight direction vector of the eyeball in the face image according to geometric constraints and through a line of sight conversion network. The eye tracking method provided by the present invention is based on a deep learning network, extracts the head posture and eye movement of the person being photographed from the two-dimensional face image, and inputs the head posture and the eye movement into pre-training In a good three-dimensional line of sight vector detection network, the three-dimensional line of sight direction vector of the subject's eyeballs in the face image is obtained. The method provided by the present invention has a wide range of application fields. Obtaining the three-dimensional line of sight vector direction of the eyeball through the face image can be used in the monitoring field of safe driving, the field of human-computer interaction, and the field of psychological research; When the neural network implements eye tracking technology, it can only detect the center position and diameter of the eyeball, which does not have the problem of wide application scenarios. Correspondingly, the device, equipment and computer-readable storage medium provided by the present invention all have the above-mentioned advantageous effects.

附图说明Description of drawings

为了更清楚的说明本发明实施例或现有技术的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only For some embodiments of the present invention, those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1为本发明所提供的实现眼球三维视线跟踪的方法的第一种具体实施例的流程图；Fig. 1 is the flow chart of the first specific embodiment of the method for realizing eyeball three-dimensional line-of-sight tracking provided by the present invention;

图2为本发明所提供的实现眼球三维视线跟踪的方法的第二种具体实施例的流程图；Fig. 2 is the flow chart of the second specific embodiment of the method for realizing eyeball three-dimensional line-of-sight tracking provided by the present invention;

图3为本发明实施例提供的一种实现眼球三维视线跟踪的装置的结构框图。Fig. 3 is a structural block diagram of an apparatus for realizing eyeball three-dimensional line-of-sight tracking provided by an embodiment of the present invention.

具体实施方式Detailed ways

本发明的核心是提供一种实现眼球三维视线跟踪的方法、装置、设备以及计算机可读存储介质，可以通过二维人脸图像得到眼球的三维视线向量，具有广泛应用场景。The core of the present invention is to provide a method, device, device and computer-readable storage medium for realizing eyeball three-dimensional line of sight tracking, which can obtain the three-dimensional line of sight vector of the eyeball through two-dimensional face images, and has a wide range of application scenarios.

为了使本技术领域的人员更好地理解本发明方案，下面结合附图和具体实施方式对本发明作进一步的详细说明。显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

请参考图1，图1为本发明所提供的实现眼球三维视线跟踪的方法的第一种具体实施例的流程图；具体操作步骤如下：Please refer to Fig. 1, Fig. 1 is the flow chart of the first kind of specific embodiment of the method for realizing eyeball three-dimensional line of sight tracking provided by the present invention; Concrete operation steps are as follows:

步骤S101：将待检测的人脸图像输入至预先构建的头部姿态检测网络，得到所述人脸图像中的头部姿态；Step S101: Input the face image to be detected into the pre-built head pose detection network to obtain the head pose in the face image;

所述将待检测的人脸图像输入至预先构建的头部姿态检测网络，得到所述人脸图像中的头部姿态前首先采集多幅具有头部姿态和眼球视线的三维标签的人脸图像，构建人脸图像数据集；并构建初始头部姿态检测网络和初始眼球动作检测网络；利用所述人脸图像数据集分别对所述初始头部姿态检测网络和所述初始眼球动作检测网络进行训练，得到完成训练的所述头部姿态检测网络和所述眼球动作检测网络。The human face image to be detected is input to the pre-built head posture detection network, and before the head posture in the human face image is obtained, multiple pieces of human face images with head posture and three-dimensional labels of eye sights are first collected , constructing a face image data set; and constructing an initial head pose detection network and an initial eye movement detection network; using the face image data set to perform the initial head pose detection network and the initial eye movement detection network respectively Training to obtain the trained head posture detection network and the eye movement detection network.

为了使所述初始头部姿态检测网络和所述初始眼球动作检测网络具有更好的泛化能力，在本实施例中采集的人脸图像数据集合需要具备以下特征：a、具有广泛的分布，尽可能的覆盖所有的头部姿态和眼球动作，同时数据图像还应该包括不同光线强度，甚至包括眼镜反光干扰。B、人脸图像数据集合具有头部姿态和眼球视线的三维标签。c、人脸图像数据集合中的人脸图像优选一般的RGB图像，而不是依赖于特定的摄像头设备。In order to make the initial head posture detection network and the initial eye movement detection network have better generalization ability, the face image data set collected in this embodiment needs to have the following characteristics: a. has a wide distribution, Cover all head postures and eye movements as much as possible. At the same time, the data image should also include different light intensities, and even reflective interference from glasses. B. The face image data set has three-dimensional labels of head pose and eye gaze. c. The face images in the face image data set are preferably general RGB images, rather than relying on specific camera equipment.

为了使所述人脸图像数据集合具有更广泛的分布，本实施例采用了一个3×4摄像头阵列，通过不同的摄像头视角来代表不同的头部姿态。但是面阵摄像头阵列仅能代表头部姿态在(y,p)两个方向的不同，所以，为了得到头部姿态在r方向不同，对采集的人脸图像分别进行顺逆时针进行旋转，来表示头部的侧头摆动动作的变化，对应的每个头部姿态，摄像头所在阵列的位置及图像旋转的角度对应一个头部姿态的标签(y^GT,p^GT,r^GT)。In order to make the face image data set have a wider distribution, this embodiment uses a 3×4 camera array to represent different head postures through different camera angles of view. However, the area array camera array can only represent the difference of the head posture in the two directions of (y,p). Therefore, in order to obtain the difference of the head posture in the r direction, the collected face images are rotated clockwise and counterclockwise to obtain Indicates the change of the side swing movement of the head. For each corresponding head pose, the position of the camera array and the angle of image rotation correspond to a head pose label (y ^GT , p ^GT , r ^GT ).

为了获取更加丰富的眼球动作，在采集所述人脸图像数据集的同时，让数据提供者眼球跟踪注视显示屏的一个动点，显示屏幕的动点包含随机字母，需要数据提供者识别字母以确保数据提供者眼球正注视着屏幕的动点，从而保证数据标签的准确性，以获取不同的眼球动作,对应每一个眼球跟踪的位置，记录此时的眼球视线向量标签(φ^GT,θ^GT)。在采集人脸图像数据集合的同时记录每幅人脸图像中头部姿态和所对应的眼球视线的三维向量标签。In order to obtain richer eye movements, while collecting the face image data set, let the data provider's eyes track and watch a moving point on the display screen. The moving point on the display screen contains random letters, and the data provider needs to identify the letters to Make sure that the data provider’s eyeball is watching the moving point of the screen, so as to ensure the accuracy of the data label, in order to obtain different eyeball movements, corresponding to the position of each eyeball tracking, and record the eyeball vector label at this time (φ ^GT , θ ^GT ). While collecting the face image data set, record the head posture in each face image and the corresponding three-dimensional vector label of the eyeball sight.

在本实施例中，采集所述人脸图像数据集合时，只需要采集人脸RGB图像，无需依赖于其它特殊设备，相对于需要依赖于昂贵的头部佩戴红外线设备的现有技术，不仅降低了应用成本，而且由于头部自由无约束，从而具有更好的便捷性。In this embodiment, when collecting the face image data set, only the RGB image of the face needs to be collected without relying on other special equipment. The application cost is reduced, and because the head is free and unconstrained, it has better convenience.

构建所述初始头部姿态检测网络、所述初步眼球动作检测网络和初始三维视线向量检测网络之前，首先叙述本实施例所采用的几何分析和坐标体系。本实施例共采用两个坐标系，头部坐标系(X_h,Y_h,Z_h)和摄像头坐标系(X_c,Y_c,Z_c)，g为视线向量。为了进一步简化头部姿态的表示，本发明实施例采用了三维的球形旋转角表示(y,p,r)，其中y表示偏航角(沿着Y_h轴的旋转角)，p表示倾斜角(沿着X_h轴的旋转角)，r表示偏航角(沿着Z_h轴的旋转角)。而眼球的动作采用二维的球形坐标体系(θ,φ)表示，其中θ和φ分别表示视线向量与头部坐标系的水平和垂直方向上的夹角。Before constructing the initial head posture detection network, the preliminary eyeball movement detection network and the initial 3D gaze vector detection network, the geometric analysis and coordinate system used in this embodiment are described first. In this embodiment, two coordinate systems are used, the head coordinate system (X _h , Y _h , Z _h ) and the camera coordinate system (X _c , Y _c , Z _c ), and g is the line of sight vector. In order to further simplify the expression of the head posture, the embodiment of the present invention adopts a three-dimensional spherical rotation angle representation (y, p, r), where y represents the yaw angle (rotation angle along the Y _h axis), and p represents the tilt angle (the rotation angle along the X _h axis), and r represents the yaw angle (the rotation angle along the Z _h axis). The eyeball movement is represented by a two-dimensional spherical coordinate system (θ, φ), where θ and φ represent the horizontal and vertical angles between the line of sight vector and the head coordinate system, respectively.

在所述头部坐标体系中用眼球动作来描述视线向量如下所示：Using eyeball movements to describe the line of sight vector in the head coordinate system is as follows:

g_h＝[-cos(φ)sin(θ),sin(φ),-cos(φ)cos(θ)]^T g _h ＝[-cos(φ)sin(θ),sin(φ),-cos(φ)cos(θ)] ^T

摄像头坐标系(X_c,Y_c,Z_c)则定义为以摄像头中心为原点，摄像头深度方向为Z_c轴，垂直于深度方向的平面的两个方向分别为X_c,Y_c轴。由于网络最终输出的三维视线向量是在摄像头坐标系表示的，所以本发明实施例定义g_c为摄像头坐标系下的三维视线向量，根据几何学知识可知，g_c取决于g_h，g_h是在头部坐标系下定义的，所以可以得到本发明实施例的整体映射关系：

The camera coordinate system (X _c , Y _c , Z _c ) is defined as the center of the camera as the origin, the depth direction of the camera as the Z _c axis, and the two directions of the plane perpendicular to the depth direction as the X _c and Y _c axes respectively. Since the 3D line-of-sight vector finally output by the network is expressed in the camera coordinate system, the embodiment of the present invention defines g _c as the 3-D line-of-sight vector in the camera coordinate system. According to geometric knowledge, g _c depends on g _h , and g _h is It is defined under the head coordinate system, so the overall mapping relationship of the embodiment of the present invention can be obtained:

步骤S102：将所述人脸图像输入至预先构建的眼球动作检测网络，得到所述人脸图像的眼球动作；Step S102: Input the human face image into the pre-built eye movement detection network to obtain the eye movement of the human face image;

步骤S103：将所述头部姿态和所述眼球动作输入至预先构建的三维视线向量检测网络，得到所述人脸图像中眼球的三维视线方向向量。Step S103: Input the head pose and the eye movement into a pre-built 3D line of sight vector detection network to obtain a 3D line of sight direction vector of the eye in the face image.

将所述头部姿态和所述眼球动作输入至预先构建的三维视线向量检测网络，得到所述人脸图像中眼球的三维视线向量。The head pose and the eye movement are input to a pre-built three-dimensional line of sight vector detection network to obtain the three-dimensional line of sight vector of the eyeball in the face image.

为了重复利用现有的数据集，本实施例中的网络采用了端对端的结构，先分别建立初始头部姿态检测网络和所述眼球动作检测网络，然后将两部分网络的结构检测结果输入到一个全连接网络，得到最后的三维视线向量，网络分成两个分支，上部分分支用于检测头部姿态，下部分用于检测眼球动作，然后经过几何约束的视线转换层，得到摄像头坐标系的视线三维方向向量。In order to reuse existing data sets, the network in this embodiment adopts an end-to-end structure. First, the initial head pose detection network and the eye movement detection network are respectively established, and then the structure detection results of the two parts of the network are input into A fully connected network to obtain the final 3D line of sight vector, the network is divided into two branches, the upper branch is used to detect the head posture, the lower part is used to detect eyeball movements, and then through the geometrically constrained line of sight conversion layer, the camera coordinate system is obtained Sight 3D direction vector.

基于上述实施例，在本实施例中，为了重复利用采集到的人脸图像数据集，本实施例采用了端对端的结构，先分别建立头部姿态检测的网络和眼球动作检测的网络，然后将两部分网络的结构检测结果输入到一个全连接网络，得到最后的三维视线向量，网络分成两个分支，上部分分支用于检测头部姿态，下部分用于检测眼球动作，然后经过几何约束的视线转换层，得到摄像头坐标系的视线三维方向向量。请参考图2，图2为本发明所提供的实现眼球三维视线跟踪的方法的第二种具体实施例的流程图；具体操作步骤如下：Based on the above-mentioned embodiments, in this embodiment, in order to reuse the collected face image data sets, this embodiment adopts an end-to-end structure, and first establishes a head posture detection network and an eye movement detection network respectively, and then The structure detection results of the two parts of the network are input into a fully connected network to obtain the final 3D line of sight vector. The network is divided into two branches, the upper branch is used to detect the head posture, and the lower part is used to detect eye movements, and then through geometric constraints The line-of-sight transformation layer of the camera coordinate system obtains the line-of-sight three-dimensional direction vector. Please refer to Fig. 2, Fig. 2 is the flow chart of the second specific embodiment of the method for realizing eyeball three-dimensional line of sight tracking provided by the present invention; the specific operation steps are as follows:

步骤S201：利用面阵摄像头阵列采集多幅数据提供者的人脸图像，并记录每幅人脸图像中的头部姿态和眼球动作的三维向量标签，得到人脸图像第一子集合；Step S201: using an area array camera array to collect multiple face images of data providers, and recording the three-dimensional vector labels of the head posture and eye movements in each face image to obtain the first subset of face images;

步骤S202：对所述人脸图像第一子集合中的人脸图像分别进行顺时针和逆时针方向的旋转，得到人脸图像第二子集合；Step S202: Rotate the face images in the first subset of face images clockwise and counterclockwise, respectively, to obtain the second subset of face images;

步骤S203：合并所述人脸图像第一子集合和所述人脸图像第二子集合得到所述人脸图像数据集；Step S203: merging the first subset of face images and the second subset of face images to obtain the face image data set;

步骤S204：利用所述人脸图像数据集合分别对预先构建的初始头部姿态检测网络和初始眼球动作检测网络进行训练，得到目标头部姿态检测网络和目标眼球检测网络；Step S204: Using the face image data set to train the pre-built initial head pose detection network and initial eyeball movement detection network respectively, to obtain the target head pose detection network and the target eyeball detection network;

所述初始头部姿态检测网络的基本网络结构采用了Alex Net的结构，对其进行相应的简化和修改。网络的层数不变，但每一层的通道数进行了适当的减少，同时将局部响应归一化改为批归一化，激活函数采用PReLU。所述初始头部姿态检测网络的网络结构如下：C(3,1,6)-BN-PReLU-P(2,2)-C(3,1,16)-BN-PReLU-P(2,2)-C(3,1,24)-BN-PReLU-C(3,1,24)-PReLU(3,1,16)-BN-PReLU-P(2,2)-FC(256)-FC(128)-PReLU-FC(3)The basic network structure of the initial head pose detection network adopts the structure of Alex Net, which is simplified and modified accordingly. The number of layers of the network remains unchanged, but the number of channels in each layer is appropriately reduced, and at the same time, the local response normalization is changed to batch normalization, and the activation function uses PReLU. The network structure of the initial head pose detection network is as follows: C(3,1,6)-BN-PReLU-P(2,2)-C(3,1,16)-BN-PReLU-P(2, 2)-C(3,1,24)-BN-PReLU-C(3,1,24)-PReLU(3,1,16)-BN-PReLU-P(2,2)-FC(256)- FC(128)-PReLU-FC(3)

其中，其中，C(k,s,c)表示卷积核尺寸为k，卷积步长为s，通道数为c的卷积层，P(k,s)表示核尺寸为k，步长为s的最大值池化层，BN表示批归一化，PReLU表示激活函数，FC(n)表示全连接层，神经元个数为n。Among them, C(k,s,c) indicates that the convolution kernel size is k, the convolution step size is s, and the convolution layer with the number of channels is c, and P(k,s) indicates that the kernel size is k, and the step size is the maximum pooling layer of s, BN means batch normalization, PReLU means activation function, FC(n) means fully connected layer, and the number of neurons is n.

所述眼球动作检测网络的输入为人脸图像的原始图片所截取的眼睛区域，分左眼和右眼两部分，由于两部分网络完全对称，下面将对其部分进行详细说明，将眼球图像块调整到一致大小36x36，然后经过卷积神经网络和全连接网络，所述初始眼球动作检测网络结构如下：C(11,2,96)-BN-PReLU-P(2,2)-C(5,1,256)-BN-PReLU-P(2,2)-C(3,1,384)-BN-PReLU-P(2,2)-C(1,1,64)-BN-PReLU-P(2,2)-FC(128)-FC(2)。The input of the eyeball movement detection network is the eye area intercepted by the original picture of the face image, which is divided into two parts, the left eye and the right eye. Since the two parts of the network are completely symmetrical, the following will describe its part in detail, and adjust the eyeball image block to a consistent size of 36x36, and then through a convolutional neural network and a fully connected network, the initial eye movement detection network structure is as follows: C(11,2,96)-BN-PReLU-P(2,2)-C(5, 1,256)-BN-PReLU-P(2,2)-C(3,1,384)-BN-PReLU-P(2,2)-C(1,1,64)-BN-PReLU-P(2,2 )-FC(128)-FC(2).

步骤S205：利用所述目标头部姿态检测网络和所述目标眼球动作检测网络对所述人脸图像数据集中的各幅人脸进行检测，得到所述各幅人脸图像的头部姿态和眼球动作；Step S205: Use the target head pose detection network and the target eyeball movement detection network to detect each face in the face image data set, and obtain the head pose and eyeballs of each face image action;

步骤S206：利用所述人脸图像数据集合中每幅人脸图像的头部姿态和眼球动作输入至预先构建的初始三维视线向量检测网络进行训练，得到所述目标三维视线向量检测网络；Step S206: Using the head posture and eyeball movements of each face image in the face image data set to input to the pre-built initial 3D line of sight vector detection network for training, to obtain the target 3D line of sight vector detection network;

所述初始三维视线向量检测网络由所述目标头部姿态检测网络得到的(y,p,r)和所述目标眼球动作检测网络得到的(θ,φ)作为所述初始三维视线向量检测网络的输入，所述初始三维视线向量检测网络为两层全连接网络，网络第一层神经元个数为128，最后层神经元个数为3，对应三维视线向量。The initial three-dimensional line of sight vector detection network is obtained from (y, p, r) obtained by the target head posture detection network and (θ, φ) obtained by the target eye movement detection network as the initial three-dimensional line of sight vector detection network The input of the initial three-dimensional line of sight vector detection network is a two-layer fully connected network, the number of neurons in the first layer of the network is 128, and the number of neurons in the last layer is 3, corresponding to the three-dimensional line of sight vector.

对所述头部姿态检测网络和所述初始眼球动作检测网络进行训练时，损失函数Loss₁＝Loss_h+Loss_e为所述初步头部姿态检测网络的损失函数

和所述初步眼球动作检测网络损失函数

之和。When the head pose detection network and the initial eye movement detection network are trained, the loss function Loss ₁ =Loss _h +Loss _e is the loss function of the preliminary head pose detection network

and the preliminary eye movement detection network loss function

Sum.

利用对所述预先建立的初始三维视线向量检测网络进行训练时，当前损失函数Loss₂＝Loss₁+Loss_g＝Loss_h+Loss_e+Loss_g为损失函数Loss₁和所述初始三维视线向量检测网络损失函数

之和。When using the pre-established initial three-dimensional line of sight vector detection network for training, the current loss function Loss ₂ = Loss ₁ + Loss _g = Loss _h + Loss _e + Loss _g is the loss function Loss ₁ and the initial three-dimensional line of sight vector detection Network loss function

Sum.

Loss_h＝||h-h^GT||₂,h＝{y,p,r}Loss _h ＝||hh ^GT || ₂ , h＝{y,p,r}

Loss_e＝||e-e^GT||₂,e＝{φ,θ}Loss _e ＝||ee ^GT || ₂ , e＝{φ,θ}

Loss_g＝||g_c-g_c ^GT||₂,g_c＝{x,y,z}Loss _g ＝||g _c -g _c ^GT || ₂ ,g _c ＝{x,y,z}

步骤S207：将待检测的人脸图像输入至所述目标头部姿态检测网络，得到所述待检测的人脸图像中的头部姿态；Step S207: Input the face image to be detected into the target head pose detection network to obtain the head pose in the face image to be detected;

步骤S208：将所述待检测的人脸图像输入至所述目标眼球动作检测网络，得到所述待检测的人脸图像的眼球动作；Step S208: Input the face image to be detected into the target eye movement detection network to obtain the eye movement of the face image to be detected;

步骤S209：将所述待检测的人脸图像的头部姿态和所述待检测的人脸图像的眼球动作输入至所述目标三维视线向量检测网络，得到所述待检测的人脸图像中眼球的三维视线方向向量。Step S209: Input the head posture of the face image to be detected and the eyeball movement of the face image to be detected into the target 3D line of sight vector detection network to obtain the eyeball in the face image to be detected The 3D view direction vector of .

现有技术中眼球识别中仅进行了眼球中心位置的二维标注，最终只能得到眼球的二维信息，所以应用受到局限，而本实施例所提供的方法同样基于深度神经网络，但本实施例不仅处理了眼球的动作信息，还进行了头部姿态的预测，同时预测了眼球三维视线向量，从而具有更高层次的信息，也具有更好的应用价值。本实施例中网络训练采用了端对端的分步训练，在第一步训练过程中，可以充分利用现有的头部姿态的数据集和眼球动作数据集，从而极大的增加了训练的数据集，让本实施例中深度网络具有更好的泛化能力。In the prior art, only the two-dimensional labeling of the eyeball center position is carried out in the eyeball recognition, and finally only the two-dimensional information of the eyeball can be obtained, so the application is limited, and the method provided in this embodiment is also based on the deep neural network, but this implementation The example not only processes the movement information of the eyeball, but also predicts the head posture, and at the same time predicts the three-dimensional line of sight vector of the eyeball, so that it has higher-level information and better application value. In this embodiment, the network training adopts end-to-end step-by-step training. In the first step of training, the existing head posture data set and eye movement data set can be fully utilized, thereby greatly increasing the training data. set, so that the deep network in this embodiment has better generalization ability.

请参考图3，图3为本发明实施例提供的一种实现眼球三维视线跟踪的装置的结构框图；具体装置可以包括：Please refer to FIG. 3. FIG. 3 is a structural block diagram of a device for realizing eyeball three-dimensional line-of-sight tracking provided by an embodiment of the present invention; the specific device may include:

头部姿态检测模块100，用于将待检测的人脸图像输入至预先构建的头部姿态检测网络，得到所述人脸图像中的头部姿态；The head pose detection module 100 is used to input the human face image to be detected to the pre-built head pose detection network to obtain the head pose in the human face image;

眼球动作检测模块200，用于将所述人脸图像输入至预先构建的眼球动作检测网络，得到所述人脸图像的眼球动作；The eye movement detection module 200 is configured to input the human face image into a pre-built eye movement detection network to obtain the eye movement of the human face image;

三维视线检测模块300，用于将所述头部姿态和所述眼球动作输入至预先构建的三维视线向量检测网络，得到所述人脸图像中眼球的三维视线方向向量。The 3D line of sight detection module 300 is configured to input the head pose and the eye movement into a pre-built 3D line of sight vector detection network to obtain a 3D line of sight direction vector of the eyeballs in the face image.

本实施例的实现眼球三维视线跟踪的装置用于实现前述的实现眼球三维视线跟踪的方法，因此实现眼球三维视线跟踪的装置中的具体实施方式可见前文中的实现眼球三维视线跟踪的方法的实施例部分，例如，头部姿态检测模块100，眼球动作检测模块200，三维视线检测模块300，分别用于实现上述实现眼球三维视线跟踪的方法中步骤S101，S102和S103，所以，其具体实施方式可以参照相应的各个部分实施例的描述，在此不再赘述。The device for realizing eyeball three-dimensional line of sight tracking in this embodiment is used to implement the aforementioned method for realizing eyeball three-dimensional line of sight tracking. Therefore, the specific implementation of the device for realizing eyeball three-dimensional line of sight tracking can be seen in the implementation of the method for realizing eyeball three-dimensional line of sight tracking in the above text. In the example part, for example, the head posture detection module 100, the eyeball movement detection module 200, and the three-dimensional line of sight detection module 300 are respectively used to implement steps S101, S102 and S103 in the method for realizing eyeball three-dimensional line of sight tracking. Therefore, the specific implementation method Reference may be made to the descriptions of the corresponding partial embodiments, and details are not repeated here.

本发明具体实施例还提供了一种实现眼球三维视线跟踪的设备，包括：存储器，用于存储计算机程序；处理器，用于执行所述计算机程序时实现上述一种实现眼球三维视线跟踪的方法的步骤。The specific embodiment of the present invention also provides a device for realizing eyeball three-dimensional line of sight tracking, including: a memory for storing computer programs; a processor for implementing the above-mentioned method for realizing eyeball three-dimensional line of sight tracking when executing the computer program A step of.

本发明具体实施例还提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现上述一种实现眼球三维视线跟踪的方法的步骤。A specific embodiment of the present invention also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned method for realizing three-dimensional sight tracking of the eyeball is realized. step.

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其它实施例的不同之处，各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言，由于其与实施例公开的方法相对应，所以描述的比较简单，相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related information, please refer to the description of the method part.

专业人员还可以进一步意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Professionals can further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two. In order to clearly illustrate the possible For interchangeability, in the above description, the composition and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present invention.

结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块，或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in connection with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

以上对本发明所提供的实现眼球三维视线跟踪的方法、装置、设备以及计算机可读存储介质进行了详细介绍。本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想。应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以对本发明进行若干改进和修饰，这些改进和修饰也落入本发明权利要求的保护范围内。The method, device, equipment and computer-readable storage medium for realizing eyeball three-dimensional line-of-sight tracking provided by the present invention are described above in detail. In this paper, specific examples are used to illustrate the principle and implementation of the present invention, and the descriptions of the above embodiments are only used to help understand the method and core idea of the present invention. It should be pointed out that for those skilled in the art, without departing from the principle of the present invention, some improvements and modifications can be made to the present invention, and these improvements and modifications also fall within the protection scope of the claims of the present invention.

Claims

1. A method for realizing eyeball three-dimensional line of sight tracking, characterized in that, comprising:

Input the face image to be detected to the pre-built head pose detection network to obtain the head pose in the face image;

The human face image is input to a pre-built eye movement detection network to obtain the eye movement of the human face image;

Inputting the head posture and the eyeball movement into a pre-built three-dimensional line of sight vector detection network to obtain the three-dimensional line of sight direction vector of the eyeball in the face image;

Said inputting the human face image to be detected into the pre-built head posture detection network, before obtaining the head posture in the human face image includes: collecting multiple human faces with three-dimensional labels of head posture and eyeball sight image, constructing a face image data set, wherein the face image is an RGB image; constructing an initial head posture detection network and an initial eyeball movement detection network; The detection network and the initial eye movement detection network are trained to obtain the trained head posture detection network and the eye movement detection network;

The network structure of the initial head posture detection network is:

C(3,1,6)-BN-PReLU-P(2,2)-C(3,1,16)-BN-PReLU-P(2,2)-C(3,1,24)-BN -PReLU-C(3,1,24)-PReLU(3,1,16)-BN-PReLU-P(2,2)-FC(256)-FC(128)-PReLU-FC(3);

The initial eye movement detection network structure is:

C(11,2,96)-BN-PReLU-P(2,2)-C(5,1,256)-BN-PReLU-P(2,2)-C(3,1,384)-BN-PReLU-P (2,2)-C(1,1,64)-BN-PReLU-P(2,2)-FC(128)-FC(2);

Among them, C(k,s,c) means that the convolution kernel size is k, the convolution step size is s, and the number of channels is c convolutional layer, and P(k,s) means that the kernel size is k, and the step size is s The maximum pooling layer of BN means batch normalization, PReLU means activation function, FC(n) means fully connected layer, and the number of neurons is n;

The input of the head posture and the eyeball movement to the pre-built three-dimensional line of sight vector detection network, before obtaining the three-dimensional line of sight direction vector of the eyeball in the face image, includes:

Using the head posture detection network and the eye movement detection network to detect the face images in the face image data set respectively, to obtain the head posture and eye movement of each face image;

Utilizing the head postures and eye movements of each of the face images to train the pre-established initial three-dimensional line of sight vector detection network, thereby obtaining the three-dimensional line of sight vector detection network that has completed the training;

The initial three-dimensional line of sight vector detection network is a two-layer fully connected network, the number of neurons in the first layer of the network is 128, and the number of neurons in the last layer is 3, corresponding to the three-dimensional line of sight vector.

2. method as claimed in claim 1, is characterized in that, described collection has the face image of the three-dimensional label of head pose and eyeball line of sight a plurality of pieces, constructs face image data set and comprises:

Using each camera in the area array camera array to separately collect the face images of the data provider to obtain the first subset of face images;

Each row of cameras in the area array camera array collects multiple face images, representing different head postures of the data provider in the y direction;

The multiple face images collected by each row of cameras in the area array camera array represent different head postures of the data provider in the p direction;

Rotating the face images collected by the area camera array clockwise and counterclockwise, respectively, to obtain a second subset of face images representing different head postures of the data provider in the r direction;

Combining the first subset of face images and the second subset of face images to obtain the face image data set.

3. The method according to claim 2, wherein said utilizing each camera in the area camera array to collect the face image of the data provider respectively comprises:

When collecting the face image of the data provider, record the moving point on the display screen where the data provider's eyes are facing up, so as to determine the three-dimensional vector label of the data provider's eye sight, and record each face image at the same time head pose in .

4. method as claimed in claim 1, is characterized in that, described utilizing described human face image data set to respectively carry out training to described initial head gesture detection network and described initial eyeball movement detection network comprising:

Using the face image data set to train the head pose detection network and the initial eye movement detection network;

Wherein, the loss function Loss ₁ =Loss _h +Loss _e is the sum of the loss function Loss _h of the preliminary head pose detection network and the loss function Loss _e of the preliminary eye movement detection network.

5. The method according to claim 4, wherein the head pose and the eyeball movement are input to a pre-built three-dimensional line of sight vector detection network to obtain the three-dimensional line of sight of the eyeball in the face image Include before the direction vector:

The current loss function Loss ₂ =Loss ₁ +Loss _g =Loss _h +Loss _e +Loss _g is the sum of the loss function Loss ₁ and the loss function Loss _g of the initial 3D view vector detection network.

6. A device for realizing eyeball three-dimensional line-of-sight tracking, characterized in that it comprises:

The head attitude detection module is used to input the human face image to be detected to the pre-built head attitude detection network to obtain the head attitude in the human face image; the human face image to be detected is input to the pre-built The constructed head pose detection network, before obtaining the head pose in the face image, includes: collecting multiple face images with three-dimensional labels of head pose and eyeball sight, and constructing a face image data set, wherein the The face image is an RGB image; construct an initial head posture detection network and an initial eye movement detection network; utilize the face image data set to train the initial head posture detection network and the initial eye movement detection network respectively , obtain the described head pose detection network and the described eyeball motion detection network that have completed the training; The network structure of the initial head pose detection network is: C(3,1,6)-BN-PReLU-P(2, 2)-C(3,1,16)-BN-PReLU-P(2,2)-C(3,1,24)-BN-PReLU-C(3,1,24)-PReLU(3,1 ,16)-BN-PReLU-P(2,2)-FC(256)-FC(128)-PReLU-FC(3); the initial eye movement detection network structure is: C(11,2,96) -BN-PreLU-P(2,2)-C(5,1,256)-BN-PReLU-P(2,2)-C(3,1,384)-BN-PReLU-P(2,2)-C( 1,1,64)-BN-PReLU-P(2,2)-FC(128)-FC(2); among them, C(k,s,c) indicates that the convolution kernel size is k, and the convolution step size is the convolutional layer with s and the number of channels c, P(k,s) represents the maximum pooling layer with a kernel size of k and a step size of s, BN represents batch normalization, PReLU represents the activation function, FC(n ) represents a fully connected layer, and the number of neurons is n;

An eye movement detection module, configured to input the human face image into a pre-built eye movement detection network to obtain the eye movement of the human face image;

The three-dimensional line of sight detection module is used to input the head posture and the eyeball movement to the pre-built three-dimensional line of sight vector detection network to obtain the three-dimensional line of sight direction vector of the eyeball in the face image; the head The posture and the eyeball movement are input to the pre-built three-dimensional line of sight vector detection network, and before obtaining the three-dimensional line of sight direction vector of the eyeball in the face image includes: using the head posture detection network and the eyeball movement detection network to respectively The human face images in the human face image data set are detected to obtain the head posture and eyeball movements of each human face image; the pre-established initial three-dimensional line of sight is determined by using the head posture and eyeball movements of each human face image The vector detection network is trained to obtain the three-dimensional line of sight vector detection network that completes the training; the initial three-dimensional line of sight vector detection network is a two-layer fully connected network, the number of neurons in the first layer of the network is 128, and the number of neurons in the last layer is 3, corresponding to the three-dimensional line of sight vector.

7. A device for realizing eyeball three-dimensional line-of-sight tracking, characterized in that it comprises:

memory for storing computer programs;

The processor is configured to implement the steps of a method for realizing eyeball three-dimensional line-of-sight tracking according to any one of claims 1 to 5 when executing the computer program.

8. A computer-readable storage medium, characterized in that, a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program according to any one of claims 1 to 5 is implemented. Steps of a method for realizing eyeball three-dimensional line of sight tracking.