CN109508679A

CN109508679A - Realize method, apparatus, equipment and the storage medium of eyeball three-dimensional eye tracking

Info

Publication number: CN109508679A
Application number: CN201811375929.7A
Authority: CN
Inventors: 张国生; 李东; 冯广; 章云
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2018-11-19
Filing date: 2018-11-19
Publication date: 2019-03-22
Anticipated expiration: 2038-11-19
Also published as: CN109508679B

Abstract

The invention discloses a kind of method, apparatus for realizing eyeball three-dimensional eye tracking, equipment and computer readable storage mediums, include: the head pose detection network that facial image to be detected is input to and is constructed in advance, obtains the head pose in the facial image；The facial image is input to the eyeball motion detection network constructed in advance, obtains the eyeball movement of the facial image；The head pose and the eyeball action input to the three-dimensional sight line vector constructed in advance are detected into network, obtain the three-dimensional direction of visual lines vector of eyeball in the facial image.Method, apparatus, equipment and computer readable storage medium provided by the present invention can extract the three-dimensional direction of visual lines vector of the person's of being taken eyeball from two-dimensional facial image, have a wide range of applications scene.

Description

Realize method, apparatus, equipment and the storage medium of eyeball three-dimensional eye tracking

Technical field

The present invention relates to eye tracking technical field, more particularly to a kind of method for realizing eyeball three-dimensional eye tracking, Device, equipment and computer readable storage medium.

Background technique

The research of eye tracking algorithm has had more mature achievement, and successfully real in many business applications It is existing, such as VR/AR technology, although traditional eye tracking technology can be realized higher precision, eye tracking at this stage Algorithm is substantially based on traditional image processing method, and dependent on expensive infrared equipment, and needs are special in head installation Detection device, detect the feature of eyeball.Traditional image processing method detection accuracy by light variation influenced, and detect away from From by serious constraint.So being badly in need of a kind of a kind of RGB image that can be shot by common camera realizes eye tracking Algorithm.In computer vision field, depth convolutional neural networks achieve great achievement, such as target inspection at many aspects Survey, example segmentation etc..

Also there is the eye tracking technology based on deep learning accordingly in the prior art, the specific steps are as follows: obtain view Film lesion image data；Data mark is carried out to retinopathy image data, obtains labeled data；Establish initial depth study Network；Retinopathy image data is inputted in initial depth learning network, output obtains corresponding prediction data；Utilize damage It loses function to be compared the corresponding labeled data of retinopathy image data and prediction data, obtains comparison result；According to Comparison result adjusts the parameter in initial depth learning network, until comparison result reaches preset threshold, obtains final depth Learning network model；Retinopathy image data to be measured is handled using deep learning network model, is obtained corresponding Eyeball centre coordinate and eyeball diameter.

Therefore in existing eye tracking technology, one is realize eye tracking skill based on traditional image processing algorithm Art, although this kind of algorithm has had more mature business application, traditional image processing algorithm detection accuracy is by light The influence of variation, and dependent on expensive head-mount infrored equipment, the convenient sexual experience on head is poor, detecting distance It suffers restraints.Another kind is the eye tracking algorithm based on deep learning algorithm, however existing based on being based on depth in technology The eye tracking algorithm of learning algorithm is only capable of detection eyeball center and eyeball diameter, only comprising the two dimension letter of eyeball movement Breath, application scenarios suffer restraints.

In summary as can be seen that it is current for how obtaining the three-dimensional direction of visual lines vector of eyeball by two-dimension human face image Problem to be solved.

Summary of the invention

The object of the present invention is to provide a kind of method, apparatus for realizing eyeball three-dimensional eye tracking, equipment and computers Readable storage medium storing program for executing can only detect the two dimension letter of eyeball to solve the eye tracking algorithm based on deep learning in the prior art The problem of breath.

In order to solve the above technical problems, the present invention provides a kind of method for realizing eyeball three-dimensional eye tracking, comprising: will be to The facial image of detection is input to the head pose detection network constructed in advance, obtains the head pose in the facial image； The facial image is input to the eyeball motion detection network constructed in advance, obtains the eyeball movement of the facial image；It will The head pose and the eyeball action input to the three-dimensional sight line vector constructed in advance detect network, obtain the face figure The three-dimensional direction of visual lines vector of eyeball as in.

Preferably, described that facial image to be detected is input to the head pose detection network constructed in advance, obtain institute Include: before stating the head pose in facial image

The facial image of several three-dimensional labels with head pose and eyeball sight is acquired, face image data is constructed Collection, wherein the facial image is RGB image；

Construct initial head pose detection network and initial eyeball motion detection network；

It is dynamic to the initial head pose detection network and the initial eyeball respectively using the face image data collection Make detection network to be trained, obtains the head pose detection network for completing training and the eyeball motion detection network.

Preferably, the facial image for acquiring several three-dimensional labels with head pose and eyeball sight, constructs people Face image data set includes:

The facial image for acquiring data set provider respectively using each camera in the battle array camera array of face, obtains face figure As the first subclass；

Every row camera collects several facial images in the face battle array camera array, indicates that the data set provider exists The different head pose in the direction y；

Several collected facial images of each column camera in the face battle array camera array, indicate the data set provider In the different head pose in the direction p；

The face battle array collected facial image of camera array is carried out clockwise and counterclockwise respectively Rotation obtains indicating that the data set provider is closed in the facial image second subset of the different head pose in the direction r；

Merge first subclass of facial image and the facial image second subset closes to obtain the facial image number According to collection.

Preferably, the facial image for acquiring data set provider respectively using each camera in the battle array camera array of face Include:

When acquiring every width facial image, the dynamic point on the display screen that the data set provider eyeball is faced is recorded, from And determine the three-dimensional vector label of the data set provider eyeball sight, and record the head appearance in every width facial image simultaneously State.

Preferably, the initial head pose detection network of the building includes:

Using Alex NET model as basic structure, the initial head detection network, the preliminary head detection net are constructed The network structure of network are as follows:

C(3,1,6)-BN-PReLU-P(2,2)-C(3,1,16)-BN-PReLU-P(2,2)-C(3,1,24)-BN- PReLU-C(3,1,24)-PReLU(3,1,16)-BN-PReLU-P(2,2)-FC(256)-FC(128)-PReLU-FC(3)；

Wherein, C (k, s, c) indicates convolution kernel having a size of k, and convolution step-length is s, and port number is the convolutional layer of c, P (k, s) table Show core having a size of k, step-length is the maximum value pond layer of s, and BN indicates batch normalization, and PReLU indicates that activation primitive, FC (n) indicate Full articulamentum, neuron number n.

Preferably, it is described using the face image data collection respectively to the initial head pose detection network and described Initial eyeball motion detection network, which is trained, includes:

Network and the initial eyeball motion detection net are detected to the head pose using the face image data collection Network is trained；

Wherein, loss function Loss₁=Loss_h+Loss_eThe loss function of network is detected for the preliminary head poseWith the preliminary eyeball motion detection network losses functionThe sum of.

Preferably, the head pose and the eyeball action input to the three-dimensional sight line vector constructed in advance are detected into net Network, obtain include: before the three-dimensional direction of visual lines vector of eyeball in the facial image

Network and the eyeball motion detection network are detected respectively to the human face data set using the head pose In facial image detected, obtain every width facial image head pose and eyeball movement；

Using head pose and the eyeball movement of each width facial image to the initial three-dimensional sight line vector pre-established Detection network is trained, to obtain completing the three-dimensional sight line vector detection network of training；

Current loss function Loss₂=Loss₁+Loss_g=Loss_h+Loss_e+Loss_gFor loss function Loss₁With it is described Initial three-dimensional sight line vector detects network losses functionThe sum of.

The present invention also provides a kind of devices for realizing eyeball three-dimensional eye tracking, comprising:

Head pose detection module detects net for facial image to be detected to be input to the head pose constructed in advance Network obtains the head pose in the facial image；

Eyeball motion detection module, for the facial image to be input to the eyeball motion detection network constructed in advance, Obtain the eyeball movement of the facial image；

Three-dimensional line-of-sight detection module, for by the head pose and the eyeball action input to the three-dimensional constructed in advance Sight line vector detects network, obtains the three-dimensional direction of visual lines vector of eyeball in the facial image.

The present invention also provides a kind of equipment for realizing eyeball three-dimensional eye tracking, comprising:

Memory, for storing computer program；Processor realizes above-mentioned one kind when for executing the computer program The step of realizing the method for eyeball three-dimensional eye tracking.

The present invention also provides a kind of computer readable storage medium, meter is stored on the computer readable storage medium Calculation machine program, the computer program realize a kind of above-mentioned method for realizing eyeball three-dimensional eye tracking when being executed by processor Step.

The method provided by the present invention for realizing eyeball three-dimensional eye tracking, facial image to be detected is input in advance The head pose of building detects network, has obtained the head pose in the facial image.The facial image is input to institute It states in the eyeball motion detection network constructed in advance, obtains the eyeball movement in the facial image.By the head pose and The eyeball action input to the three-dimensional sight line vector that constructs in advance detects network, in order to according to geometrical constraint and pass through sight Switching network obtains the three-dimensional direction of visual lines vector of eyeball in the facial image.The eye tracking side of offer of the present invention Method is based on deep learning network, head pose and the eyeball movement of the person of being taken is extracted from two-dimensional facial image, and by institute It states in head pose and the eyeball action input three-dimensional sight line vector detection network trained in advance, obtains the face figure The three-dimensional direction of visual lines vector of the person's of being taken eyeball as in.Method provided by the present invention is specifically widely applied field, passes through The three-dimensional sight line vector direction that facial image obtains eyeball can be used for the monitoring field, field of human-computer interaction, the heart of safe driving Manage research field etc.；When solving in the prior art through deep neural network realization eye tracking technology, it is only able to detect eye Ball center position and eyeball diameter do not have the problem of scene is widely applied.Corresponding, device provided by the present invention is set Standby and computer readable storage medium all has above-mentioned beneficial effect.

Detailed description of the invention

It, below will be to embodiment or existing for the clearer technical solution for illustrating the embodiment of the present invention or the prior art Attached drawing needed in technical description is briefly described, it should be apparent that, the accompanying drawings in the following description is only this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.

Fig. 1 is the process of the first specific embodiment of the method provided by the present invention for realizing eyeball three-dimensional eye tracking Figure；

Fig. 2 is the process of second of specific embodiment of the method provided by the present invention for realizing eyeball three-dimensional eye tracking Figure；

Fig. 3 is a kind of structural block diagram for the device for realizing eyeball three-dimensional eye tracking provided in an embodiment of the present invention.

Specific embodiment

Core of the invention is to provide a kind of method, apparatus for realizing eyeball three-dimensional eye tracking, equipment and computer Readable storage medium storing program for executing can obtain the three-dimensional sight line vector of eyeball by two-dimension human face image, have and scene is widely applied.

In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description The present invention is described in further detail.Obviously, described embodiments are only a part of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.

Referring to FIG. 1, Fig. 1 is specific real for the first of the method provided by the present invention for realizing eyeball three-dimensional eye tracking Apply the flow chart of example；Specific steps are as follows:

Step S101: facial image to be detected is input to the head pose detection network constructed in advance, is obtained described Head pose in facial image；

It is described that facial image to be detected is input to the head pose detection network constructed in advance, obtain the face figure The facial image of several three-dimensional labels with head pose and eyeball sight is acquired before head pose as in first, constructs people Face image data set；And construct initial head pose detection network and initial eyeball motion detection network；Utilize the face figure As data set is respectively trained the initial head pose detection network and the initial eyeball motion detection network, obtain Complete the head pose detection network and the eyeball motion detection network of training.

It is preferably general in order to have the initial head pose detection network and the initial eyeball motion detection network Change ability, the face image data set acquired in the present embodiment need to have following characteristics: a, having extensive distribution, to the greatest extent All head poses of possible covering and eyeball movement, while data image should also include different light intensities, or even packet Include glasses reflection interference.B, face image data set has the three-dimensional label of head pose and eyeball sight.C, facial image The RGB image of facial image preferably generally in data acquisition system, rather than relies on specific camera device.

In order to make the face image data set have widely distribution, the present embodiment uses one 3 × 4 camera shooting Head array represents different head poses by different camera visual angles.But face battle array camera array is only capable of representing head Portion's posture (y, p) both direction difference, so, in order to obtain head pose the direction r difference, to the facial image of acquisition Carry out respectively along being rotated counterclockwise, come indicate head side wobbling action variation, corresponding each head pose takes the photograph Label (the y of the corresponding head pose of the angle of the position of array and image rotation as where head^GT,p^GT,r^GT)。

In order to obtain eyeball movement more abundant, while acquiring the face image data collection, data is allowed to provide Person's eye tracking watches a dynamic point of display screen attentively, shows that the dynamic point of screen includes random letters, data set provider is needed to identify Letter is to ensure that data set provider eyeball is just watching the dynamic point of screen attentively, to guarantee the accuracy of data label, to obtain not Same eyeball movement, the position of each corresponding eye tracking record eyeball sight line vector label (φ at this time^GT,θ^GT).? The three-dimensional of head pose and corresponding eyeball sight in every width facial image is recorded while acquiring face image data set Vector label.

In the present embodiment, when acquiring the face image data set, it is only necessary to acquire face RGB image, without according to The other special installations of Lai Yu are not only reduced relative to the prior art for the expensive head-mount infrored equipment that needs to rely on Application cost, and since head is freely without constraint, to have better convenience.

Construct the initial head pose detection network, the preliminary eyeball motion detection network and initial three-dimensional sight to Before amount detection network, geometrical analysis and coordinate-system used by the present embodiment are described first.The present embodiment uses two altogether Coordinate system, head coordinate system (X_h,Y_h,Z_h) and camera coordinate system (X_c,Y_c,Z_c), g is sight line vector.In order to be further simplified The expression of head pose, the embodiment of the present invention, which uses three-dimensional ball shape rotary angle, indicates (y, p, r), and wherein y indicates yaw angle (along Y_hThe rotation angle of axis), p indicates inclination angle (along X_hThe rotation angle of axis), r indicates yaw angle (along Z_hThe rotation of axis Angle).And the movement of eyeball using two-dimensional spherical coordinate system (θ, φ) indicate, wherein θ and φ respectively indicate sight line vector with Head coordinate system both horizontally and vertically on angle.

It is as follows with eyeball movement sight line vector to be described in the head coordinate-system:

g_h=[- cos (φ) sin (θ), sin (φ) ,-cos (φ) cos (θ)]^T

Camera coordinate system (X_c,Y_c,Z_c) be then defined as using camera center as origin, camera depth direction is Z_cAxis, Both direction perpendicular to the plane of depth direction is respectively X_c,Y_cAxis.Due to the three-dimensional sight line vector of network final output be What camera coordinate system indicated, so the embodiment of the present invention defines g_cFor the three-dimensional sight line vector under camera coordinate system, according to Geometry is gained knowledge it is found that g_cDepending on g_h, g_hIt is undefined in head coordinate system, it is possible to obtain the embodiment of the present invention Global mapping relationship:

Step S102: the facial image is input to the eyeball motion detection network constructed in advance, obtains the face The eyeball of image acts；

Step S103: the head pose and the eyeball action input to the three-dimensional sight line vector constructed in advance are detected Network obtains the three-dimensional direction of visual lines vector of eyeball in the facial image.

The head pose and the eyeball action input to the three-dimensional sight line vector constructed in advance are detected into network, obtained The three-dimensional sight line vector of eyeball in the facial image.

In order to reuse existing data set, the network in the present embodiment uses end-to-end structure, first builds respectively Initial head pose detection network and the eyeball motion detection network are found, it is then that the structure detection result of two parts network is defeated Enter the three-dimensional sight line vector obtained to a fully-connected network to the end, network is divided into Liang Ge branch, and upper element branches are for detecting Then head pose, lower part pass through the sight conversion layer of geometrical constraint, obtain camera coordinate system for detecting eyeball movement Sight three-dimensional vector.

Based on the above embodiment, in the present embodiment, in order to reuse collected face image data collection, this implementation Example uses end-to-end structure, first establishes the network of head pose detection and the network of eyeball motion detection respectively, then will The structure detection result of two parts network is input to a fully-connected network, obtains three-dimensional sight line vector to the end, and network is divided into Liang Ge branch, upper element branches are for detecting head pose, and lower part is for detecting eyeball movement, then by geometrical constraint Sight conversion layer obtains the sight three-dimensional vector of camera coordinate system.Referring to FIG. 2, Fig. 2 is reality provided by the present invention Lose face ball three-dimensional eye tracking method second of specific embodiment flow chart；Specific steps are as follows:

Step S201: the facial image of several data set providers is acquired using face battle array camera array, and records every width people The three-dimensional vector label of head pose and eyeball movement in face image, obtains the first subclass of facial image；

Step S202: side clockwise and anticlockwise is carried out respectively to the facial image in first subclass of facial image To rotation, obtain facial image second subset conjunction；

Step S203: merging first subclass of facial image and the facial image second subset closes to obtain the people Face image data set；

Step S204: network is detected to the initial head pose constructed in advance respectively using the face image data set It is trained with initial eyeball motion detection network, obtains target cranial attitude detection network and target eyeball detection network；

The basic network topology of the initial head pose detection network uses the structure of Alex Net, carries out phase to it The simplification and modification answered.The number of plies of network is constant, but each layer of port number has carried out reduction appropriate, while by local acknowledgement Normalization is changed to batch normalization, and activation primitive uses PReLU.The network structure of the initial head pose detection network is as follows: C (3,1,6)-BN-PReLU-P(2,2)-C(3,1,16)-BN-PReLU-P(2,2)-C(3,1,24)-BN-PReLU-C(3,1, 24)-PReLU(3,1,16)-BN-PReLU-P(2,2)-FC(256)-FC(128)-PReLU-FC(3)

Wherein, wherein C (k, s, c) indicates convolution kernel having a size of k, and convolution step-length is s, and port number is the convolutional layer of c, P (k, s) indicates core having a size of k, and step-length is the maximum value pond layer of s, and BN indicates batch normalization, and PReLU indicates activation primitive, FC (n) full articulamentum, neuron number n are indicated.

The eye areas that the input of the eyeball motion detection network is intercepted by the original image of facial image, divides left eye Its part will be described in detail below with right eye two parts since two parts network is full symmetric, by eyeball image block tune It is whole to arrive consistent size 36x36, then pass through convolutional neural networks and fully-connected network, the initial eyeball motion detection network knot Structure is as follows: C (11,2,96)-BN-PReLU-P (2,2)-C (5,1,256)-BN-PReLU-P (2,2)-C (3,1,384)-BN- PReLU-P(2,2)-C(1,1,64)-BN-PReLU-P(2,2)-FC(128)-FC(2)。

Step S205: using the target cranial attitude detection network and the target eyeball motion detection network to described Each width face that face image data is concentrated is detected, and head pose and the eyeball movement of each width facial image are obtained；

Step S206: it is acted using the head pose and eyeball of every width facial image in the face image data set defeated Enter to the initial three-dimensional sight line vector detection network constructed in advance and be trained, obtains the target three-dimensional sight line vector detection net Network；

(y, p, r) that the initial three-dimensional sight line vector detection network is obtained by the target cranial attitude detection network and Input of (θ, the φ) that the target eyeball motion detection network obtains as the initial three-dimensional sight line vector detection network, institute Stating initial three-dimensional sight line vector detection network is two layers of fully-connected network, and network first tier neuron number is 128, final layer mind It is 3 through first number, corresponding three-dimensional sight line vector.

When being trained to head pose detection network and the initial eyeball motion detection network, loss function Loss₁=Loss_h+Loss_eThe loss function of network is detected for the preliminary head poseIt is acted with the preliminary eyeball Detect network losses functionThe sum of.

When using being trained to the initial three-dimensional sight line vector detection network pre-established, current loss function Loss₂=Loss₁+Loss_g=Loss_h+Loss_e+Loss_gFor loss function Loss₁With the initial three-dimensional sight line vector detection Network losses functionThe sum of.

Loss_h=| | h-h^GT||₂, h={ y, p, r }

Loss_e=| | e-e^GT||₂, e={ φ, θ }

Loss_g=| | g_c-g_c ^GT||₂,g_c={ x, y, z }

Step S207: being input to the target cranial attitude detection network for facial image to be detected, obtain it is described to Head pose in the facial image of detection；

Step S208: the facial image to be detected is input to the target eyeball motion detection network, obtains institute State the eyeball movement of facial image to be detected；

Step S209: by the eyeball of the head pose of the facial image to be detected and the facial image to be detected Action input to the target three-dimensional sight line vector detects network, obtains the three-dimensional view of eyeball in the facial image to be detected Line direction vector.

The two dimension mark for only having carried out eyeball center in eyeball identification in the prior art, finally can only obtain eyeball Two-dimensional signal, so using being limited to, and method provided by the present embodiment is equally based on deep neural network, but this implementation Example has not only handled the action message of eyeball, has also carried out the prediction of head pose, while predicting eyeball three-dimensional sight line vector, To have higher level information, it may have better application value.Network training uses end-to-end in the present embodiment Substep training, in first step training process, can make full use of the data set and eyeball action data of existing head pose Collection allows depth network in the present embodiment to have better generalization ability to significantly increase trained data set.

Referring to FIG. 3, Fig. 3 is a kind of structure for the device for realizing eyeball three-dimensional eye tracking provided in an embodiment of the present invention Block diagram；Specific device may include:

Head pose detection module 100, for facial image to be detected to be input to the head pose constructed in advance inspection Survey grid network obtains the head pose in the facial image；

Eyeball motion detection module 200, for the facial image to be input to the eyeball motion detection net constructed in advance Network obtains the eyeball movement of the facial image；

Three-dimensional line-of-sight detection module 300, for by the head pose and the eyeball action input to constructing in advance Three-dimensional sight line vector detects network, obtains the three-dimensional direction of visual lines vector of eyeball in the facial image.

The present embodiment realization eyeball three-dimensional eye tracking device for realizing realization eyeball three-dimensional sight above-mentioned with The method of track, therefore the visible realization eyeball three hereinbefore of specific embodiment in the device of realization eyeball three-dimensional eye tracking The embodiment part of the method for eye tracking is tieed up, for example, head pose detection module 100, eyeball motion detection module 200, three Tie up line-of-sight detection module 300, be respectively used in the method for realizing above-mentioned realization eyeball three-dimensional eye tracking step S101, S102 and S103, so, specific embodiment is referred to the description of corresponding various pieces embodiment, and details are not described herein.

The specific embodiment of the invention additionally provides a kind of equipment for realizing eyeball three-dimensional eye tracking, comprising: memory is used In storage computer program；Processor realizes a kind of above-mentioned realization eyeball three-dimensional sight when for executing the computer program The step of method of tracking.

The specific embodiment of the invention additionally provides a kind of computer readable storage medium, the computer readable storage medium On be stored with computer program, the computer program realized when being executed by processor a kind of above-mentioned realization eyeball three-dimensional sight with The step of method of track.

Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with it is other The difference of embodiment, same or similar part may refer to each other between each embodiment.For being filled disclosed in embodiment For setting, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part Explanation.

Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.

The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.

It above can to method, apparatus, equipment and the computer provided by the present invention for realizing eyeball three-dimensional eye tracking Storage medium is read to be described in detail.Specific case used herein explains the principle of the present invention and embodiment It states, the above description of the embodiment is only used to help understand the method for the present invention and its core ideas.It should be pointed out that for this skill For the those of ordinary skill in art field, without departing from the principle of the present invention, several change can also be carried out to the present invention Into and modification, these improvements and modifications also fall within the scope of protection of the claims of the present invention.

Claims

1. a kind of method for realizing eyeball three-dimensional eye tracking characterized by comprising

Facial image to be detected is input to the head pose detection network constructed in advance, obtains the head in the facial image Portion's posture；

The facial image is input to the eyeball motion detection network constructed in advance, the eyeball for obtaining the facial image is dynamic Make；

The head pose and the eyeball action input to the three-dimensional sight line vector constructed in advance are detected into network, obtained described The three-dimensional direction of visual lines vector of eyeball in facial image.

2. the method as described in claim 1, which is characterized in that described be input to facial image to be detected constructs in advance Head pose detects network, includes: before obtaining the head pose in the facial image

The facial image of several three-dimensional labels with head pose and eyeball sight is acquired, face image data collection is constructed, In, the facial image is RGB image；

Using the face image data collection respectively to the initial head pose detection network and the initial eyeball movement inspection Survey grid network is trained, and obtains the head pose detection network for completing training and the eyeball motion detection network.

3. method according to claim 2, which is characterized in that it is described acquisition several with head pose and eyeball sight three The facial image of dimension label, building face image data collection include:

The facial image of data set provider is acquired respectively using each camera in the battle array camera array of face, obtains facial image the One subclass；

Every row camera collects several facial images in the face battle array camera array, indicates the data set provider in the side y To different head poses；

Several collected facial images of each column camera in the face battle array camera array, indicate the data set provider in p The different head pose in direction；

Rotation clockwise and counterclockwise is carried out respectively to the face battle array collected facial image of camera array, It obtains indicating that the data set provider is closed in the facial image second subset of the different head pose in the direction r；

Merge first subclass of facial image and the facial image second subset closes to obtain the face image data collection.

4. method as claimed in claim 3, which is characterized in that described to be distinguished using each camera in the battle array camera array of face Acquisition data set provider facial image include:

When acquiring every width facial image, the dynamic point on the display screen that the data set provider eyeball is faced is recorded, thus really The three-dimensional vector label of the fixed data set provider eyeball sight, and the head pose in every width facial image is recorded simultaneously.

5. method according to claim 2, which is characterized in that the initial head pose of building detects network and includes:

Using Alex NET model as basic structure, the initial head detection network is constructed, the preliminary head detection network Network structure are as follows:

C(3,1,6)-BN-PReLU-P(2,2)-C(3,1,16)-BN-PReLU-P(2,2)-C(3,1,24)-BN-PReLU-C (3,1,24)-PReLU(3,1,16)-BN-PReLU-P(2,2)-FC(256)-FC(128)-PReLU-FC(3)；

Wherein, C (k, s, c) indicates convolution kernel having a size of k, and convolution step-length is s, and port number is the convolutional layer of c, and P (k, s) indicates core Having a size of k, step-length is the maximum value pond layer of s, and BN indicates batch normalization, and PReLU indicates activation primitive, and FC (n) expression connects entirely Meet layer, neuron number n.

6. method according to claim 2, which is characterized in that described to utilize the face image data collection respectively to described first Beginning head pose detection network and the initial eyeball motion detection network are trained and include:

Using the face image data collection to the head pose detect network and the initial eyeball motion detection network into Row training；

Wherein, loss function Loss₁=Loss_h+Loss_eThe loss function Loss of network is detected for the preliminary head pose_hWith The preliminary eyeball motion detection network losses function Loss_eThe sum of.

7. method as claimed in claim 6, which is characterized in that it is described by the head pose and the eyeball action input extremely The three-dimensional sight line vector detection network constructed in advance, obtains wrapping before the three-dimensional direction of visual lines vector of eyeball in the facial image It includes:

Network and the eyeball motion detection network are detected respectively in the human face data set using the head pose Facial image is detected, and head pose and the eyeball movement of every width facial image are obtained；

The initial three-dimensional sight line vector pre-established is detected using head pose and the eyeball movement of each width facial image Network is trained, to obtain completing the three-dimensional sight line vector detection network of training；

Current loss function Loss₂=Loss₁+Loss_g=Loss_h+Loss_e+Loss_gFor loss function Loss₁With described initial three It ties up sight line vector and detects network losses function Loss_gThe sum of.

8. a kind of device for realizing eyeball three-dimensional eye tracking characterized by comprising

Head pose detection module detects network for facial image to be detected to be input to the head pose constructed in advance, Obtain the head pose in the facial image；

Eyeball motion detection module is obtained for the facial image to be input to the eyeball motion detection network constructed in advance The eyeball of the facial image acts；

Three-dimensional line-of-sight detection module, for by the head pose and the eyeball action input to the three-dimensional sight constructed in advance Vector detection network obtains the three-dimensional direction of visual lines vector of eyeball in the facial image.

9. a kind of equipment for realizing eyeball three-dimensional eye tracking characterized by comprising

Memory, for storing computer program；

Processor realizes a kind of realization eyeball three as described in any one of claim 1 to 7 when for executing the computer program The step of tieing up the method for eye tracking.

10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes that a kind of realization eyeball is three-dimensional as described in any one of claim 1 to 7 when the computer program is executed by processor The step of method of eye tracking.