CN114816060A - User fixation point estimation and precision evaluation method based on visual tracking - Google Patents

User fixation point estimation and precision evaluation method based on visual tracking Download PDF

Info

Publication number
CN114816060A
CN114816060A CN202210432536.5A CN202210432536A CN114816060A CN 114816060 A CN114816060 A CN 114816060A CN 202210432536 A CN202210432536 A CN 202210432536A CN 114816060 A CN114816060 A CN 114816060A
Authority
CN
China
Prior art keywords
user
eye
eye movement
fixation point
offset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210432536.5A
Other languages
Chinese (zh)
Inventor
闫野
谢良
胡薇
印二威
张敬
张亚坤
罗治国
艾勇保
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Defense Technology Innovation Institute PLA Academy of Military Science
Original Assignee
National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Defense Technology Innovation Institute PLA Academy of Military Science filed Critical National Defense Technology Innovation Institute PLA Academy of Military Science
Priority to CN202210432536.5A priority Critical patent/CN114816060A/en
Publication of CN114816060A publication Critical patent/CN114816060A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Ophthalmology & Optometry (AREA)
  • Eye Examination Apparatus (AREA)

Abstract

The invention discloses a user fixation point estimation method based on visual tracking.A user wears head-mounted eye movement interaction equipment, a fixation point extraction module is utilized to obtain the fixation point coordinates of the user, a residual error between the eye movement offset of the user and the fixation point coordinates is calculated through a residual error estimation module, and the obtained residual error is sent to an offset self-adaption module to update the fixation point coordinates to obtain the final fixation point estimation value of the user. The invention also discloses a method for evaluating the precision of the user fixation point estimation method, wherein a user wears the head-wearing eye movement interaction equipment, sequentially gazes at the precision test points displayed on the display interface, acquires the fixation point coordinates of the user, and calculates and acquires the eye movement precision; and averaging the eye movement precision values of all the sub-regions to obtain a final eye movement precision evaluation value. The invention can realize the extraction of the eye movement offset of the self-adaptive individual difference, and has very important significance for improving the robustness and generalization capability of the eye movement algorithm.

Description

User fixation point estimation and precision evaluation method based on visual tracking
Technical Field
The invention relates to the field of digital image processing, in particular to a user fixation point estimation and precision evaluation method based on visual tracking.
Background
Eyeball tracking, also called gaze tracking technology, is a technology for estimating gaze and fixation point coordinates by extracting eyeball motion-related parameters. With the continuous development of the eyeball tracking technology, the application scenes of the technology in the fields of human-computer interaction, behavior analysis and the like are also continuously enriched.
Since the eyeball tracking technology is to collect the human eye movement physiological signals for processing and analyzing through the head-mounted eye movement device, the technology is strongly related to the different physiological structures of the eyes of each user and the personalized use habits of the device. On one hand, because the eye movement signal is a physiological signal of a human body, different individuals have physiological structure differences, and therefore the eye movement tracking algorithm needs to be capable of self-adapting to the physiological structure differences of the individuals to effectively ensure the eye movement precision. On the other hand, the existing head-mounted eye movement equipment irradiates eyeballs through an infrared light source based on an optical recording method, and adopts a high-speed eye pattern camera to acquire near-eye pictures. Image acquisition equipment sets up in the place ahead that is close to the eye, and image acquisition equipment has certain contained angle with the horizontal direction at eye place, and different testees have different use habits when using same eye movement tracking equipment, and this eye movement parameter that can direct influence eye movement equipment and gather to can cause the fluctuation by a wide margin of eye movement precision. In addition, for a user wearing glasses, wearing glasses causes the distance of the exit pupil to change and the eyeball may be shielded to a certain degree, and the precision deviation of the inference result of the eye movement algorithm is also influenced.
Disclosure of Invention
The invention discloses a user fixation point estimation and precision evaluation method based on visual tracking, aiming at the problem that the existing eyeball tracking technology is strongly related to the eye difference physiological structure of each user and the personalized use habit of equipment.
The invention discloses a user fixation point estimation method based on visual tracking, which specifically comprises the following steps:
the user wears the head-wearing eye movement interaction equipment, the fixation point coordinate of the user is obtained by the fixation point extraction module, the residual error between the eye movement offset of the user and the fixation point coordinate is calculated by the residual error estimation module, and then the obtained residual error is sent to the offset self-adaption module to update the fixation point coordinate, so that the final fixation point estimation value of the user is obtained.
The gaze point extraction module is realized by a deep learning artificial neural network formed by superposing a plurality of deep convolution neural layers on a plurality of expansion convolution layers, the binocular pictures of the user collected by the head-mounted eye movement interaction equipment are used as the input of the module, and the output of the module is the extracted gaze point coordinate values of the user.
The method comprises the steps of obtaining the fixation point coordinates of a user by using a fixation point extraction module, firstly constructing a sample data set, then constructing a deep learning artificial neural network, training and testing the deep learning artificial neural network, using the trained deep learning artificial neural network as a fixation point extraction model, and obtaining the fixation point coordinates of the user by using the fixation point extraction model.
The construction of the sample data set requires that a plurality of users wear the head-wearing eye movement interaction equipment, the users watch the target anchor points which continuously move in the display interface of the equipment, the target anchor points sequentially move to the positions of pixel points of each row and each column of the display interface in a snake-shaped traversing mode, the target anchor points change more than three different moving speeds in the moving process, the head-wearing eye movement interaction equipment collects eye images of the target anchor points which continuously move watched by the users, and one round of sample data extraction is completed after one snake-shaped traversing of the target anchor points is completed. In each round of sample data extraction process, when a near-eye high-speed camera carried on the head-mounted eye movement interaction equipment traverses each pixel point position on a display interface in a snake shape, the binocular image of the user and the position coordinate value of the target anchor point watched by the binocular image of the user at the moment are stored, and the binocular image of the user and the position coordinate value of the target anchor point watched by the binocular image of the user are used as the sample and the label of the sample data set, so that the construction of the sample data set is completed.
The method comprises the steps of constructing a deep learning artificial neural network, firstly, extracting features of left and right eye diagrams of a user binocular picture by adopting a deep convolution neural layer, wherein the convolution kernel size of each convolution layer in the deep convolution neural layer is 3 multiplied by 3, and the convolution step length is 2. Three layers of expansion convolutional layers are superposed behind the deep convolutional neural layer, the convolutional kernel size of the first layer of expansion convolutional layer is 3 multiplied by 3, the expansion rate is (1, 2), the convolutional kernel size of the second layer of expansion convolutional layer is 3 multiplied by 3, the expansion rate is (2, 3), the convolutional kernel size of the third layer of expansion convolutional layer is 3 multiplied by 3, the expansion rate is (4, 5), and the convolution step length of the three layers of expansion convolutional layers is 1. And performing inactivation treatment on the final output of the expansion convolutional layer, so that the parameter quantity of the deep learning artificial neural network is controlled, the real-time performance of the deep learning artificial neural network inference is ensured, and the parameters of the deep learning artificial neural network are normalized before the activation treatment by using the ReLU as an activation function.
The established deep learning artificial neural network is trained and tested, the standard processing of size and pixel distribution is carried out on the sample data set, the resolution of the user binocular picture of the sample data set is reduced to a set value, all pixel values of the user binocular picture of the sample data set are divided by 256, the pixel values are distributed between 0 and 1, therefore the normalization of the pixel values is realized, and then the standard distribution processing is carried out on all pixel value data of the user binocular picture of the sample data set by taking 0.5 as a mean value and 0.5 as a variance. The method comprises the steps of converting data after standardized distribution processing into tensor data by using a PyTorch framework, using the tensor data as input of the deep learning artificial neural network, updating parameters of the network by using a random gradient descent algorithm, optimizing the parameters of the network by using an Adam function, dividing a sample data set into a training set and a testing set by using a cross-validation method according to a data volume proportion of 7:3, using an L1 norm loss function as a loss function of the network, and using the Adam function as an optimizer when the network is trained. The set deep learning artificial neural network is subjected to iterative training, a group of network parameters with the best training result are taken as final parameters obtained by training the deep learning artificial neural network, and therefore training of the deep learning artificial neural network is completed.
After the user gazes the offset extraction identifier, the residual error estimation module extracts the eye movement offset of the user and calculates the residual error between the eye movement offset of the user and the gazing point coordinate by using a first-order difference function.
The residual estimation module establishes a two-dimensional plane rectangular coordinate system in a display interface of the head-mounted eye movement interaction equipment, and displays an offset extraction identifier at the central position of the display interface, wherein the position coordinate of the offset extraction identifier is (x) 0 ,y 0 ) The offset extraction flag is still picture or animation. The user wearing the head-wearing eye movement interaction equipment gazes at the offset extraction identifier in the display interface of the user, and the real-time fixation point coordinate of the user extracted by the fixation point extraction module in the display interface appearing at the ith time is (x) gi ,y gi ) As the eye movement offset of the user.
The frame rate of a display interface of the head-mounted eye movement interaction equipment is 30fps, the time for displaying the offset extraction identification is set to be one second, and a first-order difference function is used for calculating the residual error between the eye movement offset of the user and the fixation point coordinate to be [ x ] d ,y d ]The calculation formula thereofComprises the following steps:
Figure BDA0003611491020000041
wherein i is an integer from 0 to 29.
The user's eye movement offset includes user's use habit offset and the contained angle of user's eye visual axis and eye optical axis, and user's use habit offset is a fixed value, the contained angle of user's eye visual axis and eye optical axis, its estimation process includes, eye visual axis is the offset and draws the line of sign to eye macula lutea fovea, and eye optical axis is the line of eye pupil center to eye retina center. The coordinate of the center position of the pupil of the eye is represented by P, the coordinate of the center of curvature of the cornea of the eye is represented by C, the direction vector of the visual axis of the eye is represented by V, the direction vector from the center of the pupil of the eye to the offset extraction mark, namely the direction vector of the visual axis of the eye, is represented by U, the included angle between the visual axis of the eye and the optical axis of the eye is represented by e, the direction vector of the optical axis of the eye is represented by W, and the calculation formula is as follows:
Figure BDA0003611491020000042
wherein the content of the first and second substances,
Figure BDA0003611491020000043
the direction vector of the optical axis of the eye is represented, and the calculation formula of the direction vector of the visual axis of the eye is as follows:
Figure BDA0003611491020000051
wherein (α, β) represents a deviation correction amount of a direction vector of an eye optical axis, and a calculation formula of the eye visual axis direction vector is as follows:
Figure BDA0003611491020000052
wherein T represents the position coordinate of the offset extraction mark, and the calculation formula of the included angle e between the optical axis of the eye and the visual axis of the eye is e ═arccos θ (U, V), completing the estimation of the angle between the eye's visual axis and the eye's optical axis.
The offset self-adapting module corrects the user fixation point coordinate acquired by the fixation point extracting module by using the residual error calculated by the residual error estimating module, and takes the corrected user fixation point coordinate as the final user fixation point estimated value.
The offset self-adaption module is realized through a deep learning artificial neural network, the deep learning artificial neural network comprises a plurality of deep convolutional neural layers, a plurality of expansion convolutional layers, an offset prediction branch and a full-connection layer, the four parts are connected in sequence, the deep convolutional neural layers and the expansion convolutional layers respectively adopt the same structures as the deep convolutional neural layers and the expansion convolutional layers in the point of regard extraction module, and a loss function L used in the training process of the deep learning artificial neural network 1new Is expressed as L 1new =L 1 + λ | b |, where L 1 For the L1 norm loss function used in the gaze point extraction module, λ | b | is a regularization term for adjusting the adaptive capability of the network, where λ is the adjustment coefficient and b is the residual extracted in the gaze point extraction module.
And calculating the residual error between the eye movement offset of the user and the fixation point coordinate through a residual error estimation module, and sending the obtained residual error into an offset self-adaption module to update the fixation point coordinate to obtain a final fixation point estimation value of the user.
The invention also discloses a method for evaluating the precision of the user fixation point estimation method, which specifically comprises the steps of rendering a plurality of precision test points on a display interface of the head-mounted eye movement interaction equipment in sequence according to preset positions, controlling the precision test points to be sequentially hidden after being sequentially displayed on the display interface according to a certain time sequence, and only displaying one precision test point at each moment;
the user wears the head-wearing eye movement interaction equipment, sequentially gazes at the precision test points displayed on the display interface, acquires the fixation point coordinates of the user by using the fixation point extraction module, and calculates and acquires the eye movement precision;
and calculating a plurality of eye movement precision values for each precision test point, and averaging the eye movement precision values to obtain the eye movement precision value of the precision test point. Dividing a display interface into a plurality of sub-regions, respectively setting a plurality of precision test points on each sub-region, respectively calculating and evaluating eye movement precision aiming at different sub-regions, averaging eye movement precision values obtained by all the precision test points in each sub-region to be used as the eye movement precision value of the sub-region, and averaging the eye movement precision values of all the sub-regions to obtain a final evaluation value of the eye movement precision.
The eye movement precision is used for reflecting the precision of user fixation point extraction and the concentration degree of user attention, the eye movement precision is obtained by calculating an angle deviation delta between a precision test point and a user fixation point coordinate obtained by the fixation point extraction module, and a calculation formula of the angle deviation delta is as follows:
Figure BDA0003611491020000061
wherein (x, y) represents the position coordinates of the precision test points,
Figure BDA0003611491020000062
the user fixation point coordinates acquired by the fixation point extraction module are represented, Z represents the virtual screen depth of the display interface of the head-mounted eye movement interaction equipment, and W and H respectively represent the number of pixel points in the horizontal direction and the vertical direction of the display interface.
The invention has the beneficial effects that:
the invention is suitable for avoiding the complicated process of multi-point calibration and long waiting in the process of experiencing eye movement interaction by different users, and can also avoid a calibration page independently designed by developers for the purpose, thereby saving the storage and calculation resources of eye movement equipment. The eye movement offset extraction in the invention realizes the extraction of the eye movement offset of self-adaptive individual difference through the preset animation in the gaze interaction scene of the user, and the offset directly influences the precision of the gaze point deduced by the eye movement algorithm, thereby having very important significance for improving the robustness and generalization capability of the eye movement algorithm.
According to the method, the offset is rendered in the VR eye movement equipment according to the preset target position information to extract the identification, the offset is extracted to calculate the residual error by means of the fact that the user stares at the identification for a short time, and then the residual error is fed back to the algorithm to carry out reasoning compensation, so that the eye movement precision is effectively improved, and different users can use the eye movement equipment more conveniently and efficiently.
Drawings
FIG. 1 is a flow chart of a method for estimating a user's gaze point based on visual tracking according to the present invention;
FIG. 2 is a schematic diagram of an offset extraction flag according to the present invention;
FIG. 3 is a schematic diagram of an angle between an optical axis of an eye and a visual axis according to the present invention;
FIG. 4 is a flowchart illustrating the accuracy evaluation of the user gaze point estimation method of the present invention;
FIG. 5 is a diagram illustrating an offset adaptation module according to the present invention.
Detailed Description
For a better understanding of the present disclosure, an example is given here.
The invention discloses a user fixation point estimation and precision evaluation method based on visual tracking, aiming at the problem that the existing eyeball tracking technology is strongly related to the eye difference physiological structure of each user and the personalized use habit of equipment.
FIG. 1 is a flow chart of a method for estimating a user's gaze point based on visual tracking according to the present invention; FIG. 2 is a schematic diagram of an offset extraction flag according to the present invention; FIG. 3 is a schematic diagram of an angle between an optical axis of an eye and a visual axis according to the present invention; FIG. 4 is a flowchart illustrating the accuracy evaluation of the user gaze point estimation method of the present invention; FIG. 5 is a diagram illustrating an offset adaptation module according to the present invention.
The invention discloses a user fixation point estimation method based on visual tracking, which specifically comprises the following steps:
the user wears the head-wearing eye movement interaction equipment, the fixation point coordinate of the user is obtained by the fixation point extraction module, the residual error between the eye movement offset of the user and the fixation point coordinate is calculated by the residual error estimation module, and then the obtained residual error is sent to the offset self-adaption module to update the fixation point coordinate, so that the final fixation point estimation value of the user is obtained.
The gaze point extraction module considers that the extracted gaze point has a use value in a real scene, the real-time performance of gaze point extraction needs to be guaranteed, and meanwhile, the gaze point extraction module has good gaze point precision, so that a deep learning mode is used for realizing the gaze point extraction process, the gaze point extraction process is realized by a deep learning artificial neural network formed by overlapping a plurality of deep convolutional neural layers with a plurality of expansion convolutional layers, a user binocular pictures acquired by the head-mounted eye movement interaction device are used as the input of the module, and the output of the module is the extracted gaze point coordinate value of the user.
The method comprises the steps of obtaining the fixation point coordinates of a user by using a fixation point extraction module, firstly constructing a sample data set, then constructing a deep learning artificial neural network, training and testing the deep learning artificial neural network, using the trained deep learning artificial neural network as a fixation point extraction model, and obtaining the fixation point coordinates of the user by using the fixation point extraction model.
The method comprises the steps that a sample data set is constructed, the influence of the physiological feature difference of eyes of different users, the personalized use habit of the head-mounted eye movement interaction equipment and other reasons on the performance of a model crossing a test is considered, a plurality of users need to wear the head-mounted eye movement interaction equipment, the users watch constantly moving target anchor points in a display interface of the equipment, the target anchor points sequentially move to the positions of pixel points of each row and each column of the display interface in a snake-shaped traversing mode, the target anchor points change three or more different moving speeds in the moving process, the head-mounted eye movement interaction equipment collects eye images of the target anchor points watched by the users, one round of sample data extraction is completed after one snake-shaped traversing is completed for the target anchor points, and more than 10 rounds of sample data extraction are needed for each user. In each round of sample data extraction process, when a near-eye high-speed camera carried on the head-mounted eye movement interaction equipment traverses a target anchor point to each pixel point position on a display interface in a snake shape, the binocular image of the user and the position coordinate value of the target anchor point watched by the binocular image at the moment are stored, the resolution ratio of the binocular image of the user is 640 x 400, the position coordinate value is an x value and a y value under a two-dimensional plane rectangular coordinate system, and the binocular image of the user and the position coordinate value of the target anchor point watched by the binocular image of the user are used as a sample and a label of the sample data set, so that the construction of the sample data set is completed.
The method comprises the steps of constructing a deep learning artificial neural network, firstly, extracting features of left and right eye diagrams of a user binocular picture by adopting a deep convolution neural layer, wherein the convolution kernel size of each convolution layer in the deep convolution neural layer is 3 multiplied by 3, and the convolution step length is 2. Because the expanded convolution has a larger receptive field and can improve the efficiency of feature extraction, three layers of expanded convolution layers are superposed behind the deep convolutional neural layer, the convolution kernel size of the first layer of expanded convolution layer is 3 x 3, the expansion rate is (1, 2), the convolution kernel size of the second layer of expanded convolution layer is 3 x 3, the expansion rate is (2, 3), the convolution kernel size of the third layer of expanded convolution layer is 3 x 3, the expansion rate is (4, 5), and the convolution step length of the three layers of expanded convolution layers is 1. And (3) inactivating the final output of the expansion convolutional layer, wherein the value of a corresponding dropout function is 0.1, so that the parameter quantity of the deep learning artificial neural network is controlled, the real-time performance of the deep learning artificial neural network inference is ensured, ReLU is used as an activation function, and the parameters of the deep learning artificial neural network are normalized before activation.
The built deep learning artificial neural network is trained and tested, considering that the real-time processing speed of the model is influenced by the overlarge parameter amount, and meanwhile, in order to improve the effective extraction of the convolutional neural network on the sample characteristics, the sample data set is subjected to the standardized processing of the size and the pixel distribution, the resolution of the user binocular picture of the sample data set is reduced to a set value, for example, the resolution is reduced to 128 × 192 from 640 × 400, all the pixel values of the user binocular picture of the sample data set are divided by 256, the pixel values are distributed between 0 and 1, the normalization of the pixel values is realized, and then the standardized distribution processing is performed on all the pixel value data of the user binocular picture of the sample data set by taking 0.5 as a mean value and 0.5 as a variance. The method comprises the steps of converting data after standardized distribution processing into tensor data by using a PyTorch framework, using the tensor data as input of the deep learning artificial neural network, updating parameters of the network by using a random gradient descent algorithm, optimizing the parameters of the network by using an Adam function, dividing a sample data set into a training set and a testing set by using a cross-validation method according to a data volume proportion of 7:3, using an L1 norm loss function as a loss function of the network, setting an epoch value of training times to be 64 when the network is trained, and using the Adam function as an optimizer. The initial learning rate was 1.0e-3, with 1/10 drops every 35 epochs. The model was trained for a total of 100 epochs. The set deep learning artificial neural network is subjected to iterative training, a group of network parameters with the best training result are taken as final parameters obtained by training the deep learning artificial neural network, and therefore training of the deep learning artificial neural network is completed.
The gaze point coordinates extracted by the gaze point extraction module can generate a relatively consistent personalized difference offset due to the eye physiological differences of different users and different use habits of the head-mounted eye movement interaction device, and the offset can have a large influence on the precision of the gaze point coordinates. After the user gazing offset extraction identifier is extracted by the residual error estimation module, the module extracts the eye movement offset of the user and calculates the residual error between the eye movement offset of the user and the gazing point coordinate by using a first-order difference function.
The residual estimation module establishes a two-dimensional plane rectangular coordinate system in a display interface of the head-mounted eye movement interaction equipment, and displays an offset extraction identifier at the central position of the display interface, wherein the position coordinate of the offset extraction identifier is (x) 0 ,y 0 ) And the offset extraction identifier is a static picture or animation. The user wearing the head-wearing eye movement interaction equipment gazes at the offset extraction identifier in the display interface of the user, and the real-time fixation point coordinate of the user extracted by the fixation point extraction module in the display interface appearing at the ith time is (x) gi ,y gi ) As the eye movement offset of the user.
The frame rate of a display interface of the head-mounted eye movement interaction equipment is 30fps, and the time for displaying the offset extraction identification is extractedSetting the difference as one second, and calculating the residual error between the eye movement offset of the user and the fixation point coordinate as x by using a first-order difference function d ,y d ]The calculation formula is as follows:
Figure BDA0003611491020000101
where i is an integer from 0 to 29.
The user's eye movement offset includes user's use habit offset and the contained angle of user's eye visual axis and eye optical axis, and user's use habit offset is a fixed value, the contained angle of user's eye visual axis and eye optical axis, its estimation process includes, eye visual axis is the offset and draws the line of sign to eye macula lutea fovea, and eye optical axis is the line of eye pupil center to eye retina center. The coordinate of the center position of the pupil of the eye is represented by P, the coordinate of the center of curvature of the cornea of the eye is represented by C, the direction vector of the visual axis of the eye is represented by V, the direction vector from the center of the pupil of the eye to the offset extraction mark, namely the direction vector of the visual axis of the eye, is represented by U, the included angle between the visual axis of the eye and the optical axis of the eye is represented by e, the direction vector of the optical axis of the eye is represented by W, and the calculation formula is as follows:
Figure BDA0003611491020000111
wherein the content of the first and second substances,
Figure BDA0003611491020000112
the direction vector of the optical axis of the eye is represented, and the calculation formula of the direction vector of the visual axis of the eye is as follows:
Figure BDA0003611491020000113
wherein (α, β) represents a deviation correction amount of a direction vector of an eye optical axis, and a calculation formula of the eye visual axis direction vector is as follows:
Figure BDA0003611491020000114
and T represents the position coordinates of the offset extraction marker, and the calculation formula of the included angle e between the optical axis of the eye and the visual axis of the eye is arccos theta (U, V), so that the estimation of the included angle between the visual axis of the eye and the optical axis of the eye is finished.
The offset adaptive module corrects the user gaze point coordinates obtained by the gaze point extraction module using the residual calculated by the residual estimation module as shown in fig. 5, and uses the corrected user gaze point coordinates as the final user gaze point estimation value.
The offset self-adaption module is realized through a deep learning artificial neural network, the deep learning artificial neural network comprises a plurality of deep convolutional neural layers, a plurality of expansion convolutional layers, an offset prediction branch and a full-connection layer, the four parts are connected in sequence, the deep convolutional neural layers and the expansion convolutional layers respectively adopt the same structures as the deep convolutional neural layers and the expansion convolutional layers in the point of regard extraction module, and a loss function L used in the training process of the deep learning artificial neural network 1new Is expressed as L 1new =L 1 + λ | b |, where L 1 For the L1 norm loss function used in the gaze point extraction module, λ | b | is a regularization term for adjusting the adaptive capability of the network, where λ is the adjustment coefficient and b is the residual extracted in the gaze point extraction module. When the lambda is larger, the influence of the introduction of the residual on the model is larger, when the lambda is smaller, the influence of the introduction of the residual on the model is smaller, when the lambda is set to be 0, the model is degraded into a backbone model used in the fixation point extraction module, and experiments prove that the output with higher fixation point precision can be obtained by setting the lambda to be 0.01 in the offset self-adaption module.
And calculating the residual error between the eye movement offset of the user and the fixation point coordinate through a residual error estimation module, and sending the obtained residual error into an offset self-adaption module to update the fixation point coordinate to obtain a final fixation point estimation value of the user.
The invention also discloses a method for evaluating the precision of the user fixation point estimation method, which specifically comprises the steps of rendering a plurality of precision test points on a display interface of the head-mounted eye movement interaction equipment in sequence according to preset positions, controlling the precision test points to be sequentially hidden after being sequentially displayed on the display interface according to a certain time sequence, and only displaying one precision test point at each moment;
the user wears the head-wearing eye movement interaction equipment, sequentially gazes at the precision test points displayed on the display interface, acquires the fixation point coordinates of the user by using the fixation point extraction module, and calculates and acquires the eye movement precision;
and calculating a plurality of eye movement precision values for each precision test point, and averaging the eye movement precision values to obtain the eye movement precision values of the precision test points. Dividing a display interface into a plurality of sub-regions, respectively setting a plurality of precision test points on each sub-region, respectively calculating and evaluating eye movement precision aiming at different sub-regions, averaging eye movement precision values obtained by all the precision test points in each sub-region to be used as the eye movement precision value of the sub-region, and averaging the eye movement precision values of all the sub-regions to obtain a final evaluation value of the eye movement precision.
The eye movement precision is used for reflecting the precision of user fixation point extraction and the concentration degree of user attention, the eye movement precision is obtained by calculating an angle deviation delta between a precision test point and a user fixation point coordinate obtained by the fixation point extraction module, the smaller the angle deviation is, the higher the precision is, and the calculation formula of the angle deviation delta is as follows:
Figure BDA0003611491020000121
wherein (x, y) represents the position coordinates of the precision test points,
Figure BDA0003611491020000122
the user fixation point coordinates acquired by the fixation point extraction module are represented, Z represents the virtual screen depth of the display interface of the head-mounted eye movement interaction equipment, and W and H respectively represent the number of pixel points in the horizontal direction and the vertical direction of the display interface.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A user fixation point estimation method based on visual tracking is characterized by specifically comprising the following steps:
the user wears the head-wearing eye movement interaction equipment, the fixation point coordinate of the user is obtained by the fixation point extraction module, the residual error between the eye movement offset of the user and the fixation point coordinate is calculated by the residual error estimation module, and then the obtained residual error is sent to the offset self-adaption module to update the fixation point coordinate, so that the final fixation point estimation value of the user is obtained.
2. The visual tracking-based user gaze point estimation method of claim 1, comprising in particular:
the gaze point extraction module is realized by a deep learning artificial neural network formed by superposing a plurality of deep convolutional neural layers on a plurality of expansion convolutional layers, the binocular pictures of the user, which are acquired by the head-mounted eye movement interaction equipment, are used as the input of the module, and the output of the module is the extracted gaze point coordinate values of the user;
the method comprises the steps of obtaining the fixation point coordinates of a user by using a fixation point extraction module, firstly constructing a sample data set, then constructing a deep learning artificial neural network, training and testing the deep learning artificial neural network, using the trained deep learning artificial neural network as a fixation point extraction model, and obtaining the fixation point coordinates of the user by using the fixation point extraction model.
3. The visual tracking-based user gaze point estimation method of claim 2, comprising in particular:
the method comprises the following steps that a sample data set is constructed, a plurality of users need to wear head-mounted eye movement interaction equipment, the users watch target anchor points which continuously move in a display interface of the equipment, the target anchor points sequentially move to the positions of pixel points of each row and each column of the display interface in a snake-shaped traversal mode, the target anchor points change more than three different moving speeds in the moving process, the head-mounted eye movement interaction equipment collects eye images of the target anchor points which continuously move and watched by the users, and one round of sample data extraction is completed after one-time snake-shaped traversal of the target anchor points is completed; in each round of sample data extraction process, when a near-eye high-speed camera carried on the head-mounted eye movement interaction equipment traverses each pixel point position on a display interface in a snake shape, the binocular image of the user and the position coordinate value of the target anchor point watched by the binocular image of the user at the moment are stored, and the binocular image of the user and the position coordinate value of the target anchor point watched by the binocular image of the user are used as the sample and the label of the sample data set, so that the construction of the sample data set is completed.
4. The visual tracking-based user gaze point estimation method of claim 2, comprising in particular:
firstly, extracting the characteristics of left and right eye diagrams of a user binocular picture by adopting a deep convolution neural layer, wherein the convolution kernel size of each convolution layer in the deep convolution neural layer is 3 multiplied by 3, and the convolution step length is 2; superposing three layers of expanded convolutional layers behind a deep convolutional neural layer, wherein the convolutional kernel size of the first layer of expanded convolutional layer is 3 multiplied by 3, the expansion rate is (1, 2), the convolutional kernel size of the second layer of expanded convolutional layer is 3 multiplied by 3, the expansion rate is (2, 3), the convolutional kernel size of the third layer of expanded convolutional layer is 3 multiplied by 3, the expansion rate is (4, 5), and the convolution step length of each three layer of expanded convolutional layer is 1; and performing inactivation treatment on the final output of the expansion convolutional layer, so that the parameter quantity of the deep learning artificial neural network is controlled, the real-time performance of the deep learning artificial neural network inference is ensured, and the parameters of the deep learning artificial neural network are normalized before the activation treatment by using the ReLU as an activation function.
5. The visual tracking-based user gaze point estimation method of claim 2, comprising in particular:
the built deep learning artificial neural network is trained and tested, the standard processing of size and pixel distribution is carried out on the sample data set, the resolution of the user binocular picture of the sample data set is reduced to a set value, all pixel values of the user binocular picture of the sample data set are divided by 256, the pixel values are distributed between 0 and 1, therefore the normalization of the pixel values is realized, and then the standard distribution processing is carried out on all pixel value data of the user binocular picture of the sample data set by taking 0.5 as a mean value and 0.5 as a variance; converting data after standardized distribution processing into tensor data by using a PyTorch frame, using the tensor data as input of the deep learning artificial neural network, updating parameters of the network by using a random gradient descent algorithm, optimizing the parameters of the network by using an Adam function, dividing a sample data set into a training set and a testing set by using a cross-validation method according to a data volume ratio of 7:3, using an L1 norm loss function as a loss function of the network, and using the Adam function as an optimizer when the network is trained; the set deep learning artificial neural network is subjected to iterative training, a group of network parameters with the best training result are taken as final parameters obtained by training the deep learning artificial neural network, and therefore training of the deep learning artificial neural network is completed.
6. The visual tracking-based user gaze point estimation method of claim 1, comprising in particular:
after the user gazes the offset extraction identifier, the residual error estimation module extracts the eye movement offset of the user and calculates the residual error between the eye movement offset of the user and the gazing point coordinate by using a first-order difference function.
7. The visual tracking-based user gaze point estimation method of claim 6, comprising in particular:
the residual error estimation module establishes two-dimensional plane straightness in a display interface of the head-mounted eye movement interaction equipmentAn angular coordinate system for displaying the offset extraction mark at the central position of the display interface, and the position coordinate is (x) 0 ,y 0 ) The offset extraction identifier is a static picture or animation; the user wearing the head-wearing eye movement interaction equipment gazes at the offset extraction identifier in the display interface of the user, and the real-time fixation point coordinate of the user extracted by the fixation point extraction module in the display interface appearing at the ith time is (x) gi ,y gi ) As the eye movement offset of the user;
the frame rate of a display interface of the head-mounted eye movement interaction equipment is 30fps, the time for displaying the offset extraction identification is set to be one second, and a first-order difference function is used for calculating the residual error between the eye movement offset of the user and the fixation point coordinate to be [ x ] d ,y d ]The calculation formula is as follows:
Figure FDA0003611491010000031
wherein i is an integer from 0 to 29;
the estimation process comprises the steps that the eye movement offset of a user comprises a user use habit offset and an included angle between a user eye visual axis and an eye optical axis, the user use habit offset is a fixed value, the included angle between the user eye visual axis and the eye optical axis is a connecting line of an offset extraction mark to a central fovea of macula lutea of an eye, and the eye optical axis is a connecting line of a pupil center of the eye and a retina center of the eye; the coordinate of the center position of the pupil of the eye is represented by P, the coordinate of the center of curvature of the cornea of the eye is represented by C, the direction vector of the visual axis of the eye is represented by V, the direction vector from the center of the pupil of the eye to the offset extraction mark, namely the direction vector of the visual axis of the eye, is represented by U, the included angle between the visual axis of the eye and the optical axis of the eye is represented by e, the direction vector of the optical axis of the eye is represented by W, and the calculation formula is as follows:
Figure FDA0003611491010000041
wherein the content of the first and second substances,
Figure FDA0003611491010000042
the direction vector of the optical axis of the eye is represented, and the calculation formula of the direction vector of the visual axis of the eye is as follows:
Figure FDA0003611491010000043
wherein, (α, β) represents a deviation correction amount of a direction vector of an eye optical axis, and a calculation formula of the eye visual axis direction vector is as follows:
Figure FDA0003611491010000044
and T represents the position coordinate of the offset extraction identifier, and the calculation formula of the included angle e between the eye optical axis and the eye visual axis is (e ═ arccos theta (U, V)), so that the estimation of the included angle between the eye visual axis and the eye optical axis is completed.
8. The visual tracking-based user gaze point estimation method of claim 1, comprising in particular:
the offset self-adapting module corrects the user fixation point coordinate acquired by the fixation point extracting module by using the residual error calculated by the residual error estimating module, and takes the corrected user fixation point coordinate as a final user fixation point estimated value;
the offset self-adaption module is realized through a deep learning artificial neural network, the deep learning artificial neural network comprises a plurality of deep convolution neural layers, a plurality of expansion convolution layers, an offset prediction branch and a full-connection layer, the four parts are connected in sequence, and a loss function L used in the training process of the deep learning artificial neural network 1new Is expressed as L 1new =L 1 + λ | b |, where L 1 For the L1 norm loss function used in the gaze point extraction module, λ | b | is a regularization term for adjusting the adaptive capability of the network, where λ is the adjustment coefficient and b is the residual extracted in the gaze point extraction module.
9. A method for performing precision evaluation on the user fixation point estimation method of any one of claims 1 to 8 is characterized in that a plurality of precision test points are rendered in sequence according to a preset position on a display interface of a head-mounted eye movement interaction device, the precision test points are controlled to be sequentially displayed on the display interface and then sequentially hidden according to a certain time sequence, and only one precision test point is displayed at each moment;
the user wears the head-wearing eye movement interaction equipment, sequentially gazes at the precision test points displayed on the display interface, acquires the fixation point coordinates of the user by using the fixation point extraction module, and calculates and acquires the eye movement precision;
calculating a plurality of eye movement precision values for each precision test point and averaging to obtain the eye movement precision values of the precision test points; dividing a display interface into a plurality of sub-regions, respectively setting a plurality of precision test points on each sub-region, respectively calculating and evaluating eye movement precision aiming at different sub-regions, averaging eye movement precision values obtained by all the precision test points in each sub-region to be used as the eye movement precision value of the sub-region, and averaging the eye movement precision values of all the sub-regions to obtain a final evaluation value of the eye movement precision.
10. The method for accuracy evaluation of a user gaze point estimation method according to claim 9, wherein the eye movement accuracy is used to reflect the accuracy of user gaze point extraction and the concentration of the user's attention, the eye movement accuracy is obtained by calculating an angle deviation δ between the accuracy test point and the user gaze point coordinates obtained by the gaze point extraction module, and the calculation formula of the angle deviation δ is:
Figure FDA0003611491010000051
wherein (x, y) represents the position coordinates of the precision test points,
Figure FDA0003611491010000052
the user fixation point coordinates acquired by the fixation point extraction module are represented, Z represents the virtual screen depth of the display interface of the head-mounted eye movement interaction equipment, and W and H respectively represent the number of pixel points in the horizontal direction and the vertical direction of the display interface.
CN202210432536.5A 2022-04-23 2022-04-23 User fixation point estimation and precision evaluation method based on visual tracking Pending CN114816060A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210432536.5A CN114816060A (en) 2022-04-23 2022-04-23 User fixation point estimation and precision evaluation method based on visual tracking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210432536.5A CN114816060A (en) 2022-04-23 2022-04-23 User fixation point estimation and precision evaluation method based on visual tracking

Publications (1)

Publication Number Publication Date
CN114816060A true CN114816060A (en) 2022-07-29

Family

ID=82508055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210432536.5A Pending CN114816060A (en) 2022-04-23 2022-04-23 User fixation point estimation and precision evaluation method based on visual tracking

Country Status (1)

Country Link
CN (1) CN114816060A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116597288A (en) * 2023-07-18 2023-08-15 江西格如灵科技股份有限公司 Gaze point rendering method, gaze point rendering system, computer and readable storage medium
CN117133043A (en) * 2023-03-31 2023-11-28 荣耀终端有限公司 Gaze point estimation method, electronic device, and computer-readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117133043A (en) * 2023-03-31 2023-11-28 荣耀终端有限公司 Gaze point estimation method, electronic device, and computer-readable storage medium
CN116597288A (en) * 2023-07-18 2023-08-15 江西格如灵科技股份有限公司 Gaze point rendering method, gaze point rendering system, computer and readable storage medium
CN116597288B (en) * 2023-07-18 2023-09-12 江西格如灵科技股份有限公司 Gaze point rendering method, gaze point rendering system, computer and readable storage medium

Similar Documents

Publication Publication Date Title
JP6960494B2 (en) Collection, selection and combination of eye images
CN107929007B (en) Attention and visual ability training system and method using eye tracking and intelligent evaluation technology
Chen et al. Probabilistic gaze estimation without active personal calibration
Chen et al. A probabilistic approach to online eye gaze tracking without explicit personal calibration
CN114816060A (en) User fixation point estimation and precision evaluation method based on visual tracking
CN109712710B (en) Intelligent infant development disorder assessment method based on three-dimensional eye movement characteristics
JP2022527818A (en) Methods and systems for estimating geometric variables related to the user's eye
Nair et al. RIT-Eyes: Rendering of near-eye images for eye-tracking applications
John et al. An evaluation of pupillary light response models for 2D screens and VR HMDs
US20220175240A1 (en) A device and method for evaluating a performance of a visual equipment for a visual task
CN110472546B (en) Infant non-contact eye movement feature extraction device and method
Pizer et al. Fundamental properties of medical image perception
US9760772B2 (en) Eye image stimuli for eyegaze calibration procedures
CN110269586A (en) For capturing the device and method in the visual field of the people with dim spot
CN116503475A (en) VRAR binocular 3D target positioning method based on deep learning
CN112183160A (en) Sight estimation method and device
Chaudhary et al. : From real infrared eye-images to synthetic sequences of gaze behavior
Allen et al. Proximity and precision in spatial memory
Abbasov Features of the Perception and Recognition of Images in Art
Skowronek et al. Eye Tracking Using a Smartphone Camera and Deep Learning
Chugh An Eye Tracking System for a Virtual Reality Headset
US20240119594A1 (en) Determining Digital Markers Indicative of a Neurological Condition Using Eye Movement Parameters
Wang et al. Eye tracking method based on mobile big data in computer environment
Stengel Gaze-contingent Computer Graphics
Chaudhary Deep into the Eyes: Applying Machine Learning to Improve Eye-Tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination