CN117523650B - Eyeball motion tracking method and system based on rotation target detection - Google Patents

Eyeball motion tracking method and system based on rotation target detection Download PDF

Info

Publication number
CN117523650B
CN117523650B CN202410008039.1A CN202410008039A CN117523650B CN 117523650 B CN117523650 B CN 117523650B CN 202410008039 A CN202410008039 A CN 202410008039A CN 117523650 B CN117523650 B CN 117523650B
Authority
CN
China
Prior art keywords
pupil
frame
image
frame image
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410008039.1A
Other languages
Chinese (zh)
Other versions
CN117523650A (en
Inventor
沈益冉
张桐瑜
赵广荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202410008039.1A priority Critical patent/CN117523650B/en
Publication of CN117523650A publication Critical patent/CN117523650A/en
Application granted granted Critical
Publication of CN117523650B publication Critical patent/CN117523650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/197Matching; Classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Ophthalmology & Optometry (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of eye movement tracking, and discloses an eyeball movement tracking method and system based on rotation target detection, wherein the method comprises the following steps: acquiring an eye image sequence; inputting each frame of image in the eye image sequence into a trained eye movement tracking model, outputting pupil positioning results of each frame of image by the model, and carrying out feature extraction and feature fusion on a T frame of image by the trained eye movement tracking model to obtain image features of the T frame of image; determining whether feature fusion on a time domain is carried out on the current frame image according to pupil shielding degree of the previous frame image; pupil positioning is carried out on the T frame image by adopting a rotating target detection model; estimating the pupil shielding degree of the T frame image by adopting semantic segmentation, and taking the pupil of the T frame image as a new template if the pupil of the T frame image is not shielded; the invention can remarkably improve the accuracy and stability of eye movement tracking.

Description

Eyeball motion tracking method and system based on rotation target detection
Technical Field
The invention relates to the technical field of eye movement tracking, in particular to an eyeball movement tracking method and system based on rotation target detection.
Background
The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.
The eye tracking technology, namely analyzing the gaze point and direction of the human eye through pupil positioning, plays an important role in applications such as foreground rendering of virtual reality, man-machine interaction, virtual classroom teaching, identity verification, mental analysis in the biomedical field and the like. In the core flow of eye movement tracking, accurate identification of pupil areas is a critical aspect. At present, algorithms based on deep learning have shown more excellent performance in this field than conventional methods, but still have certain limitations.
First, the existing deep learning algorithm mainly relies on a semantic segmentation technology, which identifies pupil areas by performing two classifications on pixel points in an image and then fitting ellipses from irregular shape prediction results through a post-processing flow. This type of method does not fully utilize prior information that the pupil shape is actually elliptical.
Secondly, the existing algorithm does not effectively solve the problem of influence of blink actions on pupil detection accuracy. During blinking, the eyelid can occlude the pupil, thereby causing a large deviation of the model's predicted result from reality. Such deviations directly affect the performance of eye tracking techniques in the above-described application scenarios.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides an eyeball motion tracking method and system based on rotation target detection; by combining priori knowledge and optimization algorithm flow, the accuracy and stability of eye movement tracking are expected to be improved remarkably, so that more reliable technical support is provided for virtual reality, man-machine interaction and other related fields.
In one aspect, there is provided an eye movement tracking method based on rotation target detection, including: acquiring an eye image sequence; inputting each frame of image in the eye image sequence into a trained eye movement tracking model, and outputting pupil positioning results of each frame of image by the model: (1): extracting features and fusing features of the T frame image to obtain image features of the T frame image; t is a positive integer greater than or equal to 1; (2): judging whether the current T frame image is a first frame image in the eye image sequence, if so, entering (3); if the first frame image is not the first frame image, judging whether the pupil shielding degree of the T-1 frame is smaller than a first set threshold value, and if the pupil shielding degree is smaller than the first set threshold value, entering (3); if the pupil occlusion degree of the T-1 frame is smaller than a second set threshold, carrying out feature fusion on the time domain on the T-1 frame image and the template, and entering (3); if the threshold value is larger than a second set threshold value, entering (4); (3): pupil positioning is carried out on the T frame image by adopting a rotating target detection model; (4): estimating the pupil shielding degree of the T frame image by adopting semantic segmentation, and taking the pupil of the T frame image as a new template if the pupil of the T frame image is not shielded; (5): judging whether the T frame image is the last frame image, if so, ending, and if not, adding 1 to T, and returning to (1).
In another aspect, there is provided an eye movement tracking system based on rotational target detection, comprising: an acquisition module configured to: acquiring an eye image sequence; a tracking module configured to: inputting each frame of image in the eye image sequence into a trained eye movement tracking model, and outputting pupil positioning results of each frame of image by the model; wherein, the tracking module includes: a feature extraction fusion unit configured to: extracting features and fusing features of the T frame image to obtain image features of the T frame image; t is a positive integer greater than or equal to 1; a judgment unit configured to: judging whether the current T frame image is a first frame image in the eye image sequence, and if so, entering a pupil positioning unit; if the first frame image is not the first frame image, judging whether the pupil shielding degree of the T-1 frame is smaller than a first set threshold value, and if the pupil shielding degree is smaller than the first set threshold value, entering a pupil positioning unit; if the pupil occlusion degree of the T-1 frame is smaller than the second set threshold, carrying out feature fusion on the time domain on the T-1 frame image and the template, and entering a pupil positioning unit; if the first set threshold value is larger than the second set threshold value, entering a shielding degree estimation unit; a pupil positioning unit configured to: pupil positioning is carried out on the T frame image by adopting a rotating target detection model; an occlusion degree estimation unit configured to: estimating the pupil shielding degree of the T frame image by adopting semantic segmentation, and taking the pupil of the T frame image as a new template if the pupil of the T frame image is not shielded; a re-judgment unit configured to: and judging whether the T frame image is the last frame image, if so, ending, and if not, adding 1 to T, and returning to the feature extraction fusion unit.
The technical scheme has the following advantages or beneficial effects: the invention can maintain high detection accuracy when the blinking action occurs. The invention mainly utilizes the priori knowledge that the pupil is in an elliptical shape in nature to directly obtain the ellipse corresponding to the pupil. Post-processing steps required by the algorithm based on semantic segmentation are avoided, so that the algorithm is more concise and elegant. The core mechanism is to apply a rotation target detection method to obtain the minimum circumscribed rectangle of the ellipse sharing parameter corresponding to the pupil, thereby obtaining the center point coordinate, the long axis length, the short axis length and the rotation angle of the pupil ellipse.
When the pupil partial shielding condition is processed, the invention adopts a fusion technology in a time domain. Specifically, the pupil image features which are not blocked by the eyelid before are used as templates, and fusion processing is carried out on the current frame image features in a time domain, so that when the pupil is blocked by the eyelid part, the relevant information of the pupil can still be accurately obtained, and therefore accurate pupil detection is achieved.
Meanwhile, the shielding degree of the pupil is judged by using a semantic segmentation technology, so that the model is switched under different working modes according to the shielding degree of the pupil.
In addition, in order to solve the problem that blink images are sparsely distributed in the data set due to the excessively high blink action speed and low blink frequency and reduce the manpower required for marking the partially occluded pupils, the invention provides an innovative data generation strategy. The strategy can generate a corresponding pupil image which is partially blocked by the eyelid by utilizing the pupil image in the state of completely opening the eyes, so that the diversity of the data set is enriched, and the robustness and the accuracy of the pupil detection algorithm in practical application are improved.
The pupil detection method has the advantages that the priori information that the pupil is elliptical is fully utilized, post-processing operation required by a semantic segmentation algorithm is avoided, a good effect is achieved no matter whether the pupil is partially shielded or not in pupil detection, and particularly, when the degree of shielding of the pupil is high, the best effect is achieved compared with the existing method. The specific index is that when more than 80% of the pupil area is shielded, compared with the prior art, the invention is improved by 20% in the cross-over ratio and is improved by 12.5% in the F1 fraction. Moreover, the fusion technology on the time domain and the data generation strategy provided by the invention are proved to be effective through an ablation experiment.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
Fig. 1 is a flow chart of a method according to a first embodiment.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
Example 1
As shown in fig. 1, the present embodiment provides an eye movement tracking method based on rotation target detection, including: s101: acquiring an eye image sequence; s102: inputting each frame of image in the eye image sequence into a trained eye movement tracking model, and outputting pupil positioning results of each frame of image by the model; wherein, the step S102 specifically includes: s102-1: extracting features and fusing features of the T frame image to obtain image features of the T frame image; t is a positive integer greater than or equal to 1; s102-2: judging whether the current T frame image is a first frame image in the eye image sequence, if so, entering S102-3; if the first frame image is not the first frame image, judging whether the pupil shielding degree of the T-1 frame is smaller than a first set threshold value, and if the pupil shielding degree is smaller than the first set threshold value, entering S102-3; if the pupil occlusion degree of the T-1 frame is smaller than the second set threshold, carrying out feature fusion on the time domain on the T-1 frame image and the template, and entering S102-3; if the threshold value is larger than the second set threshold value, S102-4 is entered; s102-3: pupil positioning is carried out on the T frame image by adopting a rotating target detection model; s102-4: estimating the pupil shielding degree of the T frame image by adopting semantic segmentation, and taking the pupil of the T frame image as a new template if the pupil of the T frame image is not shielded; s102-5: and judging whether the T frame image is the last frame image, if so, ending, and if not, adding 1 to T, and returning to S102-1.
Further, the trained eye movement tracking model comprises the following structures: the input end of the multi-scale space feature extraction fusion module is used for inputting an eye image sequence, and the multi-scale space feature extraction fusion module performs feature extraction and feature fusion processing on the eye image to obtain primary fusion features; the judging module is used for inputting the pupil shielding degree of the T-1 frame, comparing the pupil shielding degree of the T-1 frame with a first set threshold value and a second set threshold value, and inputting the primary fusion characteristic of the T frame image to the pupil positioning module if the pupil shielding degree of the T-1 frame is smaller than the first set threshold value; if the first set threshold value is larger than the second set threshold value, inputting the primary fusion characteristic and the template characteristic of the T-th frame image into a time domain characteristic fusion module, fusing the primary fusion characteristic and the template characteristic by the time domain characteristic fusion module, and inputting the fused result into a pupil positioning module; if the image is larger than the second set threshold value, the T frame image is considered as an invalid image; the input end of the judging module is also connected with the output end of the pupil shielding degree estimating module; the pupil shielding degree estimation module is used for estimating the pupil shielding degree in the image; the pupil positioning module is used for determining the position of the pupil; and the time domain feature fusion module is used for realizing feature fusion of the features in the time domain.
Further, the training process of the trained eye movement tracking model comprises the following steps: constructing a first training set, wherein the first training set is an eye image sequence with known pupil positions and shapes; and inputting the first training set into the eye movement tracking model, and stopping training when the total loss function value of the eye movement tracking model is not reduced, or the iteration number exceeds the set number, so as to obtain the trained eye movement tracking model.
Illustratively, the constructing a first training set includes: the invention adopts a picture sequence formed by near-eye 8bit gray level images continuously shot by a camera as input data to position pupils in the gray level images.
Further, the multi-scale space feature extraction fusion module adopts a feature extraction Swin transform network to perform feature extraction on the T frame image; the feature pyramid network FPN (Feature Pyramid Networks) is used to achieve fusion of the features of each stage extracted by the Swin Transformer network with the next stage features of each stage.
Further, S102-1: extracting and fusing the features of the T frame image to obtain the image features of the T frame image, which specifically comprises the following steps: adopting a Swin transform network to extract features of different stages of the T frame image; the feature pyramid network FPN (Feature Pyramid Networks) is used to achieve fusion of the features of each stage extracted by the Swin Transformer network with the next stage features of each stage.
Illustratively, the image is spatially characterized using a feature extractor Swin transform, which extracts features by computing self-attention within a window in the feature map, the window size being 77. And meanwhile, merging the bottom layer features through patch merging operation to generate high-level features with the scale of only 1/2 of the bottom layer features, and extracting semantic information of the feature map through the operation.
The feature pyramid FPN is used to fuse the features of different scales on the spatial domain, and each layer of feature map generated by upsampling the upper layer features by 2 times is compared with the 1 th pass1 are added. The generated feature map has detail information rich in the bottom layer features and semantic information of the high layer features.
Further, the pupil shielding degree adopts the trained languageThe artificial segmentation network U-net identifies the number of pupil pixel points in the image; calculating the pupil shielding degree:/>;(1)。
Wherein,representing the number of pupil pixel points in the complete pupil area when the pupil is not shielded; />And when the pupil is partially blocked, the U-net network identifies the number of pupil pixel points in the non-blocked pupil area.
Further, the trained semantic segmentation network U-net comprises the following training processes: constructing a second training set, wherein the second training set is provided with eye images with known pupil positions and morphology; and inputting the second training set into the semantic segmentation network U-net, and training the second training set to obtain the trained semantic segmentation network U-net.
The invention adopts semantic segmentation technology to judge the degree of the shielded pupil. Specifically, performing semantic segmentation on pupils in the picture to obtain the number of pupil pixel points in the picture. If the current pupil is not occluded by the eyelid or is occluded to a lesser extent by the eyelid, then +.>The number of pixel points in the complete pupil area predicted by the model +.>Comparing, if not, the number of pixel points in the complete pupil area predicted by the model is +.>Obtain the degree of the blocked pupil->. And judging a strategy for detecting the next picture in the sequence according to the shielding degree of the pupil in the current picture. If the current pupil is not occluded, the template is updated with the characteristics of the current pupil.
Illustratively, the first set threshold is 25%; the second set threshold is 87.5%.
Further, the feature fusion in the time domain is performed on the T frame image and the template, including:;(2)。
wherein,is a feature map after fusion, < >>Is the T-th frame image feature,/>Is a template feature, k is the kth channel of the feature map, < >>Is a convolution operation, the template is allowed to be updated, and the initial template is an unobstructed pupil image in the image sequence.
And carrying out feature fusion on the time domain on the T frame image and the template, and also being an internal working process of the time domain feature fusion module. If the pupil is partially occluded, the fusion feature will be used, and if the pupil is not partially occluded, the feature of the current frame will be used.
Further, the step S102-3: pupil positioning is carried out on the T frame image by adopting a rotary target detection model, and the method comprises the following steps: the rotating target detection model includes: a classification subnet and a regression subnet in parallel; the classification sub-network and the regression sub-network are realized through a convolutional neural network; the classifying sub-network is used for judging whether the priori anchor frame contains pupils, the input value of the classifying sub-network is the characteristic of the T frame image, and the output value of the classifying sub-network is the confidence coefficient of the pupil contained in the anchor frame; the regression sub-network is used for predicting the offset of the rotation rectangular frame corresponding to the priori anchor frame and the pupil, the input value of the regression sub-network is the characteristic of the T frame image, and the output value of the regression sub-network is the offset of the ellipse corresponding to the pupil and the anchor frame.
Further, the step S102-3: the training process comprises the following steps of: constructing a third training set, wherein the third training set is pupil area images of known pupil position labels; and inputting the third training set into the rotating target detection model, training the model, and stopping training when the loss function value of the model is not reduced any more or the iteration number reaches the set number of times to obtain the trained rotating target detection model.
Further, after obtaining the anchor frame with the highest confidence coefficient, the classifying sub-network assumes that the parameters of the anchor frame with the highest confidence coefficient are respectively: anchor frame center pointCoordinates->Anchor frame center->Coordinates->The width of the anchor frame>Length of anchor frame->Rotation angle of anchor frame->Deviation amount obtained according to regression subnet +.>According to formulas (3), (4), (5), (6), (7), (8) and (9), the center point +_of the predicted pupil corresponding rotation rectangular frame for the rotation rectangular frame is finally obtained>Coordinates->Center point->Coordinates->Length of rectangular frame->Width ∈of rectangular frame>Rotation angle of rectangular frame->,/>Is also the center point of the pupil corresponding ellipse +.>Coordinates of->Is also the center point of the pupil corresponding ellipse +.>Coordinates of->Also the length of the major axis of the ellipse corresponding to the pupil, +.>Also the pupil corresponds to the length of the minor axis of the ellipse, +.>The pupil corresponds to the elliptical rotation angle.
Wherein,representing intermediate variables +.>Representing intermediate variables +.>Representing the predicted ellipse center point +.>Coordinate offset amount->Representing the predicted ellipse center point +.>Coordinate offset amount->Representing the deviation of the predicted ellipse major axis relative to the anchor frame major axis, < >>Representing the predicted elliptical short axis offset relative to the anchor frame short axis, < ->Indicating the amount of deviation of the predicted elliptical rotation angle with respect to the rotation angle of the anchor frame.
Example two
The present embodiment provides an eye movement tracking system based on rotational target detection, including: an acquisition module configured to: acquiring an eye image sequence; a tracking module configured to: inputting each frame of image in the eye image sequence into a trained eye movement tracking model, and outputting pupil positioning results of each frame of image by the model; wherein, the tracking module includes: a feature extraction fusion unit configured to: extracting features and fusing features of the T frame image to obtain image features of the T frame image; t is a positive integer greater than or equal to 1; a judgment unit configured to: judging whether the current T frame image is a first frame image in the eye image sequence, and if so, entering a pupil positioning unit; if the first frame image is not the first frame image, judging whether the pupil shielding degree of the T-1 frame is smaller than a first set threshold value, and if the pupil shielding degree is smaller than the first set threshold value, entering a pupil positioning unit; if the pupil occlusion degree of the T-1 frame is smaller than the second set threshold, carrying out feature fusion on the time domain on the T-1 frame image and the template, and entering a pupil positioning unit; if the first set threshold value is larger than the second set threshold value, entering a shielding degree estimation unit; a pupil positioning unit configured to: pupil positioning is carried out on the T frame image by adopting a rotating target detection model; an occlusion degree estimation unit configured to: estimating the pupil shielding degree of the T frame image by adopting semantic segmentation, and taking the pupil of the T frame image as a new template if the pupil of the T frame image is not shielded; a re-judgment unit configured to: and judging whether the T frame image is the last frame image, if so, ending, and if not, adding 1 to T, and returning to the feature extraction fusion unit.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. The eyeball motion tracking method based on the detection of the rotating target is characterized by comprising the following steps:
acquiring an eye image sequence;
inputting each frame of image in the eye image sequence into a trained eye movement tracking model, outputting pupil positioning results of each frame of image by the model, wherein the trained eye movement tracking model is used for:
(1): extracting features and fusing features of the T frame image to obtain image features of the T frame image; t is a positive integer greater than or equal to 1;
(2): judging whether the current T frame image is a first frame image in the eye image sequence, if so, entering (3); if the first frame image is not the first frame image, judging whether the pupil shielding degree of the T-1 frame is smaller than a first set threshold value, and if the pupil shielding degree is smaller than the first set threshold value, entering (3); if the pupil occlusion degree of the T-1 frame is smaller than a second set threshold, carrying out feature fusion on the time domain on the T-1 frame image and the template, and entering (3); if the threshold value is larger than a second set threshold value, entering (4);
the feature fusion in the time domain is carried out on the T frame image and the template, which comprises the following steps:
wherein,is a feature map after fusion, < >>Is->Frame image feature->Is a template feature->Refer to the +.>Multiple channels (I)>The method is a convolution operation, the template is allowed to be updated, and the initial template is a set non-occlusion pupil image;
(3): pupil positioning is carried out on the T frame image by adopting a rotating target detection model;
pupil positioning is carried out on the T frame image by adopting a rotary target detection model, and the method comprises the following steps:
the rotating target detection model includes: a classification subnet and a regression subnet in parallel; the classification sub-network and the regression sub-network are realized through a convolutional neural network;
the classifying sub-network is used for judging whether the priori anchor frame contains pupils, the input value of the classifying sub-network is the characteristic of the T frame image, and the output value of the classifying sub-network is the confidence coefficient of the pupil contained in the anchor frame;
the regression sub-network is used for predicting the offset of the rotation rectangular frame corresponding to the priori anchor frame and the pupil, the input value of the regression sub-network is the characteristic of the T frame image, and the output value of the regression sub-network is the offset of the ellipse corresponding to the pupil and the anchor frame;
after obtaining the anchor frame with the highest confidence coefficient, the classifying sub-network assumes that the parameters of the anchor frame with the highest confidence coefficient are respectively as follows: anchor frame center pointCoordinates->Anchor frame center->Coordinates->The width of the anchor frame>Length of anchor frame->Rotation angle of anchor frame->Deviation amount obtained according to regression subnet +.>According to formulas (3), (4), (5), (6), (7), (8) and (9), the center point +_of the predicted pupil corresponding rotation rectangular frame for the rotation rectangular frame is finally obtained>Coordinates->Center point->Coordinates->Length of rectangular frame->Width ∈of rectangular frame>Rotation angle of rectangular frame->,/>Is also the center point of the pupil corresponding ellipse +.>Coordinates of->Is also the center point of the pupil corresponding ellipse +.>Coordinates of->Also the length of the major axis of the ellipse corresponding to the pupil, +.>Also the pupil corresponds to the length of the minor axis of the ellipse, +.>The pupil corresponds to the elliptical rotation angle;
wherein,representing intermediate variables +.>Representing intermediate variables +.>Representing the x coordinate offset of the predicted ellipse center point relative to the anchor frame center point, +.>Representing the predicted ellipse center point +.>Coordinate offset amount->Representing the deviation of the predicted ellipse major axis relative to the anchor frame major axis, < >>Representing the predicted elliptical short axis offset relative to the anchor frame short axis, < ->Representing the offset of the predicted elliptical rotation angle relative to the rotation angle of the anchor frame;
(4): estimating the pupil shielding degree of the T frame image by adopting semantic segmentation, and taking the pupil of the T frame image as a new template if the pupil of the T frame image is not shielded;
(5): judging whether the T frame image is the last frame image, if so, ending, and if not, adding 1 to T, and returning to (1).
2. The eye movement tracking method based on rotation target detection according to claim 1, wherein feature extraction and feature fusion are performed on a T-th frame image to obtain image features of the T-th frame image, and the method specifically comprises: extracting features of the T frame image by adopting a feature extraction network; and the feature pyramid network is adopted to realize the fusion of the features of each stage extracted by the feature extraction network and the features of the next stage of each stage.
3. The eye movement tracking method based on rotation target detection as claimed in claim 1, wherein the pupil shielding degree is used for identifying the number of pupil pixels in the image by using a trained semantic segmentation network; calculating the pupil shielding degree
Wherein,representing the number of pupil pixel points in the complete pupil area when the pupil is not shielded; />When the pupil is partially blocked, the semantic segmentation network identifies the number of pupil pixel points in the non-blocked pupil area.
4. A method of eye movement tracking based on rotational object detection as claimed in claim 3, wherein the trained semantic segmentation network, the training process comprises:
constructing a second training set, wherein the second training set is provided with eye images with known pupil positions and morphology;
and inputting the second training set into the semantic segmentation network, and training the second training set to obtain a trained semantic segmentation network.
5. A method of eye movement tracking based on rotational object detection as claimed in claim 1, wherein the trained eye movement tracking model comprises:
the input end of the multi-scale space feature extraction fusion module is used for inputting an eye image sequence, and the multi-scale space feature extraction fusion module performs feature extraction and feature fusion processing on the eye image to obtain primary fusion features;
the judging module is used for inputting the pupil shielding degree of the T-1 frame, comparing the pupil shielding degree of the T-1 frame with a first set threshold value and a second set threshold value, and inputting the primary fusion characteristic of the T frame image to the pupil positioning module if the pupil shielding degree of the T-1 frame is smaller than the first set threshold value; if the first set threshold value is larger than the second set threshold value, inputting the primary fusion characteristic and the template characteristic of the T-th frame image into a time domain characteristic fusion module, fusing the primary fusion characteristic and the template characteristic by the time domain characteristic fusion module, and inputting the fused result into a pupil positioning module; if the image is larger than the second set threshold value, the T frame image is considered as an invalid image;
the input end of the judging module is also connected with the output end of the pupil shielding degree estimating module;
the pupil shielding degree estimation module is used for estimating the pupil shielding degree in the image;
the pupil positioning module is used for determining the position of the pupil;
and the time domain feature fusion module is used for realizing feature fusion of the features in the time domain.
6. A method of eye movement tracking based on rotational object detection as claimed in claim 1, wherein the training process of the trained eye movement tracking model comprises:
constructing a first training set, wherein the first training set is an eye image sequence with known pupil positions and shapes;
and inputting the first training set into the eye movement tracking model, and stopping training when the total loss function value of the eye movement tracking model is not reduced, or the iteration number exceeds the set number, so as to obtain the trained eye movement tracking model.
7. An eye movement tracking system based on rotational target detection, comprising:
an acquisition module configured to: acquiring an eye image sequence;
a tracking module configured to: inputting each frame of image in the eye image sequence into a trained eye movement tracking model, and outputting pupil positioning results of each frame of image by the model; wherein, the tracking module includes:
a feature extraction fusion unit configured to: extracting features and fusing features of the T frame image to obtain image features of the T frame image; t is a positive integer greater than or equal to 1;
a judgment unit configured to: judging whether the current T frame image is a first frame image in the eye image sequence, and if so, entering a pupil positioning unit; if the first frame image is not the first frame image, judging whether the pupil shielding degree of the T-1 frame is smaller than a first set threshold value, and if the pupil shielding degree is smaller than the first set threshold value, entering a pupil positioning unit; if the pupil occlusion degree of the T-1 frame is smaller than the second set threshold, carrying out feature fusion on the time domain on the T-1 frame image and the template, and entering a pupil positioning unit; if the first set threshold value is larger than the second set threshold value, entering a shielding degree estimation unit;
the feature fusion in the time domain is carried out on the T frame image and the template, which comprises the following steps:
wherein,is a feature map after fusion, < >>Is->Frame image feature->Is a template feature->Refer to the +.>Multiple channels (I)>The method is a convolution operation, the template is allowed to be updated, and the initial template is a set non-occlusion pupil image;
a pupil positioning unit configured to: pupil positioning is carried out on the T frame image by adopting a rotating target detection model;
pupil positioning is carried out on the T frame image by adopting a rotary target detection model, and the method comprises the following steps:
the rotating target detection model includes: a classification subnet and a regression subnet in parallel; the classification sub-network and the regression sub-network are realized through a convolutional neural network;
the classifying sub-network is used for judging whether the priori anchor frame contains pupils, the input value of the classifying sub-network is the characteristic of the T frame image, and the output value of the classifying sub-network is the confidence coefficient of the pupil contained in the anchor frame;
the regression sub-network is used for predicting the offset of the rotation rectangular frame corresponding to the priori anchor frame and the pupil, the input value of the regression sub-network is the characteristic of the T frame image, and the output value of the regression sub-network is the offset of the ellipse corresponding to the pupil and the anchor frame;
after obtaining the anchor frame with the highest confidence coefficient, the classifying sub-network assumes that the parameters of the anchor frame with the highest confidence coefficient are respectively as follows: anchor frame center pointCoordinates->Anchor frame center->Coordinates->The width of the anchor frame>Length of anchor frame->Rotation angle of anchor frame->According to regression sub-networkThe resulting offset->According to formulas (3), (4), (5), (6), (7), (8) and (9), the center point +_of the predicted pupil corresponding rotation rectangular frame for the rotation rectangular frame is finally obtained>Coordinates->Center point->Coordinates->Length of rectangular frame->Width ∈of rectangular frame>Rotation angle of rectangular frame->,/>Is also the center point of the pupil corresponding ellipse +.>Coordinates of->Is also the center point of the pupil corresponding ellipse +.>Coordinates of->Also the length of the major axis of the ellipse corresponding to the pupil, +.>Also the pupil corresponds to the length of the minor axis of the ellipse, +.>The pupil corresponds to the elliptical rotation angle;
wherein,representing intermediate variables +.>Representing intermediate variables +.>Representing the x coordinate offset of the predicted ellipse center point relative to the anchor frame center point, +.>Representing the predicted ellipse center point +.>Coordinate offset amount->Representing the deviation of the predicted ellipse major axis relative to the anchor frame major axis, < >>Representing the predicted elliptical short axis offset relative to the anchor frame short axis, < ->Representing the offset of the predicted elliptical rotation angle relative to the rotation angle of the anchor frame;
an occlusion degree estimation unit configured to: estimating the pupil shielding degree of the T frame image by adopting semantic segmentation, and taking the pupil of the T frame image as a new template if the pupil of the T frame image is not shielded;
a re-judgment unit configured to: and judging whether the T frame image is the last frame image, if so, ending, and if not, adding 1 to T, and returning to the feature extraction fusion unit.
CN202410008039.1A 2024-01-04 2024-01-04 Eyeball motion tracking method and system based on rotation target detection Active CN117523650B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410008039.1A CN117523650B (en) 2024-01-04 2024-01-04 Eyeball motion tracking method and system based on rotation target detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410008039.1A CN117523650B (en) 2024-01-04 2024-01-04 Eyeball motion tracking method and system based on rotation target detection

Publications (2)

Publication Number Publication Date
CN117523650A CN117523650A (en) 2024-02-06
CN117523650B true CN117523650B (en) 2024-04-02

Family

ID=89766788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410008039.1A Active CN117523650B (en) 2024-01-04 2024-01-04 Eyeball motion tracking method and system based on rotation target detection

Country Status (1)

Country Link
CN (1) CN117523650B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006167256A (en) * 2004-12-17 2006-06-29 National Univ Corp Shizuoka Univ Pupil detecting apparatus
CN109857254A (en) * 2019-01-31 2019-06-07 京东方科技集团股份有限公司 Pupil positioning method and device, VR/AR equipment and computer-readable medium
CN110659674A (en) * 2019-09-05 2020-01-07 东南大学 Lie detection method based on sight tracking
CN113688733A (en) * 2021-08-25 2021-11-23 深圳龙岗智能视听研究院 Eye detection and tracking method, system, equipment and application based on event camera
CN113971834A (en) * 2021-10-23 2022-01-25 郑州大学 Eyeball tracking method and system based on virtual reality
WO2023001063A1 (en) * 2021-07-19 2023-01-26 北京鹰瞳科技发展股份有限公司 Target detection method and apparatus, electronic device, and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006167256A (en) * 2004-12-17 2006-06-29 National Univ Corp Shizuoka Univ Pupil detecting apparatus
CN109857254A (en) * 2019-01-31 2019-06-07 京东方科技集团股份有限公司 Pupil positioning method and device, VR/AR equipment and computer-readable medium
CN110659674A (en) * 2019-09-05 2020-01-07 东南大学 Lie detection method based on sight tracking
WO2023001063A1 (en) * 2021-07-19 2023-01-26 北京鹰瞳科技发展股份有限公司 Target detection method and apparatus, electronic device, and storage medium
CN113688733A (en) * 2021-08-25 2021-11-23 深圳龙岗智能视听研究院 Eye detection and tracking method, system, equipment and application based on event camera
CN113971834A (en) * 2021-10-23 2022-01-25 郑州大学 Eyeball tracking method and system based on virtual reality

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
人眼检测技术研究进展;常胜江;孟春宁;韩建民;林淑玲;;数据采集与处理;20151115(06);全文 *
基于眼动追踪的眼动位置识别;隋秀娟;薛雷;许翠单;;工业控制计算机;20200525(05);全文 *

Also Published As

Publication number Publication date
CN117523650A (en) 2024-02-06

Similar Documents

Publication Publication Date Title
CN107767405B (en) Nuclear correlation filtering target tracking method fusing convolutional neural network
CN106778664B (en) Iris image iris area segmentation method and device
WO2018188453A1 (en) Method for determining human face area, storage medium, and computer device
US11403874B2 (en) Virtual avatar generation method and apparatus for generating virtual avatar including user selected face property, and storage medium
CN108062525B (en) Deep learning hand detection method based on hand region prediction
Wang et al. Blink detection using Adaboost and contour circle for fatigue recognition
WO2021179471A1 (en) Face blur detection method and apparatus, computer device and storage medium
CN106650574A (en) Face identification method based on PCANet
CN111158491A (en) Gesture recognition man-machine interaction method applied to vehicle-mounted HUD
CN112101208A (en) Feature series fusion gesture recognition method and device for elderly people
CN112613579A (en) Model training method and evaluation method for human face or human head image quality and selection method for high-quality image
CN111158457A (en) Vehicle-mounted HUD (head Up display) human-computer interaction system based on gesture recognition
CN114549557A (en) Portrait segmentation network training method, device, equipment and medium
Saif et al. Robust drowsiness detection for vehicle driver using deep convolutional neural network
Wan et al. Robust and accurate pupil detection for head-mounted eye tracking
Kang et al. Real-time eye tracking for bare and sunglasses-wearing faces for augmented reality 3D head-up displays
CN112767440B (en) Target tracking method based on SIAM-FC network
CN117523650B (en) Eyeball motion tracking method and system based on rotation target detection
CN111898454A (en) Weight binarization neural network and transfer learning human eye state detection method and device
Yamamoto et al. Algorithm optimizations for low-complexity eye tracking
CN116403150A (en) Mask detection algorithm based on C3-CBAM (C3-CBAM) attention mechanism
CN115661894A (en) Face image quality filtering method
CN111898473B (en) Driver state real-time monitoring method based on deep learning
CN110675416B (en) Pupil center detection method based on abstract contour analysis
CN104102896B (en) A kind of method for recognizing human eye state that model is cut based on figure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant