CN114973391A - Eyeball tracking method, device and equipment applied to metacarpal space - Google Patents

Eyeball tracking method, device and equipment applied to metacarpal space Download PDF

Info

Publication number
CN114973391A
CN114973391A CN202210759801.0A CN202210759801A CN114973391A CN 114973391 A CN114973391 A CN 114973391A CN 202210759801 A CN202210759801 A CN 202210759801A CN 114973391 A CN114973391 A CN 114973391A
Authority
CN
China
Prior art keywords
feature map
frame
eyeball
characteristic diagram
eye image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210759801.0A
Other languages
Chinese (zh)
Other versions
CN114973391B (en
Inventor
郝炯辉
李茂林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Superred Technology Co Ltd
Original Assignee
Beijing Superred Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Superred Technology Co Ltd filed Critical Beijing Superred Technology Co Ltd
Priority to CN202210759801.0A priority Critical patent/CN114973391B/en
Publication of CN114973391A publication Critical patent/CN114973391A/en
Application granted granted Critical
Publication of CN114973391B publication Critical patent/CN114973391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Ophthalmology & Optometry (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

Embodiments of the present application provide methods, apparatuses, devices and computer-readable storage devices for eye tracking for application to the meta universe. The method comprises the steps of obtaining an eye image of a user through a metacosmic shooting device; extracting the features in the eye image frame by frame to obtain a first feature map; processing the first characteristic diagram based on a self-attention mechanism to obtain a second characteristic diagram; detecting key points of the second characteristic diagram to obtain eyeball positions; and drawing a key point track of the eyeball based on the eyeball position, predicting the eyeball position of the next frame and finishing eyeball tracking. In this way, the eye is quickly and accurately tracked.

Description

Eyeball tracking method, device and equipment applied to meta universe
Technical Field
Embodiments of the present application relate to the field of iris image processing, and in particular, to an eyeball tracking method, apparatus, device, and computer-readable storage device applied to the meta universe.
Background
With the continuous development of virtual reality and the proposition of a meta-space concept, many researchers take a virtual reality technology as a hot topic for research, wherein eyeball tracking is a technical field for realizing an important breakthrough in the meta-space concept, and the application of eyeball tracking in a virtual reality scene can improve the processing speed of images in the scene and the permeability of a user in the scene under the condition of ensuring the accuracy not to be reduced, and reduce the consumption of performance and the dizziness and fatigue of the user.
the structure of a transformer has achieved excellent performance in natural language processing tasks, and then visual tasks have continued to explore the ability of the transformer structure in image processing. the transformer structure can effectively obtain the correlation degree among a plurality of correlation vectors, and the eyeball tracking task is highly correlated among frames in the time dimension.
Therefore, how to better use the transform structure to perform eye tracking to obtain the correlation of the eyeball in the time dimension, and draw the motion trajectory and predict the eyeball position is an urgent need.
Disclosure of Invention
According to an embodiment of the present application, an eye tracking scheme applied to the meta universe is provided.
In a first aspect of the present application, an eye tracking method applied to the meta universe is provided. The method comprises the following steps:
acquiring an eye image of a user through a metauniverse shooting device;
extracting the features in the eye image frame by frame to obtain a first feature map;
processing the first characteristic diagram based on a self-attention mechanism to obtain a second characteristic diagram;
detecting key points of the second characteristic diagram to obtain eyeball positions; and drawing a key point track of the eyeball based on the eyeball position, predicting the eyeball position of the next frame and finishing eyeball tracking.
Further, the extracting the features in the eye image frame by frame to obtain a first feature map includes:
extracting the features in the eye image frame by frame through a mobileNet V2 network to obtain an eye image feature map;
detecting the eye image characteristic diagram, and if the current frame is eye closing, not tracking; if the eyes are open, constructing a first feature map according to the eye image feature map;
wherein the first convolution of the mobilenetV2 network is an ECB volume block.
Further, the processing the first feature map based on the self-attention mechanism to obtain a second feature map includes:
if the first feature map is a feature map of a single-frame image, independently performing self-attention calculation to obtain a second feature map;
and if the first feature map is the feature map of a plurality of frames of images, performing self-attention calculation on the current frame, performing self-attention calculation between the current frame and other frames, and fusing calculation results to obtain a second feature map.
Further, if the first feature map is a feature map of a multi-frame image, performing self-attention calculation on the current frame, performing self-attention calculation between the current frame and other frames, and fusing calculation results to obtain a second feature map includes:
respectively pulling the characteristic diagrams of the current frame and at least one frame before the current frame into vectors with preset sizes to obtain the vector of each characteristic diagram and an embedded matrix of each frame; the image of each frame comprises n characteristic graphs; n is a positive integer;
and according to the time sequence, carrying out position coding on the vectors in the embedded matrix, and inputting the embedded matrix subjected to the position coding into a preset transform coder to obtain a second characteristic diagram.
Further, the inputting the embedded matrix subjected to the position coding into a preset transform encoder to obtain a second feature map includes:
the transform encoder is composed of a projection matrix, a multi-head self-attention, a residual block, normalization and convolution;
inputting the embedded matrix subjected to position coding into a preset transform coder, and respectively calculating Q, K, V matrixes through projection matrixes;
inputting Q, K, V matrix of current frame and K, V matrix of other frames into multi-head self attention to obtain first output result;
inputting the first output result and the embedded matrix of the current frame into a residual block for addition;
normalizing the addition result to obtain a second output result;
and inputting the second output result into a convolution block to obtain a second characteristic diagram.
Further, the multi-head self-attention includes:
Figure 485943DEST_PATH_IMAGE001
wherein z is a multi-head self-attention calculation result;
Figure 855613DEST_PATH_IMAGE002
Figure 838613DEST_PATH_IMAGE003
Figure 686352DEST_PATH_IMAGE004
q, k, v representing the t-th frame, respectively;
Figure 703987DEST_PATH_IMAGE004
,
Figure 729711DEST_PATH_IMAGE005
) Show that
Figure 455134DEST_PATH_IMAGE004
And
Figure 122876DEST_PATH_IMAGE006
splicing the two matrixes;
Figure 995017DEST_PATH_IMAGE003
,
Figure 706490DEST_PATH_IMAGE007
) Show that
Figure 664081DEST_PATH_IMAGE003
And
Figure 604356DEST_PATH_IMAGE008
and splicing the two matrixes.
Further, the performing of the key point detection on the second feature map to obtain the eyeball position includes:
performing convolution once, convolution twice and convolution three times on the second feature graph respectively to obtain outputs of three scales;
and splicing the outputs of the three scales and then carrying out full connection to obtain the eyeball position.
In a second aspect of the present application, an eye tracking apparatus for application to the meta universe is provided. The device comprises:
the acquisition module is used for acquiring an eye image of a user through the metauniverse shooting equipment;
the extraction module is used for extracting the features in the eye image frame by frame to obtain a first feature map;
the processing module is used for processing the first characteristic diagram based on a self-attention mechanism to obtain a second characteristic diagram;
the tracking module is used for detecting key points of the second characteristic diagram to obtain the eyeball position; and drawing a key point track of the eyeball based on the eyeball position, predicting the eyeball position of the next frame and finishing eyeball tracking.
In a third aspect of the present application, an electronic device is provided. The electronic device includes: a memory having a computer program stored thereon and a processor implementing the method as described above when executing the program.
In a fourth aspect of the present application, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the method as according to the first aspect of the present application.
According to the eyeball tracking method applied to the meta universe, the eye image of a user is obtained through the shooting equipment of the meta universe; extracting the features in the eye image frame by frame to obtain a first feature map; processing the first characteristic diagram based on a self-attention mechanism to obtain a second characteristic diagram; detecting key points of the second characteristic diagram to obtain eyeball positions; based on the eyeball position, the key point track of the eyeball is drawn, the eyeball position of the next frame is predicted, eyeball tracking is completed, the eyeball is tracked, and meanwhile the efficiency and accuracy of eyeball tracking are improved.
It should be understood that what is described in this summary section is not intended to limit key or critical features of the embodiments of the application, nor is it intended to limit the scope of the application. Other features of the present application will become apparent from the following description.
Drawings
The above and other features, advantages and aspects of various embodiments of the present application will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements, and wherein:
FIG. 1 shows a flow diagram of an eye tracking method applied to the meta universe according to an embodiment of the application;
FIG. 2 shows a schematic diagram of the structure of an ECB volume block according to an embodiment of the present application;
fig. 3 shows a schematic structural diagram of a backbone network according to an embodiment of the present application;
FIG. 4 shows a feature extraction flow diagram for an eye image according to an embodiment of the application;
5 a-5 d show overall network structure schematics according to embodiments of the present application;
FIG. 6 shows a schematic structural diagram of a transform encoder according to an embodiment of the present application;
FIG. 7 shows a block diagram of an eye tracking apparatus applied to the meta universe according to an embodiment of the present application;
fig. 8 shows a schematic structural diagram of a terminal device or a server suitable for implementing the embodiments of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Fig. 1 shows a flowchart of an eye-tracking method applied to the meta universe according to an embodiment of the present disclosure. The method comprises the following steps:
and S110, acquiring the eye image of the user through the metacosmic shooting equipment.
In some embodiments, images of the user's eyes are captured by a head-mounted camera, VR camera, and/or other pseudocosmic capture device.
And S120, extracting the features in the eye image frame by frame to obtain a first feature map.
In some embodiments, the eye image is input into a feature extraction network, and features in the eye image are extracted frame by frame to obtain a first feature map.
The feature extraction network adopts mobileNetV2 as a network main body. The first convolution of the mobileNetV2 network is an ECB volume block (edge-oriented convolution block), discarding the last bottomleneck and all following convolution pooling full-connect operations of the conventional mobileNetV2 network.
Further, as shown in fig. 2, the ECB convolution block is used to guide the edge information of the web learning image, which is the result of adding a 3 × 3 convolution, an expansion convolution, a table edge extraction and a laplacian operator.
Specifically, as shown in fig. 3, the eye image is input into a mobileNetV2 network, wherein the eye image is input through an ECB convolution block, subjected to size modification and dimension lifting through a 1 × 1 network, then fused with the input of mobileNetV2, and finally subjected to a 3 × 3 convolution to obtain a final output of the network block, and further subjected to dimension change (number of parameters is reduced, and thus the amount of computation is reduced) of the features through a plurality of bottleneck layers to obtain an eye image feature map;
further, as shown in fig. 4, detecting an eye image feature map, if the current frame is eye-closed, that is, there is no eyeball, not tracking, re-shooting the image and calculating the next frame; if the eyes are not closed, checking whether a plurality of continuous frames exist, if so, storing the feature map for subsequent frames, and fusing the features of the current frame and other frames (continuous frames) to obtain a first feature map; and if the current frame does not exist, taking the current frame as the first feature map.
And S130, processing the first characteristic diagram based on a self-attention mechanism to obtain a second characteristic diagram.
In some embodiments, if the first feature map is a feature map of a single frame image, performing self-attention calculation separately to obtain a second feature map;
and if the first feature map is the feature map of a multi-frame image, performing self-attention calculation on the current frame, performing inter-frame self-attention calculation on the current frame and other frames, fusing inter-frame information after multiple rounds of inter-frame attention calculation, and extracting relatively important information (set according to an application scene) to obtain a second feature map.
Specifically, as shown in fig. 5a to 5d, the feature maps of the current frame (t) and at least one frame (t-1, t-2) before the current frame are respectively pulled into vectors of a preset size, so as to obtain a vector of each feature map;
preferably, the vector of the predetermined size is 1 ×
Figure 845850DEST_PATH_IMAGE009
Further, vector 1 x based on each feature map
Figure 478957DEST_PATH_IMAGE009
Obtaining an embedded matrix n of each frame
Figure 189424DEST_PATH_IMAGE009
(ii) a The image of each frame comprises n characteristic graphs; n is a positive integer;
further, according to the time sequence, the position of the vector in the embedding matrix is coded to obtain an embedding matrix n: (a)
Figure 917077DEST_PATH_IMAGE009
+ e) embedding matrix n × (
Figure 763810DEST_PATH_IMAGE009
+ e) inputting the second characteristic diagram into a preset transform coder to obtain a second characteristic diagram; the structure of the transform encoder is, as shown in fig. 6, composed of a projection matrix, a multi-head self-attention, a residual block, normalization, and convolution;
that is, Q, K, V matrices are calculated from the projection matrices, and the Q, K, V matrix of the current frame and the K, V matrix of the other frames (consecutive frames) are input into the multi-head self-attention, resulting in a first output result. Inputting the first output result and the embedded matrix of the current frame into a residual block for addition, performing normalization processing on the addition result to obtain a second output result, and inputting the second output result into a convolution block to obtain the output of a transform encoder;
further, the output result of the transform encoder is encoded (converted into a form of a feature map) according to the position of the eyeball, so as to obtain a second feature map.
In some embodiments, the multi-head self-attention comprises:
Figure 567818DEST_PATH_IMAGE001
wherein z is a multi-head self-attention calculation result;
Figure 749270DEST_PATH_IMAGE002
Figure 31347DEST_PATH_IMAGE003
Figure 998166DEST_PATH_IMAGE004
q, k, v representing the t-th frame, respectively;
Figure 222342DEST_PATH_IMAGE004
,
Figure 641822DEST_PATH_IMAGE005
) Show that
Figure 727590DEST_PATH_IMAGE004
And
Figure 548916DEST_PATH_IMAGE006
splicing the two matrixes;
Figure 943994DEST_PATH_IMAGE003
,
Figure 850770DEST_PATH_IMAGE007
) Show that
Figure 474649DEST_PATH_IMAGE003
And
Figure 665328DEST_PATH_IMAGE008
splicing the two matrixes;
furthermore, when multi-head self-attention calculation is carried out, the number of spliced matrixes is not limited; when the first frame of the video has only one matrix, the first frame is not spliced, and other frames can be spliced with one or more matrixes.
S140, carrying out key point detection on the second characteristic diagram to obtain an eyeball position; and drawing a key point track of the eyeball based on the eyeball position, predicting the eyeball position of the next frame and finishing eyeball tracking.
In some embodiments, the second feature map is subjected to keypoint detection to obtain an eyeball position. Namely, performing convolution once, convolution twice and convolution three times on the second feature map respectively to obtain outputs of three scales, and splicing the outputs of the three scales and then performing full connection to obtain the eyeball position.
Further, the eyeball position is correlated with the previous eyeball position, the track of key points of the eyeball is drawn, the eyeball position of the next frame is predicted, and the eyeball tracking is completed.
According to the embodiment of the disclosure, the following technical effects are achieved:
through the optimized mobileNetV2 network and the transform encoder, the association of the eyeballs on the time dimension is realized in the application scene of the meta-universe, and the efficiency and the accuracy of eyeball tracking are improved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required in this application.
The above is a description of method embodiments, and the embodiments of the present application are further described below by way of apparatus embodiments.
Fig. 7 shows a block diagram of an eye tracking apparatus 700 for application to the meta space according to an embodiment of the application, the apparatus 700 comprising, as shown in fig. 7:
an obtaining module 710, configured to obtain an eye image of a user through a metasma shooting device;
an extracting module 720, configured to extract features in the eye image frame by frame to obtain a first feature map;
the processing module 730 is configured to process the first feature map based on a self-attention mechanism to obtain a second feature map;
the tracking module 740 is configured to perform key point detection on the second feature map to obtain an eyeball position; and drawing a key point track of the eyeball based on the eyeball position, predicting the eyeball position of the next frame and finishing eyeball tracking.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the described module may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
Fig. 8 shows a schematic structural diagram of a terminal device or a server suitable for implementing the embodiments of the present application.
As shown in fig. 8, the terminal device or server 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, the above method flow steps may be implemented as a computer software program according to embodiments of the present application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present application when executed by the Central Processing Unit (CPU) 801.
It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor. Wherein the designation of a unit or module does not in some way constitute a limitation of the unit or module itself.
As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may be separate and not incorporated into the electronic device. The computer readable storage medium stores one or more programs that, when executed by one or more processors, perform the methods described herein.
The foregoing description is only exemplary of the preferred embodiments of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the application referred to in the present application is not limited to the embodiments with a particular combination of the above-mentioned features, but also encompasses other embodiments with any combination of the above-mentioned features or their equivalents without departing from the spirit of the application. For example, the above features may be replaced with (but not limited to) features having similar functions as those described in this application.

Claims (10)

1. An eye tracking method applied to a meta-universe, comprising:
acquiring an eye image of a user through a metauniverse shooting device;
extracting the features in the eye image frame by frame to obtain a first feature map;
processing the first characteristic diagram based on a self-attention mechanism to obtain a second characteristic diagram;
detecting key points of the second characteristic diagram to obtain eyeball positions; and drawing a key point track of the eyeball based on the eyeball position, predicting the eyeball position of the next frame and finishing eyeball tracking.
2. The method of claim 1, wherein extracting features from the eye image frame by frame to obtain a first feature map comprises:
extracting the features in the eye image frame by frame through a mobileNet V2 network to obtain an eye image feature map;
detecting the eye image feature map, and if the current frame is closed eye, not tracking; if the eyes are open, constructing a first feature map according to the eye image feature map;
wherein the first convolution of the mobilenetV2 network is an ECB volume block.
3. The method of claim 2, wherein the processing the first feature map based on the self-attention mechanism to obtain a second feature map comprises:
if the first feature map is a feature map of a single-frame image, independently performing self-attention calculation to obtain a second feature map;
and if the first feature map is the feature map of a plurality of frames of images, performing self-attention calculation on the current frame, performing self-attention calculation between the current frame and other frames, and fusing calculation results to obtain a second feature map.
4. The method according to claim 3, wherein if the first feature map is a feature map of a multi-frame image, performing self-attention calculation on the current frame and the frames of other frames, and fusing the calculation results to obtain a second feature map comprises:
respectively drawing the characteristic diagrams of the current frame and at least one frame before the current frame into vectors with preset sizes to obtain the vector of each characteristic diagram and an embedded matrix of each frame; the image of each frame comprises n characteristic graphs; n is a positive integer;
and according to the time sequence, carrying out position coding on the vectors in the embedded matrix, and inputting the embedded matrix subjected to the position coding into a preset transform coder to obtain a second characteristic diagram.
5. The method of claim 4, wherein the inputting the embedded matrix after position coding into a preset transform encoder to obtain a second feature map comprises:
the transform encoder is composed of a projection matrix, a multi-head self-attention, a residual block, normalization and convolution;
inputting the embedded matrix subjected to position coding into a preset transform coder, and respectively calculating Q, K, V matrixes through projection matrixes;
inputting Q, K, V matrix of current frame and K, V matrix of other frames into multi-head self attention to obtain first output result;
inputting the first output result and the embedded matrix of the current frame into a residual block for addition;
normalizing the addition result to obtain a second output result;
and inputting the second output result into a convolution block to obtain a second characteristic diagram.
6. The method of claim 5, wherein the multi-headed self-attention comprises:
Figure DEST_PATH_IMAGE001
wherein z is a multi-head self-attention calculation result;
Figure 350921DEST_PATH_IMAGE002
Figure 10572DEST_PATH_IMAGE003
Figure 703722DEST_PATH_IMAGE004
q, k, v representing the t-th frame, respectively;
Figure 234060DEST_PATH_IMAGE004
,
Figure 689050DEST_PATH_IMAGE005
) Show that
Figure 570419DEST_PATH_IMAGE004
And
Figure 332838DEST_PATH_IMAGE006
splicing the two matrixes;
Figure 983262DEST_PATH_IMAGE003
,
Figure 641777DEST_PATH_IMAGE007
) Show that
Figure 213704DEST_PATH_IMAGE003
And
Figure 747191DEST_PATH_IMAGE008
and splicing the two matrixes.
7. The method according to claim 6, wherein the performing the keypoint detection on the second feature map to obtain the eyeball position comprises:
performing convolution once, convolution twice and convolution three times on the second feature graph respectively to obtain outputs of three scales;
and splicing the outputs of the three scales and then carrying out full connection to obtain the eyeball position.
8. An eye tracking apparatus for application to the meta space, comprising:
the acquisition module is used for acquiring an eye image of a user through the metauniverse shooting equipment;
the extraction module is used for extracting the features in the eye image frame by frame to obtain a first feature map;
the processing module is used for processing the first characteristic diagram based on a self-attention mechanism to obtain a second characteristic diagram;
the tracking module is used for detecting key points of the second characteristic diagram to obtain the eyeball position; and drawing a key point track of the eyeball based on the eyeball position, predicting the eyeball position of the next frame and finishing eyeball tracking.
9. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, wherein the processor, when executing the computer program, implements the method of any of claims 1-7.
10. A computer-readable storage device, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202210759801.0A 2022-06-30 2022-06-30 Eyeball tracking method, device and equipment applied to metacarpal space Active CN114973391B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210759801.0A CN114973391B (en) 2022-06-30 2022-06-30 Eyeball tracking method, device and equipment applied to metacarpal space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210759801.0A CN114973391B (en) 2022-06-30 2022-06-30 Eyeball tracking method, device and equipment applied to metacarpal space

Publications (2)

Publication Number Publication Date
CN114973391A true CN114973391A (en) 2022-08-30
CN114973391B CN114973391B (en) 2023-03-21

Family

ID=82966680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210759801.0A Active CN114973391B (en) 2022-06-30 2022-06-30 Eyeball tracking method, device and equipment applied to metacarpal space

Country Status (1)

Country Link
CN (1) CN114973391B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984952A (en) * 2023-03-20 2023-04-18 杭州叶蓁科技有限公司 Eye movement tracking system and method based on bulbar conjunctiva blood vessel image recognition

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105917277A (en) * 2014-01-07 2016-08-31 视瑞尔技术公司 Display device for holographic reconstruction
CN106464673A (en) * 2014-05-02 2017-02-22 诺克诺克实验公司 Enhanced security for registration of authentication devices
CN108542404A (en) * 2018-03-16 2018-09-18 成都虚实梦境科技有限责任公司 Attention appraisal procedure, device, VR equipment and readable storage medium storing program for executing
CN110301143A (en) * 2016-12-30 2019-10-01 英特尔公司 Method and apparatus for radio communication
CN110378264A (en) * 2019-07-08 2019-10-25 Oppo广东移动通信有限公司 Method for tracking target and device
CN110502100A (en) * 2019-05-29 2019-11-26 中国人民解放军军事科学院军事医学研究院 Virtual reality exchange method and device based on eye-tracking
CN112184852A (en) * 2020-09-10 2021-01-05 珠海格力电器股份有限公司 Auxiliary drawing method and device based on virtual imaging, storage medium and electronic device
US20210012525A1 (en) * 2019-07-09 2021-01-14 David Kind, Inc. System and method for eyewear sizing
CN112748797A (en) * 2019-10-31 2021-05-04 Oppo广东移动通信有限公司 Eyeball tracking method and related equipment
US20210158023A1 (en) * 2018-05-04 2021-05-27 Northeastern University System and Method for Generating Image Landmarks
CN113255719A (en) * 2021-04-01 2021-08-13 北京迈格威科技有限公司 Target detection method, target detection device, electronic equipment and computer-readable storage medium
CN113361540A (en) * 2021-05-25 2021-09-07 商汤集团有限公司 Image processing method and device, electronic equipment and storage medium
CN113359448A (en) * 2021-06-03 2021-09-07 清华大学 Autonomous underwater vehicle track tracking control method aiming at time-varying dynamics
KR20210129503A (en) * 2020-04-20 2021-10-28 연세대학교 산학협력단 Object tracking apparatus and method using self-attention
WO2022003013A1 (en) * 2020-07-03 2022-01-06 Inivation Ag Eye tracking device, eye tracking method, and computer-readable medium
CN113946211A (en) * 2021-10-14 2022-01-18 网易有道信息技术(江苏)有限公司 Method for interacting multiple objects based on metauniverse and related equipment
CN114248893A (en) * 2022-02-28 2022-03-29 中国农业大学 Operation type underwater robot for sea cucumber fishing and control method thereof
CN114373217A (en) * 2022-01-20 2022-04-19 天津大学 High-robustness pupil positioning method
CN114494347A (en) * 2022-01-21 2022-05-13 北京科技大学 Single-camera multi-mode sight tracking method and device and electronic equipment
CN114548692A (en) * 2022-01-25 2022-05-27 浙江大学 Regional energy system multi-future scheduling optimization method and system based on metauniverse
CN114565953A (en) * 2020-11-27 2022-05-31 北京三星通信技术研究有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105917277A (en) * 2014-01-07 2016-08-31 视瑞尔技术公司 Display device for holographic reconstruction
CN106464673A (en) * 2014-05-02 2017-02-22 诺克诺克实验公司 Enhanced security for registration of authentication devices
CN110301143A (en) * 2016-12-30 2019-10-01 英特尔公司 Method and apparatus for radio communication
CN108542404A (en) * 2018-03-16 2018-09-18 成都虚实梦境科技有限责任公司 Attention appraisal procedure, device, VR equipment and readable storage medium storing program for executing
US20210158023A1 (en) * 2018-05-04 2021-05-27 Northeastern University System and Method for Generating Image Landmarks
CN110502100A (en) * 2019-05-29 2019-11-26 中国人民解放军军事科学院军事医学研究院 Virtual reality exchange method and device based on eye-tracking
CN110378264A (en) * 2019-07-08 2019-10-25 Oppo广东移动通信有限公司 Method for tracking target and device
US20210012525A1 (en) * 2019-07-09 2021-01-14 David Kind, Inc. System and method for eyewear sizing
CN112748797A (en) * 2019-10-31 2021-05-04 Oppo广东移动通信有限公司 Eyeball tracking method and related equipment
KR20210129503A (en) * 2020-04-20 2021-10-28 연세대학교 산학협력단 Object tracking apparatus and method using self-attention
WO2022003013A1 (en) * 2020-07-03 2022-01-06 Inivation Ag Eye tracking device, eye tracking method, and computer-readable medium
CN112184852A (en) * 2020-09-10 2021-01-05 珠海格力电器股份有限公司 Auxiliary drawing method and device based on virtual imaging, storage medium and electronic device
CN114565953A (en) * 2020-11-27 2022-05-31 北京三星通信技术研究有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN113255719A (en) * 2021-04-01 2021-08-13 北京迈格威科技有限公司 Target detection method, target detection device, electronic equipment and computer-readable storage medium
CN113361540A (en) * 2021-05-25 2021-09-07 商汤集团有限公司 Image processing method and device, electronic equipment and storage medium
CN113359448A (en) * 2021-06-03 2021-09-07 清华大学 Autonomous underwater vehicle track tracking control method aiming at time-varying dynamics
CN113946211A (en) * 2021-10-14 2022-01-18 网易有道信息技术(江苏)有限公司 Method for interacting multiple objects based on metauniverse and related equipment
CN114373217A (en) * 2022-01-20 2022-04-19 天津大学 High-robustness pupil positioning method
CN114494347A (en) * 2022-01-21 2022-05-13 北京科技大学 Single-camera multi-mode sight tracking method and device and electronic equipment
CN114548692A (en) * 2022-01-25 2022-05-27 浙江大学 Regional energy system multi-future scheduling optimization method and system based on metauniverse
CN114248893A (en) * 2022-02-28 2022-03-29 中国农业大学 Operation type underwater robot for sea cucumber fishing and control method thereof

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
JIEWEI YU 等: ""Error Analysis and Calibration Improvement of the Imaging Section in a Mueller Matrix Microscope"", 《APPLIED SCIENCES》 *
NAH, S. 等: ""NTIRE 2020 Challenge on Image and Video Deblurring"", 《ARXIV:2005.01244V1 [CS.CV]》 *
SHANG LI 等: ""Conference Issue: Intelligent Media Computing Technology and Applications for Mobile Internet"", 《HINDAWI WIRELESS COMMUNICATIONS AND MOBILE COMPUTING》 *
SRIVASTAVA, HARSHVARDHAN: ""Poirot at CMCL 2022 Shared Task: Zero Shot Crosslingual Eye-Tracking Data Prediction using Multilingual Transformer Models"", 《ARXIV ABS/2203.16474》 *
张塘昆: ""面向智能设备的轻量化环境感知和理解算法研究"", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
李海峰 等: ""元宇宙+教育:未来虚实融生的教育发展新样态"", 《现代远距离教育》 *
苟超 等: ""眼动跟踪研究进展与展望"", 《自动化学报 网络首发》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984952A (en) * 2023-03-20 2023-04-18 杭州叶蓁科技有限公司 Eye movement tracking system and method based on bulbar conjunctiva blood vessel image recognition
CN115984952B (en) * 2023-03-20 2023-11-24 杭州叶蓁科技有限公司 Eye movement tracking system and method based on bulbar conjunctiva blood vessel image recognition

Also Published As

Publication number Publication date
CN114973391B (en) 2023-03-21

Similar Documents

Publication Publication Date Title
WO2020199931A1 (en) Face key point detection method and apparatus, and storage medium and electronic device
US10410315B2 (en) Method and apparatus for generating image information
CN111368685B (en) Method and device for identifying key points, readable medium and electronic equipment
CN106664467B (en) Method, system, medium and the equipment of video data stream capture and abstract
CN111054080B (en) Method, device and equipment for intelligently detecting perspective plug-in and storage medium thereof
US20190138816A1 (en) Method and apparatus for segmenting video object, electronic device, and storage medium
US20230030431A1 (en) Method and apparatus for extracting feature, device, and storage medium
CN113435365B (en) Face image migration method and device
CN115861462B (en) Training method and device for image generation model, electronic equipment and storage medium
CN114973391B (en) Eyeball tracking method, device and equipment applied to metacarpal space
CN111723707A (en) Method and device for estimating fixation point based on visual saliency
JP2023530796A (en) Recognition model training method, recognition method, device, electronic device, storage medium and computer program
CN115497139A (en) Method for detecting and identifying face covered by mask and integrating attention mechanism
CN117237761A (en) Training method of object re-recognition model, object re-recognition method and device
CN116664603A (en) Image processing method, device, electronic equipment and storage medium
US20230115765A1 (en) Method and apparatus of transferring image, and method and apparatus of training image transfer model
CN112287945A (en) Screen fragmentation determination method and device, computer equipment and computer readable storage medium
CN116309158A (en) Training method, three-dimensional reconstruction method, device, equipment and medium of network model
CN113591838B (en) Target detection method, device, electronic equipment and storage medium
CN112861687B (en) Mask wearing detection method, device, equipment and medium for access control system
CN113177483B (en) Video object segmentation method, device, equipment and storage medium
CN113761965B (en) Motion capture method, motion capture device, electronic equipment and storage medium
CN114463734A (en) Character recognition method and device, electronic equipment and storage medium
WO2024051690A1 (en) Image restoration method and apparatus, and electronic device
CN115908982B (en) Image processing method, model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant