WO2023028171A1 - Systems and methods for using natural gaze dynamics to detect input recognition errors - Google Patents
Systems and methods for using natural gaze dynamics to detect input recognition errors Download PDFInfo
- Publication number
- WO2023028171A1 WO2023028171A1 PCT/US2022/041415 US2022041415W WO2023028171A1 WO 2023028171 A1 WO2023028171 A1 WO 2023028171A1 US 2022041415 W US2022041415 W US 2022041415W WO 2023028171 A1 WO2023028171 A1 WO 2023028171A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- gaze
- user interface
- computer
- tracking
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 78
- 230000003993 interaction Effects 0.000 claims abstract description 58
- 230000000246 remedial effect Effects 0.000 claims abstract description 34
- 230000004434 saccadic eye movement Effects 0.000 claims description 57
- 230000008859 change Effects 0.000 claims description 25
- 238000006073 displacement reaction Methods 0.000 claims description 23
- 230000015654 memory Effects 0.000 claims description 20
- 238000010801 machine learning Methods 0.000 claims description 16
- 239000006185 dispersion Substances 0.000 claims description 11
- 230000004044 response Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 6
- 238000012790 confirmation Methods 0.000 claims description 4
- 230000003287 optical effect Effects 0.000 description 29
- 230000006399 behavior Effects 0.000 description 28
- 210000001747 pupil Anatomy 0.000 description 27
- 230000033001 locomotion Effects 0.000 description 16
- 230000008569 process Effects 0.000 description 16
- 230000010076 replication Effects 0.000 description 15
- 238000001514 detection method Methods 0.000 description 14
- 238000011084 recovery Methods 0.000 description 13
- 230000009471 action Effects 0.000 description 12
- 230000000694 effects Effects 0.000 description 11
- 238000007427 paired t-test Methods 0.000 description 11
- 230000005855 radiation Effects 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000012937 correction Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000002474 experimental method Methods 0.000 description 8
- 230000004424 eye movement Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 230000000007 visual effect Effects 0.000 description 8
- 238000002347 injection Methods 0.000 description 7
- 239000007924 injection Substances 0.000 description 7
- 238000012898 one-sample t-test Methods 0.000 description 7
- 238000003860 storage Methods 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 210000003128 head Anatomy 0.000 description 5
- 238000005259 measurement Methods 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 230000004075 alteration Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 210000000554 iris Anatomy 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000012800 visualization Methods 0.000 description 4
- 230000003190 augmentative effect Effects 0.000 description 3
- 238000002790 cross-validation Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 210000000613 ear canal Anatomy 0.000 description 3
- 230000004438 eyesight Effects 0.000 description 3
- 239000011521 glass Substances 0.000 description 3
- 238000005286 illumination Methods 0.000 description 3
- 230000000116 mitigating effect Effects 0.000 description 3
- 238000009877 rendering Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 229920001621 AMOLED Polymers 0.000 description 2
- 241000226585 Antennaria plantaginifolia Species 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000001149 cognitive effect Effects 0.000 description 2
- 210000004087 cornea Anatomy 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000004418 eye rotation Effects 0.000 description 2
- 230000008713 feedback mechanism Effects 0.000 description 2
- 238000013383 initial experiment Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 210000001525 retina Anatomy 0.000 description 2
- 238000005096 rolling process Methods 0.000 description 2
- 230000000153 supplemental effect Effects 0.000 description 2
- 238000010408 sweeping Methods 0.000 description 2
- 210000003813 thumb Anatomy 0.000 description 2
- 238000002604 ultrasonography Methods 0.000 description 2
- 235000001808 Ceanothus spinosus Nutrition 0.000 description 1
- 241001264786 Ceanothus spinosus Species 0.000 description 1
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 1
- WHXSMMKQMYFTQS-UHFFFAOYSA-N Lithium Chemical compound [Li] WHXSMMKQMYFTQS-UHFFFAOYSA-N 0.000 description 1
- HBBGRARXTFLTSG-UHFFFAOYSA-N Lithium ion Chemical compound [Li+] HBBGRARXTFLTSG-UHFFFAOYSA-N 0.000 description 1
- 241000593989 Scardinius erythrophthalmus Species 0.000 description 1
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 241000746998 Tragus Species 0.000 description 1
- 230000004308 accommodation Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 210000000845 cartilage Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000006998 cognitive state Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013502 data validation Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 210000000720 eyelash Anatomy 0.000 description 1
- 230000004886 head movement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000003155 kinesthetic effect Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 229910052744 lithium Inorganic materials 0.000 description 1
- 229910001416 lithium ion Inorganic materials 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 201000005111 ocular hyperemia Diseases 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000019612 pigmentation Effects 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 208000014733 refractive error Diseases 0.000 description 1
- 238000005067 remediation Methods 0.000 description 1
- 230000002207 retinal effect Effects 0.000 description 1
- 210000001210 retinal vessel Anatomy 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000035807 sensation Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000021317 sensory perception Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 210000003462 vein Anatomy 0.000 description 1
- 210000000707 wrist Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/013—Eye tracking input arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/038—Indexing scheme relating to G06F3/038
- G06F2203/0381—Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer
Definitions
- the present disclosure is directed to systems and methods for using natural gaze dynamics to detect input recognition errors.
- Recognition-based input techniques are growing in popularity for augmented and virtual reality applications. These techniques must distinguish intentional input actions (e.g., the user performing a free-hand selection gesture) from all other user behaviors. When this recognition fails, two kinds of system errors can occur: false positives, where the system recognizes an input action when the user did not intentionally perform one, and false negatives, where the system fails to recognize an input action that was intentionally performed by the user.
- an input system could use this information to refine its recognition model to make fewer errors in the future. Additionally, the system could assist with error recovery if it could detect the errors soon enough after they occur. This capability would be particularly compelling for false positive errors. These false positive errors may be damaging to the user experience in part due to the attentional demands/costs to the user to detect and fix them when they occur. For example, if the system were to rapidly detect a false positive, it could increase the physical salience and size of an undo button or provide an “undo” confirmation dialogue.
- Gaze may be a compelling modality for this purpose because it may provide indications of fast, real-time changes in cognitive state, it may be tightly linked with behavior and gestures, and it may be sensitive to environmental inconsistencies.
- the present disclosure may focus on false positive errors because these have been shown to be particularly costly to users. Furthermore, there may be a number of emerging techniques that may aim to assist with false negative errors, such as bi-level thresholding, which may implicitly detect false negative errors through scores that are close to the recognizer threshold, and then adjusts the threshold to allow users to succeed when trying the gesture a second time.
- the systems and methods of the present disclosure may be distinct in that they may focus on detecting false positive errors.
- the systems and methods may also relate to the use of gaze to detect recognizer errors, as bi-level thresholding only focuses on the signal that the gesture recognizer uses.
- a computer-implemented method comprising: tracking a gaze of a user as the user interacts with a user interface; determining, based on tracking of the gaze of the user, that a detected user interaction with the user interface represents a false positive input inference by the user interface; and executing at least one remedial action based on determining that the detected user interaction represents the false positive input inference by the user interface.
- Tracking the gaze of the user may comprise extracting at least one gaze feature from the gaze of the user as the user interacts with the user interface.
- the at least one gaze feature may comprise at least one of: a fixation duration; an angular displacement between an initial fixation centroid and a subsequent fixation centroid; an angular displacement between an initial saccade centroid and a subsequent saccade centroid; an angular displacement between an initial saccade landing point and a subsequent saccade landing point; an amplitude of a saccade; a duration of a saccade; a fixation probability; a saccade probability; a gaze velocity; or a gaze dispersion.
- Determining, based on tracking of the gaze of the user, that the detected user interaction with the user interface represents the false positive input inference by the user interface may comprise: training, using gaze features of the user, a machine learning model to discriminate between true positive events and false positive events; and analyzing the tracked gaze of the user using the trained machine learning model.
- Determining, based on tracking of the gaze of the user, that the detected user interaction with the user interface represents the false positive input inference by the user interface may comprise: training, using gaze features of a group of users, a machine learning model to discriminate between true positive events and false positive events; and analyzing the tracked gaze of the user using the trained machine learning model.
- Executing the at least one remedial action may comprise receiving, via the user interface, user input associated with the false positive input inference.
- the method may further comprise determining, based on additional tracking of the gaze of the user and the user input associated with the false positive input inference, that an additional detected user interaction with the user interface represents an additional false positive input inference by the user interface.
- Executing the at least one remedial action may comprise: determining that the detected user interaction with the user interface caused a change in an application state of an application associated with the user interface; and automatically undoing the change in the application state.
- Executing the at least one remedial action may comprise presenting a notification within the user interface that indicates that a false positive input inference has occurred.
- the notification may further indicate that the detected user interaction caused a change in an application state of an application associated with the user interface.
- the notification may further comprise a confirmation control that enables the user to confirm the detected user interaction.
- the notification may comprise an undo control.
- the method may further comprise: receiving, via the undo control of the user interface, an instruction to undo a command executed as a result of the detected user interaction; and undoing, in response to receiving the instruction to undo the command executed as a result of the detected user interaction, the command executed as a result of the detected user interaction.
- a system configured to carry out the method of the first aspect, the system comprising: at least one physical processor; a memory; a tracking module, stored in the memory, that tracks the gaze of the user as the user interacts with the user interface; a determining module, stored in the memory, that determines that the detected user interaction with the user interface represents the false positive input inference by the user interface; and an executing module, stored in the memory, that executes the at least one remedial action.
- the determining module may determine, based on tracking of the gaze of the user, that the detected user interaction with the user interface represents the false positive input inference by the user interface by: training, using gaze features of the user, a machine learning model to discriminate between true positive events and false positive events; and analyzing the tracked gaze of the user using the trained machine learning model.
- the determining module may determine, based on tracking of the gaze of the user, that the detected user interaction with the user interface represents the false positive input inference by the user interface by: training, using gaze features of a group of users, a machine learning model to discriminate between true positive events and false positive events; and analyzing the tracked gaze of the user using the trained machine learning model.
- the executing module may execute the at least one remedial action by receiving, via the user interface, user input associated with the false positive input inference.
- the determining module may further determine, based on additional tracking of the gaze of the user and the user input associated with the false positive input inference, that an additional detected user interaction with the user interface represents an additional false positive input inference by the user interface.
- the executing module may execute the at least one remedial action by: determining that the detected user interaction with the user interface caused a change in an application state of an application associated with the user interface; and automatically undoing the change in the application state.
- a non-transitory computer-readable medium comprising computer-readable instructions that, when executed by at least one processor of a computing system, cause the computing system to carry out the method of the first aspect.
- the medium may be non-transitory.
- a computer program product comprising instructions which, when the computer program is executed by a computer, cause the computer to carry out the method of the first aspect.
- FIG. 1 shows an interface view of a study task interface in accordance with some examples provided herein.
- FIG. 2 shows example timelines for tile interaction around user clicks for true positives (e.g., intentional selections of a target) and false positives (e.g., injected selections on nontarget items).
- true positives e.g., intentional selections of a target
- false positives e.g., injected selections on nontarget items
- FIG. 3A through FIG. 3C show a set of plots that visualize a variety of time series of gaze data following true positive (TP) and false positive (FP) selections and may indicate whether there was a significant difference at each time point as per paired t-tests (as described above).
- FIG. 4A through FIG. 4D show a set of plots that may include area-under-the-curve (AUC) of the Receiver Operator Characteristic (ROC) (also “AUC-ROC” herein) scores from an individual model described herein.
- AUC area-under-the-curve
- ROC Receiver Operator Characteristic
- FIG. 5A through FIG. 5D show a set of plots that may include AUC-ROC scores from a group model described herein.
- FIG. 6A through FIG. 6C show a set of plots that may include a number of time series of gaze features from the matched participants in an original study and a replication study described herein.
- FIG. 7 shows a plot that shows the individual model results and the group model results as described herein.
- FIG. 8A through FIG. 8C show a set of plots of individual model averaged learning curves.
- FIG. 9A through FIG. 9C show a set of plots of group model learning curves.
- FIG. 10 shows a visualization of Ul changes following serial true positives and end true positives.
- FIG. 11A through FIG. 11 C show a set of plots that visualize the time series of the serial true positives and end true positives for each feature.
- FIG. 12A through FIG. 12D include a set of plots of AUC-ROC scores when the group model is tested on serial true positives and end true positives.
- FIG. 13A through FIG. 13D show a set of plots of AUC-ROC scores for the matched original and replication study participants.
- FIG. 14 is a block diagram of an example system for using natural gaze dynamics to detect input recognition errors.
- FIG. 15 is a block diagram of an example implementation of a system for using natural gaze dynamics to detect input recognition errors.
- FIG. 16 is a flow diagram of an example method for using natural gaze dynamics to detect input recognition errors.
- FIG. 17 is a flow diagram of example remedial actions and/or effects on a user experience of some examples described herein.
- FIG. 18 is an illustration of example augmented-reality glasses.
- FIG. 19 is an illustration of an example virtual-reality headset.
- FIG. 20 is an illustration of an example system that incorporates an eye-tracking subsystem capable of tracking a user’s eye(s).
- FIG. 21 is a more detailed illustration of various aspects of the eye-tracking subsystem illustrated in FIG. 20.
- FIG. 1 shows an interface view 100 of a study task interface. The study task involved uncovering and selecting target items using a ray-cast pointer.
- the pointer was enabled whenever participants rested their thumb on a touchpad of the HMD controller. On each “page”, six randomly selected tiles in a 3 x 3 grid were enabled. The user was instructed to search for a specified number of a target item (e.g., “Select 2 x green circles”). To reveal the contents of an enabled tile, the user was required to dwell on the tile for 1 .25 seconds. During the dwell period, a radial progress indicator progressively filled. Once the dwell time was completed, the tile flipped to reveal one of six icons (e.g., a green circle, a red heart, an orange triangle, a yellow star, a blue moon, or a purple plus).
- a green circle e.g., a green circle, a red heart, an orange triangle, a yellow star, a blue moon, or a purple plus.
- the user was directed to select the tile by briefly breaking and then re-engaging contact between the user’s thumb and the controller’s touchpad. If the tile was not selected within 1.0 seconds, the tile closed automatically. If selected, the tile would close 0.5 seconds following the click.
- the tile would be given a blue border, the ray-cast pointer would change to yellow, and a click sound would occur.
- a 1.0 second lockout was imposed following a click.
- the ray-cast pointer would temporarily change to grey to communicate the lockout state.
- FIG. 2 shows a set of timelines 200 that indicate timelines for tile interaction around user clicks for true positives (e.g., intentional selections of a target) and false positives (e.g., injected selections on non-target items).
- true positives e.g., intentional selections of a target
- false positives e.g., injected selections on non-target items.
- the system occasionally injected false positive errors when a user uncovered a non-target icon.
- a click was injected at a randomly selected time between 0.2 seconds and 0.5 seconds after the tile was opened or at the moment when the user’s ray-cast pointer left the tile, whichever occurred first.
- the non-target item would appear selected, and the click feedback would occur.
- the user was required to first re-open the tile and then click to de-select it.
- the system prevented the user from opening any other tiles until the error was corrected.
- Each participant experienced 12 “blocks” of the task described above, each consisting of 60 tile openings over a number of trials. Across all tile openings in a block, ⁇ 50% revealed target items, and the rest revealed a randomly selected non-target item; a total of 9 false positives were injected (9/60 trials, or 15% of the time). Before the start of each block, the icon to be used as the target item was communicated to the participant (e.g., “The target item for this block is the circle”). The order of the different target items was counterbalanced across participants using a balanced Latin square.
- the first step of pre-processing the gaze data involved transforming the 3D gaze vectors from the eye-in-head frame of reference to an eye-in-world direction using head orientation.
- Gaze data were then filtered to remove noise and unwanted segments before event detection and feature extraction. Data from the practice trials and breaks was discarded prior to analysis, and we remove all gaze samples where gaze velocity exceeds 800 degrees per second, indicating unfeasibly fast eye movements. All missing values were then replaced through interpolation. Finally, a median filter with a width of seven samples was applied to the gaze velocity signal to smooth the signal and account for noise prior to event detection.
- l-VT saccade detection was performed on the filtered gaze velocity by identifying consecutive samples that exceeded 700 degrees per second. A minimum duration of 17 ms and maximum duration of 200 ms was enforced for saccades. I-DT fixation detection was performed by computing dispersion over time windows as the largest angular displacement from the centroid of gaze samples. Time windows where dispersion did not exceed 1 degree were marked as fixations. A minimum duration of 50ms and maximum duration of 1.5 s was enforced for fixations.
- the inventors explored at least 10 total features including, without limitation, fixation duration, the angular displacement between fixation centroids, the angular displacement between the current and previous saccade centroids, the angular displacement between the current and previous saccade landing points, saccade amplitude, saccade duration, fixation probability, saccade probability, gaze velocity, and dispersion.
- both fixation durations and the distance between fixations and targets may be affected by incongruent scene information. Therefore, the inventors opted to look at fixation durations and the angular displacement between the current and previous fixation centroid.
- the angular displacement between fixation centroids may be related to how farthe eyes move from fixation to fixation (i.e., saccades). The inventors therefore also looked at several saccade features: the angular displacement between the current and previous saccade centroid, the angular displacement between the current and previous saccade landing points, saccade amplitude, and saccade duration.
- the dispersion algorithm requires a time parameter that indicates the amount of gaze data to be included in the computation. In some examples, this time parameter may be set to 1000ms.
- the inventors conducted a statistical analysis over the time series. To do so, the inventors computed the average value for each feature and each time point for each participant, the inventors then statistically compared each time point via a paired t-test to determine which points in time are statistically different for each feature. All 36 time points starting from 17ms to 600ms following selections were used. This resulted in 36 paired t-tests conducted for each feature. The false detection rate (FDR) correction was used to control for multiple comparisons across the lens sizes for each feature.
- FDR false detection rate
- the inventors trained and tested a set of logistic regression models. Importantly, to explore how quickly a system might detect a false positive error, the inventors trained models with varying time durations following the selection event, which the inventors refer to as the lens approach.
- the inventors used gaze data following the selection event (true and false) from 50 ms to 600 ms in 50 ms bins (e.g., a total of 12 lens sizes). The inventors set 600ms as the maximum time used since this was the average amount of time it took to select a new tile following a true selection.
- the inventors only used true selections that were followed by another selection and eliminated true selections that occurred at the end of a trial since true selections at the end of the trial were followed by unique graphical visualizations (i.e., shuffling of tiles) rather than the standard selection feedback, which might elicit different gaze behaviors.
- each sample was an eventual beta parameter.
- Model performance for prediction was measured using the area-under-the-curve (AUC) of the Receiver Operator Characteristic (ROC).
- AUC area-under-the-curve
- ROC Receiver Operator Characteristic
- the first set of models were trained and tested for each individual, which allowed the models to represent individual differences in gaze features. Individual models were trained on [0070] Group models were used to determine whether gaze behaviors that differentiate true selections from false positives are in fact consistent across people. Group models were trained on a leave one participant out cross-validation. Here, models were trained on N-1 datasets and tested on the left-out dataset.
- Any comparison of the AUC-ROC value at lens size was compared to chance (0.5) using a one-sample t-test. Any comparisons of two AUC-ROC values for a given lens size were conducted using paired t-tests. The false detection rate (FDR) correction was used to control for multiple comparisons across the lens sizes for each feature.
- FDR false detection rate
- FIGS. 3A through 3C show a set of plots 300 (e.g., plot 300(A) through plot 300(J)) that visualize a variety of time series of gaze data following true positive (TP) and false positive (FP) selections and may indicate whether there was a significant difference at each time point as per paired t-tests (as described above).
- TP true positive
- FP false positive
- FIGS. 3A through 3C visualize the time series of the fixation features, saccade features, and continuous features following the true positive (dashed line) and false positive (dashed and dotted line) selections. Brackets correspond to the points in the time series that were significantly different from each other per paired t-tests. Error bands correspond to one standard error of the mean.
- FIGS. 4A through 4C show a set of plots that may include AUC-ROC scores from the individual model.
- Plot 400 in FIG. 4A shows the AUC-ROC values for each lens size when considering all features at each lens size in the individual model.
- Plots 410 e.g., plot 410(A) through plot 410(J) in FIG. 4B through FIG. 4D
- Error bars refer to confidence intervals.
- the experimental results support a hypothesis that there are general gaze features that can discriminate between true selections and false positives across many participants. If a group model is effective even on a held-out participant, it may indicate that there are general patterns of gaze and that the general model can be useful even for entirely new users. If this is the case, then it is likely that a group model of gaze could be used as a cold start model in a system that is not yet personalized. As with the individual models, the inventors tested whether group models could detect errors above chance when considering individual features and when considering all gaze features.
- FIGS. 5A through 5D show a set of plots that may include AUC-ROC scores from the group model.
- Plot 500 in FIG. 5A shows the AUC-ROC values for each lens size when considering all features at each lens size in the group model.
- Plots 510 e.g., plot 510(A) through plot 510(J) in FIG. 5B through FIG. 5D
- Error bars refer to confidence intervals.
- One potential confound in the initial experiment may have been the method in which errors were injected. Specifically, errors were injected randomly within 200 and 500 ms of tile opening, or when the participant's cursor left the bounds of the tile. This latter criterion could perhaps introduce a confound because the false positive errors were more likely to occur during hand motion, which the inventors know to correlate with gaze motion. To address this potential concern, the inventors reran the experiment without this contingency; instead, the inventors randomly injected false positives based upon time alone (200 to 500 ms after a tile opened).
- FIGS. 6A through 6C show a set of plots 600 (e.g., plot 600(A) through plot 600(d)) that may include a plurality of time series of gaze features from the matched participants in the original and replication studies.
- the plot visualizes the time series of the fixation features, saccade features, and continuous features from the matched participants from the original and replication studies.
- time series corresponding to true positive selections are visualized from the original study (dashed line with speckled fill pattern in error area/bands) and the replication (dotted-and-dashed line with downward diagonal fill pattern in error area/bands) as well as adaptive false positives from the original study (dashed line with upward diagonal fill pattern in error area/bands) and time-based false positives from the replication (dotted line with grid fill pattern in error area/bands).
- Error areas/bands correspond to one standard error of the mean.
- model performance may differ between individual and group models.
- the inventors also compared the performance of the group model and the individual model for each participant. This was a useful comparison to determine whether a group model could be used as a cold start model in a system that had not yet been personalized. The inventors did this for the group and individual models containing all features for simplicity.
- FIG. 7 shows a plot 700 that shows the individual model results and the group model results. As shown in plot 700, overall, paired t-tests at each lens size show no significant difference between group model and individual model using the FDR family-wise correction ps > 0.05).
- FIGS. 8A through 8C show a set of plots 800 (e.g., plot 800(A) through plot 800(L)) of individual model averaged learning curves.
- FIGS. 9A through 9C show a set of plots 900 (e.g., plot 900(A) through plot 900(L)) of group model learning curves.
- the results showed that the group model had enough data but that the individual models would benefit from having more data. This suggests that although there was no significant difference in model performance between the group and individual models, the individual models would likely perform better than the group models if there was sufficient data to train the model.
- the lens model may be resilient to Ul changes and task changes following TP selections.
- An additional follow-up analysis tested whether the inventors' model was resilient to changes in the user interface (Ul) and task following true positive selections. This was important to test because it could be the case that the inventors' model learned behaviors that were specific to the Ul and task rather than behaviors that were general across Uls and tasks.
- FIG. 10 shows a visualization 1000 of Ul changes following serial true positives and end true positives. As shown, following serial true positives, there was no change in user interface as people selected a new tile. Following end true positives, however, the user interface changed, as tiles shuffled to indicate a new trial was going to occur.
- FIGS. 11 A through 11 C show a set of plots 1100 (e.g., plot 1100(A) through plot 1100(J)) that visualize the time series of the serial true positives and end true positives for each feature.
- the plots visualize the time series of the fixation features, saccade features, and continuous features following serial true positive selections (dashed line with speckled fill pattern in error area/bands), end true positive selections (dotted-and-dashed line with downward diagonal fill pattern in error area/band), and false positive selections (dashed line with upward diagonal fill pattern in error area/bands). Error areas/bands correspond to one standard error of the mean. Overall, the relationship between end true positives and false positives was similar to that of the serial true positives and false positives.
- the fixation duration model performed better on end true positives than serial true positives for all time points (FDR ps ⁇ 0.05) except for 600 ms (FDR ps > 0.05) according to a paired t-test. This is likely because end true positives appeared more separable from false positives than serial true positives. For the angular displacement between the previous and current fixation centroid, the model performed significantly better on end true positives than serial true positives at lens size 200 (FDR ps ⁇ 0.05). Conversely, the model was better able to separate serial true positives from false positives at lens size 350, 400, and 450 (FDR ps ⁇ 0.05).
- FIGS. 12A through 12D include a set of plots of AUC-ROC scores when the group model is tested on serial true positives and end true positives.
- Plot 1200 shows the AUC-ROC values when the group model (that has only seen serial true positives) is tested on serial true positives and end true positives at each lens size when considering all features in the group model.
- Plots 1210 e.g., plot 1210(A) through plot 1210(J) show the AUC-ROC values for the serial true positives and the end true positives each individual feature at each lens size. Error bars refer to confidence intervals.
- One group model was trained using the matched original study participants and a second was trained using the replication study participants at each lens size. Each of these models was tested using leave one out cross-validation, the inventors then compared the resulting AUC-ROC values for group models that were trained on individual features and group models that were trained on all features.
- FIGS. 13A through 13D show a set of plots of AUC-ROC scores for the matched original and replication study participants.
- Plot 1300 in FIG. 13A shows the AUC-ROC values for the matched original and replication participants when considering all features simultaneously.
- Plot 1310(A) through plot 1310(J), included in FIG. 13B through FIG. 13D, show the AUC-ROC values for the original and replication studies for each feature at each lens size. Error bars refer to confidence intervals.
- the AUC-ROC scores for each feature at each lens size were significantly greater than chance (FDR ps ⁇ 0.05) except for fixation durations at 50ms and saccade durations from 100 to 450ms in the replication when considering one-sample t-tests (FDR ps > 0.05).
- the inventors discovered that gaze features varied consistently following true selection events versus system-generated input errors. In fact, using gaze features alone, a simple machine learning model was able to discriminate true selections from false ones, demonstrating the potential utility of gaze for error detection. Importantly, the inventors found that the model could detect errors almost immediately (e.g., at 50 ms, 0.63 AUC-ROC), and that decoding performance increased as time continued (e.g., at 550 ms, 0.81 , AUC-ROC). The model performance peaked between 300ms and 550 ms, which suggests that systems might be able to leverage gaze dynamics to detect potential errors and provide low-friction error mediation.
- the model is capturing two types of signals as they relate to true and false selections.
- gaze behaviors that occur immediately after a selection reflect attention (or lack of) to the selected target. These behaviors occur within milliseconds of selection as evidenced by H1 (FIGS. 3A through 3D).
- the inventors’ model is likely capturing gaze behaviors related to noticing the error, which likely reflect attention to feedback and recognition of the need to reorient to the target to correct the error. These can be seen at later time frames in the figures provided herein (e.g., 300 ms to 450 ms).
- the inventors’ findings make intuitive sense with how users orient their gaze following true selections and false positive errors across interaction tasks. Indeed, the inventors' tile task mimics a broad class of situations (e.g., photo selection, movie selection, typing on calculator) where false positives occur in practice.
- a user might have focused attention on an interface element (e.g., a movie preview) but decided not to interact with it (e.g., select the movie).
- errors occur as their gaze is mid-flight to another selection (e.g., a movie is falsely selected). Once they receive feedback (e.g., the movie starts playing), they must reorient their gaze back to the erroneously selected target. While the inventors' study provided the first proof-of-concept that gaze is sensitive to errors and needs to be confirmed with future work, the pattern of behaviors observed leads us to believe that this pattern would generalize to new tasks and interfaces.
- gaze is sensitive to user input following a selection. Because gaze is sensitive to the onset and offset of intentional user input, this suggests that by treating user behaviors continuously (e.g., capturing user behavior before, during, and after an event), systems may produce stronger model performance than if they treat these behaviors as a one-off event.
- FIG. 14 is a block diagram of an example system 1400 for using natural gaze dynamics to detect input recognition errors.
- example system 1400 may include one or more modules 1402 for performing one or more tasks.
- modules 1402 may include a tracking module 1404 that tracks a gaze of a user as the user interacts with a user interface (e.g., user interface 1440, described below).
- Example system 1400 may also include a determining module 1406 that determines, based on tracking of the gaze of the user, that a detected user interaction with the user interface represents a false positive input inference by the user interface.
- example system 1400 may also include an executing module 1408 that may execute at least one remedial action based on determining that the detected user interaction represents the false positive input inference by the user interface.
- example system 1400 may also include one or more memory devices, such as memory 1420.
- Memory 1420 generally represents any type or form of volatile or non-volatile storage device or medium capable of storing data and/or computer- readable instructions.
- memory 1420 may store, load, and/or maintain one or more of modules 1402.
- Examples of memory 1420 include, without limitation, Random Access Memory (RAM), Read Only Memory (ROM), flash memory, Hard Disk Drives (HDDs), Solid- State Drives (SSDs), optical disk drives, caches, variations or combinations of one or more of the same, or any other suitable storage memory.
- example system 1400 may also include one or more physical processors, such as physical processor 1430.
- Physical processor 1430 generally represents any type or form of hardware-implemented processing unit capable of interpreting and/or executing computer-readable instructions.
- physical processor 1430 may access and/or modify one or more of modules 1402 stored in memory 1420. Additionally or alternatively, physical processor 1430 may execute one or more of modules 1402 to facilitate using natural gaze dynamics to detect input recognition errors.
- Examples of physical processor 1430 include, without limitation, microprocessors, microcontrollers, central processing units (CPUs), Field-Programmable Gate Arrays (FPGAs) that implement softcore processors, Application-Specific Integrated Circuits (ASICs), portions of one or more of the same, variations or combinations of one or more of the same, or any other suitable physical processor.
- CPUs central processing units
- FPGAs Field-Programmable Gate Arrays
- ASICs Application-Specific Integrated Circuits
- example system 1400 may also include a user interface 140 with an interface element 142. As described herein, example system 1400 may track a gaze of a user as a user interacts with user interface 1440 and/or user interface element 1442.
- User interface 1440 may include and/or represent any suitable user interface including, without limitation, a graphical user interface, an auditory computer interface, a tactile user interface, and so forth.
- system 1400 may be connected to system 1400 in FIG. 14. Conversely, all of the components and devices illustrated in FIG. 14 need not be present to practice the examples described and/or illustrated herein.
- the devices and subsystems referenced above may also be interconnected in different ways from those shown in FIG. 14.
- System 1400 may also employ any number of software, firmware, and/or hardware configurations.
- one or more of the examples disclosed herein may be encoded as a computer program (also referred to as computer software, software applications, computer-readable instructions, and/or computer control logic) on a computer-readable medium.
- Example system 1400 in FIG. 14 may be implemented in a variety of ways.
- example system 1400 may represent portions of an example system 1500 (“system 1500”) in FIG. 15.
- system 1500 may include a computing device 1502.
- computing device 1502 may be programmed with one or more of modules 1402.
- one or more of modules 1402 from FIG. 14 may, when executed by computing device 1502, enable computing device 1502 to track a gaze of a user as the user interacts with a user interface.
- tracking module 1404 may cause computing device 1502 to track (e.g., via an eye tracking subsystem 1508) a gaze (e.g., 1504) of a user (e.g., user 1506) as the user interacts with a user interface (e.g., user interface 1440).
- tracking module 1404 may track the gaze of the user by extracting at least one gaze feature (e.g., gaze feature 1510) from the gaze of the user.
- determining module 1406 may cause computing device 1502 to determine, based on tracking of the gaze of the user, that a detected user interaction with the user interface (e.g., detected user interaction 1512) represents a false positive input inference (e.g., “false positive 1514” in FIG. 5) by the user interface.
- executing module 1408 may cause computing device 1502 to execute at least one remedial action (e.g., remedial action 1516) based on determining that the detected user interaction represents the false positive input inference by the user interface.
- Computing device 1502 generally represents any type or form of computing device capable of reading and/or executing computer-executable instructions. Examples of computing device 1502 may include, without limitation, servers, desktops, laptops, tablets, cellular phones, (e.g., smartphones), personal digital assistants (PDAs), multimedia players, embedded systems, wearable devices (e.g., smart watches, smart glasses, etc.), gaming consoles, combinations of one or more of the same, or any other suitable mobile computing device.
- PDAs personal digital assistants
- multimedia players e.g., embedded systems
- wearable devices e.g., smart watches, smart glasses, etc.
- gaming consoles e.g., Sony PlayStation 4, Microsoft Xbox One, etc.
- computing device 1502 may be a computing device programmed with one or more of modules 1402. All or a portion of the functionality of modules 1402 may be performed by computing device 1502. As will be described in greater detail below, one or more of modules 1402 from FIG. 14 may, when executed by at least one processor of computing device 1502, may enable computing device 1502 to use natural gaze dynamics to detect input recognition errors.
- FIG. 14 Many other devices or subsystems may be connected to system 1400 in FIG. 14 and/or system 1500 in FIG. 15. Conversely, all of the components and devices illustrated in FIGS. 14 and 15 need not be present to practice the examples described and/or illustrated herein.
- the devices and subsystems referenced above may also be interconnected in different ways from those shown in FIG. 15.
- Systems 1400 and 1500 may also employ any number of software, firmware, and/or hardware configurations.
- one or more of the examples disclosed herein may be encoded as a computer program (also referred to as computer software, software applications, computer-readable instructions, and/or computer control logic) on a computer-readable medium.
- FIG. 16 is a flow diagram of an example computer-implemented method 1600 for allocating shared resources in multi-tenant environments.
- the steps shown in FIG. 16 may be performed by any suitable computer-executable code and/or computing system, including system 1400 in FIG. 1 and/or variations or combinations thereof.
- each of the steps shown in FIG. 16 may represent an algorithm whose structure includes and/or is represented by multiple sub-steps, examples of which will be provided in greater detail below.
- one or more of the systems described herein may track a gaze of a user as the user interacts with a user interface.
- tracking module 1404 in FIG. 14 may, as part of computing device 1502 in FIG.
- Tracking module 1404 may track gaze 1504 in any suitable way, such as via an eye tracking subsystem 1508. Additional explanations, examples, and illustrations of eye tracking subsystems will be provided below in reference to FIGS. 20 and 21 .
- one or more of the systems described herein may determine, based on tracking of the gaze of the user, that a detected user interaction with the user interface represents a false positive input inference by the user interface. For example, determining module 1406 in FIG. 14 may, as part of computing device 1502 in FIG. 15, cause computing device 1502 to determine, based on tracking of the gaze of the user (e.g., by tracking module 1404 and/or eye tracking subsystem 1508), that detected user interaction 1512 with user interface 1440 represents a false positive input inference 1514 by user interface 1440.
- Determining module 1406 may determine that detected user interaction 1512 represents a false positive input inference 1514 in a variety of contexts. For example, as described above in reference to FIGS. 1-14, one or more of modules 1402 may extract at least one gaze feature from tracking data generated by tracking module 1404 (e.g., via eye tracking subsystem 1508).
- a gaze feature may include, without limitation, a fixation duration, an angular displacement between an initial fixation centroid and a subsequent fixation centroid, an angular displacement between an initial saccade centroid and a subsequent saccade centroid, an angular displacement between an initial saccade landing point and a subsequent saccade landing point, an amplitude of a saccade, a duration of a saccade, a fixation probability, a saccade probability, a gaze velocity, a gaze dispersion, and so forth.
- Determining module 1406 may use gaze features of user 1506 and/or gaze features of a group of users to train a machine learning model to discriminate between true positive events and false positive events in any of the ways described herein, such as those disclosed above in reference to FIGS. 1-14. Determining module 1406 may further analyze the tracked gaze of user 1506 using the trained machine learning model in any of the ways described herein, such as those disclosed above in reference to FIGS. 1-14. This may enable determining module 1406 to determine that a detected user interaction with a user interface (e.g., detected user interaction 1512) represents a false positive input inference (e.g., false positive input inference 1514).
- a detected user interaction with a user interface e.g., detected user interaction 1512
- a false positive input inference e.g., false positive input inference 1514
- one or more of the systems described herein may execute at least one remedial action based on determining that the detected user interaction represents the false positive input inference by the user interface.
- executing module 1408 in FIG. 14 may execute remedial action 1516 based on determining (e.g., by determining module 1406) that detected user interaction 1512 represents a false positive input inference (e.g., false positive 1514) by user interface 1440.
- Executing module 1408 may execute a variety of remedial actions in a variety of contexts.
- the capabilityto detect when a gesture recognizer e.g., tracking module 1404, user interface 1440, etc.
- a gesture recognizer e.g., tracking module 1404, user interface 1440, etc.
- interactive meditation techniques may assist the user with error recovery.
- a false positive detected user interaction occurs as a user interacts with a user interface
- the false positive may result in providing of unintended input to the system.
- the system is configured to provide feedback associated with user input (e.g., visual feedback, haptic feedback, auditory feedback, etc.)
- the system may provide such feedback in response to the false positive.
- input resulting from the false positive may cause one or more changes to a state of an application associated with the user interface (e.g., selecting an item the user did not intend to select).
- Executing module 1408 may execute one or more remedial actions to aid a user in error recovery.
- error recovery may include cognitive and behavioral actions that a user must take in response to the consequences of an unintended input. For example, in the case where a false positive causes an item to be selected, the user may recover by identifying that an item has been unintentionally selected and de-selecting that item. In the case where no change to application state has occurred, error recovery may involve the user confirming that the unintended input did not change the application state.
- a first step to error recovery for the user may be to notice that the error has occurred, and to understand whether and what changes to application state have been made as a result of the unintended input.
- Executing module 1408 may execute one or more remedial actions to aid the user by indicating that a false positive error may have occurred and highlighting any changes to an application state that may have resulted from the associated input to the system. For example, in a system where the user can select items, executing module 1408 may provide a glowing outline around recently selected objects, which may fade after a short period of time.
- executing module 1408 may provide an indication that no change to application state has occurred as a result of a possible gesture FP error. This may help the user confirm that the input did not make any changes and remove any need for the user to confirm this by inspecting the interface for changes. [0139] In some examples, where an input resulting from a false positive has resulted in changes to the application state, executing module 1408 may facilitate the user in reversing these changes. For example, executing module 1408 may display, within user interface 1440, a prominent button that, when interacted with by user 1506, may cause executing module 1408 to undo the change. Likewise, an undo action could be mapped to a micro-gesture or easy-to-access button on an input device.
- Recovery facilitation techniques can provide benefit by providing more consistent means of reversing unintended results caused by false positive detected user interaction errors (e.g., the same method, across many system actions), and also by making the recovery action easier to perform (e.g., an ‘Undo’ button on a delete file operation, in place of a multi-action process of navigating to the Recycle Bin, locating the deleted file, and clicking Restore).
- executing module 1408 may automatically reverse the changes to the application state on behalf of the user.
- automatic recovery operations may include and/or employ the previous techniques of notification and recovery facilitation. This may avoid, mitigate, or resolve some challenges that such automatic recovery operations may introduce.
- one or more of modules 1402 may further incorporate information on the user's behavior over longer time scales to aid in detection and/or remediation of input errors.
- information on the user's behavior may stand out as clearly distinct from the others.
- One or more of the systems described herein may use this ‘semantic’ information on the user's actions along with gaze information to produce a more holistic model of user actions and to determine whether detected user interactions represent false positives.
- one or more of modules 1402 e.g., tracking module 1404, determining module 1406, executing module 1408, etc.
- FIG. 17 includes a flow diagram 1700 that illustrates example remedial actions and/or effects on a user experience of an automatic error recovery operation.
- a user interface e.g., user interface 1440
- may recognize or receive a click gesture e.g., detected user interaction 1512
- register that a click has occurred and change an application state.
- flow diagram 1700 distinguishes whether a user (e.g., user 1506) intended the user interface to recognize or receive the click gesture. If no (i.e., the user interface or gesture recognizer receives a false positive), at decision 1706, one or more of the systems described herein (e.g., determining module 1406) may determine whether a detection error has occurred. If yes (i.e., determining module 1406 determines that detected user interaction 1512 is a false positive), then, at process 1708, one or more of modules 1402 (e.g., executing module 1408) may execute a remedial action (e.g., remedial action 1516) by automatically undoing or rolling back changes to an application state and notifying the user with a dialog. If no, at process 1710 (i.e., determining module 1406 does not determine that detected user interaction 1512 is a false positive), the systems and methods described herein may execute no remedial action and/or an alternative action.
- a remedial action e.g., remedial action 1516
- one or more of the systems described herein may determine whether a detection error has occurred. If no (i.e., determining module 1406 determines that detected user interaction 1512 is a true negative), then, at process 1714, the systems and methods described herein may execute no remedial action and/or an alternative action.
- one or more of modules 1402 may, at process 1716, execute a remedial action (e.g., remedial action 1516) by automatically undoing or rolling back changes to an application state and notifying the user with a dialog.
- a remedial action e.g., remedial action 1516
- the disclosed systems and methods may provide one or more advantages. For example, by determining that a detected user interaction represents a false positive input inference by a user interface, an example of the disclosed systems and methods could use this information to take one or more remedial actions to refine the user interface’s recognition model to make fewer errors in the future. Additionally, the system could assist with error recovery if it could detect the errors soon enough after they occur. This capability may be particularly compelling for false positive errors. These false positive errors may be damaging to the user experience in part due to the attentional demands/costs to the user to detect and fix them when they occur. For example, if the system were to rapidly detect a false positive, it could increase the physical salience and size of an undo button or provide an “undo” confirmation dialogue.
- Embodiments of the present disclosure may include or be implemented in conjunction with various types of artificial reality systems.
- Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, for example, a virtual reality, an augmented reality, a mixed reality, a hybrid reality, or some combination and/or derivative thereof.
- Artificial-reality content may include completely computer-generated content or computer-generated content combined with captured (e.g., real-world) content.
- the artificial-reality content may include video, audio, haptic feedback, or some combination thereof, any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional (3D) effect to the viewer).
- artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, for example, create content in an artificial reality and/or are otherwise used in (e.g., to perform activities in) an artificial reality.
- Artificial-reality systems may be implemented in a variety of different form factors and configurations. Some artificial reality systems may be designed to work without near-eye displays (NEDs). Other artificial reality systems may include an NED that also provides visibility into the real world (such as, e.g., augmented-reality system 1800 in FIG. 18) or that visually immerses a user in an artificial reality (such as, e.g., virtual-reality system 1900 in FIG. 19). While some artificial-reality devices may be self-contained systems, other artificial-reality devices may communicate and/or coordinate with external devices to provide an artificialreality experience to a user. Examples of such external devices include handheld controllers, mobile devices, desktop computers, devices worn by a user, devices worn by one or more other users, and/or any other suitable external system.
- augmented-reality system 1800 may include an eyewear device 1802 with a frame 1810 configured to hold a left display device 1815(A) and a right display device 1815(B) in front of a user’s eyes.
- Display devices 1815(A) and 1815(B) may act together or independently to present an image or series of images to a user.
- augmented- reality system 1800 includes two displays, embodiments of this disclosure may be implemented in augmented-reality systems with a single NED or more than two NEDs.
- augmented-reality system 1800 may include one or more sensors, such as sensor 1840.
- Sensor 1840 may generate measurement signals in response to motion of augmented-reality system 1800 and may be located on substantially any portion of frame 1810.
- Sensor 1840 may represent one or more of a variety of different sensing mechanisms, such as a position sensor, an inertial measurement unit (IMU), a depth camera assembly, a structured light emitter and/or detector, or any combination thereof.
- IMU inertial measurement unit
- augmented-reality system 1800 may or may not include sensor 1840 or may include more than one sensor.
- the IMU may generate calibration data based on measurement signals from sensor 1840.
- Examples of sensor 1840 may include, without limitation, accelerometers, gyroscopes, magnetometers, other suitable types of sensors that detect motion, sensors used for error correction of the IMU, or some combination thereof.
- augmented-reality system 1800 may also include a microphone array with a plurality of acoustic transducers 1820(A)- 1820(J), referred to collectively as acoustic transducers 1820.
- Acoustic transducers 1820 may represent transducers that detect air pressure variations induced by sound waves.
- Each acoustic transducer 1820 may be configured to detect sound and convert the detected sound into an electronic format (e.g., an analog or digital format).
- 18 may include, for example, ten acoustic transducers: 1820(A) and 1820(B), which may be designed to be placed inside a corresponding ear of the user, acoustic transducers 1820(C), 1820(D), 1820(E), 1820(F), 1820(G), and 1820(H), which may be positioned at various locations on frame 1810, and/or acoustic transducers 1820(1) and 1820(J), which may be positioned on a corresponding neckband 1805.
- ten acoustic transducers 1820(A) and 1820(B), which may be designed to be placed inside a corresponding ear of the user
- acoustic transducers 1820(C), 1820(D), 1820(E), 1820(F), 1820(G), and 1820(H) which may be positioned at various locations on frame 1810
- acoustic transducers 1820(1) and 1820(J) which may be positioned on a
- acoustic transducers 1820(A)-(J) may be used as output transducers (e.g., speakers).
- acoustic transducers 1820(A) and/or 1820(B) may be earbuds or any other suitable type of headphone or speaker.
- the configuration of acoustic transducers 1820 of the microphone array may vary. While augmented-reality system 1800 is shown in FIG. 18 as having ten acoustic transducers 1820, the number of acoustic transducers 1820 may be greater or less than ten. In some examples, using higher numbers of acoustic transducers 1820 may increase the amount of audio information collected and/or the sensitivity and accuracy of the audio information. In contrast, using a lower number of acoustic transducers 1820 may decrease the computing power required by an associated controller 1850 to process the collected audio information. In addition, the position of each acoustic transducer 1820 of the microphone array may vary. For example, the position of an acoustic transducer 1820 may include a defined position on the user, a defined coordinate on frame 1810, an orientation associated with each acoustic transducer 1820, or some combination thereof.
- Acoustic transducers 1820(A) and 1820(B) may be positioned on different parts of the user’s ear, such as behind the pinna, behind the tragus, and/or within the auricle or fossa. Or, there may be additional acoustic transducers 1820 on or surrounding the ear in addition to acoustic transducers 1820 inside the ear canal. Having an acoustic transducer 1820 positioned next to an ear canal of a user may enable the microphone array to collect information on how sounds arrive at the ear canal.
- augmented- reality system 1800 may simulate binaural hearing and capture a 3D stereo sound field around about a user’s head.
- acoustic transducers 1820(A) and 1820(B) may be connected to augmented-reality system 1800 via a wired connection 1830, and in other examples acoustic transducers 1820(A) and 1820(B) may be connected to augmented-reality system 1800 via a wireless connection (e.g., a BLUETOOTH connection).
- acoustic transducers 1820(A) and 1820(B) may not be used at all in conjunction with augmented-reality system 1800.
- Acoustic transducers 1820 on frame 1810 may be positioned in a variety of different ways, including along the length of the temples, across the bridge, above or below display devices 1815(A) and 1815(B), or some combination thereof. Acoustic transducers 1820 may also be oriented such that the microphone array is able to detect sounds in a wide range of directions surrounding the user wearing the augmented-reality system 1800. In some examples, an optimization process may be performed during manufacturing of augmented- reality system 1800 to determine relative positioning of each acoustic transducer 1820 in the microphone array.
- augmented-reality system 1800 may include or be connected to an external device (e.g., a paired device), such as neckband 1805.
- Neckband 1805 generally represents any type or form of paired device.
- the following discussion of neckband 1805 may also apply to various other paired devices, such as charging cases, smart watches, smart phones, wrist bands, other wearable devices, hand-held controllers, tablet computers, laptop computers, other external computing devices, etc.
- neckband 1805 may be coupled to eyewear device 1802 via one or more connectors.
- the connectors may be wired or wireless and may include electrical and/or nonelectrical (e.g., structural) components.
- eyewear device 1802 and neckband 1805 may operate independently without any wired or wireless connection between them.
- FIG. 18 illustrates the components of eyewear device 1802 and neckband 1805 in example locations on eyewear device 1802 and neckband 1805, the components may be located elsewhere and/or distributed differently on eyewear device 1802 and/or neckband 1805.
- the components of eyewear device 1802 and neckband 1805 may be located on one or more additional peripheral devices paired with eyewear device 1802, neckband 1805, or some combination thereof.
- Pairing external devices such as neckband 1805
- augmented-reality eyewear devices may enable the eyewear devices to achieve the form factor of a pair of glasses while still providing sufficient battery and computation power for expanded capabilities.
- Some or all of the battery power, computational resources, and/or additional features of augmented-reality system 1800 may be provided by a paired device or shared between a paired device and an eyewear device, thus reducing the weight, heat profile, and form factor of the eyewear device overall while still retaining desired functionality.
- neckband 1805 may allow components that would otherwise be included on an eyewear device to be included in neckband 1805 since users may tolerate a heavier weight load on their shoulders than they would tolerate on their heads.
- Neckband 1805 may also have a larger surface area over which to diffuse and disperse heat to the ambient environment. Thus, neckband 1805 may allow for greater battery and computation capacity than might otherwise have been possible on a standalone eyewear device. Since weight carried in neckband 1805 may be less invasive to a user than weight carried in eyewear device 1802, a user may tolerate wearing a lighter eyewear device and carrying or wearing the paired device for greater lengths of time than a user would tolerate wearing a heavy standalone eyewear device, thereby enabling users to more fully incorporate artificial reality environments into their day-to-day activities.
- Neckband 1805 may be communicatively coupled with eyewear device 1802 and/or to other devices. These other devices may provide certain functions (e.g., tracking, localizing, depth mapping, processing, storage, etc.) to augmented-reality system 1800.
- neckband 1805 may include two acoustic transducers (e.g., 1820(1) and 1820(J)) that are part of the microphone array (or potentially form their own microphone subarray).
- Neckband 1805 may also include a controller 1825 and a power source 1835.
- Acoustic transducers 1820(1) and 1820(J) of neckband 1805 may be configured to detect sound and convert the detected sound into an electronic format (analog or digital).
- acoustic transducers 1820(1) and 1820(J) may be positioned on neckband 1805, thereby increasing the distance between the neckband acoustic transducers 1820(1) and 1820(J) and other acoustic transducers 1820 positioned on eyewear device 1802.
- increasing the distance between acoustic transducers 1820 of the microphone array may improve the accuracy of beamforming performed via the microphone array.
- the determined source location of the detected sound may be more accurate than if the sound had been detected by acoustic transducers 1820(D) and 1820(E).
- Controller 1825 of neckband 1805 may process information generated by the sensors on neckband 1805 and/or augmented-reality system 1800.
- controller 1825 may process information from the microphone array that describes sounds detected by the microphone array.
- controller 1825 may perform a direction-of-arrival (DOA) estimation to estimate a direction from which the detected sound arrived at the microphone array.
- DOA direction-of-arrival
- controller 1825 may populate an audio data set with the information.
- controller 1825 may compute all inertial and spatial calculations from the IMU located on eyewear device 1802.
- a connector may convey information between augmented-reality system 1800 and neckband 1805 and between augmented-reality system 1800 and controller 1825.
- the information may be in the form of optical data, electrical data, wireless data, or any other transmittable data form. Moving the processing of information generated by augmented-reality system 1800 to neckband 1805 may reduce weight and heat in eyewear device 1802, making it more comfortable for the user.
- Power source 1835 in neckband 1805 may provide power to eyewear device 1802 and/or to neckband 1805. Power source 1835 may include, without limitation, lithium-ion batteries, lithium-polymer batteries, primary lithium batteries, alkaline batteries, or any other form of power storage. In some cases, power source 1835 may be a wired power source. Including power source 1835 on neckband 1805 instead of on eyewear device 1802 may help better distribute the weight and heat generated by power source 1835.
- some artificial reality systems may, instead of blending an artificial reality with actual reality, substantially replace one or more of a user’s sensory perceptions of the real world with a virtual experience.
- a head-worn display system such as virtual-reality system 1900 in FIG. 19, that mostly or completely covers a user’s field of view.
- Virtual-reality system 1900 may include a front rigid body 1902 and a band 1904 shaped to fit around a user’s head.
- Virtual-reality system 1900 may also include output audio transducers 1906(A) and 1906(B).
- front rigid body 1902 may include one or more electronic elements, including one or more electronic displays, one or more inertial measurement units (IMUs), one or more tracking emitters or detectors, and/or any other suitable device or system for creating an artificial-reality experience.
- IMUs inertial measurement units
- Artificial reality systems may include a variety of types of visual feedback mechanisms.
- display devices in augmented-reality system 1800 and/or virtual-reality system 1900 may include one or more liquid crystal displays (LCDs), light emitting diode (LED) displays, microLED displays, organic LED (OLED) displays, digital light projector (DLP) microdisplays, liquid crystal on silicon (LCoS) micro-displays, and/or any other suitable type of display screen.
- LCDs liquid crystal displays
- LED light emitting diode
- OLED organic LED
- DLP digital light projector
- LCD liquid crystal on silicon
- These artificial reality systems may include a single display screen for both eyes or may provide a display screen for each eye, which may allow for additional flexibility for varifocal adjustments or for correcting a user’s refractive error.
- Some of these artificial reality systems may also include optical subsystems having one or more lenses (e.g., concave or convex lenses, Fresnel lenses, adjustable liquid lenses, etc.) through which a user may view a display screen.
- optical subsystems may serve a variety of purposes, including to collimate (e.g., make an object appear at a greater distance than its physical distance), to magnify (e.g., make an object appear larger than its actual size), and/or to relay (to, e.g., the viewer’s eyes) light.
- optical subsystems may be used in a non-pupil-forming architecture (such as a single lens configuration that directly collimates light but results in so-called pincushion distortion) and/or a pupil-forming architecture (such as a multi-lens configuration that produces so-called barrel distortion to nullify pincushion distortion).
- a non-pupil-forming architecture such as a single lens configuration that directly collimates light but results in so- called pincushion distortion
- a pupil-forming architecture such as a multi-lens configuration that produces so-called barrel distortion to nullify pincushion distortion
- some of the artificial reality systems described herein may include one or more projection systems.
- display devices in augmented-reality system 1800 and/or virtual-reality system 1900 may include micro-LED projectors that project light (using, e.g., a waveguide) into display devices, such as clear combiner lenses that allow ambient light to pass through.
- the display devices may refract the projected light toward a user’s pupil and may enable a user to simultaneously view both artificial reality content and the real world.
- the display devices may accomplish this using any of a variety of different optical components, including waveguide components (e.g., holographic, planar, diffractive, polarized, and/or reflective waveguide elements), lightmanipulation surfaces and elements (such as diffractive, reflective, and refractive elements and gratings), coupling elements, etc.
- waveguide components e.g., holographic, planar, diffractive, polarized, and/or reflective waveguide elements
- lightmanipulation surfaces and elements such as diffractive, reflective, and refractive elements and gratings
- coupling elements etc.
- Artificial reality systems may also be configured with any other suitable type or form of image projection system, such as retinal projectors used in virtual retina displays.
- augmented-reality system 1800 and/or virtual-reality system 1900 may include one or more optical sensors, such as two- dimensional (2D) or 3D cameras, structured light transmitters and detectors, time-of-flight depth sensors, single-beam or sweeping laser rangefinders, 3D LiDAR sensors, and/or any other suitable type or form of optical sensor.
- An artificial reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions.
- the artificial reality systems described herein may also include one or more input and/or output audio transducers.
- Output audio transducers may include voice coil speakers, ribbon speakers, electrostatic speakers, piezoelectric speakers, bone conduction transducers, cartilage conduction transducers, tragus-vibration transducers, and/or any other suitable type or form of audio transducer.
- input audio transducers may include condenser microphones, dynamic microphones, ribbon microphones, and/or any other type or form of input transducer.
- a single transducer may be used for both audio input and audio output.
- the artificial reality systems described herein may also include tactile (i.e., haptic) feedback systems, which may be incorporated into headwear, gloves, bodysuits, handheld controllers, environmental devices (e.g., chairs, floormats, etc.), and/or any other type of device or system.
- Haptic feedback systems may provide various types of cutaneous feedback, including vibration, force, traction, texture, and/or temperature.
- Haptic feedback systems may also provide various types of kinesthetic feedback, such as motion and compliance.
- Haptic feedback may be implemented using motors, piezoelectric actuators, fluidic systems, and/or a variety of other types of feedback mechanisms.
- Haptic feedback systems may be implemented independent of other artificial reality devices, within other artificial reality devices, and/or in conjunction with other artificial reality devices.
- artificial reality systems may create an entire virtual experience or enhance a user’s real-world experience in a variety of contexts and environments. For instance, artificial reality systems may assist or extend a user’s perception, memory, or cognition within a particular environment. Some systems may enhance a user’s interactions with other people in the real world or may enable more immersive interactions with other people in a virtual world.
- Artificial reality systems may also be used for educational purposes (e.g., for teaching or training in schools, hospitals, government organizations, military organizations, business enterprises, etc.), entertainment purposes (e.g., for playing video games, listening to music, watching video content, etc.), and/or for accessibility purposes (e.g., as hearing aids, visual aids, etc.).
- the embodiments and examples disclosed herein may enable or enhance a user’s artificial reality experience in one or more of these contexts and environments and/or in other contexts and environments.
- the systems described herein may also include an eye-tracking subsystem designed to identify and track various characteristics of a user’s eye(s), such as the user’s gaze direction.
- eye tracking may, in some examples, refer to a process by which the position, orientation, and/or motion of an eye is measured, detected, sensed, determined, and/or monitored.
- the disclosed systems may measure the position, orientation, and/or motion of an eye in a variety of different ways, including through the use of various optical-based eye-tracking techniques, ultrasound-based eye-tracking techniques, etc.
- An eye-tracking subsystem may be configured in a number of different ways and may include a variety of different eye-tracking hardware components or other computer-vision components.
- an eye-tracking subsystem may include a variety of different optical sensors, such as two-dimensional (2D) or 3D cameras, time-of-flight depth sensors, single-beam or sweeping laser rangefinders, 3D LiDAR sensors, and/or any other suitable type or form of optical sensor.
- a processing subsystem may process data from one or more of these sensors to measure, detect, determine, and/or otherwise monitor the position, orientation, and/or motion of the user’s eye(s).
- FIG. 20 is an illustration of an exemplary system 2000 that incorporates an eyetracking subsystem capable of tracking a user’s eye(s).
- system 2000 may include a light source 2002, an optical subsystem 2004, an eye-tracking subsystem 2006, and/or a control subsystem 2008.
- light source 2002 may generate light for an image (e.g., to be presented to an eye 2001 of the viewer).
- Light source 2002 may represent any of a variety of suitable devices.
- light source 2002 can include a two-dimensional projector (e.g., a LCoS display), a scanning source (e.g., a scanning laser), or other device (e.g., an LCD, an LED display, an OLED display, an active-matrix OLED display (AMOLED), a transparent OLED display (TOLED), a waveguide, or some other display capable of generating light for presenting an image to the viewer).
- the image may represent a virtual image, which may refer to an optical image formed from the apparent divergence of light rays from a point in space, as opposed to an image formed from the light ray’s actual divergence.
- optical subsystem 2004 may receive the light generated by light source 2002 and generate, based on the received light, converging light 2020 that includes the image.
- optical subsystem 2004 may include any number of lenses (e.g., Fresnel lenses, convex lenses, concave lenses), apertures, filters, mirrors, prisms, and/or other optical components, possibly in combination with actuators and/or other devices.
- the actuators and/or other devices may translate and/or rotate one or more of the optical components to alter one or more aspects of converging light 2020.
- various mechanical couplings may serve to maintain the relative spacing and/or the orientation of the optical components in any suitable combination.
- eye-tracking subsystem 2006 may generate tracking information indicating a gaze angle of an eye 2001 of the viewer.
- control subsystem 2008 may control aspects of optical subsystem 2004 (e.g., the angle of incidence of converging light 2020) based at least in part on this tracking information.
- control subsystem 2008 may store and utilize historical tracking information (e.g., a history of the tracking information over a given duration, such as the previous second or fraction thereof) to anticipate the gaze angle of eye 2001 (e.g., an angle between the visual axis and the anatomical axis of eye 2001).
- eye-tracking subsystem 2006 may detect radiation emanating from some portion of eye 2001 (e.g., the cornea, the iris, the pupil, or the like) to determine the current gaze angle of eye 2001. In other examples, eye-tracking subsystem 2006 may employ a wavefront sensor to track the current location of the pupil.
- some portion of eye 2001 e.g., the cornea, the iris, the pupil, or the like
- eye-tracking subsystem 2006 may employ a wavefront sensor to track the current location of the pupil.
- Any number of techniques can be used to track eye 2001. Some techniques may involve illuminating eye 2001 with infrared light and measuring reflections with at least one optical sensor that is tuned to be sensitive to the infrared light. Information about how the infrared light is reflected from eye 2001 may be analyzed to determine the position(s), orientation(s), and/or motion(s) of one or more eye feature(s), such as the cornea, pupil, iris, and/or retinal blood vessels.
- the radiation captured by a sensor of eye-tracking subsystem 2006 may be digitized (i.e., converted to an electronic signal). Further, the sensor may transmit a digital representation of this electronic signal to one or more processors (for example, processors associated with a device including eye-tracking subsystem 2006).
- Eye-tracking subsystem 2006 may include any of a variety of sensors in a variety of different configurations.
- eye-tracking subsystem 2006 may include an infrared detector that reacts to infrared radiation.
- the infrared detector may be a thermal detector, a photonic detector, and/or any other suitable type of detector.
- Thermal detectors may include detectors that react to thermal effects of the incident infrared radiation.
- one or more processors may process the digital representation generated by the sensor(s) of eye-tracking subsystem 2006 to track the movement of eye 2001.
- these processors may track the movements of eye 2001 by executing algorithms represented by computer-executable instructions stored on non- transitory memory.
- on-chip logic e.g., an application-specific integrated circuit or ASIC
- eyetracking subsystem 2006 may be programmed to use an output of the sensor(s) to track movement of eye 2001.
- eye-tracking subsystem 2006 may analyze the digital representation generated by the sensors to extract eye rotation information from changes in reflections.
- eye-tracking subsystem 2006 may use corneal reflections or glints (also known as Purkinje images) and/or the center of the eye’s pupil 2022 as features to track over time.
- eye-tracking subsystem 2006 may use the center of the eye’s pupil 2022 and infrared or near-infrared, non-collimated light to create corneal reflections. In these examples, eye-tracking subsystem 2006 may use the vector between the center of the eye’s pupil 2022 and the corneal reflections to compute the gaze direction of eye 2001.
- the disclosed systems may perform a calibration procedure for an individual (using, e.g., supervised or unsupervised techniques) before tracking the user’s eyes.
- the calibration procedure may include directing users to look at one or more points displayed on a display while the eye-tracking system records the values that correspond to each gaze position associated with each point.
- eye-tracking subsystem 2006 may use two types of infrared and/or near-infrared (also known as active light) eye-tracking techniques: bright-pupil and dark-pupil eye tracking, which may be differentiated based on the location of an illumination source with respect to the optical elements used. If the illumination is coaxial with the optical path, then eye 2001 may act as a retroreflector as the light reflects off the retina, thereby creating a bright pupil effect similar to a red-eye effect in photography. If the illumination source is offset from the optical path, then the eye’s pupil 2022 may appear dark because the retroreflection from the retina is directed away from the sensor.
- infrared and/or near-infrared also known as active light
- bright-pupil tracking may create greater iris/pupil contrast, allowing more robust eye tracking with iris pigmentation, and may feature reduced interference (e.g., interference caused by eyelashes and other obscuring features).
- Bright-pupil tracking may also allow tracking in lighting conditions ranging from total darkness to a very bright environment.
- control subsystem 2008 may control light source 2002 and/or optical subsystem 2004 to reduce optical aberrations (e.g., chromatic aberrations and/or monochromatic aberrations) of the image that may be caused by or influenced by eye 2001.
- control subsystem 2008 may use the tracking information from eye-tracking subsystem 2006 to perform such control.
- control subsystem 2008 may alter the light generated by light source 2002 (e.g., by way of image rendering) to modify (e.g., pre-distort) the image so that the aberration of the image caused by eye 2001 is reduced.
- the disclosed systems may track both the position and relative size of the pupil (since, e.g., the pupil dilates and/or contracts).
- the eye-tracking devices and components e.g., sensors and/or sources
- the frequency range of the sensors may be different (or separately calibrated) for eyes of different colors and/or different pupil types, sizes, and/or the like.
- the various eye-tracking components e.g., infrared sources and/or sensors
- described herein may need to be calibrated for each individual user and/or eye.
- the disclosed systems may track both eyes with and without ophthalmic correction, such as that provided by contact lenses worn by the user.
- ophthalmic correction elements e.g., adjustable lenses
- the color of the user’s eye may necessitate modification of a corresponding eye-tracking algorithm.
- eye-tracking algorithms may need to be modified based at least in part on the differing color contrast between a brown eye and, for example, a blue eye.
- FIG. 21 is a more detailed illustration of various aspects of the eye-tracking subsystem illustrated in FIG. 20.
- an eye-tracking subsystem 2100 may include at least one source 2104 and at least one sensor 2106.
- Source 2104 generally represents any type or form of element capable of emitting radiation.
- source 2104 may generate visible, infrared, and/or near-infrared radiation.
- source 2104 may radiate non-collimated infrared and/or near-infrared portions of the electromagnetic spectrum towards an eye 2102 of a user.
- Source 2104 may utilize a variety of sampling rates and speeds.
- the disclosed systems may use sources with higher sampling rates in order to capture fixational eye movements of a user’s eye 2102 and/or to correctly measure saccade dynamics of the user’s eye 2102.
- any type or form of eye-tracking technique may be used to track the user’s eye 2102, including optical-based eye-tracking techniques, ultrasound-based eye-tracking techniques, etc.
- Sensor 2106 generally represents any type or form of element capable of detecting radiation, such as radiation reflected off the user’s eye 2102.
- sensor 2106 include, without limitation, a charge coupled device (CCD), a photodiode array, a complementary metal-oxide-semiconductor (CMOS) based sensor device, and/or the like.
- CMOS complementary metal-oxide-semiconductor
- sensor 2106 may represent a sensor having predetermined parameters, including, but not limited to, a dynamic resolution range, linearity, and/or other characteristic selected and/or designed specifically for eye tracking.
- eye-tracking subsystem 2100 may generate one or more glints.
- a glint 2103 may represent reflections of radiation (e.g., infrared radiation from an infrared source, such as source 2104) from the structure of the user’s eye.
- glint 2103 and/or the user’s pupil may be tracked using an eye-tracking algorithm executed by a processor (either within or external to an artificial reality device).
- an artificial reality device may include a processor and/or a memory device in order to perform eye tracking locally and/or a transceiver to send and receive the data necessary to perform eye tracking on an external device (e.g., a mobile phone, cloud server, or other computing device).
- an external device e.g., a mobile phone, cloud server, or other computing device.
- FIG. 21 shows an example image 2105 captured by an eye-tracking subsystem, such as eye-tracking subsystem 2100.
- image 2105 may include both the user’s pupil 2108 and a glint 2110 near the same.
- pupil 2108 and/or glint 2110 may be identified using an artificial-intelligence-based algorithm, such as a computer-vision- based algorithm.
- image 2105 may represent a single frame in a series of frames that may be analyzed continuously in order to track the eye 2102 of the user. Further, pupil 2108 and/or glint 2110 may be tracked over a period of time to determine a user’s gaze.
- eye-tracking subsystem 2100 may be configured to identify and measure the inter-pupillary distance (IPD) of a user.
- IPD inter-pupillary distance
- eye-tracking subsystem 2100 may measure and/or calculate the IPD of the user while the user is wearing the artificial reality system.
- eye-tracking subsystem 2100 may detect the positions of a user’s eyes and may use this information to calculate the user’s IPD.
- the eye-tracking systems or subsystems disclosed herein may track a user’s eye position and/or eye movement in a variety of ways.
- one or more light sources and/or optical sensors may capture an image of the user’s eyes.
- the eye-tracking subsystem may then use the captured information to determine the user’s inter-pupillary distance, interocular distance, and/or a 3D position of each eye (e.g., for distortion adjustment purposes), including a magnitude of torsion and rotation (i.e., roll, pitch, and yaw) and/or gaze directions for each eye.
- infrared light may be emitted by the eye-tracking subsystem and reflected from each eye. The reflected light may be received or detected by an optical sensor and analyzed to extract eye rotation data from changes in the infrared light reflected by each eye.
- the eye-tracking subsystem may use any of a variety of different methods to track the eyes of a user.
- a light source e.g., infrared light-emitting diodes
- the eye-tracking subsystem may then detect (e.g., via an optical sensor coupled to the artificial reality system) and analyze a reflection of the dot pattern from each eye of the user to identify a location of each pupil of the user.
- the eyetracking subsystem may track up to six degrees of freedom of each eye (i.e., 3D position, roll, pitch, and yaw) and at least a subset of the tracked quantities may be combined from two eyes of a user to estimate a gaze point (i.e., a 3D location or position in a virtual scene where the user is looking) and/or an IPD.
- a gaze point i.e., a 3D location or position in a virtual scene where the user is looking
- IPD IPD
- the distance between a user’s pupil and a display may change as the user’s eye moves to look in different directions.
- the varying distance between a pupil and a display as viewing direction changes may be referred to as “pupil swim” and may contribute to distortion perceived by the user as a result of light focusing in different locations as the distance between the pupil and the display changes.
- measuring distortion at different eye positions and pupil distances relative to displays and generating distortion corrections for different positions and distances may allow mitigation of distortion caused by pupil swim by tracking the 3D position of a user’s eyes and applying a distortion correction corresponding to the 3D position of each of the user’s eyes at a given point in time.
- knowing the 3D position of each of a user’s eyes may allow for the mitigation of distortion caused by changes in the distance between the pupil of the eye and the display by applying a distortion correction for each 3D eye position. Furthermore, as noted above, knowing the position of each of the user’s eyes may also enable the eye-tracking subsystem to make automated adjustments for a user’s IPD.
- a display subsystem may include a variety of additional subsystems that may work in conjunction with the eye-tracking subsystems described herein.
- a display subsystem may include a varifocal subsystem, a scene-rendering module, and/or a vergence-processing module.
- the varifocal subsystem may cause left and right display elements to vary the focal distance of the display device.
- the varifocal subsystem may physically change the distance between a display and the optics through which it is viewed by moving the display, the optics, or both. Additionally, moving or translating two lenses relative to each other may also be used to change the focal distance of the display.
- the varifocal subsystem may include actuators or motors that move displays and/or optics to change the distance between them.
- This varifocal subsystem may be separate from or integrated into the display subsystem.
- the varifocal subsystem may also be integrated into or separate from its actuation subsystem and/or the eye-tracking subsystems described herein.
- the display subsystem may include a vergence-processing module configured to determine a vergence depth of a user’s gaze based on a gaze point and/or an estimated intersection of the gaze lines determined by the eye-tracking subsystem.
- Vergence may refer to the simultaneous movement or rotation of both eyes in opposite directions to maintain single binocular vision, which may be naturally and automatically performed by the human eye.
- a location where a user’s eyes are verged is where the user is looking and is also typically the location where the user’s eyes are focused.
- the vergenceprocessing module may triangulate gaze lines to estimate a distance or depth from the user associated with intersection of the gaze lines.
- the depth associated with intersection of the gaze lines may then be used as an approximation for the accommodation distance, which may identify a distance from the user where the user’s eyes are directed.
- the vergence distance may allow for the determination of a location where the user’s eyes should be focused and a depth from the user’s eyes at which the eyes are focused, thereby providing information (such as an object or plane of focus) for rendering adjustments to the virtual scene.
- the vergence-processing module may coordinate with the eye-tracking subsystems described herein to make adjustments to the display subsystem to account for a user’s vergence depth.
- the eye-tracking subsystem may obtain information about the user’s vergence or focus depth and may adjust the display subsystem to be closer together when the user’s eyes focus or verge on something close and to be farther apart when the user’s eyes focus or verge on something at a distance.
- the eye-tracking information generated by the above-described eye-tracking subsystems may also be used, for example, to modify various aspect of how different computer-generated images are presented.
- a display subsystem may be configured to modify, based on information generated by an eye-tracking subsystem, at least one aspect of how the computer-generated images are presented. For instance, the computergenerated images may be modified based on the user’s eye movement, such that if a user is looking up, the computer-generated images may be moved upward on the screen. Similarly, if the user is looking to the side or down, the computer-generated images may be moved to the side or downward on the screen. If the user’s eyes are closed, the computer-generated images may be paused or removed from the display and resumed once the user’s eyes are back open.
- eye-tracking subsystems can be incorporated into one or more of the various artificial reality systems described herein in a variety of ways.
- one or more of the various components of system 2000 and/or eye-tracking subsystem 2100 may be incorporated into augmented-reality system 1800 in FIG. 18 and/or virtual-reality system 1900 in FIG. 19 to enable these systems to perform various eye-tracking tasks (including one or more of the eye-tracking operations described herein).
- computing devices and systems described and/or illustrated herein broadly represent any type or form of computing device or system capable of executing computer-readable instructions, such as those contained within the modules described herein.
- these computing device(s) may each include at least one memory device and at least one physical processor.
- modules described and/or illustrated herein may represent portions of a single module or application.
- one or more of these modules may represent one or more software applications or programs that, when executed by a computing device, may cause the computing device to perform one or more tasks.
- one or more of the modules described and/or illustrated herein may represent modules stored and configured to run on one or more of the computing devices or systems described and/or illustrated herein.
- One or more of these modules may also represent all or portions of one or more special-purpose computers configured to perform one or more tasks.
- one or more of the modules described herein may transform data, physical devices, and/or representations of physical devices from one form to another.
- one or more of the modules recited herein may receive eye tracking to be transformed, transform the eye tracking data, output a result of the transformation to determine whether a user interaction with a user interface represents a false positive input inference by the user interface, use the result of the transformation to execute a remedial action, and store the result of the transformation to improve a model of user interaction.
- one or more of the modules recited herein may transform a processor, volatile memory, non- volatile memory, and/or any other portion of a physical computing device from one form to another by executing on the computing device, storing data on the computing device, and/or otherwise interacting with the computing device.
- computer-readable medium generally refers to any form of device, carrier, or medium capable of storing or carrying computer-readable instructions.
- Examples of computer-readable media include, without limitation, transmission-type media, such as carrier waves, and non-transitory-type media, such as magnetic-storage media (e.g., hard disk drives, tape drives, and floppy disks), optical-storage media (e.g., Compact Discs (CDs), Digital Video Discs (DVDs), and BLU-RAY discs), electronic-storage media (e.g., solid- state drives and flash media), and other distribution systems.
- transmission-type media such as carrier waves
- non-transitory-type media such as magnetic-storage media (e.g., hard disk drives, tape drives, and floppy disks), optical-storage media (e.g., Compact Discs (CDs), Digital Video Discs (DVDs), and BLU-RAY discs), electronic-storage media (e.g., solid- state drives and flash media),
- embodiments of the instant disclosure may include or be implemented in conjunction with an artificial reality system.
- Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality, an augmented reality, a mixed reality, a hybrid reality, or some combination and/or derivatives thereof.
- Artificial reality content may include completely generated content or generated content combined with captured (e.g., real-world) content.
- the artificial reality content may include video, audio, haptic feedback, or some combination thereof, any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer).
- artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, e.g., create content in an artificial reality and/or are otherwise used in (e.g., perform activities in) an artificial reality.
- the artificial reality system that provides the artificial reality content may be implemented on various platforms, including a head-mounted display connected to a host computer system, a standalone HMD, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Eye Examination Apparatus (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202280057907.4A CN117897674A (en) | 2021-08-24 | 2022-08-24 | System and method for detecting input recognition errors using natural gaze dynamics |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163236657P | 2021-08-24 | 2021-08-24 | |
US63/236,657 | 2021-08-24 | ||
US17/866,179 US20230069764A1 (en) | 2021-08-24 | 2022-07-15 | Systems and methods for using natural gaze dynamics to detect input recognition errors |
US17/866,179 | 2022-07-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023028171A1 true WO2023028171A1 (en) | 2023-03-02 |
Family
ID=83354976
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2022/041415 WO2023028171A1 (en) | 2021-08-24 | 2022-08-24 | Systems and methods for using natural gaze dynamics to detect input recognition errors |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023028171A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180025245A1 (en) * | 2015-03-05 | 2018-01-25 | Koc Universitesi | Sketch misrecognition correction system based on eye gaze monitoring |
US20210074277A1 (en) * | 2019-09-06 | 2021-03-11 | Microsoft Technology Licensing, Llc | Transcription revision interface for speech recognition system |
-
2022
- 2022-08-24 WO PCT/US2022/041415 patent/WO2023028171A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180025245A1 (en) * | 2015-03-05 | 2018-01-25 | Koc Universitesi | Sketch misrecognition correction system based on eye gaze monitoring |
US20210074277A1 (en) * | 2019-09-06 | 2021-03-11 | Microsoft Technology Licensing, Llc | Transcription revision interface for speech recognition system |
Non-Patent Citations (1)
Title |
---|
DAVID-JOHN BRENDAN BRENDANJOHN@UFL EDU ET AL: "Towards gaze-based prediction of the intent to interact in virtual reality", ACM SYMPOSIUM ON EYE TRACKING RESEARCH AND APPLICATIONS, ACMPUB27, NEW YORK, NY, USA, 25 May 2021 (2021-05-25), pages 1 - 7, XP058533001, ISBN: 978-1-4503-8349-3, DOI: 10.1145/3448018.3458008 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7342191B2 (en) | Iris code accumulation and reliability assignment | |
US10831268B1 (en) | Systems and methods for using eye tracking to improve user interactions with objects in artificial reality | |
US11726324B2 (en) | Display system | |
US11176367B1 (en) | Apparatuses, systems, and methods for mapping a surface of an eye via an event camera | |
US10983591B1 (en) | Eye rank | |
US20230037329A1 (en) | Optical systems and methods for predicting fixation distance | |
US20230053497A1 (en) | Systems and methods for performing eye-tracking | |
WO2023147038A1 (en) | Systems and methods for predictively downloading volumetric data | |
US20230046155A1 (en) | Dynamic widget placement within an artificial reality display | |
US20230069764A1 (en) | Systems and methods for using natural gaze dynamics to detect input recognition errors | |
WO2023028171A1 (en) | Systems and methods for using natural gaze dynamics to detect input recognition errors | |
CN117897674A (en) | System and method for detecting input recognition errors using natural gaze dynamics | |
US11789544B2 (en) | Systems and methods for communicating recognition-model uncertainty to users | |
US20230341812A1 (en) | Multi-layered polarization volume hologram | |
CN118119915A (en) | System and method for communicating model uncertainty to a user | |
WO2023023299A1 (en) | Systems and methods for communicating model uncertainty to users | |
US20220236795A1 (en) | Systems and methods for signaling the onset of a user's intent to interact | |
WO2022235250A1 (en) | Handheld controller with thumb pressure sensing | |
WO2023014918A1 (en) | Optical systems and methods for predicting fixation distance | |
WO2023023206A1 (en) | Systems and methods for performing eye-tracking | |
CN117795395A (en) | Optical system and method for predicting gaze distance | |
WO2023018827A1 (en) | Dynamic widget placement within an artificial reality display | |
CN117882032A (en) | System and method for performing eye tracking |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22772711 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280057907.4 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022772711 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022772711 Country of ref document: EP Effective date: 20240325 |