CN117795395A

CN117795395A - Optical system and method for predicting gaze distance

Info

Publication number: CN117795395A
Application number: CN202280054957.7A
Authority: CN
Inventors: 托马斯·斯科特·默迪森; 伊恩·埃克伦斯; 凯文·詹姆斯·麦肯齐
Original assignee: Meta Platforms Technologies LLC
Current assignee: Meta Platforms Technologies LLC
Priority date: 2021-08-05
Filing date: 2022-08-04
Publication date: 2024-03-29

Abstract

The head-mounted display system may include an eye-tracking subsystem and a gaze distance prediction subsystem. The eye-tracking subsystem may be configured to determine at least a gaze direction of both eyes of the user and an eye-movement speed of both eyes of the user. The gaze distance prediction subsystem may be configured to predict a gaze distance at which the user's eyes will become gazed before the user's eyes reach a gaze state associated with the predicted gaze distance based on the eye movement speed and gaze direction of the user's eyes. Additional methods, systems, and devices are also disclosed.

Description

Optical system and method for predicting gaze distance

Background

A head-mounted display (HMD) is a head-mounted device that may include a near-eye display (NED) that presents visual content to a user. The visual content may include stereoscopic images that enable a user to view the content as three-dimensional (3D). HMDs may be used for educational, gaming, healthcare, social, and various other applications.

Some HMDs may be configured to change visual content according to the location of the user's gaze. For example, the zoom system may be used to adjust the focal length of the optical element based on the user gaze direction and/or gaze depth. As another example, gaze-driven rendering (e.g., foveal (foveal) rendering, rendering depth of field, etc.) is a concept of: wherein a portion of the visual content at which the user gazes remains in focus, while a portion of the visual content away from the user gazes (e.g., content in the visual periphery or at different perceived depths) is blurred. This technique simulates a real-world experience of a person because both eyes naturally focus on objects in the center of the person's field of view and at gaze distances, while other parts of the person's vision (e.g., peripheral vision, objects at different depths) may be physically perceived as out-of-focus. Thus, gaze-driven rendering may bring a more immersive and realistic experience to the user. Furthermore, gaze driven rendering may result in reduced computational requirements because portions of the visual content that are far from the user's focus may not be rendered entirely in high definition. This may reduce the size and/or weight of the HMD. However, gaze-based rendering systems may encounter system delays in adjusting focus and become blurred after tracking the position of the user's eyes gaze. As the latency increases, the user experience may be degraded in terms of image quality and/or comfort.

Disclosure of Invention

According to a first aspect of the present disclosure, there is provided a head-mounted optical system including: an eye-tracking subsystem configured to determine at least a gaze direction of both eyes of the user and an eye-movement speed of both eyes of the user; and a gaze distance prediction subsystem configured to predict a gaze distance at which both eyes of the user will become gazed before the both eyes of the user reach a gaze state associated with the predicted gaze distance based on an eye movement speed and a gaze direction of the both eyes of the user.

In some embodiments, the head-mounted optical system further comprises a zoom optical element fixed in position in front of the eyes of the user when the head-mounted optical system is worn by the user, the zoom optical element configured to change at least one optical characteristic based on information from the eye-tracking subsystem and the gaze-distance prediction subsystem, the at least one optical characteristic comprising a focal length.

In some embodiments, the zoom optical element comprises: a substantially transparent support element; a substantially transparent deformable element coupled to the support element at least along an edge of the deformable element; and a substantially transparent deformable medium disposed between the support element and the deformable element.

In some embodiments, the zoom optical element further comprises a zoom actuator configured to change at least one optical characteristic of the zoom optical element when actuated.

In some embodiments, the zoom actuator includes at least one substantially transparent electrode coupled to the deformable element.

In some embodiments, the zoom optical element comprises a liquid crystal element configured to change at least one optical characteristic of the zoom optical element when activated.

In some embodiments, the head-mounted optical system further comprises a near-eye display configured to display visual content to a user.

In some embodiments, the near-eye display is operated to fully render only portions of the visual content that are at perceived depth of the user's binocular gaze.

In some embodiments, the gaze distance prediction subsystem is configured to predict a gaze distance at which both eyes of the user will become gazed within 600ms before both eyes of the user reach a gaze state associated with the predicted gaze distance.

According to a second aspect of the present disclosure, there is provided a computer-implemented method of operating a head-mounted optical device, the method comprising: measuring gaze direction and movement speed of both eyes of a user using an eye-tracking element; and predicting, using the at least one processor and based on the measured gaze direction and movement speed of the eyes of the user, a gaze distance of the eyes of the user before the eyes of the user reach a gaze state associated with the predicted gaze distance.

In some embodiments, the method further comprises: based on the predicted gaze distance of the eyes of the user, at least the focal length of the zoom optical element is changed using the zoom optical element.

In some embodiments, the method further comprises: presenting visual content to both eyes of a user using a near-eye display; and fully rendering only portions of the visual content at the predicted gaze distance of both eyes of the user.

In some embodiments, the full rendering of only portions of the visual content is completed before the user's eyes vergence reaches the gaze distance.

In some embodiments, measuring the speed of movement of the eyes of the user includes measuring a maximum speed of the eyes of the user; and the predicted gaze distance is based at least in part on a maximum speed of both eyes of the user.

In some embodiments, the prediction of the gaze distance at which both eyes of the user will become gazed is completed within 600ms before both eyes of the user reach the gaze state associated with the predicted gaze distance.

According to a third aspect of the present disclosure, there is provided a non-transitory computer-readable medium comprising one or more computer-executable instructions that, when executed by at least one processor of a computing device, cause the computing device to: measuring a gaze direction and a movement speed of both eyes of the user using the eye-tracking element; and predicting a gaze distance of the user's eyes before the user's eyes reach a gaze state associated with the predicted gaze distance based on the measured gaze direction and movement speed of the user's eyes.

In some embodiments, the one or more computer-executable instructions further cause the computing device to: based on the predicted gaze distance of the eyes of the user, at least the focal length of the zoom optical element is changed using the zoom optical element.

In some embodiments, the one or more computer-executable instructions further cause the computing device to: presenting visual content to both eyes of a user using a near-eye display; and fully rendering only portions of the visual content at the predicted gaze distance of both eyes of the user.

In some embodiments, the one or more computer-executable instructions further cause the computing device to: complete rendering of only portions of the content is completed before the user's convergence reaches the gaze distance.

In some embodiments, the one or more computer-executable instructions further cause the computing device to: within 600ms before the eyes of the user reach the gaze state associated with the predicted gaze distance, the prediction of the gaze distance at which the eyes of the user will become gazed is completed.

Drawings

The accompanying drawings illustrate various example embodiments and are a part of the specification. Together with the following description, these drawings illustrate and explain various principles of the disclosure.

Fig. 1A is a schematic diagram illustrating eye vergence in accordance with at least one embodiment of the present disclosure.

Fig. 1B is a graph illustrating an example response time of a person's double eye vergence (verge) and autofocus to focus on an object at a new distance in accordance with at least one embodiment of the present disclosure.

Fig. 2 is a block diagram illustrating a head-mounted optical system in accordance with at least one embodiment of the present disclosure.

Fig. 3 is a graph showing a relationship between peak velocity and response amplitude for full convergence of eyes in accordance with at least one embodiment of the present disclosure.

Fig. 4A-4C include three graphs showing position, velocity, and acceleration of eye movement during a converging motion in accordance with at least one embodiment of the present disclosure.

Fig. 5 is a graph illustrating actual eye movement data and an overlay model of a coarse alignment response of both eyes in accordance with at least one embodiment of the present disclosure.

Fig. 6 is a graph showing a relationship between peak velocity and response amplitude of coarse convergence of eyes in accordance with at least one embodiment of the present disclosure.

Fig. 7 is a flowchart illustrating a method of operating a head-mounted optical device in accordance with at least one embodiment of the present disclosure.

Fig. 8 is an illustration of example augmented reality glasses that may be used in connection with embodiments of the present disclosure.

Fig. 9 is an illustration of an example virtual reality headset (head set) that may be used in connection with embodiments of the present disclosure.

Fig. 10 is an illustration of an example system including an eye tracking subsystem capable of tracking one or both eyes of a user.

Fig. 11 is a more detailed illustration of various aspects of the eye-tracking subsystem shown in fig. 10.

Throughout the drawings, identical reference numbers and descriptions refer to similar, but not necessarily identical, elements. While the example embodiments described herein are susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. However, the example embodiments described herein are not intended to be limited to the particular forms disclosed. Rather, the present disclosure covers all modifications, equivalents, and alternatives falling within the present disclosure.

Detailed Description

A Head Mounted Display (HMD) is a head mounted device that may include a near-eye display (NED) that presents visual content to a user. The visual content may include stereoscopic images that enable a user to view the content as three-dimensional (3D). HMDs may be used for educational, gaming, healthcare, social, and various other applications.

Some HMDs may be configured to change visual content according to the location of the user's gaze. For example, the zoom system may be used to adjust the focal length of the optical element based on the user gaze direction and/or gaze depth. As another example, gaze-driven rendering (e.g., foveal rendering, rendering depth of field, etc.) is a concept of: wherein a portion of the visual content at which the user gazes remains in focus, while a portion of the visual content away from the user gazes (e.g., content in the visual periphery or at different perceived depths) is blurred. This technique simulates a real-world experience of a person because both eyes naturally focus on objects in the center of the person's field of view and at gaze distances, while other parts of the person's vision (e.g., peripheral vision, objects at different depths) may be physically perceived as out-of-focus. Thus, gaze-driven rendering may bring a more immersive and realistic experience to the user. Furthermore, gaze driven rendering may result in reduced computational requirements because portions of the visual content that are far from the user's focus may not be rendered entirely in high definition. This may reduce the size and/or weight of the HMD. However, gaze-based rendering systems may encounter system delays in adjusting focus and become blurred after tracking the position of the user's eyes gaze. As the latency increases, the user experience may be degraded in terms of image quality and/or comfort.

In another example, eye tracking may enable a user to interact with visual content by simply visually hovering over a displayed object, scene, word or icon, or the like. Such visual interactions may be used to replace or supplement conventional hand-held controllers.

In yet another example, augmented reality glasses are a class of HMDs that display content to a user in a see-through display. Determining the location at which the user gazes in or is about to gaze in the real-world environment in front of the user may enable the augmented reality system to obtain information about what the user is looking at and what is being focused on. Determining the focal length of the user's eyes may be important to adjust the display content for comfort or context.

Determining the location of the user's gaze may be accomplished using an eye-tracking system. As explained further below, the eye-tracking system may employ optical tracking, ultrasonic tracking, or other types of tracking (e.g., electro-oculography (EOG), search coil (search coil), etc.) to determine gaze directions of both eyes of the user. For example, a camera (e.g., a visible light camera and/or an infrared camera) or an ultrasonic transceiver may be directed toward the user's eye and may sense reflected light or reflected sound to generate data indicative of the user's pupil, iris, sclera, and/or cornea location. The processor may use the sensor data to calculate a gaze direction.

When a person gazes at objects at different distances, both eyes move in opposite directions (e.g., inward or outward) to focus the objects and overlap the images from each eye to obtain stereoscopic vision. For example, both eyes will be oriented at a wider gaze angle to view distant objects and at a narrower gaze angle to view near objects. This process of moving both eyes in opposite directions is called "convergence".

Fig. 1A is a schematic diagram showing a concept of vergence. A person may gaze on a first object relatively closer to his eyes 100 at a first gaze distance D1 and a second object relatively farther from his eyes at a second gaze distance D2. The convergence angle may be defined as the angle between the respective gaze directions of the eyes 100 of the person. As shown in fig. 1A, when a person gazes at a first object, both eyes 100 may have a first convergence angle α ₁ . When a person gazes at a second object, eyes 100 can have a second convergence angle alpha ₂ . The pupils of both eyes 100 of a person may be separated by an interpupillary distance (interpupillary distance, IPD).

The convergence angle may be calculated or estimated given the gaze direction and IPD (e.g., determined by the eye-tracking system). Once the convergence angle is known, the gaze distances D1, D2 may be calculated using the following equations:

Gaze distance= (IPD/2)/tan (convergence angle/2).

Autofocus (acromion) is a process by which each eye changes its optical power, for example, by changing the lens shape of the eye, to maintain a clear image or focus on an object as the gaze distance changes. Both autofocus and convergence should be accomplished in order to obtain the sharpest view of the object or scene.

Fig. 1B shows a graph 102 of an example response time of a person's eyes vergence and autofocus to focus on an object at a new distance. The solid line shows the response time of convergence, and the broken line shows the response time of autofocus. As shown in graph 102, when a person gazes at an object at a new distance, both eyes typically adjust to substantially proper convergence and autofocus conditions in about one second (1000 ms). If the object is stationary, both eyes remain substantially in vergence and autofocus after about two to three seconds (2000 ms to 3000 ms) as they continue to look at the object.

The present disclosure relates generally to systems, devices, and methods for predicting a focal length (e.g., gaze distance) at which a user's eyes will gaze. The systems may include an eye-tracking subsystem configured to track at least a gaze direction and a movement speed of both eyes of the user, and a gaze distance prediction subsystem configured to predict a gaze distance at which both eyes of the user will stay based on information from the eye-tracking subsystem. The systems and methods of the present disclosure may reduce overall system latency of an optical system (e.g., a head-mounted optical system), for example, by providing information in advance (to operate a zoom optical element and/or a near-eye display). Reducing latency may improve user experience in terms of, for example, comfort and image quality.

Fig. 2 is a block diagram illustrating a head-mounted optical system 200 in accordance with at least one embodiment of the present disclosure. The head-mounted optical system 200 may include an eye-tracking subsystem 202 and a gaze distance prediction subsystem 204. In some embodiments, for example, where the head-mounted optical system 200 is or includes a head-mounted display, the head-mounted optical system 200 may include a near-eye display 206. In other embodiments, the head-mounted optical system 200 may include a zoom optical element 208. The zoom optical element 208 may be included in a head-mounted display and/or in a system that does not have a near-eye display 206. For example, the zoom optical element 208 may be included in such an eyeglass device: the eyeglass device is configured to correct and/or supplement a user's vision.

The eye-tracking subsystem 202 may be configured to track the gaze direction and/or the speed of movement of both eyes of the user. Eye-tracking subsystem 202 may include a set of elements for tracking each of the eyes of a user's eyes. The convergence angle of both eyes of a user may be sensed using a combination of two sets of eye-tracking elements to determine (e.g., estimate) the distance (also referred to as gaze depth or gaze depth) at which the user is looking. In some examples, eye-tracking subsystem 202 may include a substantially transparent lens element (e.g., a waveguide) configured to sense the position of a pupil, cornea, retina, sclera, limbus, or other eye feature indicative of a gaze direction. In some embodiments, the eye-tracking element may include a camera (e.g., a visible light camera and/or an infrared light camera) mounted to the frame of the head-mounted optical system 200 and directed toward the user's eye. Further description of example eye tracking elements and their features are given below with reference to fig. 10 and 11.

The gaze distance prediction subsystem 204 may be configured to predict a gaze distance at which both eyes of the user will become gazed before the eyes of the user reach a final gaze state associated with the predicted gaze distance. For example, the gaze distance prediction subsystem 204 may predict the gaze distance within about 600ms before the user's eyes reach the final gaze state. In additional examples, for example, where eye movements are short (e.g., to gaze on a new object that is relatively close to the current gaze of the eye in view), gaze distance prediction subsystem 204 may predict gaze distances within about 400ms, within about 200ms, within about 150ms, within about 100ms, within about 50ms, or within about 20ms before both eyes of the user reach the final gaze state. The gaze distance prediction subsystem 204 may include at least one processor that receives gaze information 210 from the eye-tracking subsystem 202 that indicates a user's eye-movement speed and gaze direction. The gaze distance prediction subsystem 204 may use the gaze information 210 to make gaze distance predictions 212.

In some embodiments, gaze distance prediction subsystem 204 may employ a machine learning model to make gaze distance predictions 212. For example, the machine learning module may be configured to train a machine learning model to facilitate and improve the making of predictions 212. The machine learning model may use any suitable system, algorithm, and/or model that may construct and/or implement a mathematical model based on sample data (referred to as training data) to make predictions or decisions without being explicitly programmed to do so. Examples of machine learning models may include, but are not limited to, artificial neural networks, decision trees, support vector machines, regression analysis, bayesian networks, genetic algorithms, and the like. Machine learning algorithms that may be used to construct, implement, and/or develop a machine learning model may include, but are not limited to, supervised learning algorithms, unsupervised learning algorithms, self-learning algorithms, feature learning algorithms, sparse dictionary learning algorithms, anomaly detection algorithms, robotic learning algorithms, and association rule learning methods, among others.

In some examples, the machine learning module may train a machine learning model (e.g., a regression model) to determine the gaze distance prediction 212 by analyzing data from the eye-tracking subsystem 202. The initial training data set provided to the machine learning model may include data representative of eye position, eye velocity, and/or eye acceleration. The machine learning model may include algorithms that update the model based on new information, such as data generated by the eye-tracking subsystem 202 for a particular user, feedback from the user or technician, and/or data from another sensor (e.g., an optical sensor, an ultrasonic sensor, etc.). The machine learning model may be trained to ignore or disregard (discount) noise data.

The gaze distance predictions 212 generated by the gaze distance prediction subsystem 204 may be used in various ways. For example, in a head-mounted optical system 200 that includes a zoom optical element 208, gaze distance prediction 212 may be used to make the appropriate power change for the zoom optical element 208. This may enable a pair of zoom glasses and/or a head-mounted display to change optical power before, while, or only after the user's convergence and/or autofocus naturally reaches a steady gaze state associated with gaze distance prediction 212. Since the prediction 212 may be determined before the user's eyes reach the final gaze state, the power change may be made earlier than would otherwise be possible when the power change is made based on the measured actual gaze distance.

If the head-mounted optical system 200 includes a near-sighted display 206, the gaze distance prediction 212 may be used to alter the displayed visual content, for example, to provide a focus cue to the user (e.g., blur visual content at a different perceived depth than the gaze distance prediction 212 and/or at a periphery of the displayed visual content away from the gaze direction). These focus cues may be generated before, simultaneously with, or only slightly after the user's convergence and/or autofocus naturally reaches a steady gaze state. Since the prediction 212 may be determined before the user's eyes reach the final gaze state, the focus cues may be generated earlier than would otherwise be possible when rendering focus cues based on measured actual gaze distances.

The zoom optical element 208, if present, may be any optical element that can change at least one optical characteristic (e.g., focal length/optical power). In some examples, the zoom optical element 208 may be a substantially transparent element through which a user may look and which has at least one optical characteristic (e.g., optical power, focal length, astigmatism correction, etc.) that may be changed as desired. For example, the zoom optical element 208 may include a so-called "liquid lens", a deformable mirror, an electrically driven zoom lens, a mechanically adjustable lens, or the like. In the case of a liquid lens, the liquid lens may include a substantially transparent support element, a substantially transparent deformable element coupled to the support element at least along an edge of the deformable element, and a substantially transparent deformable medium disposed between the support element and the deformable element. Changing (e.g., electrically or mechanically) the shape of the deformable element and the deformable medium may change at least one optical characteristic (e.g., focal length) of the zoom optical element 208.

The liquid lens may also include a zoom actuator configured to change the shape of the zoom optical element 208 and thus at least one optical characteristic of the zoom optical element when actuated. For example, the zoom actuator may include a mechanical actuator, an electromechanical actuator, a piezoelectric actuator, an electrostatic actuator, or other actuator that may be configured and positioned to apply an actuation force to an edge region of the deformable element. The actuation force may cause the deformable medium to flow and the deformable element to change its shape (e.g., to become more concave and/or convex, to move the optical axis laterally, etc.), thereby causing a change in focal length or other optical characteristics.

In additional embodiments, the deformable element may include one or more electroactive materials (e.g., substantially transparent electroactive polymers) that may change shape when a voltage is applied thereto. In some examples, the one or more electroactive materials may be actuated by at least one substantially transparent electrode coupled to the deformable element. The electrodes may comprise a substantially transparent conductive material and/or an opaque conductive material applied in a manner that is substantially transparent to the user. In the latter case, for example, the electrodes may comprise sufficiently thin lines of conductive material that may be straight and/or curved (e.g., irregularly curved) such that the zoom optical element 208 appears substantially transparent from a user's perspective.

In some examples, the terms "substantially" and "essentially" with respect to a given parameter, characteristic, or condition may refer to the following degrees: those skilled in the art will understand the extent to which a given parameter, characteristic, or condition meets a small degree of variance (e.g., within acceptable manufacturing tolerances). For example, the substantially satisfied parameter may be at least about 90% satisfied, at least about 95% satisfied, at least about 99% satisfied, or fully satisfied.

In additional examples, the zoom optical element 208 may include a liquid crystal electroactive material that is operable to change focal length when a voltage is applied thereto.

The head-mounted optical system 200 according to the present disclosure may reduce or eliminate time delays in conventional optical systems, which may improve the user's experience in terms of comfort, immersion, and image quality.

Fig. 3 is a graph 300 showing a relationship between peak velocity and response amplitude for full convergence of eyes in accordance with at least one embodiment of the present disclosure. Vergence eye movement follows a predictable pattern; the peak velocity and the final response amplitude of the vergence eye movement are directly related. This relationship is referred to as the "master sequence".

Graph 300 shows a convergence main sequence diagram of converging eye movements. Convergence refers to the inward movement of both eyes, for example to gaze on an object at a closer distance. For convergence, the relationship between peak velocity (in degrees per second) and final response amplitude (in degrees) is typically linear, with the variance (dispersion) of the confidence limits increasing as the response amplitude and peak velocity increase. Divergent refers to the eyes moving outwards, for example to fixate an object at a greater distance. The vergence principal sequence relationship is unique in direction, meaning that converging eyes may follow a different principal sequence slope and intercept than diverging moves. The system and device of the present disclosure may be configured to: such differences are taken into account, for example, by formulating different algorithms for convergence and for divergence, to improve the accuracy of the predictive model.

The convergence master sequence relationship is also unique for each individual user for both converging and diverging responses. As described above, the system of the present disclosure may employ and update a machine learning model to accurately predict a final gaze distance of a particular user. For example, the calibration process may set and/or improve initial performance, and then the system may continuously or periodically update the modeled master sequence relationship during use. In some embodiments, a baseline predictive model for convergence and divergence may be initially used based on a set of training data (e.g., from population norms). The baseline predictive model may be updated and personalized as the user uses the system. In this case, the predictions may become more accurate over time as the user uses the system.

Using the relationship between peak velocity and response amplitude, the system and apparatus of the present disclosure can predict the magnitude of the vergence eye movement amplitude that is in progress before both eyes reach their final resting position. The convergence response peak speed may occur between about 100ms and about 600ms before the convergence response is complete. This time depends on the final amplitude of the vergence change. For example, a larger response amplitude tends to experience a larger time difference between peak speed and end of response than a smaller response amplitude. By using the peak eye movement velocity to estimate the final gaze depth position (e.g., the convergence angle), the system may direct the near-eye display and/or the zoom optical element to an appropriate focal length before both eyes arrive, thereby reducing the overall end-to-end delay.

Fig. 4A, 4B, and 4C include three respective graphs 400A, 400B, and 400C, respectively, showing the position, velocity, and acceleration of eye movement during a converging action, in accordance with at least one embodiment of the present disclosure. To determine when the peak velocity is reached, the system may use vergence acceleration data, which may be calculated from the eye tracking element as the second derivative of position. When the acceleration exceeds 0 DEG/s ² At this time, the velocity peaks. The system may then use the customized or preloaded master sequence relationship to predict the final convergence angle and thus the location at which the gaze distance is located. Fig. 4A to 4C show how the convergence position, the convergence speed, and the convergence acceleration are correlated.

In the position graph 400A of fig. 4A, an example convergence position of both eyes over a period of time is shown. The position is an angular position in diopters, which may be associated with the degrees of the convergence angle. The vergence position transitions from 0 diopter to about 2.5 diopters and reaches a substantially steady state within about 1250ms (about 1.25 seconds).

The velocity profile 400B of fig. 4B shows an example convergence velocity of both eyes aligned with the convergence position of the position profile 400A. The convergence speed is the angular velocity expressed in diopters per second. The speed increases rapidly and reaches a peak of about 5 diopters per second in about 400ms (about 0.4 seconds), after which the speed slows until reaching substantially zero in a steady state in about 1250 ms.

The acceleration graph 400C of fig. 4C shows an example vergence acceleration for both eyes aligned with the vergence position of the position graph 400A and the vergence speed of the velocity graph 400B. The vergence position is the angular acceleration in diopters per second. Acceleration peaks in about 250ms (about 0.25 seconds) and drops below zero while velocity peaks (which is in about 400 ms). When the speed becomes slow, the acceleration is negative.

Fig. 5 is a graph 500 of actual eye movement data (solid line) and an overlay model (dashed line) showing a coarse alignment response of both eyes in accordance with at least one embodiment of the present disclosure. As shown in fig. 3, the response amplitude of the main sequence is typically linear for less than about 4 °. When the response amplitude increases beyond about 4 °, the regression limit (e.g., confidence limit) describing the main sequence increases. This means that if the same algorithm is used for smaller responses, the final response amplitude of the larger response may be difficult to predict. For large variations in gaze distance, this may affect the performance of the proposed method.

The system according to the present disclosure can improve the performance of the method by utilizing how the brain controls the convergence response. Initially, vergence eye movement is initiated by a pulse (burst) of nerve discharges. This initial response will continue until completion even without visual input (e.g., turning off a light). As the gaze angle approaches its final destination, the vision system in the brain switches to use visual information (e.g., blur and/or convergence feedback) to fine tune the convergence position to match the desired gaze distance of the object of interest. Thus, when the eyes are vergence to a new gaze distance, there is a coarse vergence adjustment (e.g., initial response) and a fine vergence adjustment (e.g., the latter part of the response) in the brain. As the amplitude increases, fine tuning becomes a more important part of the overall response. When using the main sequence prediction method, the fine adjustments are more variable than the coarse adjustments and thus less predictable.

In fig. 5 a rough adjustment is shown, wherein the actual eye velocity data generally follows a quadratic fit of the data. On the right side of graph 500 in fig. 5, fine adjustment is shown, wherein the eye velocity data deviates from the quadratic fit of the data. The quadratic fit (dashed line) to the data in this figure shows what the vergence response amplitude would be if there were no visual feedback and the response was driven by the coarse alignment response only.

Fig. 6 is a graph 600 showing a relationship between peak velocity and response amplitude for coarse convergence of eyes in accordance with at least one embodiment of the present disclosure. In contrast to the graph 300 of fig. 3, which shows the full response amplitude (including coarse and fine adjustments), the graph 600 shows only the predicted coarse response amplitude. When comparing fig. 6 and 3, the variance of the primary sequence regression is significantly reduced and more linear when only the more predictable coarse response estimate is used. The techniques may further refine the predictive algorithms and/or machine learning models described above. As more gaze data and its dynamics are collected through use, the system of the present disclosure may compare the coarse response amplitude estimate with the actual response amplitude in order to further calibrate and improve the accuracy of the predictive model.

This approach will encounter another challenge when multiple convergence responses are generated in sequence before the eyes reach a steady state position. In this case, using only the first peak in the velocity signal may result in inaccurate predicted final response amplitude. To counteract this effect, the system may identify all peaks in the velocity signal (e.g., when the acceleration value exceeds zero), and may continually sum the expected amplitudes to improve the prediction of the final gaze distance.

Fig. 7 is a flow chart illustrating a method 700 of operating a head-mounted optical device in accordance with at least one embodiment of the present disclosure. At operation 710, a gaze direction and a movement speed of both eyes of a user may be measured using, for example, an eye-tracking element. Operation 710 may be performed in various ways. For example, the eye-tracking element may function as described with reference to fig. 2, 10, and/or 11.

At operation 720, a gaze distance of the user's eyes may be predicted before the user's eyes reach a final gaze state, the final gaze state being associated with the predicted gaze distance. The prediction may be performed within about 600ms before the final gaze state. Operation 710 may be performed in various ways. For example, predictions may be made by the at least one processor based on measured gaze directions and movement speeds of the user's eyes. The peak velocities of the eyes can be determined, which can be used to predict when the eyes reach steady state. In some embodiments, a machine learning model may be employed to make predictions.

The method 700 may also include additional operations. For example, at least the focal length of the zoom optical element may be changed based on the predicted gaze distance of the user's eyes (e.g., to substantially match the predicted gaze distance). The near-eye display (e.g., of a head-mounted display) may be used to present visual content to both eyes of the user, and portions of the visual content at perceived depths different from the predicted gaze distance may be obscured. In some examples, blurring may be achieved by fully rendering only visual content within the user's field of view, which in turn may reduce the computational requirements (and thus size and weight) of the overall system. This blurring can be done before the user's convergence of both eyes reaches the gaze distance.

Accordingly, the present disclosure includes systems, devices, and methods that may be used to predict a user's eye gaze distance before both eyes reach a steady gaze state. The disclosed concepts may reduce system latency of head-mounted optical systems such as: a head mounted display rendering focused cues/blurred cues of visual content presented to a user; and/or a variable focal length (e.g., optical power) variable focal length optical element.

Embodiments of the present disclosure may include or be implemented in conjunction with various types of artificial reality systems. An artificial reality is a form of reality that has been regulated in some way before being presented to a user, which may include, for example, virtual reality, augmented reality, mixed reality (mixed reality), mixed reality (hybrid reality), or some combination and/or derivative thereof. The artificial reality content may include entirely computer-generated content or computer-generated content in combination with captured (e.g., real-world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, any of which may be presented in a single channel or in multiple channels (e.g., stereoscopic video that brings three-dimensional (3D) effects to the viewer). Further, in some embodiments, the artificial reality may also be associated with an application, product, accessory, service, or some combination thereof, for creating content in the artificial reality and/or otherwise for the artificial reality (e.g., to perform an activity in the artificial reality), for example.

The artificial reality system may be implemented in a variety of different form factors and configurations. Some artificial reality systems may be designed to operate without a near-eye display (NED). Other artificial reality systems may include NEDs that also provide visibility to the real world (e.g., augmented reality system 800 in FIG. 8) or visually immersing a user in artificial reality (e.g., virtual reality system 900 in FIG. 9). While some artificial reality devices may be stand-alone systems, other artificial reality devices may communicate and/or coordinate with external devices to provide an artificial reality experience to a user. Examples of such external devices include a handheld controller, a mobile device, a desktop computer, a device worn by a user, a device worn by one or more other users, and/or any other suitable external system.

Turning to fig. 8, the augmented reality system 800 may include an eyeglass device 802 having a frame 810 configured to hold a left display device 815 (a) and a right display device 815 (B) in front of both eyes of a user. Display device 815 (a) and display device 815 (B) may act together or independently to present an image or series of images to a user. Although the augmented reality system 800 includes two displays, embodiments of the present disclosure may be implemented in an augmented reality system having a single NED or more than two nes.

In some embodiments, the augmented reality system 800 may include one or more sensors, such as sensor 840. The sensor 840 may generate measurement signals in response to the motion of the augmented reality system 800, and the sensor may be located on substantially any portion of the frame 810. Sensor 840 may represent one or more of a variety of different sensing mechanisms such as a position sensor, an inertial measurement unit (inertial measurement unit, IMU), a depth camera assembly, a structured light emitter and/or detector, or any combination thereof. In some embodiments, the augmented reality system 800 may or may not include a sensor 840, or may include more than one sensor. In embodiments where the sensor 840 includes an IMU, the IMU may generate calibration data based on measurement signals from the sensor 840. Examples of sensors 840 may include, but are not limited to, accelerometers, gyroscopes, magnetometers, other suitable types of sensors that detect motion, sensors for error correction of an IMU, or some combination thereof.

In some examples, the augmented reality system 800 may also include a microphone array having a plurality of acoustic transducers 820 (a) through 820 (J), collectively referred to as acoustic transducers 820. The acoustic transducer 820 may represent a transducer that detects changes in air pressure caused by sound waves. Each acoustic transducer 820 may be configured to detect sound and to convert the detected sound into an electronic format (e.g., analog format or digital format). The microphone array in fig. 8 may for example comprise ten acoustic transducers: 820 (a) and 820 (B) that may be designed to be placed within respective ears of a user; acoustic transducers 820 (C), 820 (D), 820 (E), 820 (F), 820 (G), and 820 (H), which may be positioned at different locations on frame 810; and/or acoustic transducers 820 (I) and 820 (J) that may be positioned on the corresponding neck strap 805.

In some embodiments, one or more of acoustic transducers 820 (a) to 820 (J) may be used as output transducers (e.g., speakers). For example, acoustic transducers 820 (a) and/or 820 (B) may be earplugs or any other suitable type of headphones or speakers.

The configuration of each acoustic transducer 820 of the microphone array may vary. Although the augmented reality system 800 is shown in fig. 8 as having ten acoustic transducers 820, the number of acoustic transducers 820 may be more or less than ten. In some embodiments, using a greater number of acoustic transducers 820 may increase the amount of audio information collected and/or the sensitivity and accuracy of the audio information. In contrast, using a fewer number of acoustic transducers 820 may reduce the computational power required by the associated controller 850 to process the collected audio information. Furthermore, the location of the individual acoustic transducers 820 of the microphone array may vary. For example, the locations of the acoustic transducers 820 may include defined locations on the user, defined coordinates on the frame 810, orientations associated with each acoustic transducer 820, or some combination thereof.

Acoustic transducers 820 (a) and 820 (B) may be positioned on different parts of the user's ear, such as behind the pinna, behind the tragus and/or within the auricle (auricle) or fossa. Alternatively, there may be additional acoustic transducers 820 on or around the ear in addition to the acoustic transducer 820 in the ear canal. Positioning the acoustic transducer 820 near the ear canal of the user may enable the microphone array to collect information about how sound reaches the ear canal. By positioning at least two of the acoustic transducers 820 on both sides of the user's head (e.g., as binaural microphones), the augmented reality device 800 may simulate binaural hearing and capture a 3D stereo sound field around the user's head. In some embodiments, acoustic converters 820 (a) and 820 (B) may be connected to augmented reality system 800 via wired connection 830, and in other embodiments acoustic converters 820 (a) and 820 (B) may be connected to augmented reality system 800 via a wireless connection (e.g., a bluetooth connection). In other embodiments, acoustic transducers 820 (a) and 820 (B) may not be used in conjunction with augmented reality system 800 at all.

The acoustic transducers 820 on the frame 810 may be positioned in a variety of different ways including along the length of the temple (sample), across the bridge, above or below the display device 815 (a) and the display device 815 (B), or some combination thereof. The acoustic transducer 820 may also be oriented such that the microphone array is capable of detecting sound in a wide range of directions around a user wearing the augmented reality system 800. In some embodiments, an optimization process may be performed during manufacture of the augmented reality system 800 to determine the relative positioning of the acoustic transducers 820 in the microphone array.

In some examples, the augmented reality system 800 may include or be connected to an external device (e.g., a pairing device), such as a neck strap 805. Neck strap 805 generally represents any type or form of mating device. Accordingly, the following discussion of the neck strap 805 may also apply to various other paired devices, such as charging boxes, smartwatches, smartphones, wrist straps, other wearable devices, hand-held controllers, tablet computers, laptop computers, other external computing devices, and the like.

As shown, the neck strap 805 may be coupled to the eyeglass apparatus 802 via one or more connectors. The connectors may be wired or wireless and may include electronic components and/or non-electronic (e.g., structural) components. In some cases, the eyeglass device 802 and the neck strap 805 can operate independently without any wired or wireless connection therebetween. Although fig. 8 shows the components of the eyeglass apparatus 802 and neck strap 805 at example locations on the eyeglass apparatus 802 and neck strap 805, the components may be located elsewhere on the eyeglass apparatus 802 and/or neck strap 805 and/or distributed differently on the eyeglass apparatus and/or neck strap. In some embodiments, the components of the eyeglass device 802 and the neck strap 805 can be located on one or more additional peripheral devices paired with the eyeglass device 802, the neck strap 805, or some combination thereof.

Pairing an external device (e.g., neck strap 805) with an augmented reality eyewear device may enable the eyewear device to implement the form factor of a pair of eyewear while still providing sufficient battery power and computing power for the extended capabilities. Some or all of the battery power, computing resources, and/or additional features of the augmented reality system 800 may be provided by or shared between the paired device and the eyeglass device, thereby reducing the weight, heat distribution, and form factor of the eyeglass device as a whole, while still maintaining the desired functionality. For example, the neck strap 805 may allow components that would otherwise be included on the eyeglass device to be included in the neck strap 805, as they may bear a heavier weight load on their shoulders than they would bear on their heads. The neck strap 805 may also have a large surface area through which to spread and dissipate heat to the surrounding environment. Thus, the neck strap 805 may allow for greater battery power and computing power than would otherwise be possible on a standalone eye device. Because the weight carried in the neck strap 805 may be less invasive to the user than the weight carried in the eyeglass device 802, the user may endure wearing a lighter eyeglass device and carrying or wearing a paired device for a longer period of time than a user would endure wearing a heavy, independent eyeglass device, thereby enabling the user to more fully integrate the artificial reality environment into his daily activities.

The neck strap 805 may be communicatively coupled with the eyeglass device 802 and/or communicatively coupled to other devices. These other devices may provide certain functions (e.g., tracking, positioning, depth map construction (depth mapping), processing, storage, etc.) to the augmented reality system 800. In the embodiment of fig. 8, the neck strap 805 may include two acoustic transducers (e.g., 820 (I) and 820 (J)) that are part of the microphone array (or potentially form its own microphone sub-array). The neck strap 805 may also include a controller 825 and a power supply 835.

The acoustic transducers 820 (I) and 820 (J) of the neck strap 805 may be configured to detect sound and convert the detected sound to an electronic format (analog or digital). In the embodiment of fig. 8, acoustic transducers 820 (I) and 820 (J) may be positioned on the neck strap 805, thereby increasing the distance between the neck strap acoustic transducers 820 (I) and 820 (J) and other acoustic transducers 820 positioned on the eyeglass apparatus 802. In some cases, increasing the distance between the acoustic transducers 820 of the microphone array may increase the accuracy of the beamforming performed via the microphone array. For example, if acoustic transducers 820 (C) and 820 (D) detect sound and the distance between acoustic transducers 820 (C) and 820 (D) is greater than, for example, the distance between acoustic transducers 820 (D) and 820 (E), the determined source location of the detected sound may be more accurate than when the sound is detected by acoustic transducers 820 (D) and 820 (E).

The controller 825 of the neck strap 805 may process information generated by the various sensors on the neck strap 805 and/or the augmented reality system 800. For example, the controller 825 may process information from the microphone array describing sounds detected by the microphone array. For each detected sound, the controller 825 may perform a direction-of-arrival (DOA) estimation to estimate from which direction the detected sound arrives at the microphone array. When sound is detected by the microphone array, the controller 825 can use this information to populate the audio data set. In embodiments where the augmented reality system 800 includes an inertial measurement unit, the controller 825 may calculate all inertial and spatial calculations from the IMU located on the eyeglass device 802. The connector may communicate information between the augmented reality system 800 and the neck strap 805, as well as between the augmented reality system 800 and the controller 825. The information may be in the form of optical data, electronic data, wireless data, or any other transmissible data. Moving the processing of information generated by the augmented reality system 800 to the neck strap 805 may reduce the weight and heat of the eyeglass device 802, making the eyeglass device more comfortable for the user.

The power supply 835 in the neck strap 805 may provide power to the eyeglass device 802 and/or the neck strap 805. The power supply 835 may include, but is not limited to, a lithium ion battery, a lithium polymer battery, a disposable lithium battery, an alkaline battery, or any other form of power storage. In some cases, power supply 835 may be a wired power supply. The inclusion of the power supply 835 on the neck strap 805 rather than on the eyeglass device 802 may help better disperse the weight and heat generated by the power supply 835.

As mentioned, some artificial reality systems may use a virtual experience to substantially replace one or more of the user's multiple sensory perceptions of the real world, rather than mixing artificial reality with real reality. One example of this type of system is a head mounted display system (e.g., virtual reality system 900 in fig. 9) that covers most or all of the user's field of view. The virtual reality system 900 may include a front rigid body 902 and a strap 904 shaped to fit around the user's head. The virtual reality system 900 may also include output audio transducers 906 (a) and 906 (B). Further, although not shown in fig. 9, the front rigid body 902 may include one or more electronic components including one or more electronic displays, one or more Inertial Measurement Units (IMUs), one or more tracking emitters or detectors, and/or any other suitable device or system for creating an artificial reality experience.

The artificial reality system may include various types of visual feedback mechanisms. For example, a display device in the augmented reality system 800 and/or in the virtual reality system 900 may include: one or more liquid crystal displays (liquid crystal display, LCD), light emitting diode (light emitting diode, LED) displays, micro LED (micro LED) displays, organic LED (organic light emitting diode, OLED) displays, digital light projection (digital light project, DLP) micro displays, liquid crystal on silicon (liquid crystal on silicon, LCoS) micro displays, and/or any other suitable type of display screen. These artificial reality systems may include a single display screen for both eyes, or one display screen may be provided for each eye, which may provide additional flexibility for zoom adjustment or for correcting refractive errors of the user. Some of these artificial reality systems may also include multiple optical subsystems having one or more lenses (e.g., conventional concave or convex lenses, fresnel lenses, adjustable liquid lenses, etc.) through which a user may view the display screen. These optical subsystems may be used for various purposes, including collimation of light (e.g., making objects appear to be at a greater distance than their physical distance), magnification (e.g., making objects appear to be larger than their physical size), and/or transmission (transmitting light to, for example, both eyes of a viewer). These optical subsystems may be used for direct-view architectures (non-pupil-forming architecture) (e.g., single lens configurations that directly collimate light but produce so-called pincushion distortion) and/or for non-direct-view architectures (pupil-forming architecture) (e.g., multi-lens configurations that produce so-called barrel distortion to counteract pincushion distortion).

Some of the plurality of artificial reality systems described herein may include one or more projection systems in addition to or instead of using a display screen. For example, the display devices in the augmented reality system 800 and/or in the virtual reality system 900 may include micro LED projectors that project light into the display devices (using, for example, waveguides), such as transparent combination lenses that allow ambient light to pass through. The display device may refract the projected light toward the pupil of the user, and may enable the user to view both the artificial reality content and the real world simultaneously. The display device may use any of a variety of different optical components to achieve this end, including waveguide components (e.g., holographic, planar, diffractive, polarizing, and/or reflective), light-manipulating surfaces and elements (e.g., diffractive, reflective, and refractive, and optical), coupling elements, and the like. The artificial reality system may also be configured with any other suitable type or form of image projection system, such as a retinal projector used in a virtual retinal display.

The artificial reality systems described herein may also include various types of computer vision components and subsystems. For example, the augmented reality system 800 and/or the virtual reality system 900 may include one or more optical sensors, such as two-dimensional (2D) or 3D cameras, structured light emitters and detectors, time-of-flight depth sensors, single beam rangefinders or scanning laser rangefinders, 3D LiDAR (LiDAR) sensors, and/or any other suitable type or form of optical sensor. The artificial reality system may process data from one or more of these sensors to identify the user's location, map the real world, provide the user with context about the real world environment, and/or perform various other functions.

The artificial reality system described herein may also include one or more input audio transducers and/or output audio transducers. The output audio transducer may include a voice coil speaker, a ribbon speaker, an electrostatic speaker, a piezoelectric speaker, a bone conduction transducer, a cartilage conduction transducer, a tragus vibration transducer, and/or any other suitable type or form of audio transducer. Similarly, the input audio transducer may include a condenser microphone, a dynamic microphone, a ribbon microphone, and/or any other type or form of input transducer. In some embodiments, a single transducer may be used for both the audio input and the audio output.

In some embodiments, the artificial reality systems described herein may also include haptic feedback systems, which may be incorporated into headwear, gloves, clothing, hand-held controllers, environmental devices (e.g., chairs, floor mats, etc.), and/or any other type of device or system. The haptic feedback system may provide various types of skin feedback including vibration, thrust, traction, texture, and/or temperature. Haptic feedback systems may also provide various types of kinesthetic feedback, such as motion and compliance. Haptic feedback may be implemented using motors, piezoelectric actuators, fluid systems, and/or various other types of feedback mechanisms. The haptic feedback system may be implemented independently of and/or in conjunction with other artificial reality devices.

By providing haptic perception, auditory content, and/or visual content, an artificial reality system may create a complete virtual experience or enhance a user's real-world experience in various contexts and environments. For example, an artificial reality system may help or extend a user's perception, memory, or cognition within a particular environment. Some systems may enhance user interaction with others in the real world or may enable more immersive interaction with others in the virtual world. The artificial reality system may also be used for educational purposes (e.g., for teaching or training in schools, hospitals, government institutions, military institutions, commercial enterprises, etc.), entertainment purposes (e.g., for playing video games, listening to music, watching video content, etc.), and/or for accessibility purposes (e.g., as hearing aids, visual aids, etc.). Embodiments disclosed herein may be capable of implementing or enhancing a user's artificial reality experience in one or more of these contexts and environments and/or in other contexts and environments.

In some embodiments, the systems described herein may also include an eye-tracking subsystem designed to identify and track various characteristics of a user's single or both eyes, such as the user's gaze direction. In some examples, the term "eye-tracking" may refer to a process by which the position, orientation, and/or movement of an eye is measured, detected, sensed, determined, and/or monitored. The disclosed systems may measure the position, orientation, and/or movement of the eye in a variety of different ways, including through the use of various optical-based eye-tracking techniques, ultrasonic-based eye-tracking techniques, and the like. The eye-tracking subsystem may be configured in a number of different ways and may include a variety of different eye-tracking hardware components or other computer vision components. For example, the eye-tracking subsystem may include a variety of different optical sensors, such as a two-dimensional (2D) camera or 3D camera, a time-of-flight depth sensor, a single beam rangefinder or scanning laser rangefinder, a 3D laser radar (LiDAR) sensor, and/or any other suitable type or form of optical sensor. In this example, the processing subsystem may process data from one or more of these sensors to measure, detect, determine, and/or otherwise monitor the position, orientation, and/or movement of a user's single or both eyes.

Fig. 10 is an illustration of an exemplary system 1000 that includes an eye-tracking subsystem capable of tracking a user's single or both eyes. As depicted in fig. 10, system 1000 may include a light source 1002, an optical subsystem 1004, an eye-tracking subsystem 1006, and/or a control subsystem 1008. In some examples, the light source 1002 may generate light for an image (e.g., the eye 1001 of which the image is to be presented to a viewer). The light source 1002 may represent any of a variety of suitable devices. For example, the light source 1002 may include a two-dimensional projector (e.g., LCoS display), a scanning source (e.g., scanning laser), or other device (e.g., LCD, LED display, OLED display, active-matrix OLED (AMOLED) display, transparent OLED (transparent OLED display, TOLED) display, waveguide, or some other display capable of generating light for presenting an image to a viewer. In some examples, the image may represent a virtual image, which may refer to an optical image formed from apparent divergence (apparent divergence) of light rays from points in space, rather than an image formed from actual divergence of light rays.

In some embodiments, the optical subsystem 1004 may receive light generated by the light source 1002 and generate convergent light 1020 including an image based on the received light. In some examples, optical subsystem 1004 may include any number of lenses (e.g., fresnel lenses, convex lenses, concave lenses), apertures, filters, mirrors, prisms, and/or other optical components, possibly in combination with actuators and/or other devices. In particular, actuators and/or other devices may translate and/or rotate one or more of the plurality of optical components to change one or more aspects of the converging light 1020. Further, various mechanical couplings may be used to maintain the relative spacing and/or orientation of the optical components in any suitable combination.

In one embodiment, the eye-tracking subsystem 1006 may generate tracking information indicative of the gaze angle of the viewer's eyes 1001. In this embodiment, control subsystem 1008 may control aspects of optical subsystem 1004 (e.g., the angle of incidence of converging light 1020) based at least in part on the tracking information. Further, in some examples, control subsystem 1008 may store and utilize historical tracking information (e.g., a history of tracking information over a given duration (e.g., the previous second or a fraction of the previous second)) to anticipate a gaze angle of eye 1001 (e.g., an angle between a visual axis and an anatomical axis of eye 1001). In some embodiments, eye-tracking subsystem 1006 may detect radiation emanating from a portion of eye 1001 (e.g., cornea, iris, pupil, etc.) to determine the current gaze angle of eye 1001. In other examples, eye tracking subsystem 1006 may employ a wavefront sensor to track the current position of the pupil.

Any number of techniques may be used to track the eye 1001. Some techniques may involve illuminating the eye 1001 with infrared light and measuring the reflection using at least one optical sensor tuned to be sensitive to infrared light. Information regarding how infrared light is reflected from the eye 1001 may be analyzed to determine one or more locations, one or more orientations, and/or one or more movements of one or more eye features (e.g., cornea, pupil, iris, and/or retinal blood vessels).

In some examples, the radiation collected by the sensors of eye-tracking subsystem 1006 may be digitized (i.e., converted to electronic signals). Further, the sensor may send a digital representation of the electronic signal to one or more processors (e.g., a processor associated with a device that includes the eye-tracking subsystem 1006). Eye-tracking subsystem 1006 may include any of a variety of sensors in a variety of different configurations. For example, eye tracking subsystem 1006 may include an infrared detector that reacts to infrared radiation. The infrared detector may be a thermal detector, a photon detector, and/or any other suitable type of detector. The thermal detector may comprise a detector that reacts to thermal effects of the incident infrared radiation.

In some examples, the one or more processors may process the digital representation generated by the one or more sensors of the eye-tracking subsystem 1006 to track the movement of the eye 1001. In another example, the processors may track movement of the eye 1001 by executing algorithms represented by computer-executable instructions stored on non-transitory memory. In some examples, on-chip logic (e.g., application-specific integrated circuit (application-specific integrated circuit) or ASIC) may be used to execute at least portions of these algorithms. As noted, eye tracking subsystem 1006 may be programmed to use the output of one or more sensors to track movement of eye 1001. In some embodiments, eye-tracking subsystem 1006 may analyze the digital representation generated by the sensor to extract eye-rotation information from changes in reflection. In one embodiment, eye-tracking subsystem 1006 may use corneal reflection or glint (also known as Purkinje (Purkinje) images) and/or the center of eye pupil 1022 as features that track over time.

In some embodiments, eye tracking subsystem 1006 may use the center of eye pupil 1022, as well as infrared or near infrared light, non-collimated light to produce corneal reflection. In these embodiments, the eye-tracking subsystem 1006 may use the vector between the center of the eye pupil 1022 and the corneal reflection to calculate the gaze direction of the eye 1001. In some embodiments, the disclosed system may perform a calibration process on an individual (using, for example, supervised or unsupervised techniques) prior to tracking both eyes of the user. For example, the calibration process may include directing the user to look at one or more points displayed on the display while the eye-tracking system records a value corresponding to each gaze location associated with each point.

In some embodiments, eye-tracking subsystem 1006 may use two types of infrared and/or near-infrared (also referred to as active light) eye-tracking techniques: bright pupil eye tracking and dark pupil eye tracking, which can be distinguished based on the position of the illumination source relative to the optical element used. If the illumination is coaxial with the light path, eye 1001 may act as a retroreflector because light is reflected from the retina, producing a bright pupil effect similar to the red eye effect in photography. If the illumination source is off-path, the eye pupil 1022 may appear dark because the retroreflection from the retina is directed away from the sensor. In some embodiments, bright pupil tracking may result in greater iris/pupil contrast, allowing for more robust eye tracking with iris pigmentation, and may have reduced interference (e.g., interference caused by lashes and other blurred features). Bright pupil tracking may also allow tracking under illumination conditions ranging from completely dark to very bright environments.

In some embodiments, control subsystem 1008 may control light source 1002 and/or optical subsystem 1004 to reduce optical aberrations (e.g., chromatic and/or monochromatic aberrations) of the image that may be caused by eye 1001 or affected by eye 1001. In some examples, control subsystem 1008 may use tracking information from eye-tracking subsystem 1006 to perform such control, as described above. For example, in controlling the light source 1002, the control subsystem 1008 may change the light generated by the light source 1002 (e.g., by image rendering) to modify (e.g., pre-distort) the image to reduce aberrations of the image caused by the eye 1001.

The disclosed system may track both the position and relative size of the pupil (e.g., due to pupil dilation and/or constriction). In some examples, eye tracking devices and components (e.g., sensors and/or sources) for detecting and/or tracking pupils may be different (or calibrated differently) for different types of eyes. For example, the frequency range of the sensor may be different (or calibrated separately) for eyes of different colors and/or different pupil types and/or sizes, etc. Thus, it may be desirable to calibrate the various eye-tracking components described herein (e.g., infrared sources and/or sensors) for each individual user and/or eye.

The disclosed system can track both eyes with and without ophthalmic correction (e.g., ophthalmic correction provided by contact lenses worn by a user). In some embodiments, an ophthalmic corrective element (e.g., an adjustable lens) may be incorporated directly into the artificial reality system described herein. In some examples, the color of the user's eye may necessitate modification of the corresponding eye-tracking algorithm. For example, it may be desirable to modify the eye-tracking algorithm based at least in part on the different color contrast between a brown eye and, for example, a blue eye.

Fig. 11 is a more detailed illustration of various aspects of the eye-tracking subsystem shown in fig. 10. As shown in this figure, eye-tracking subsystem 1100 may include at least one source 1104 and at least one sensor 1106. The source 1104 generally represents any type or form of element capable of emitting radiation. In one example, the source 1104 may generate visible radiation, infrared radiation, and/or near infrared radiation. In some examples, the source 1104 may radiate a non-collimated infrared portion and/or near infrared portion of the electromagnetic spectrum toward the user's eye 1102. The source 1104 may utilize various sampling rates and sampling speeds. For example, the disclosed system may use a source with a higher sampling rate in order to collect gaze eye movements of the user's eyes 1102 and/or to properly measure saccadic dynamics of the user's eyes 1102. As described above, any type or form of eye-tracking technique (including optical-based eye-tracking techniques, ultrasonic-based eye-tracking techniques, etc.) may be used to track the user's eye 1102.

Sensor 1106 generally represents any type or form of element capable of detecting radiation (e.g., radiation reflected from a user's eye 1102). Examples of sensors 1106 include, but are not limited to: charge coupled devices (charge coupled device, CCDs), photodiode arrays, and/or Complementary Metal Oxide Semiconductor (CMOS) based sensor devices, and the like. In one example, sensor 1106 may represent a sensor having predetermined parameters including, but not limited to: dynamic resolution range, linearity, and/or other characteristics specifically selected and/or designed for eye tracking.

As detailed above, the eye-tracking subsystem 1100 may generate one or more flashes. As detailed above, the glints 1103 may represent reflections of radiation (e.g., infrared radiation from an infrared source (e.g., source 1104)) from structures of the user's eye. In various embodiments, the glints 1103 and/or the pupils of the user may be tracked using eye tracking algorithms executed by a processor (internal or external to the artificial reality device). For example, the artificial reality device may include a processor and/or a storage device to perform eye movement tracking locally; and/or a transceiver for transmitting and receiving data required to perform eye tracking on an external device (e.g., a mobile phone, cloud server, or other computing device).

Fig. 11 shows an example image 1105 acquired by an eye-tracking subsystem (e.g., eye-tracking subsystem 1100). In this example, image 1105 may include both the user's pupil 1108 and a glint 1110 in the vicinity of the pupil. In some examples, an artificial intelligence-based algorithm (e.g., a computer vision-based algorithm) may be used to identify pupil 1108 and/or glints 1110. In one embodiment, image 1105 may represent a single frame in a series of frames that may be continuously analyzed to track user's eye 1102. In addition, pupil 1108 and/or glints 1110 may be tracked over a period of time to determine the user's gaze.

In one example, eye tracking subsystem 1100 may be configured to identify and measure a user's pupil distance (IPD). In some embodiments, eye-tracking subsystem 1100 may measure and/or calculate the IPD of a user while the user is wearing an artificial reality system. In these embodiments, eye tracking subsystem 1100 may detect the position of both eyes of the user and may use this information to calculate the user's IPD.

As described above, the eye-tracking systems or eye-tracking subsystems disclosed herein may track a user's eye position and/or eye movement in various ways. In one example, one or more light sources and/or optical sensors may capture images of both eyes of a user. The eye-tracking subsystem may then use the acquired information, including the magnitude and/or gaze direction of the twist and rotation of each eye (i.e., roll, pitch, and yaw) to determine the user's pupil distance, eye distance, and/or 3D position of each eye (e.g., for distortion adjustment purposes). In one example, infrared light may be emitted by the eye-tracking subsystem and reflected from each eye. The reflected light may be received or detected by an optical sensor and analyzed to extract eye rotation data from changes in the infrared light reflected by each eye.

The eye tracking subsystem may use any of a variety of different methods to track both eyes of a user. For example, a light source (e.g., an infrared light emitting diode) may emit a dot pattern onto each eye of the user. The eye-tracking subsystem may then detect and analyze the reflection of the point pattern from each eye of the user (e.g., via an optical sensor coupled to the artificial reality system) to identify the location of each pupil of the user. Thus, the eye-tracking subsystem may track up to six degrees of freedom (i.e., 3D position, flip, up-down motion, and side-to-side motion) for each eye, and may combine at least a subset of the tracked amounts from both eyes of the user to estimate gaze point (i.e., 3D position or position in a virtual scene that the user is viewing) and/or IPD.

In some cases, the distance between the user's pupil and the display may change as the user's eye moves in different directions for viewing. The varying distance between the pupil and the display when the viewing direction is changed may be referred to as "pupil wander (swim)", and may cause distortion perceptible to the user due to focusing of light at different positions as the distance between the pupil and the display is changed. Thus, measuring distortion at different eye positions and pupillary distances relative to the display, and generating distortion corrections for different positions and distances, may allow: distortion caused by pupil wander is reduced by tracking the 3D position of the user's eyes and applying distortion corrections corresponding to the 3D position of each of the user's eyes at a given point in time. Thus, knowing the 3D position of each of the user's eyes may allow for reducing distortion caused by variations in the distance between the pupil of the eye and the display by applying distortion correction to each 3D eye position. Furthermore, knowing the position of each of the user's eyes may also enable the eye tracking subsystem to automatically adjust the user's IPD, as described above.

In some embodiments, the display subsystem may include various additional subsystems that may work in conjunction with the eye-tracking subsystem described herein. For example, the display subsystem may include a zoom subsystem, a scene rendering module, and/or a vergence processing module. The zoom subsystem may cause the left display element and the right display element to change a focal length of the display device. In one embodiment, the zoom subsystem may physically change the distance between the display (through which it is viewed) and the optics by moving the display, the optics, or both. Furthermore, two lenses that move or translate relative to each other may also be used to change the focal length of the display. Thus, the zoom subsystem may include an actuator or motor that moves the display and/or optics to change the distance between the display and/or optics. The zoom subsystem may be separate from or integrated into the display subsystem. The zoom subsystem may also be integrated into or separate from its actuation subsystem and/or eye-tracking subsystem described herein.

In one example, the display subsystem may include a convergence processing module configured to determine a convergence depth of a user's gaze based on an estimated intersection of gaze points and/or lines of sight (gaze lines) determined by the eye-tracking subsystem. Vergence may refer to both eyes simultaneously moving or rotating in opposite directions to maintain both eyes single vision, which may be performed naturally and automatically by the human eyes. Thus, the position at which the user's eyes are vergence is the position at which the user is looking, and is typically also the position at which the user's eyes are focused. For example, the convergence processing module may triangulate the line of sight to estimate a distance or depth from the user associated with the intersection of the line of sight. The depth associated with the intersection of the lines of sight may then be used as an approximation of the focus distance that may identify the distance from the user where the user's eyes are pointing. Thus, the vergence distance may allow determining the position where the user's eyes should be focused, and the depth of the user's eyes from the user's eyes, thereby providing information (e.g., object or focal plane) for rendering adjustments to the virtual scene.

The vergence processing module may cooperate with the eye-tracking subsystem described herein to adjust the display subsystem to account for the vergence depth of the user. When the user focuses on something far away, the user's pupils may be separated slightly farther than when the user focuses on something near. The eye tracking subsystem may acquire information about the user's convergence or depth of focus and may adjust the display subsystem closer when the user's eyes focus or converge on something near and farther when the user's eyes focus or converge on something far away.

For example, eye-tracking information generated by the eye-tracking subsystem described above may also be used to modify how various aspects of different computer-generated images are presented. For example, the display subsystem may be configured to modify at least one aspect of how the computer-generated image is presented based on information generated by the eye-tracking subsystem. For example, the computer-generated image may be modified based on the user's eye movement such that if the user looks up, the computer-generated image may be moved up on the screen. Similarly, if the user looks to one side or down, the computer-generated image may be moved to that side or down on the screen. If the user closes his eyes, the computer-generated image may be paused or removed from the display and resumed once the user opens both eyes again.

The eye-tracking subsystem described above may be incorporated into one or more of the various artificial reality systems described herein in various ways. For example, one or more of the various components of system 1000 and/or eye-tracking subsystem 1100 may be incorporated into augmented reality system 800 of fig. 8 and/or virtual reality system 900 of fig. 9 to enable these systems to perform various eye-tracking tasks (including one or more of the eye-tracking operations described herein).

The present disclosure also includes the following example embodiments:

example 1: a head-mounted optical system may include: an eye-tracking subsystem configured to determine at least a gaze direction of both eyes of the user and an eye-movement speed of both eyes of the user; and a gaze distance prediction subsystem configured to predict a gaze distance at which both eyes of the user will become gazed before the both eyes of the user reach a gaze state associated with the predicted gaze distance based on an eye movement speed and a gaze direction of the both eyes of the user.

Example 2: the head-mounted optical system of example 1, further comprising a zoom optical element fixed in position in front of both eyes of the user when the head-mounted optical system is worn by the user, the zoom optical element configured to change at least one optical characteristic based on information from the eye-tracking subsystem and the gaze-distance prediction subsystem, the at least one optical characteristic comprising a focal length.

Example 3: the head-mounted optical system according to example 2, wherein the zoom optical element includes: a substantially transparent support element; a substantially transparent deformable element coupled to the support element at least along an edge of the deformable element; and a substantially transparent deformable medium disposed between the support element and the deformable element.

Example 4: the headset optical system of example 3, wherein the zoom optical element further comprises a zoom actuator configured to change at least one optical characteristic of the zoom optical element when actuated.

Example 5: the head-mounted optical system of example 4, wherein the zoom actuator comprises at least one substantially transparent electrode coupled to the deformable element.

Example 6: the head-mounted optical system of any of examples 2-5, wherein the zoom optical element comprises a liquid crystal element configured to change at least one optical characteristic of the zoom optical element when activated.

Example 7: the head-mounted optical system of any of examples 1-6, further comprising a near-eye display configured to display visual content to a user.

Example 8: the head-mounted optical system of example 7, wherein the near-eye display is operated to fully render only portions of the visual content at perceived depths of the user's binocular gaze.

Example 9: the head-mounted optical system of any of examples 1-8, wherein the gaze distance prediction subsystem is configured to predict a gaze distance at which both eyes of the user will become gazed within 600ms before both eyes of the user reach a gaze state associated with the predicted gaze distance.

Example 10: a method of operating a head-mounted optical device may include: measuring a gaze direction and a movement speed of both eyes of the user using the eye-tracking element; and predicting, using the at least one processor and based on the measured gaze direction and movement speed of the user's eyes, a gaze distance of the user's eyes before the user's eyes reach a gaze state associated with the predicted gaze distance.

Example 11: the method of example 10, further comprising: based on the predicted gaze distance of the eyes of the user, at least the focal length of the zoom optical element is changed using the zoom optical element.

Example 12: the method of example 10 or 11, further comprising: presenting visual content to both eyes of a user using a near-eye display; and fully rendering only portions of the visual content at the predicted gaze distance of both eyes of the user.

Example 13: the method of example 12, wherein the full rendering of only portions of the visual content is completed before the user's convergence of eyes reaches the gaze distance.

Example 14: the method of any one of examples 10 to 13, wherein: measuring the moving speed of both eyes of the user includes measuring the maximum speed of both eyes of the user; and the predicted gaze distance is based at least in part on a maximum speed of both eyes of the user.

Example 15: the method of any of examples 10-14, wherein the predicting of the gaze distance at which the user's eyes will become gazed is done within 600ms before the user's eyes reach the gaze state associated with the predicted gaze distance.

Example 16: a non-transitory computer-readable medium comprising one or more computer-executable instructions that, when executed by at least one processor of a computing device, cause the computing device to: measuring a gaze direction and a movement speed of both eyes of the user using the eye-tracking element; and predicting a gaze distance of the user's eyes before the user's eyes reach a gaze state associated with the predicted gaze distance based on the measured gaze direction and movement speed of the user's eyes.

Example 17: the non-transitory computer-readable medium of example 16, wherein the one or more computer-executable instructions further cause the computing device to: based on the predicted gaze distance of the eyes of the user, at least the focal length of the zoom optical element is changed using the zoom optical element.

Example 18: the non-transitory computer-readable medium of examples 16 or 17, wherein the one or more computer-executable instructions further cause the computing device to: presenting visual content to both eyes of a user using a near-eye display; and fully rendering only portions of the visual content at the predicted gaze distance of both eyes of the user.

Example 19: the non-transitory computer-readable medium of example 18, wherein the one or more computer-executable instructions further cause the computing device to: complete rendering of only portions of the visual content is completed before the user's convergence of both eyes reaches the gaze distance.

Example 20: the non-transitory computer-readable medium of any one of examples 16-19, wherein the one or more computer-executable instructions further cause the computing device to: within 600ms before the eyes of the user reach the gaze state associated with the predicted gaze distance, the prediction of the gaze distance at which the eyes of the user will become gazed is completed.

As noted above, the computing devices and systems described and/or illustrated herein broadly represent any type or form of computing device or system capable of executing computer-readable instructions (e.g., those included in the modules described herein). In its most basic configuration, the one or more computing devices may each include at least one storage device and at least one physical processor.

In some examples, the term "storage device" refers generally to any type or form of volatile or non-volatile storage device or medium capable of storing data and/or computer-readable instructions. In one example, a storage device may store, load, and/or maintain one or more of the plurality of modules described herein. Examples of storage devices include, but are not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), flash Memory, hard Disk Drive (HDD), solid-State Drive (SSD), optical Disk Drive, cache Memory, variations or combinations of one or more of the above, or any other suitable storage Memory.

In some examples, the term "physical processor" refers generally to any type or form of hardware-implemented processing unit capable of interpreting and/or executing computer-readable instructions. In one example, a physical processor may access and/or modify one or more modules stored in the storage device described above. Examples of physical processors include, but are not limited to, microprocessors, microcontrollers, central processing units (Central Processing Unit, CPUs), field-programmable gate arrays (Field-Programmable Gate Array, FPGAs) implementing soft-core processors, application-specific integrated circuits (ASICs), portions of one or more of the foregoing, variations or combinations of one or more of the foregoing, or any other suitable physical processor.

Although the modules described and/or illustrated herein are illustrated as separate elements, these modules may represent individual modules or portions of an application. Further, in some embodiments, one or more of these modules may represent one or more software applications or programs that, when executed by a computing device, may cause the computing device to perform one or more tasks. For example, one or more of the modules described and/or illustrated herein may represent such modules: the modules are stored in and configured to run on one or more of the computing devices or systems described and/or illustrated herein. One or more of these modules may also represent all or part of one or more special purpose computers configured to perform one or more tasks.

Further, one or more of the plurality of modules described herein may convert data, physical devices, and/or representations of physical devices from one form to another. For example, one or more of the modules described herein may receive measurement data to be converted, convert the measurement data, output conversion results to predict a gaze distance of a user's eyes, use the conversion results to change focus cues of visual content displayed to the user, and store the conversion results to update the machine learning model. Additionally or alternatively, one or more of the modules described herein may transform a processor, volatile memory, non-volatile memory, and/or any other portion of a physical computing device from one form to another by executing on, storing data on, and/or otherwise interacting with the computing device.

In some embodiments, the term "computer-readable medium" refers generally to any form of device, carrier, or medium capable of storing or carrying computer-readable instructions. Examples of computer readable media include, but are not limited to, transmission type media such as carrier waves, and non-transitory type media such as magnetic storage media (e.g., hard Disk drives, tape drives, and floppy disks), optical storage media (e.g., compact disks, CDs), digital video disks (Digital Video Disk, DVDs), and blu-ray disks), electronic storage media (e.g., solid state drives and flash memory media), and other distribution systems.

The process parameters and sequence of steps described and/or illustrated herein are given by way of example only and may be varied as desired. For example, although steps illustrated and/or described herein may be shown or discussed in a particular order, the steps need not be performed in the order illustrated or discussed. The various exemplary methods described and/or illustrated herein may also omit one or more of the steps described or illustrated herein or include additional steps in addition to those disclosed.

The previous description has been provided to enable any person skilled in the art to best utilize various aspects of the exemplary embodiments disclosed herein. The exemplary description is not intended to be exhaustive or to be limited to any precise form disclosed. Many modifications and variations are possible without departing from the scope of the disclosure. The embodiments disclosed herein are to be considered in all respects as illustrative and not restrictive. In determining the scope of the present disclosure, reference should be made to any claims appended hereto and their equivalents.

The terms "connected to" and "coupled to" (and derivatives thereof) as used in the specification and/or claims, are to be interpreted as allowing both direct connection and indirect connection (i.e., via other elements or components) unless otherwise indicated. Furthermore, the terms "a" or "an", as used in the description and claims, should be interpreted to mean "at least one". Finally, for convenience in use, the terms "comprising" and "having" (and their derivatives) used in the specification and claims are interchangeable with the term "comprising" and have the same meaning.

Claims

1. A head-mounted optical system, comprising:

an eye-tracking subsystem configured to determine at least a gaze direction of both eyes of a user and an eye-movement speed of both eyes of the user; and

a gaze distance prediction subsystem configured to predict a gaze distance at which both eyes of the user will become gazed before reaching a gaze state associated with the predicted gaze distance based on the eye movement velocity and the gaze direction of both eyes of the user.

2. The head-mounted optical system of claim 1, further comprising: a zoom optical element fixed in a position in front of both eyes of the user when the head-mounted optical system is worn by the user, the zoom optical element configured to change at least one optical characteristic based on information from the eye-tracking subsystem and the gaze distance prediction subsystem, the at least one optical characteristic comprising a focal length.

3. The head-mounted optical system of claim 2, wherein the zoom optical element comprises:

A substantially transparent support element;

a substantially transparent deformable element coupled to the support element at least along an edge of the deformable element; and

a substantially transparent deformable medium disposed between the support element and the deformable element.

4. The headset optical system of claim 3, wherein the zoom optical element further comprises a zoom actuator configured to change the at least one optical characteristic of the zoom optical element when actuated; and/or preferably wherein the zoom actuator comprises at least one substantially transparent electrode coupled to the deformable element.

5. The head-mounted optical system of claim 2, wherein the zoom optical element comprises a liquid crystal element configured to change the at least one optical characteristic of the zoom optical element when activated.

6. The head-mounted optical system of any of the preceding claims, further comprising a near-eye display configured to display visual content to the user; and/or preferably wherein the near-eye display is operated to fully render only portions of the visual content at perceived depths of the user's binocular gaze.

7. The head mounted optical system of any of the preceding claims, wherein the gaze distance prediction subsystem is configured to predict the gaze distance at which both eyes of the user will become gazed within 600ms before both eyes of the user reach the gaze state associated with the predicted gaze distance.

8. A computer-implemented method of operating a head-mounted optical device, the method comprising:

measuring gaze direction and movement speed of both eyes of a user using an eye-tracking element; and

predicting, using at least one processor and based on the measured gaze direction and the movement speed of both eyes of the user, a gaze distance of both eyes of the user before the both eyes of the user reach a gaze state, the gaze state being associated with the predicted gaze distance.

9. The method of claim 8, further comprising:

based on the predicted gaze distance of both eyes of the user, at least a focal length of the zoom optical element is changed using the zoom optical element.

10. The method of claim 8 or 9, further comprising:

presenting visual content to both eyes of the user using a near-eye display; and

Fully rendering only portions of the visual content at the predicted gaze distance of both eyes of the user; and/or preferably, wherein only a complete rendering of portions of the visual content is completed before the convergence of both eyes of the user reaches the gaze distance.

11. The method of any one of claims 8 to 10, wherein:

measuring the movement speed of both eyes of the user includes measuring a maximum speed of both eyes of the user; and is also provided with

Predicting the gaze distance is based at least in part on the maximum speeds of both eyes of the user; and/or preferably, wherein the prediction of the gaze distance at which both eyes of the user will become gazed is done within 600ms before both eyes of the user reach the gaze state associated with the predicted gaze distance.

12. A non-transitory computer-readable medium comprising one or more computer-executable instructions that, when executed by at least one processor of a computing device, cause the computing device to:

Measuring a gaze direction and a movement speed of both eyes of the user using the eye-tracking element; and

based on the measured gaze direction and the movement speed of the user's eyes, a gaze distance of the user's eyes is predicted before the user's eyes reach a gaze state associated with the predicted gaze distance.

13. The non-transitory computer-readable medium of claim 12, wherein the one or more computer-executable instructions further cause the computing device to: based on the predicted gaze distance of both eyes of the user, at least a focal length of the zoom optical element is changed using the zoom optical element.

14. The non-transitory computer-readable medium of claim 12 or 13, wherein the one or more computer-executable instructions further cause the computing device to:

fully rendering only portions of the visual content at the predicted gaze distance of both eyes of the user; and/or preferably, wherein the one or more computer-executable instructions further cause the computing device to: complete rendering of only portions of the content is completed before the user's convergence of both eyes reaches the gaze distance.

15. The non-transitory computer-readable medium of any of claims 12-14, wherein the one or more computer-executable instructions further cause the computing device to: within 600ms before both eyes of the user reach the gaze state associated with the predicted gaze distance, prediction of the gaze distance at which both eyes of the user will become gazed is completed.