CN109791605B

CN109791605B - Adaptive parameters in image regions based on eye tracking information

Info

Publication number: CN109791605B
Application number: CN201780059271.6A
Authority: CN
Inventors: 马丁·亨里克·塔尔; 哈维尔·圣阿古斯丁洛佩斯; 拉斯穆斯·达尔
Original assignee: Meta Platforms Technologies LLC
Current assignee: Meta Platforms Technologies LLC
Priority date: 2016-08-01
Filing date: 2017-07-31
Publication date: 2023-05-23
Anticipated expiration: 2037-07-31
Also published as: CN109791605A; EP3345125A4; US20180033405A1; US10984756B2; KR20190026030A; WO2018026730A1; US20190318708A1; KR102403304B1; JP2021144227A; JP7177213B2; KR20220080000A; JP2019533325A; EP3345125A1; US10373592B2; EP4180920A1; KR102543341B1; JP6886510B2

Abstract

A display system divides a screen into a plurality of regions and applies a different set of rendering/encoding parameters to each region. The system applies a first set of parameters to a first region seen by a fovea of a user's eye. The system may also apply a second set of parameters to a second region seen by the fovea of the eye and a third set of parameters to a third region seen by a region of the eye outside the fovea. Selecting the first set of parameters yields a relatively high image quality, while the second set of parameters yields an intermediate quality, and the third set of parameters yields a lower quality. As a result, the second region and the third region may be rendered, encoded, and transmitted with less computing power and less bandwidth.

Description

Adaptive parameters in image regions based on eye tracking information

Technical Field

The present invention relates generally to eye tracking and, more particularly, to the use of adaptive parameters in image regions based on eye tracking information.

Background

In various contexts, images are rendered, encoded, and displayed to a user. In many cases, the process for rendering, encoding, and transmitting images for display to a user can consume significant computational resources, especially when the images have a relatively high resolution, such as 1080p or 4K, or when the images are part of a sequence of frames that make up a video, such as a video file or scene generated by a gaming application. This can lead to undesirable side effects such as higher power consumption and longer processing times.

Disclosure of Invention

The display system applies different sets of parameters to different image areas. The system receives eye tracking information for one or both eyes of a user viewing a screen of a display screen. The system determines a first screen region and a second screen region based on the eye tracking information. In one embodiment, the first screen region is a portion of the screen seen by the fovea of the user's eye, and the second screen region is a portion of the screen seen by a portion of the retina outside the fovea. The system processes an image for display on a screen by applying a first set of parameters to a first image area and a second set of parameters to a second image area of the screen. The first image area is a portion of an image to be displayed in the first screen area, and the second image area is a portion of an image to be displayed in the second image area. The second set of parameters results in a lower image quality than the first set of parameters, but the user is less likely to perceive the lower image quality because the portion of the retina outside the fovea is less sensitive than the fovea. Thus, the image may be processed with less computing power and less bandwidth.

The display system may be part of a Head Mounted Display (HMD). The HMD may be part of a Virtual Reality (VR) system, an Augmented Reality (AR) system, a Mixed Reality (MR) system, or some combination thereof.

Embodiments according to the invention are disclosed in particular in the appended claims for a system, a storage medium and a method, wherein any features mentioned in one claim category (e.g. system) may also be claimed in another claim category (e.g. method, storage medium or computer program). The dependencies or rearward references in the appended claims are chosen solely for formal reasons. However, any subject matter resulting from the intentional backward reference (specifically, multiple dependencies) to any of the above claims may also be claimed, so that any combination of the claims and their features are disclosed and claimed regardless of the dependent claims chosen in the appended claims. The subject matter which may be claimed includes not only the combination of features set forth in the attached claims, but also any other combination of features in the claims, wherein each feature mentioned in the claims may be combined with any other feature or combination of features in the claims. Furthermore, any embodiments and features described or depicted herein may be claimed in the separate claims and/or in any combination with any embodiments or features described or depicted herein or with any features of the accompanying claims.

In an embodiment, a method may include:

receiving eye tracking information comprising at least one image of an eye of a user seeing a screen of a display device;

determining a first screen region corresponding to a region of the screen based on the eye tracking information, the first screen region containing a gaze point of the user, the gaze point representing a point on the screen that the user looks at when the eye tracking information is captured;

determining a second screen region, which is separated from the first screen region, corresponding to a region of the screen based on the eye tracking information;

encoding an image for display on a screen, the encoding comprising:

encoding a first image region of the image based on a first set of encoding parameters, the first image region to be displayed in a first screen region, and

encoding a second image region of the image based on a second set of encoding parameters, the second image region to be displayed in a second screen region, the second set of encoding parameters resulting in lower quality than the first set of encoding parameters; and is also provided with

The encoded image is transmitted to a display device for display on a screen.

In an embodiment, the first screen region may correspond to a region of the screen seen by the fovea of the eye.

The second screen region may correspond to a region of the screen seen by a portion of the retina of the eye outside the fovea.

The first set of encoding parameters may include a first image resolution, the second set of encoding parameters includes a second image resolution, and the second image resolution is lower than the first image resolution.

Encoding the first image region may include generating a first encoded image region, wherein encoding the second image region may include generating a second encoded image region, and may further include:

a packed image is generated that includes a first encoded image region and a second encoded image region.

The first set of encoding parameters may include a first frame rate, the second set of encoding parameters may include a second frame rate, and the second frame rate may be lower than the first frame rate.

The method may further comprise:

determining a third screen region, separate from the first screen region and the second screen region, corresponding to a region of the screen based on the eye tracking information,

wherein encoding the image for display on the screen further comprises:

the third image region of the image is encoded based on a third set of encoding parameters, the third image region to be displayed in the third screen region, the third set of encoding parameters resulting in lower quality than the first set of encoding parameters and higher quality than the second set of encoding parameters.

The third screen area may correspond to an area of the screen seen by the fovea of the eye.

The display device may be a virtual reality headset.

In an embodiment according to the present invention, a non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:

encoding an image for display on a screen, the encoding comprising:

The encoded image is transmitted to a display device for display on a screen.

The first screen region may correspond to a region of the screen seen by the fovea of the eye.

The first set of encoding parameters may include a first image resolution, the second set of encoding parameters may include a second image resolution, and the second image resolution may be lower than the first image resolution.

The operations may further include:

wherein encoding the image for display on the screen may further comprise:

The display device may be a virtual reality headset.

In an embodiment, a system may include:

one or more processors; and

a non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, may cause the one or more processors to perform operations comprising:

eye tracking information comprising at least one image of an eye of a user seeing a screen of the display device is received,

determining a first screen region corresponding to a region of the screen based on the eye tracking information, the first screen region containing a gaze point of the user, the gaze point representing a point on the screen that the user looks at when the eye tracking information is captured, determining a second screen region separate from the first screen region corresponding to a region of the screen based on the eye tracking information,

Encoding an image for display on a screen, the encoding comprising:

encoding a second image region of the image based on a second set of encoding parameters, the second image region to be displayed in the second screen region, the second set of encoding parameters resulting in lower quality than the first set of encoding parameters, and

the encoded image is transmitted to a display device for display on a screen.

The operations may further include:

wherein encoding the image for display on the screen further comprises:

Drawings

Fig. 1A is a block diagram of a system in which a display device according to an embodiment operates.

FIG. 1B is a block diagram of a rendering/encoding engine according to an embodiment.

Fig. 2 is a diagram of a head mounted display according to an embodiment.

Fig. 3A and 3B illustrate eyes gazing at a screen and depict foveal and parafoveal regions, and foveal and parafoveal cones, according to an embodiment.

Fig. 3C illustrates a foveal region, a paracaviral region, and an outer region on a screen according to an embodiment.

Fig. 4 is a block diagram illustrating a process of applying adaptive parameters in an image region based on eye tracking information according to an embodiment.

Fig. 5A is a block diagram illustrating a process of rendering different portions of an image with different sets of rendering parameters according to an embodiment.

Fig. 5B is a block diagram illustrating a process of encoding different portions of an image with different sets of encoding parameters according to an embodiment.

Fig. 5C is a block diagram illustrating a process of rendering and encoding different portions of an image with different sets of parameters according to an embodiment.

Fig. 6A illustrates an example of encoding different regions of an image with different sets of encoding parameters and then packing the encoded image regions of the image into a packed image, according to an embodiment.

Fig. 6B shows an example of a sequence of encoded and packed images according to an embodiment.

The figures depict various embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the disclosure provided herein.

Detailed Description

SUMMARY

There are many applications that involve a user and in which the user continuously gazes at images of different areas of the screen. Examples include video games, virtual reality scenes, video streams, and augmented reality projections.

When looking at the screen, the user's eyes are sensitive to higher resolution in the area of the screen surrounding the gaze point. The gaze point is aligned with the fovea, which is the area of the retina that provides the highest visual acuity and the highest concentration of cones. In the area of the retina beyond the fovea, the sensitivity to resolution decreases and visual acuity decreases with distance from the fovea. In the parafovea (the annular region of the eye that limits the fovea), the eye is still sensitive to resolution, but to a lesser extent than in the fovea. In the region outside the parafovea, the eye is significantly less sensitive to differences in resolution.

In the case where an image displayed on a screen has a higher resolution, it is common to render the entire image at the higher resolution, encode the entire image at the higher resolution, and transmit the encoded image using a high bandwidth device as appropriate. But because the eye is relatively insensitive to image resolution in the region outside the fovea and the parafovea, the process of rendering, encoding, and transmitting the entire image at higher resolution may result in unnecessary use of computing power and bandwidth.

Instead of using a higher resolution of the entire image, the display system divides the screen of the display device into a plurality of regions and applies a different set of rendering/encoding parameters for each region. For example, the display system identifies a first region of the screen seen by the fovea of the user's eye (hereinafter foveal region) and applies a first set of parameters to the first region. The first set of parameters is selected to produce a relatively high image quality. For example, the first set of parameters may specify a relatively high frame rate and resolution. Similarly, the display system may identify a second region of the screen seen by the parafovea of the user's eye (hereinafter referred to as the parafoveal region) and apply a second set of parameters to the second region. The display system may further identify a third region (hereinafter referred to as an outer region) of the screen seen by a region outside the fovea and parafovea of the user's eye and apply a third set of parameters to the third region. The second set of parameters is selected to produce an intermediate image quality (e.g., intermediate frame rate and resolution), and the third set of parameters is selected to produce a lower image quality (e.g., lower frame rate and resolution). Thus, the second region and the third region may be rendered, encoded and transmitted with less computing power and less bandwidth, which reduces the total amount of computing power and bandwidth used to render, encode and transmit the image.

The display system identifies the first region and the second region based on eye tracking information received from the eye tracking unit. The display system uses the eye tracking information to determine a point on the screen (hereinafter referred to as a gaze point) to which the user is looking. The display system may then determine the boundaries of the foveal region, the parafoveal region, and the outer region based on the gaze point. In one embodiment, the foveal region is a circle centered at the gaze point and having a radius of 2.5 degrees from the viewing angle, and the parafoveal region is a torus centered at the gaze point having an inner radius of 2.5 degrees from the viewing angle and an outer radius of 5 degrees from the viewing angle. The outer region is the portion of the screen beyond the outer radius of the side foveal region.

After determining the boundaries of the screen region, the display system may adaptively render and encode the image. Further, the display system may update the eye-tracking information at regular intervals as the user's eyes are shifted between different positions of the screen, and the display system may recalculate the gaze point and the boundary of the screen area based on the updated eye-tracking information. Thus, the area around the gaze point (i.e. foveal area) in the screen will appear to the user to have a higher image quality, and this is the area where the eye is most sensitive to image quality. In the side fovea region, the intermediate parameters may be applied without significant image degradation, and in the outer region, the lower parameters may be applied without significant image quality degradation.

Even when the user's eyes are moving rapidly on the screen, the eye tracking information may keep up with the change in gaze location and the updated eye tracking information may be relayed quickly enough to allow adaptive parameters to be applied to different foveal and parafoveal regions of the image. Thus, the image appears to have high resolution wherever the user looks. However, the display system renders and encodes a portion of the image with higher quality parameters, and renders and encodes the remaining portion of the image with intermediate or lower quality parameters.

The net effect of applying the adaptive parameters based on the eye tracking information is to reduce the overall computational capability for rendering and encoding the image data and to reduce the amount of bandwidth used to transmit the image data displayed by the user's display device.

Overview of the System

Fig. 1A is a block diagram of a system 100 in which a display system 110 operates. The system 100 may operate in a VR system environment, an AR system environment, an MR system environment, or some combination thereof. The system 100 illustrated by fig. 1A includes a display device 105 (e.g., a head mounted display), an imaging device 135, and an input interface 140, each coupled to a display system 110.

Although fig. 1A shows an exemplary system 100 including one display device 105, one imaging device 135, and one input interface 140, in other embodiments, any number of these components may be included in the system 100. For example, there may be a plurality of display devices 105 each having an associated input interface 140 and monitored by one or more imaging devices 135, where each display device 105, input interface 140, and imaging device 135 is in communication with the display system 110. In alternative configurations, different and/or additional components may be included in the system 100. Similarly, the functionality of one or more components may be distributed among the components in different ways than described herein. For example, some or all of the functionality of display system 110 may be contained within display device 105.

The display device 105 is a Head Mounted Display (HMD) that presents content including virtual and/or enhanced views of a physical real world environment to a user using computer-generated elements (e.g., two-or three-dimensional images, two-or three-dimensional video, sound, etc.). In some implementations, the audio is presented via an external device (e.g., a speaker and/or a headset) that receives the audio information from the display device 105, the display system 110, or both, and presents the audio data based on the audio information. Some embodiments of the display device 105 are further described below in conjunction with fig. 2. The display device 105 may include one or more rigid bodies, which may be rigidly or non-rigidly coupled to each other. Rigid couplings between rigid bodies allow the coupled rigid bodies to act as a single rigid entity. In contrast, the non-rigid coupling between rigid bodies allows the rigid bodies to move relative to each other. The display device 105 includes an electronic display 115, an optical block 118, one or more positioners 120, one or more position sensors 125, an Inertial Measurement Unit (IMU) 130, and an eye tracking unit 160. Some embodiments of the display device 105 have components that are different from those described herein. Similarly, in various embodiments, the functionality of the various components may be distributed among other components of the system 100 in a different manner than described herein. For example, some functions of the eye tracking unit 160 may be performed by the display system 110.

An electronic display 115 (also referred to herein as a screen) displays images to a user based on data received from the display system 110. In various embodiments, electronic display 115 may include a single electronic display or multiple electronic displays (e.g., a display for each eye of a user). Examples of electronic display 115 include: a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) display, an active matrix organic light emitting diode display (AMOLED), some other display, or some combination thereof.

The optical block 118 amplifies the image light received from the electronic display 115, corrects an optical error associated with the image light, and presents the corrected image light to a user of the display device 105. In various embodiments, the optical block 118 includes one or more optical elements. Exemplary optical elements include: holes, fresnel lenses, convex lenses, concave lenses, filters, or any other suitable optical element that affects the image light emitted from electronic display 115. Furthermore, the optical block 118 may include a combination of different optical elements. In some embodiments, one or more optical elements in the optical block 118 may have one or more coatings, such as a partial reflector or an anti-reflective coating.

Magnifying the image light through the optics block 118 allows the electronic display 115 to be physically smaller, lighter in weight, and consume less power than larger displays. Additionally, zooming in may increase the field of view of the displayed content. For example, the field of view of the displayed content is such that the displayed content is presented using almost all (e.g., 110 ° diagonal) of the user's field of view, and in some cases all of the user's field of view. In some embodiments, the optical block 118 is designed such that its effective focal length is greater than the spacing to the electronic display 115, which causes image light projected through the electronic display 115 to be magnified. Additionally, in some embodiments, the amount of magnification may be adjusted by adding or removing optical elements from the optical block 118.

The optics block 118 may be designed to correct one or more types of optical errors. Examples of optical errors include: two-dimensional optical errors, three-dimensional optical errors, or some combination thereof. The two-dimensional error is an optical aberration occurring in two dimensions. Example types of two-dimensional errors include: barrel distortion, pincushion distortion, longitudinal chromatic aberration, lateral chromatic aberration, or any other type of two-dimensional optical error. The three-dimensional error is an optical error occurring in three dimensions. Examples of types of three-dimensional errors include spherical aberration, chromatic aberration, field curvature, astigmatism, or any other type of three-dimensional optical error. In some implementations, the content provided to the electronic display 115 for display is pre-distorted and the optical block 118 corrects the distortion when it receives image light generated based on the content from the electronic display 115.

The locators 120 are objects that are positioned at particular locations on the display device 105 relative to each other and relative to particular reference points on the display device 105. The locator 120 may be a Light Emitting Diode (LED), a corner cube reflector, a reflective marker, a type of light source that contrasts with the environment in which the display device 105 operates, or some combination thereof. In embodiments where the locator 120 is active (i.e., an LED or other type of light emitting device), the locator 120 may emit light in the visible band (380 nm to 750 nm), the Infrared (IR) band (750 nm to 1700 nm), the ultraviolet band (10 nm to 380 nm), some other portion of the electromagnetic spectrum, or some combination thereof.

In some embodiments, the locator 120 is located below an outer surface of the display device 105 that is transparent to the wavelength of light emitted or reflected by the locator 120 or is sufficiently fine not to substantially attenuate the wavelength of light emitted or reflected by the locator 120. Additionally, in some embodiments, the exterior surface or other portion of the display device 105 is opaque within the visible band of light wavelengths. Thus, the locator 120 may emit light in the IR band below an outer surface that is transparent in the IR band but opaque in the visible band.

The IMU 130 is an electronic device that generates fast calibration data based on measurement signals received from one or more position sensors 125. The position sensor 125 generates one or more measurement signals in response to movement of the display device 105. Examples of the position sensor 125 include: one or more accelerometers, one or more gyroscopes, one or more magnetometers, another suitable type of sensor that detects motion, one type of sensor for error correction of the IMU 130, or some combination thereof. The position sensor 125 may be located external to the IMU 130, internal to the IMU 130, or some combination thereof.

Based on one or more measurement signals from the one or more position sensors 125, the IMU 130 generates fast calibration data indicative of an estimated position of the display device 105 relative to an initial position of the display device 105. For example, the position sensor 125 includes a plurality of accelerometers to measure translational motion (front/back, up/down, left/right) and a plurality of gyroscopes to measure rotational motion (e.g., tilt, yaw, roll). In some implementations, the IMU 130 rapidly samples the measurement signals and calculates an estimated position of the display device 105 from the sampled data. For example, the IMU 130 integrates the measurement signals received from the accelerometers over time to estimate a velocity vector and integrates the velocity vector over time to determine an estimated location of a reference point on the display device 105. Alternatively, the IMU 130 provides the sampled measurement signals to the display system 110, which determines the fast calibration data. The reference point is a point that can be used to describe the position of the display device 105. Although the reference point may be generally defined as a spatial point, in practice the reference point is generally defined as a point within the display device 105 (e.g., the center of the IMU 130).

The IMU 130 receives one or more calibration parameters from the display system 110. As discussed further below, one or more calibration parameters are used to keep track of the display device 105. Based on the received calibration parameters, the IMU 130 may adjust one or more IMU parameters (e.g., sample rates). In some implementations, certain calibration parameters cause the IMU 130 to update the initial position of the reference point so that it corresponds to the next calibration position of the reference point. Updating the initial position of the reference point to the next calibrated position of the reference point helps to reduce the accumulated error associated with the determined estimated position. The accumulated error, also referred to as drift error, causes the estimated position of the reference point to "drift" away from the actual position of the reference point over time.

The eye tracking unit 160 tracks the movement of the user's eyes. In general, the tracked eye movements may include angular rotation of the eye, as well as translation of the eye, torsional changes of the eye, or shape changes of the eye. Angular rotation of an eye is a change in the angular orientation of the eye. The angular orientation of the eye corresponds to the direction of the user's line of sight within the display device 105 and is defined herein as the direction of the foveal axis, which is the axis between the fovea of the eye (the depression on the retina of the eye) and the center of the pupil of the eye. Typically, when the user's eye is fixed at a point, the foveal axis of the user's eye intersects the point. The eye also includes a pupil axis, which is an axis perpendicular to the corneal surface that passes through the center of the pupil. Typically, the pupil axis is not directly aligned with the foveal axis. The pupil and foveal axis intersect at the center of the pupil, but the foveal axis is oriented about-1 ° to 8 ° laterally and ±4° longitudinally from the pupil axis. Because the foveal axis is defined relative to the fovea located in the back of the eye, detection of the foveal axis may be difficult or not feasible when using some method of eye tracking. Thus, in some embodiments, the eye tracking unit 160 detects the orientation of the pupil axis and estimates the foveal axis based on the detected pupil axis. Alternatively, eye tracking unit 160 estimates the foveal axis by directly detecting the location of the fovea or other features of the retina of the eye.

Translation of the eye is a change in the position of the eye relative to the orbit. In some embodiments, the translation of the eye cannot be directly detected, but is approximated based on a mapping from the detected angular orientation. Translation of the eye corresponding to a change in the position of the eye relative to one or more components of the eye tracking unit 160 may also be detected. Translation of the eye relative to one or more components of the eye tracking unit 160 may occur when the position of the display device 105 of the user's head is shifted. The eye tracking unit 160 may also detect torsion of the eye, which is rotation of the eye about the pupil axis. The eye tracking unit 160 may use the detected torsion of the eye to estimate the orientation of the foveal axis based on the detected pupil axis. The eye tracking unit 160 may also track changes in the shape of the eye, which may be approximated as a skew, scaled linear transformation, or distortion (e.g., due to torsional deformation). The foveal axis may be estimated based on a combination of angular orientation of the pupil axis, translation of the eye, torsion of the eye, and the current shape of the eye tracking unit 160.

The eye tracking unit 160 determines eye tracking information using the tracked eye movements. The eye tracking information describes the position and/or orientation of the user's eyes. The eye tracking unit 160 may use the determined eye tracking information to further estimate eye tracking values such as gaze direction (also referred to as direction of the foveal axis which is the axis bisecting the center of the fovea and the center of the eye pupil), gaze position (also referred to as fixation point where the user looks), and gaze time (time when the user looks in a specific direction), vergence angle (angle between the two eyes when the user changes viewing distance and gaze direction), inter-pupil distance of the user (IPD, defined as distance between centers of pupils of the two eyes), identity of the user, torsion status of the eyes, shape of the eyes, some other function based on the position of one or both eyes, or some combination thereof. For example, when the user is focused at infinity or another object, e.g. remote from the user, the eye tracking unit 160 may determine the IPD by estimating the eye position. In another example, the eye tracking unit 160 determines the vergence angle by estimating the change in the user's viewing distance and gaze direction. The eye tracking unit 160 is also capable of determining the torsion status of the eye by estimating the rotation of the eye about the pupil axis. In some embodiments, the eye tracking unit 160 is capable of determining the foveal axis, the orientation of the foveal axis and the pupil axis, and the change in shape of the eye.

The eye tracking unit 160 may include one or more illumination sources, one or more imaging devices, and an eye tracking controller. An illumination source (also referred to as a luminaire) illuminates a portion of the user's eye with light. The predetermined illumination power is less than a threshold value that causes damage to the eye. The illumination source may be an infrared light source. Examples of infrared light sources include: a laser (e.g., a tunable laser, a continuous wave laser, a pulsed laser, other suitable lasers that emit infrared light), a Light Emitting Diode (LED), a fiber optic light source, another other suitable light source that emits infrared and/or visible light, or some combination thereof. In various embodiments, the illumination source may emit visible light or near infrared light. In some embodiments, the light emitted from the one or more illumination sources is a structured light pattern. In some embodiments, a portion of the eye illuminated by the illumination source is selected for ease of detection due to significant variations between the signal from the illuminated portion and other signals surrounding the illuminated portion during eye movement. For example, the illuminated portion may have the greatest contrast (e.g., a location with the strongest retroreflection or backscatter from the edge of the user's sclera or cornea surface). For example, the illuminated portion may be located on the edge of the sclera, the surface of the cornea, the limbus (e.g., the junction of the cornea and sclera, the junction of the iris and pupil, or any other suitable junction in the eye).

The imaging device detects light reflected and/or scattered from the illuminated portion of the eye. The imaging device outputs a detection signal proportional to the detected light. The detection signal corresponds to a reflectivity of the illuminated portion of the eye that correlates to an apparent contrast change (e.g., a contrast change in corneal reflection) of the illuminated portion of the eye 170 through the user. In an embodiment, the imaging device includes a camera configured to capture an image of an illuminated portion of the eye. In some embodiments, the detector may be based on single point detection (e.g., photodiodes, balanced/matched photodiodes, or avalanche photodiodes, or photomultiplier tubes), or on a one-or two-dimensional detector array (e.g., camera, linear photodiode array, CCD array, or CMOS array). In some embodiments, the eye tracking unit 160 may include a plurality of detectors to capture light reflected from one or more illuminated portions of the eye.

The eye tracking unit 160 determines eye tracking information based on light captured from one or more imaging devices (e.g., captured images). In some implementations, the eye tracking unit 160 may compare the captured light information (e.g., reflection of the eye, distortion of the structured light pattern projected onto the eye) to a predetermined look-up table or a predetermined eye model to estimate eye tracking information of the eye. The predetermined look-up table or the predetermined eye model describes a relationship between the captured light information and the eye tracking information. For example, in some embodiments, the eye tracking unit 160 identifies a reflected position of light from one or more illumination sources in a captured image of the user's eye and determines eye tracking information based on a comparison between the shape and/or position of the identified reflection and a predetermined look-up table (or predetermined eye model). Alternatively, if the eye is illuminated with a structured light pattern, the eye tracking unit 160 may detect distortions of the structured light pattern projected onto the eye and estimate eye tracking information based on a comparison between the detected distortions and a predetermined look-up table (or a predetermined eye model). The eye tracking unit 160 may use the eye tracking information to further estimate other eye tracking values, such as pupil axis, gaze angle (e.g., corresponding to a foveal axis), translation of the eye, torsion of the eye, and current shape of the eye. In alternative embodiments, instead of using light reflected from one or both eyes to determine eye tracking information, the eye tracking unit 160 may use some other method of determining the eye position, such as ultrasound or radar.

In some implementations, the eye tracking unit 160 stores a model of the user's eye and uses the model in conjunction with one or more scans of the eye to estimate the current orientation of the eye. The model may be a 3D model of the surface of the eye or a 3D volume of a portion of the eye. The model further includes boundaries of different parts of the retina of the eye, including, for example, fovea, parafovea, and remote fovea. For example, the boundaries of these portions of the eye may be determined by a calibration sequence described below. In embodiments in which both eyes of the user are scanned, the display system 110 or the display device 105 may store separate models for each eye.

Prior to determining the screen area, the eye tracking unit 160 may perform a calibration sequence to generate or train a model of the eye. In one embodiment, the eye tracking unit 160 repeatedly scans the eye with one or more transceivers during a calibration sequence. For example, the user is instructed to look at a virtual object or visual indicator displayed on the electronic display 115 of the display device 105. Scanning a portion of the eye while the user looks at the visual indicator allows the eye tracking unit 160 to capture a sampled scan of the eye at a known orientation of the eye. These sampling scans can be combined into a model. After the eye tracking unit 160 generates the model, the eye tracking unit 160 may then track the user's eyes. In some implementations, the eye tracking unit 160 updates the model during eye tracking.

The imaging device 135 generates slow calibration data based on the calibration parameters received from the display system 110. The slow calibration data includes one or more images showing the observed position of the positioner 120 that can be detected by the imaging device 135. The imaging device 135 may include one or more cameras, one or more video cameras, any other device capable of capturing images including one or more positioners 120, or some combination thereof. Further, the imaging device 135 may include one or more filters (e.g., to increase the signal-to-noise ratio). The imaging device 135 is configured to detect light emitted or reflected from the locator 120 in a field of view of the imaging device 135. In embodiments where the positioners 120 include passive elements (e.g., retro-reflectors), the imaging device 135 may include a light source that illuminates some or all of the positioners 120 that retro-reflect light to the light source in the imaging device 135. Slow calibration data is transmitted from imaging device 135 to display system 110, and imaging device 135 receives one or more calibration parameters from display system 110 to adjust one or more imaging parameters (e.g., focal length, focus, frame rate, ISO, sensor temperature, shutter speed, aperture, etc.).

The input interface 140 is a device that allows a user to send an action request to the display system 110. An action request is a request to perform a particular action. For example, the action request may be to start or end an application or to perform a particular action within an application. Input interface 140 may include one or more input devices. An exemplary input device includes: a keyboard, a mouse, a game controller, or any other suitable device for receiving action requests and communicating the received action requests to display system 110. The action request received by the input interface 140 is transmitted to the display system 110, which performs an action corresponding to the action request. In some implementations, the input interface 140 provides haptic feedback to the user in accordance with instructions received from the display system 110. For example, the haptic feedback is provided when a request for an action is received, or when the input interface 140 receives an instruction from the display system 110 to cause the input interface 140 to generate haptic feedback when the display system 110 performs an action.

The display system 110 provides content to the display device 105 for presentation to a user based on information received from one or more of the imaging device 135, the display device 105, and the input interface 140. In the example shown in fig. 1A, display system 110 includes application memory 145, tracking module 150, engine 155, and image processing engine 165. Some embodiments of the display system 110 have different modules than those described in connection with fig. 1A. Similarly, the functionality described further below may be distributed among the modules of display system 110 in a different manner than that described herein.

The application memory 145 stores one or more application programs executed by the display system 110. An application is a set of instructions that when executed by a processor generates content for presentation to a user. The content generated by the application may be responsive to input received from a user via movement of the display device 105 or the input interface 140. Examples of application programs include: a gaming application, a conferencing application, a video playback application, or other suitable application.

The tracking module 150 calibrates the system 100 with one or more calibration parameters and may adjust one or more calibration parameters to reduce errors in determining the position of the display device 105 or the input interface 140. For example, the tracking module 150 adjusts the focus of the imaging device 135 to obtain a more accurate position for the positioner observed on the display device 105. In addition, the calibration performed by the tracking module 150 also takes into account information received from the IMU 130. In addition, if tracking of the display device 105 is lost (e.g., the imaging device 135 loses line of sight of at least a threshold number of locators 120 on the display device 105), the tracking module 150 recalibrates some or all of the system 100.

The tracking module 150 tracks movement of the display device 105 using slow calibration information from the imaging device 135. For example, the tracking module 150 uses the observed locator to determine the location of the reference point of the display device 105 based on the slow calibration information and the model of the display device 105. The tracking module 150 also uses the location information from the fast calibration information to determine the location of the reference point of the display device 105. Additionally, in some implementations, the tracking module 150 may use portions of the fast calibration information, the slow calibration information, or some combination thereof to predict the future position of the display device 105. The tracking module 150 provides the estimated or predicted future position of the display device 105 to the engine 155.

The engine 155 executes an application within the system 100 and receives position information, acceleration information, velocity information, predicted future positions, or some combination thereof, of the display device 105 from the tracking module 150. Based on the received information, engine 155 determines content to provide to display device 105 for presentation to a user. For example, if the received information indicates that the user is looking to the left, engine 155 generates content for display device 105 reflecting the user's movements in the virtual environment. In addition, engine 155 performs actions within applications executing on display system 110 in response to action requests received from input interface 140 and provides feedback to the user that the actions were performed. The feedback provided may be visual feedback or audible feedback via the display device 105 or tactile feedback via the input interface 140.

The image processing engine 165 receives eye tracking information (e.g., from the eye tracking unit 160), divides the screen into two or more screen regions representing portions of the screen seen by different portions of the user's retina, and processes the image for display on the screen by applying different sets of parameters to portions of the image to be displayed in each screen region. As referred to herein, processing an image for display on a screen may include rendering the image (e.g., rendering a scene in a game), encoding the image, or some combination of rendering and encoding the image. The components of the image processing engine 165 are described in detail below with reference to FIG. 1B.

In the embodiment shown in fig. 1A, the image processing engine 165 is implemented in a system 100 that may operate in a VR system environment, an AR system environment, an MR system environment, or some combination thereof. However, the image processing engine 165 as described herein may also be implemented in other types of systems that also include eye tracking components. For example, in another embodiment, the engine 165 is implemented on a portable electronic device (e.g., a smart phone or tablet computer). In this embodiment, engine 165 receives eye tracking information (e.g., an image captured from a forward facing camera of the portable electronic device) and applies the adaptive parameters to the image displayed on the screen of the portable electronic device. In yet another embodiment, engine 165 is implemented on a laptop or desktop computer having components capable of performing eye tracking (e.g., a user-oriented camera, such as a camera built into the bezel of a laptop display or mounted on top of a monitor), and applies adaptive parameters to images displayed on the screen of the computer.

Fig. 1B is a block diagram of an image processing engine 165 according to an embodiment. The rendering/encoding engine shown in fig. 1B includes a screen region module 162, a rendering module 164, and an encoding module 166. In alternative configurations, different and/or additional components may be included in the rendering/encoding engine. Similarly, the functionality of one or more components may be distributed among the components in different ways than described herein.

The screen region module 162 receives eye tracking information (e.g., from the eye tracking unit 160) and determines screen regions corresponding to different portions of the user's retina, such as the fovea, the paracentesis, and portions of the retina beyond the paracentesis. As referred to herein, a screen region is a portion of a screen that is seen by a particular portion of a user's retina. For example, the screen region module 162 may determine a first screen region (foveal region) corresponding to a portion of the screen seen by the fovea, a second screen region (paracentetic region) corresponding to a portion of the screen seen by the paracentetic, and a third screen region (outer region) corresponding to a portion of the screen seen by a portion of the retina outside the paracentetic.

The screen area module 162 receives eye tracking information from an eye tracking component, such as the eye tracking unit 160. In one embodiment, the eye tracking information includes an angular orientation of the user's eyes as determined by the eye tracking unit 160. In another embodiment, the eye tracking information includes additional or different information, such as one or more images of the user's eyes. The manner in which the screen region module 162 determines the screen region based on the eye tracking information is described in detail below with reference to fig. 3A through 3C.

Rendering module 164 renders images based on information generated by applications executing on display system 110. The application program may be stored in the application memory 145 or received from another system via the input interface 140. For example, the display system 110 executes a game application that generates a scene that includes one or more virtual objects (such as player characters, non-player characters, environmental objects, and background) at various locations within the scene. The module 164 renders the image by dividing the image into two or more image regions and applying a corresponding set of rendering parameters to each region. For example, in the example described above, the rendering module 164 renders an image of the scene by rendering virtual objects in each image region with a corresponding set of rendering parameters. As referred to herein, an image may be a still picture or a frame in a sequence of frames (e.g., a frame of a video or game application).

The encoding module 166 receives the image (e.g., from the rendering module 164 or the input interface 140) and encodes the image for transmission. The module 166 encodes the image by dividing the image into two or more image regions and applying a corresponding set of encoding parameters to each region. In one embodiment, the encoding module 166 encodes the image according to the process described with reference to fig. 6A and 6B. After they are encoded, the display system 110 may transmit the images to the display device 105 for display to a user.

As referred to herein, an image area is a portion of an image that is seen by a particular portion of a user's retina when displayed on the screen of the display device 105. For example, the

modules

164, 166 apply different sets of parameters to a first image region seen by the fovea, a second image region seen by the parafovea, and a third image region seen by a portion of the retina outside of the parafovea. In some implementations, the image rendered by rendering module 164 and/or encoded by encoding module 166 is displayed in a manner that covers the entire screen (i.e., the image is displayed in full screen mode). In these embodiments, the image regions described with reference to rendering module 164 and encoding module 166 have the same shape and location within the image as the screen regions described with reference to screen region module 162. In other embodiments, the image is displayed on a portion of the screen (e.g., within a window that does not occupy the entire screen). In these embodiments, the

modules

164, 166 determine the boundaries of the image region by cropping the screen region to include a portion of the screen on which the image is displayed and not to include other portions of the screen.

Fig. 2 is a diagram of a Head Mounted Display (HMD) 200 according to an embodiment. HMD 200 is an embodiment of display device 105 and includes a front rigid body 205 and a belt 210. The front rigid body 205 includes electronic display elements of the electronic display 115 (not shown in fig. 2), the optical block 118 (not shown in fig. 2), the IMU 130, one or more position sensors 125, the eye tracking unit 160, and the locator 120. In the embodiment shown in fig. 2, the position sensor 125 is located within the IMU 130, and neither the IMU 130 nor the position sensor 125 is visible to the user.

The locators 120 are located in fixed positions on the front rigid body 205 relative to each other and relative to the reference point 215. In the example of fig. 2, the reference point 215 is located at the center of the IMU 130. Each locator 120 emits light that is detectable by the imaging device 135. In the example of fig. 2, the locators 120, or a portion of the locators 120, are located on the front side 220A, top side 220B, bottom side 220C, right side 220D, and left side 220E of the front rigid body 205.

Adaptive parameters in image regions

Fig. 3A and 3B illustrate a user's eye 302 gazing at a screen 304 in a direction represented by a gaze vector 306, according to an embodiment. Referring first to fig. 3A, shown are a user's eye 302, gaze vector 306, and screen 304. Fig. 3A further illustrates the foveal region 308, the parafoveal region 310, and the outer region 312 on the screen generated by the screen region module 162 based on, for example, an image of an eye captured by the eye tracking unit 160.

The screen area module 162 determines a gaze vector representing the direction in which the eyes look. In some implementations, the screen region module 162 determines the gaze vector based on a plurality of eye characteristics associated with the eye, including an eyeball center (a), a cornea center (C), a pupil (E), and a distance (h) between the cornea center and the pupil center. In one embodiment, the eye tracking unit 160 calculates estimates of these eye characteristics and sends these estimates to the screen area module 162 as part of the eye tracking information. In another embodiment, the screen region module 162 receives the angular orientation of the eye from the eye tracking unit 160 and generates these eye characteristics by applying a rotation to a model of the eye based on the angular rotation. In other embodiments, the screen area module 162 receives the foveal axis of the eye from the eye tracking unit 160 and uses the direction of the foveal axis as the gaze vector 306. After determining gaze vector 306, module 162 determines gaze point 314 by calculating the intersection between gaze vector 306 and screen 304. In other embodiments, the gaze point 314 is calculated by other means.

The area of the screen depicted as 308 is the foveal area where the eye would be most sensitive to differences in resolution. The region depicted as 310 is the parafoveal region of the eye closest to 308 that is less sensitive to differences in resolution. The area outside of

areas

308 and 310 is an outer image area 312 whose eyes are least sensitive to differences in resolution.

In the embodiment shown in fig. 3A, the screen region module 162 determines the foveal region 308 by drawing a circle of a predetermined radius centered on the gaze point. Similarly, the screen region module 162 determines the side fovea region 310 by drawing a second circle of a second predetermined radius centered on the gaze point and subtracting the fovea region to produce a torus (toroidal shape). The screen region module 162 may additionally determine the outer region 312 as the region of the screen that is not covered by the foveal region 308 and the paracfoveal region 310.

This process of determining the

screen regions

308, 310, 312 with a predetermined radius is particularly advantageous in implementations where the screen 304 is maintained a known distance from the user's eye 302 during operation (e.g., implementations in which the display device 105 is a head-mounted display, such as the HMD 200 shown in fig. 2), because, for example, the predetermined radius may be selected to correspond to a particular number of viewing angles, and the number of viewing angles of the foveal region 308 and the parachlorogenic region 310 may be selected to represent an estimate of the anatomical boundary of the user's fovea and paracfovea. For example, as described above, the predetermined radius may be selected such that the foveal region 308 corresponds to a radius of view angle of 2.5 degrees and the parafoveal region 310 corresponds to a radius of view angle of 5.0 degrees. In some embodiments, the predetermined radius is selected based on a model of the user's eyes as described above with reference to eye tracking unit 160.

Fig. 3B shows the eye of fig. 3A and further depicts foveal cone 316 and parafoveal cone 318, according to an embodiment. In the embodiment shown in fig. 3B, the foveal region and the parafoveal region are defined by a 3D cone. The conical shape depicted as 316 is a foveal cone, whereas the conical shape depicted as 318 is a paracentetic cone.

In the embodiment shown in fig. 3B, the screen region module 162 determines the

screen regions

308, 310 based on the

cones

316, 318. More specifically, the screen region module 162 determines the gaze vector based on the eye characteristics in the manner described above with reference to fig. 3A. After determining gaze vector 306, screen region module 162 defines foveal cone 316 as a cone having an axis that matches gaze vector 306, an apex at the center of the cornea (C), and a cone surface that is offset from the axis by a predetermined angle (e.g., 2.5 degrees). Similarly, the screen area module 162 defines the parafoveal cone 318 as a cone having an axis matching the gaze vector 306, an apex at the center of the cornea (C), and a cone surface offset from the axis by a different predetermined angle (e.g., 5 degrees). Similar to the predetermined radii described above with reference to fig. 3A, the predetermined angles defining the

cones

316, 318 may be selected to represent an estimate of the anatomical boundaries of the fovea and paracapices of a typical human user and may be based on a model of the eye. After defining the

cones

316, 318, the screen region module 162 determines the foveal region 308 and the parafoveal region 310 by calculating the intersection between the corresponding

cones

316, 318 and the screen 304. The screen region module 162 may further determine the outer region 312 as the region of the screen that is not covered by the foveal region 308 and the paracaviral region 310.

Fig. 3C shows a foveal region 308, a paracaviral region 310, and an outer region 312 appearing on screen 304 according to an embodiment. Although the foveal region 308 and the parafoveal region 310 are shown as circles in fig. 3A-3C, in other embodiments these regions may have different shapes. In one embodiment, the foveal and parafoveal regions are square centered on the gaze point 314 and have sides with the same length as the diameter of the circular foveal region 308 and parafoveal region 310 shown in fig. 3A-3C. In another embodiment, the foveal and parafoveal regions are rectangular. The use of square or rectangular fovea and side fovea regions is advantageous because, for example, square or rectangular regions are easier to define on a screen whose pixels are indexed by a rectangular coordinate system.

In some implementations, the screen region module 162 generates additional regions to provide smoother transitions between layers. In some implementations, the size of each region is adaptive and may be updated based on noise in the eye characteristics, accuracy of estimation of the eye characteristics, direction of the gaze vector, or any combination thereof. In some embodiments, the size of each screen area is increased when a glance (rapid eye movement between fixation points) is detected, and again decreased once the fixation points stabilize.

Fig. 4 is a block diagram illustrating a process of applying adaptive parameters in an image region based on eye tracking information according to an embodiment. The display system 110 receives 402 eye tracking information. For example, the system 110 receives an angular orientation of the user's eyes from the eye tracking unit 160 of the display device 105. The system 110 additionally or alternatively receives other types of eye tracking information, such as an image of the user's eye or one or more eye characteristics described with reference to fig. 3A.

After receiving 402 the eye tracking information, the system 110 determines 404 two or more screen regions based on the eye tracking information. For example, the system 110 performs one or both of the processes described with reference to fig. 3A and 3B to determine 404 a foveal region, a paracaviral region, and an outer region. In other implementations, the system 110 may determine 404 a different number of regions. For example, the system 110 may determine two regions (e.g., a foveal region and an outer region, but no paracented foveal region). As another example, the system may determine foveal, paracentetic and external regions as described above, and additionally determine a remote foveal region (a portion of the retina that circumscribes the paracentetic) corresponding to the remote fovea of the user's eye.

The system 110 renders and/or encodes the image by applying 406 different sets of parameters to different image regions within the image. In embodiments where the image is to be displayed on the entire screen, the image area is coextensive with the screen area. In embodiments where an image is to be displayed on a portion of a screen, the system 110 determines the boundaries of the image area by cropping the screen area as a portion of the screen on which the image is to be displayed. The display system 110 transmits 408 the image to the display device 105 for display on the screen.

Fig. 5A-5C illustrate various embodiments of a process for rendering and encoding an image by applying different sets of parameters to different image regions. For example, one or more of the processes shown in fig. 5A-5C may be performed as the rendering/encoding step 406 described with reference to fig. 4. In one embodiment, each of the processes shown in fig. 5A-5C is performed by an image processing engine 165.

Referring first to fig. 5A, shown is a block diagram illustrating a process of rendering different portions of an image with different sets of rendering parameters according to an embodiment. The display system 110 renders 502 a first image region based on a first set of rendering parameters. The display system 110 also renders a second image region based on a second set of rendering parameters. For example, the first image region may be a foveal region corresponding to a portion of an image seen by the fovea, and the second image region may be an external region corresponding to a portion of an image seen by a portion of the retina outside the fovea.

As referred to herein, a set of rendering parameters specifies one or more factors that affect the operation of a rendering process performed on display system 110 to render an image. Examples of rendering parameters may include image resolution, frame rate, antialiasing settings, or texture quality. Some or all of the rendering parameters may be adjusted to achieve higher image quality or lower quality at the expense of more computing resources consumed by the rendering process but allow the rendering process to be performed with less computing resources. In one embodiment, the second set of rendering parameters produces a lower image quality than the first set of rendering parameters.

After the image is rendered, display system 110 encodes 506 the rendered image. In the embodiment shown in fig. 5A, the entire image is encoded 506 based on a single set of encoding parameters. However, in other embodiments, such as the embodiments shown in fig. 5B-5C, the display system 110 encodes 506 the rendered images of different regions based on different sets of encoding parameters.

Although not shown in fig. 5A, the display system 110 may also render a third image region based on a third set of rendering parameters. In one embodiment, the third image region may be a paracentral region corresponding to a portion of the image seen by the paracentral fovea (in which case the outer region may correspond to a portion of the image seen by portions of the retina that are external to both the fovea and the paracentral fovea). In this embodiment, the third set of rendering parameters yields an intermediate level of image quality (i.e., lower image quality than the first set of rendering parameters and higher image quality than the second set of rendering parameters).

More broadly, the display system 110 may render the foveal region, the paracentetic region, and the outer region (referred to as the first region, the third region, and the second region, respectively, in the description provided above) with different sets of rendering parameters that provide decreasing image quality levels. Because the sensitivity of the eye to image quality decreases with distance from the fovea, many users have difficulty perceiving intermediate image quality in the parafoveal region and lower image quality in the outer region. However, the use of lower quality rendering parameters for the side foveal region and the outer region allows the rendering process of the image to be performed with less computing resources, which advantageously allows the rendering process to be completed in less time and/or consumes less power. Thus, this process of rendering different image regions with different sets of rendering parameters allows display system 110 to perform a rendering process that balances between sufficiently high image quality and low use of computing resources.

In some implementations, the rendered image describes a scene generated by an application (e.g., a game) and includes a plurality of objects at various locations within the scene. In one embodiment, a set of rendering parameters may be applied to objects in a scene based on whether a portion of the objects appear inside the corresponding region. For example, if the user looks at the object and a portion of the object extends outside of the foveal region, the display system 110 may allocate rendering of the entire object with the first set of rendering parameters such that the user does not perceive any change in quality in some portions of the object.

In some implementations, one or more objects in the scene have a depth value that represents a distance between the object and the user. The depth value of each object in the scene may be determined by an application generating the scene. For example, a depth value of an object in a scene generated by a game application may represent a distance between a position of the game object in the scene and a position of a viewpoint of a player.

In these implementations, the display system 110 may select the set of rendering parameters to apply to the object based at least in part on the depth of the object. In one embodiment, if the user looks at an object located in close proximity (e.g., if the gaze point is positioned on an object having a depth value below a first threshold), the display system 110 applies a lower quality set of rendering parameters to the object located in the background (e.g., an object having a depth value above a second threshold) even if the background object is within the fovea or paracentric region.

In another embodiment, two gaze vectors for two eyes of a user are determined separately, and the display system 110 determines the depth of focus of the user based on a convergence of the two gaze vectors. In this embodiment, the display system 110 determines a depth difference (hereinafter referred to as a depth difference) between the depth of focus and the depth values of objects in the scene, and the system 110 applies a lower quality set of rendering parameters to objects whose depth difference is above a threshold, even though those objects are in the foveal or paracharyngeal region.

In some implementations, the display system 110 performs a rendering process of multiple images that together make up a sequence of frames. For example, the image may be a frame of video or a frame of a game application. In these embodiments, the display system 110 may render different regions of the image at different frame rates. In one embodiment, the display system 110 renders the foveal region of each frame and renders the outer region every other frame. In some implementations, the display system 110 may apply temporal antialiasing to correct areas for a given frame that have not yet been rendered.

In another embodiment, the display system 110 determines the frequency with which image regions in a frame are rendered based on scene content. For example, if an image region contains objects that remain somewhat static between successive frames (e.g., clouds in the background), the image region may be rendered at a lower frequency than the image region containing moving objects.

Fig. 5B is a block diagram illustrating a process for encoding different portions of an image with different sets of encoding parameters, according to an embodiment. The rendered image is provided 512 in the display system 110. In various implementations, the rendered image may be rendered on the display system 110 (e.g., as part of a gaming application or some other application executing on the display system), stored in a storage medium of the display system 110 (e.g., an image file or video file), or received in the display system 110 through the input interface 140 (e.g., a video stream from a streaming service such as YOUTUBE).

After providing 512 the rendered image, the display system 110 encodes 514 the first image region based on the first set of encoding parameters. The display system 110 also encodes 516 a second image region based on a second set of encoding parameters. For example, the first image region may be a foveal region corresponding to a portion of an image seen by the fovea, and the second image region may be an external region corresponding to a portion of an image seen by a portion of the retina outside the fovea.

As referred to herein, a set of encoding parameters specifies one or more factors that affect the operation of an encoding process performed on display system 110 to render an image. Examples of coding parameters may include image resolution, frame rate, compression factors such as the type of conversion used for transform coding, number and range of quantization levels, chroma sub-sampling, and color space reduction. Similar to the rendering parameters described above, some or all of the encoding parameters may be adjusted to achieve higher image quality at the expense of more computing resources consumed by the encoding process, or to achieve lower quality but allow the encoding process to be performed with less computing resources. In addition, a higher quality set of encoding parameters may produce encoded image files having a larger file size, while a lower quality set of encoding parameters may produce encoded image files having a smaller file size. In one embodiment, the second set of rendering parameters yields lower image quality and smaller file size than the first set of rendering parameters.

Although not shown in fig. 5B, the display system 110 may also encode a third image region based on a third set of encoding parameters. In one embodiment, the third image region may be a paracentral region corresponding to a portion of the image seen by the paracentral fovea (in which case the outer region may correspond to a portion of the image seen by portions of the retina that are external to both the fovea and the paracentral fovea). In this embodiment, the third set of rendering parameters yields an intermediate level of image quality (i.e., lower image quality than the first set of encoding parameters and higher image quality than the second set of encoding parameters).

In summary, the display system 110 performing this process of encoding an image may apply higher quality encoding parameters to the foveal region, intermediate quality encoding parameters to the side foveal region, and lower quality encoding to the outer region. As described above, the sensitivity of the eye to image quality decreases with distance from the fovea; thus, the intermediate quality in the side foveal region and the lower quality in the outer region are less likely to give the user a negative impression of the overall image quality of the image. However, using middle quality encoding parameters in the side fovea region and lower quality encoding parameters in the outer region allows the encoding process to be performed with less computing resources, and thus may be completed in less time and/or consume less power. Furthermore, using coding parameters of intermediate quality in the side fovea region and lower quality in the outer region produces a coded image with a smaller file size, which is advantageous in reducing the amount of bandwidth used when transmitting the image to the display device 105.

Fig. 5C is a block diagram illustrating a process for rendering and encoding different portions of an image with different sets of parameters, according to an embodiment. The display system 110 renders 522 the first image region based on the first set of rendering parameters and renders 524 the second image region based on the second set of rendering parameters. After rendering the image, the display system 110 encodes 526 the first image region based on the first set of encoding parameters and encodes 528 the second image region based on the second set of encoding parameters.

In one embodiment, the image areas used for

rendering steps

522, 524 are the same as the image areas used for encoding

steps

526, 528. In other words, the first set of encoding parameters is applied to an image region rendered with the first set of rendering parameters and the second set of encoding parameters is applied to an image region rendered with the second set of rendering parameters. In other embodiments, the image areas used for

rendering steps

522, 524 are different from the image areas used for encoding

steps

526, 528. For example, the image areas for

rendering steps

522, 524 are circular, but the image areas for encoding

steps

526, 528 are square (as shown in the examples described below with reference to fig. 6A and 6B).

Fig. 6A illustrates an example of a process of encoding different regions of an image with different sets of encoding parameters and then packing the image regions of the image into a single frame, according to an embodiment. The display system 110 encodes an image 601 that is divided into a foveal region 602, a parachlorotic region 604, and an outer region 606. The display system 110 encodes the foveal region 602 based on the first set of encoding parameters, encodes the side foveal region 604 based on the second set of rendering parameters, and encodes the outer region based on the third set of rendering parameters. In the example shown in fig. 6A, three sets of encoding parameters specify different resolutions. Specifically, a first set of encoding parameters specifies a first relatively high resolution, a second set of encoding parameters specifies a lower resolution (e.g., one-half of the first resolution), and a third set of encoding parameters specifies an even lower resolution (e.g., one-fourth of the first resolution).

This encoding process generates three encoded image regions: a coded foveal region 612, a coded paracaviral region 614, and a coded outer region 616. The display system 110 performs a packaging process to package the encoded image regions into a single packaged image 622 and transmits the packaged image 622 to the display device 105. After receiving the packed image 622, the display device 105 decompresses Bao Bingchong the image 601 by enlarging and synthesizing each encoded image area accordingly so as to display it on the screen. Packing the encoded image region in this manner is advantageous because, for example, the resulting packed image 622 contains significantly fewer pixels than an image of a particular resolution. For example, if image 601 has a resolution of 1920×1080 pixels, packed image 622 includes 38 ten thousand pixels, which significantly reduces the bandwidth and time for transmitting each image compared to 200 ten thousand pixels of the original 1920×1080 image.

Fig. 6B shows an example of a sequence of encoded and packed images according to an embodiment. In the example shown in fig. 6B, the series of images constitutes a sequence of frames (e.g., in a video or game application), and each set of encoding parameters specifies a different frequency (i.e., a different frame rate). More specifically, foveal region 652 is encoded and transmitted at a first frequency (e.g., 120 Hz); the paracented region 654 is encoded and transmitted at a second frequency (e.g., 80 Hz) that is lower than the first frequency; and the outer region 656 encodes and transmits at a third frequency (e.g., 40 Hz) that is lower than the first and second frequencies.

For each frame, the display system 110 encodes and packages the image region into a packaged frame, which is then transmitted to the display device 105. When the frame does not contain all of the regions (e.g., frame n+1660b, frame n+2660c, and frame n+4660e), the display system 110 generates the next frame by using the most recent frame available for each image region. In some embodiments, post-processing techniques are applied in order to correct the misalignment.

Fig. 6B shows a sequence of five frames, frames n 660A through n+4660e. Frame n includes foveal region 652A, paracfoveal region 654A, and outer region 656A. In frame n+1 660B, foveal region 652B and paracented region 654B are included, but no outer regions are included, thereby reducing the amount of data to be transmitted to display device 105. Frame n+2 includes the most recently encoded foveal region 652C, but does not include a side foveal region or an outer region, thereby further reducing the amount of data. The series then repeats itself with frames n+3 660D and n+4 660E. In this case, the image region most sensitive to the user's eyes (i.e., foveal region 652) is re-encoded and transmitted in each successive frame. The image areas that are less sensitive to the user's eyes (i.e., the paracented area 654) are recoded and transmitted every three frames. And an image area (i.e., an outer area) where the user's eyes are even less sensitive is encoded and transmitted once every three frames. This significantly reduces the amount of data transmitted and thus reduces the bandwidth and time required to transmit each frame.

In some implementations, the display system 110 does not render areas for a given frame that are not encoded and transmitted to the display device 105. In some implementations, the display system 110 determines which regions to render based on activity in the scene. For example, if an object remains static in an image region, the rendering frequency for that image region may be lower than when a moving object is presented.

Additional configuration information

The foregoing description of the embodiments of the present disclosure has been presented for the purposes of illustration; and is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. Those skilled in the relevant art will appreciate from the foregoing disclosure that many modifications and variations are possible.

Some portions of this specification describe embodiments of the present disclosure from the standpoint of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are generally used by those skilled in the data processing arts to effectively convey the substance of their work to others skilled in the art. When described as functional, computational or logical, these operations are to be understood as being implemented by a computer program or equivalent circuitry, microcode, etc. In addition, it is also convenient to sometimes refer to the arrangement of these operations as modules, without losing their generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combination thereof.

Any of the steps, operations, or processes described herein may be performed or implemented using one or more hardware or software modules, alone or in combination with other devices. In one embodiment, the software modules are implemented using a computer program product comprising a computer readable medium containing computer program code executable by a computer processor for performing any or all of the steps, operations, or processes described.

Embodiments of the present disclosure may also relate to an apparatus for performing the operations herein. The apparatus may be specially constructed for the required purposes, and/or it may comprise a general purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of medium suitable for storing electronic instructions that may be coupled to a computer system bus. Furthermore, any of the computing systems mentioned in this specification may include a single processor or may be an architecture employing a multi-processor design to enhance computing power.

Embodiments of the present disclosure may also relate to products produced by the computing processes described herein. Such an article of manufacture may comprise information generated by a computing process, wherein the information is stored in a non-transitory, tangible computer-readable storage medium and may comprise any implementation of the computer program product or other data combination described herein.

Finally, the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Thus, the scope of the present disclosure is not intended to be limited by the embodiments, but is to be defined by any claims issued in this application based on the embodiments. Accordingly, the disclosure of the embodiments is intended to be illustrative, but not limiting, of the scope of the disclosure, which is set forth in the following claims.

Claims

1. An image processing method, comprising:

receiving eye tracking information comprising at least one image of an eye of a user viewing a screen of a display device;

determining, based on the eye-tracking information, a first screen region corresponding to a region of the screen, the first screen region containing a gaze point of the user, the gaze point representing a point on the screen to which the user is looking when the eye-tracking information is captured;

Determining a second screen region, separate from the first screen region, corresponding to a region of the screen based on the eye tracking information;

determining a depth of focus of the user based on the eye tracking information;

encoding an image for display on the screen, the image comprising an object, at least a portion of the object to be displayed within the first screen region, the encoding comprising:

encoding a first image region of the image based on a first set of encoding parameters, the first image region comprising a portion of the image to be displayed in the first screen region,

encoding a second image region of the image based on a second set of encoding parameters, the second image region being separate from the first image region and to be displayed in the second screen region, the second set of encoding parameters resulting in lower quality than the first set of encoding parameters; and is also provided with

Encoding the object using the second set of encoding parameters in response to a difference between a depth of the object in the image and a depth of focus of the user exceeding a threshold, the object including a portion to be displayed within the first screen region; and

Transmitting the encoded image to the display device for display on the screen.

2. The method of claim 1, wherein the first screen region corresponds to a region of the screen seen by a fovea of the eye.

3. The method of claim 1, wherein the second screen region corresponds to a region of the screen seen by a portion of the retina of the eye that is outside of the fovea.

4. The method of claim 1, wherein the first set of encoding parameters comprises a first image resolution, the second set of encoding parameters comprises a second image resolution, and the second image resolution is lower than the first image resolution.

5. The method of claim 4, wherein encoding the first image region comprises generating a first encoded image region, wherein encoding the second image region comprises generating a second encoded image region, and further comprising:

a packed image is generated comprising the first encoded image region and the second encoded image region.

6. The method of claim 1, wherein the first set of encoding parameters comprises a first frame rate, the second set of encoding parameters comprises a second frame rate, and the second frame rate is lower than the first frame rate.

7. The method of claim 1, further comprising:

wherein encoding the image for display on the screen further comprises:

a third image region of the image is encoded based on a third set of encoding parameters, the third image region being separate from the first image region and the second image region and the third image region to be displayed in the third screen region, the third set of encoding parameters resulting in lower quality than the first set of encoding parameters and higher quality than the second set of encoding parameters.

8. The method of claim 7, wherein the third screen region corresponds to a region of the screen seen by a parafovea of the eye.

9. The method of claim 1, wherein the display device is a virtual reality headset.

10. A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:

determining a depth of focus of the user based on the eye tracking information;

transmitting the encoded image to the display device for display on the screen.

11. The computer-readable storage medium of claim 10, wherein the first screen region corresponds to a region of the screen seen by a fovea of the eye.

12. The computer-readable storage medium of claim 10, wherein the second screen region corresponds to a region of the screen seen by a portion of the retina of the eye that is outside of the fovea.

13. The computer-readable storage medium of claim 10, wherein the first set of encoding parameters comprises a first image resolution, the second set of encoding parameters comprises a second image resolution, and the second image resolution is lower than the first image resolution.

14. The computer-readable storage medium of claim 10, wherein encoding the first image region comprises generating a first encoded image region, wherein encoding the second image region comprises generating a second encoded image region, and further comprising:

15. The computer-readable storage medium of claim 10, wherein the first set of encoding parameters comprises a first frame rate, the second set of encoding parameters comprises a second frame rate, and the second frame rate is lower than the first frame rate.

16. The computer-readable storage medium of claim 10, the operations further comprising:

wherein encoding the image for display on the screen further comprises:

17. The computer-readable storage medium of claim 16, wherein the third screen region corresponds to a region of the screen seen by a parafovea of the eye.

18. The computer-readable storage medium of claim 10, wherein the display device is a virtual reality headset.

19. An image processing system, comprising:

one or more processors; and

a non-transitory computer-readable storage medium storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:

eye tracking information comprising at least one image of an eye of a user viewing a screen of the display device is received,

determining a first screen region corresponding to a region of a screen based on the eye tracking information, the first screen region including a gaze point of the user, the gaze point representing a point on the screen to which the user is looking when the eye tracking information is captured,

determining a second screen area separated from the first screen area corresponding to an area of the screen based on the eye tracking information,

Determining a depth of focus of the user based on the eye tracking information;

encoding a second image region of the image based on a second set of encoding parameters, the second image region being separate from the first image region and to be displayed in the second screen region, the second set of encoding parameters resulting in lower quality than the first set of encoding parameters, and

transmitting the encoded image to the display device for display on the screen.

20. The system of claim 19, the operations further comprising:

wherein encoding the image for display on the screen further comprises: