WO2021169766A1

WO2021169766A1 - System and method for visualizing light rays in a scene

Info

Publication number: WO2021169766A1
Application number: PCT/CN2021/075355
Authority: WO
Inventors: Zhong Li; Yi Xu; Shuxue Quan
Original assignee: Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority date: 2020-02-25
Filing date: 2021-02-04
Publication date: 2021-09-02
Also published as: CN115088019A

Abstract

Methods and systems are disclosed for visualizing light rays with environments. An image of an environment that includes a light source is received by a mobile device. The mobile device determines a geometric composition of one or more objects within the image, a material composition of each of the one or more objects using an object classifier, and a position of the light source within the image of the environment. An augmented-reality application of the mobile device generates virtual light rays using the geometric composition, the material composition of the one or more objects, and the position of the light source. The virtual light rays are presented on a display of the mobile device superimposed on the image of the environment.

Description

SYSTEM AND METHOD FOR VISUALIZING LIGHT RAYS IN A SCENE

BACKGROUND OF THE INVENTION

Augmented Reality (AR) superimposes virtual content over a user’s view of the real world environment. With the development of AR software development kits (SDK) , the mobile industry has brought smartphone AR to mainstream. An AR SDK typically provides six degrees-of-freedom (6DoF) tracking capability. A user can scan the environment using a smartphone’s camera, and the smartphone performs visual inertial odometry (VIO) in real time. Once the camera pose is tracked continuously, virtual objects can be placed into the AR scene to create an illusion that real objects and virtual objects are merged together.

The AR environment may account for natural lighting sources and virtual lighting sources in generating a presentation of the virtual object within the environment. Thus, it can be important in such systems to simulate visual light rays that can improve both an understanding of light rays within the real-world environment and the presentation of virtual content presented in the AR environment.

SUMMARY OF THE INVENTION

The present invention relates generally to augmented reality systems, and more specifically, and without limitation, to simulating visual light rays within an augmented reality scene.

Aspects of the present disclosure include methods for visualizing light rays within environments. The methods include receiving, by a camera of a mobile device, an image of an environment, the image being a frame of video captured by the camera, wherein the environment includes a light source; determining, by the mobile device, a geometric composition of one or more objects within the image; determining, by the mobile device, a material composition of each of the one or more objects using an object classifier; determining, by the mobile device and using the image of the environment, a position of the light source relative to the image of the environment; generating, by an augmented-reality application executing on the mobile device, one or more virtual light rays using the geometric composition, the material composition of the one or more objects, and the position of the light source; and presenting, on a display of the mobile device, the one or more virtual light rays superimposed on the image of the environment.

Another aspect of the present invention includes a system comprising one or more processors and a non-transitory computer-readable media that includes instructions that when executed by the one or more processors, cause the one or more processors to perform methods described above.

Another aspect of the present invention includes a non-transitory computer-readable media that includes instructions that when executed by one or more processors, cause the one or more processors to perform the methods described above.

Numerous benefits are achieved by way of the present invention over conventional techniques. For example, embodiments of the present invention provide AR environments that can be modified to account for particular light source locations or types of light (point source, area source, etc. ) . Further benefits can be educational in nature, such as improving a user’s understanding of optical physics through visualization of light rays within the user’s environment. These and other embodiments of the invention along with many of its advantages and features are described in more detail in conjunction with the text below and attached figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example of a computer system that includes a depth sensor and a red, green, and blue (RGB) optical sensor for AR applications according to an embodiment of the present invention.

FIG. 2 illustrates an example of a computer system for visualizing light rays with an AR system according to an embodiment of the present invention.

FIG. 3 illustrates an example graphical depiction of the visualization of light rays with an AR system according to an embodiment of the present invention.

FIG. 4 illustrates an example block diagram of light ray visualization with an AR system according to an embodiment of the present invention.

FIG. 5 is a simplified flowchart illustrating a method of visualizing light rays with an AR system according to an embodiment of the present invention.

FIG. 6 illustrates an example computer system, according to embodiments of the present invention.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

In the following description, various embodiments will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the embodiments. However, it will also be apparent to one skilled in the art that the embodiments may be practiced without the specific details. Furthermore, well-known features may be omitted or simplified in order not to obscure the embodiment being described.

Embodiments of the present invention are directed to, among other things, generating a visualization of light rays within an AR system. The AR system includes a mobile device and a camera that captures images of an environment. The environment includes at least one light source. The AR system analyzes the images to determine a geometric composition of the environment. A material analysis then assigns a material type to each surface identified by the geometric composition. A light source analysis identifies the position of the light source within the environment. The AR system generates a virtual light ray that is superimposed on at least one image. The virtual light ray may include substantially a same trajectory as a light ray emitted by the light source. A user may interact with the virtual light rays by modifying properties of the environment (e.g., adding/removing light sources, changing material types for one or more surfaces, changing a color of the light source or virtual light ray, etc. ) or by user input (e.g., a gesture, touch screen input, etc. ) .

As an example, the AR system may be operated within a room that includes a light source such as daylight passing through a window. The AR system captures images of the room and determines a geometric composition that identifies surfaces within the room (e.g., such as the floor, ceiling, walls, windows, door, objects within the room, etc. ) . A material composition then assigns a material type to the surfaces. In some instances, the material composition can be a material type (e.g., wood, glass, ceramic, etc. ) . In other instances, the material composition may correspond to the optical properties of the material (e.g., a variable indicating a degree of reflection between diffuse reflection and specular reflection) . The AR system can determine an approximate position of the light source (e.g., as being outside the window) and direction (e.g., as entering the room through the window) .

The AR system then generates virtual light rays that are superimposed on the images to enable a user to visualize the trajectory of the light rays as they pass through the window from the light source and interact with the various material types of the surfaces of the room. For instance, a virtual light ray may travel through the window towards a cloth-covered sofa. The cloth causes a diffuse reflection in which the virtual light ray is depicted as scattering in many different directions. Another virtual light ray travels to a specular surface (e.g., the glass of a television, or the like) and is depicted as reflected in a particular direction. Any number of virtual light rays can be generated.

The user can interact with the virtual light rays by user selection via an interface of the AR system or by a gesture such as a hand gesture. Different gestures can correspond to different input. Examples of user input include input that alters the trajectory of the virtual light rays, alters the hue and/or intensity of the virtual light rays or the light source, adds or removes light sources, alters a material type of a surface (to determine a resulting effect on a virtual light ray) , displays angles of reflection, displays light ray distances, indicates changes in optical characteristics of the light rays as a result of an interaction with a material type (s) (e.g., such as changes in light intensity, hue, etc. ) , combinations thereof, or the like.

Numerous benefits can be achieve through AR system described herein. The trajectory of light rays can be determined to isolate the effect of one or more light sources have on the appearance of an environment based on the material composition of the surfaces within the environment. As a result, environments can be modified to account for particular light source locations or types of light (point source, area source, etc. ) . For instance, an environment can be rearranged to prevent a particular type of light interference (e.g., glare, lens flare, etc. ) . Visualizing light rays may additionally improve the generation of virtual environments (e.g., such as virtual reality environments) such that the virtual environments appears like a real-world environment. Further benefits can be educational in nature, such as improve a user’s understanding of optical physics through visualization of light rays within the user’s environment.

FIG. 1 illustrates an example of a computer system 110 that includes a depth sensor 112 and an RGB optical sensor 114 for AR applications, according to an embodiment of the present invention. The AR applications can be implemented by an AR module 116 of the computer system 110. Generally, the RGB optical sensor 114 generates an RGB image of a real-world environment that includes, for instance, real-

world objects

130, 135. The depth sensor 112 generates depth data about the real-world environment, where this data includes, for instance, a depth map that shows depth (s) of the real-world objects 130, 135 (e.g., distance (s) between the depth sensor 112 and the real-world objects 130, 135) . Following an initialization of an AR session (where this initialization can include calibration and tracking) , the AR module 116 renders an AR scene 120 of the of the real-world environment in the AR session, where this AR scene 120 can be presented at a graphical user interface (GUI) on a display of the computer system 110. The AR scene 120 depicts a real-world object representation 122 of the real-

world objects

130, 135. In addition, the AR scene 120 depicts a virtual object 124 not present in the real-world environment.

The AR module 116 can generate a red, green, blue, and depth (RGBD) image from the RGB image and the depth map to detect an occlusion of the virtual object 124 by at least a portion of the real-world object representation 122 or vice versa. The AR module 116 can additionally or alternatively generate a 3D model of the real-world environment based on the depth map, where the 3D model includes multi-level voxels. Such voxels can be used, among other things, to detect collision between the virtual object 124 and at least a portion of the real-world object representation 122. In some instances, the 3D model can include a wireframe representation of the real-world environment captured by RGB image. The wireframe representation can provide an indication of the geometric shape and orientation of surfaces captured by the RGB image. The AR scene 120 can be rendered using a 3D model of the real-world environment to properly show the occlusion and avoid the rendering of the collision.

In an example, the computer system 110 represents a suitable user device that includes, in addition to the depth sensor 112 and the RGB optical sensor 114, one or more graphical processing units (GPUs) , one or more general purpose processors (GPPs) , and one or more memories storing computer-readable instructions that are executable by at least one of the processors to perform various functionalities of the embodiments of the present invention. For instance, the computer system 110 can be any of a smartphone, a tablet, an AR headset, or a wearable AR device.

The depth sensor 112 has a known maximum depth range (e.g., a maximum working distance) and this maximum value may be stored locally and/or accessible to the AR module 116. The depth sensor 112 can be a ToF camera. In this case, the depth map generated by the depth sensor 112 includes a depth image. The RGB optical sensor 114 can be a color camera. The depth image and the RGB image can have different resolutions. Typically, the resolution of the depth image is smaller than that of the RGB image. For instance, the depth image has a 640x480 resolution, whereas the RGB image has a 1920x1280 resolution.

The AR module 116 may execute a visual-inertial odometry (VIO) process to track the pose (e.g., position and orientation) of the AR module 116. VIO uses image analysis and an inertial measurement unit (sensor data) to determine changes in the camera’s (e.g., the AR module’s 116) position and/or orientation. Visual odometry can use feature detection in images to identify and correlate features across successive images. The feature detection may be used to generate an optical flow field that estimates the motion of the camera relative to objects depicted in the success images. The degree of motion between time intervals (e.g., the time interval between successive images) may be used to determine the distance and direction the camera has moved during the time interval. The distance and direction of the camera (and sensor data) may be used to track a position and an orientation of the camera at each time interval. Visual odometry may be augmented using an inertial measurement unit that captures directional force values.

For instance, AR module 116 may execute an implementation of VIO called a simultaneous localizations and mapping (SLAM) process. For instance, the SLAM process may initiate with a calibration step in which an empty map of the environment may be initialized with the device positioned at the origin of the coordinate system. The SLAM process receives input data such as, but not limited to, image data, control data ct, sensor data st, and time interval t. The SLAM process then generates an output that may include an approximate location of the device xt for a given time interval (relative to one or more approximate locations at one or more previous time intervals) and a map of the environment mt. The output can be augmented (or verified) using feature detection images captured at time t and time t+1 to identify and correlate features across the images. The changes between images can be used to verify the movement of the AR module 116, populate the environment mt with objects detected in the images, etc.

As the device captures sensor data that indicates movement in a particular direction (and image data from the camera of the device) , the SLAM process may update xt and mt. The SLAM process may be an iterative process that updates xt and mt in set time intervals or when new sensor data or image data can be detected. For instance, if no sensor change occurs between time interval t and t+1, then the SLAM process may delay updating the position and map to preserve processing resources. Upon detecting a change in sensor data indicating a high probability that the device has moved from its previous position xt, the SLAM process may compute the new position of device xt and update the map mt.

The AR module 116 can be implemented as specialized hardware and/or a combination of hardware and software (e.g., general purpose processor and computer-readable instructions stored in memory and executable by the general purpose processor) .

In an illustrative example of FIG. 1, a smartphone is used for an AR session that shows the real-world environment. In particular, the AR session includes rendering an AR scene that includes a representation of a real-world table on top of which a vase (or some other real-world object) is placed. A virtual ball (or some other virtual object) is to be shown in the AR scene. In particular, the virtual ball is to be shown on top of the table too. By tracking the occlusion between the virtual ball and a virtual vase (that represents the real-world vase) , the virtual vase can occlude, in parts of the AR scene, when the virtual ball is behind the virtual vase relative to the pose of the smartphone. In other parts of the AR scene, the virtual ball can occlude the virtual vase when the virtual vase is behind the virtual ball relative to a change in the pose of the smartphone. And in remaining parts of the AR scene, no occlusion is present. In addition, a user of the smartphone can interact with the virtual ball to move the virtual ball on the top surface of the virtual table (that represents the real-world table) . By tracking possible collision between the virtual ball and the virtual object, any interaction that would cause collision would not be rendered. In other words, the collision tracking can be used to control where the virtual ball can be moved in the AR scene.

FIG. 2 illustrates an example of a computer system for visualizing light rays with an AR system, according to an embodiment of the present invention. In addition to rendering virtual objects within the real-world environment, the AR module 116 can generate virtual representations of characteristics of the real-world environment. For instance, the AR module 116 can generate virtual light rays superimposed onto an image captured by the computer system device 110 that accurately represent the light rays emitted from the light source 235 within the real-world environment. The AR module 116 can generate any number of virtual light rays within AR scene 120 to illustrate various interactions of the light rays with different surfaces within the AR scene 120 such as the real-world object representation 122 or the virtual object 124.

In an illustrative example of FIG. 2, a smartphone is used for an AR session that shows the real-world environment. The real-world environment includes real-world objects 130, 135, a light source 235, and light rays 226. While the light rays 226 illuminate the real-world environment, light rays 226 themselves (and the trajectory of the light rays) cannot be viewed by the user. In particular, the AR session includes rendering an AR scene that includes a representation of a real-world table on top of which a vase (or some other real-world object) is placed. A virtual ball (or some other virtual object) can to be shown in the AR scene. The AR scene 120 presents virtual light rays 228 that represent light rays 226 emitted from the light source 235 within the real-world environment. The virtual light rays 228 have approximately a same or similar trajectory as the light rays 226. The virtual scene 120 optionally includes a representation of the light source 235 (e.g., either as captured within an image or as a virtual representation) .

The AR module can determine the entire (or a part of the) trajectory of the virtual light rays (that are within view of the sensors) , by analyzing the geometric composition of the AR scene 120, determining a material composition of surfaces within the AR scene 120, and determining the position and the type of light source. The AR module uses a three-dimensional reconstruction process using the depth map to reconstruct the surfaces of the real-world environment. The AR module may supplement the three-dimensional reconstruction process by adding depth information for any virtual objects within the AR scene. In some instances, the three-dimensional reconstruction process may output a wireframe representation of the AR scene. The wireframe representation may represent surfaces of the AR scene as lines and planes. In some instances, the AR module may use a neural network to determine the geometric composition of the AR scene. The AR module uses the geometric composition to identify the surfaces within the AR scene to determine the angle of incidence when the virtual light ray interacts with a surface.

The AR module may classify the material composition of the identified surfaces with the AR scene. For instance, some materials reflect light rays better than other materials, which may scatter light the light rays. The AR module classifies the surfaces of a captured image according to a material composition of the surface. Once classified, the AR module can accurately trace the trajectory of light rays and generate an accurate representation of the light rays via the virtual light rays. In some instances, material composition can be performed by semantic segmentation of an image (e.g., classifying pixels of the image as corresponding to a particular object or surface) , then classifying each segment. For instance, the segments can be classified by a trained neural network or the like. In other instances, the material composition can be performed via user input. The material composition may include a particular material (e.g., wood, glass, etc. ) and/or the optical properties of the surface of the object such as specular, diffuse, or somewhere between specular and diffuse.

The AR module determines an approximate position of the light source and a light source type (e.g., point source, area source, etc. ) . In some instances, image process may be used to detect the light source (e.g., if the light source is depicted in the image) . For instance, the pixels of the image can be analyzed to brightness (e.g., indicating a light source) relative to the pixels. The number of pixels that represent the light source may be a further indication of a type of the light source with point source being smaller than area light sources. In other instances, a trained neural network may be used to classify the position and type of light sources. For instance, the neural network may be trained using images of similar real-world environments with labeled light sources (e.g., supervised learning) . Though described in a particular order, the geometric composition, material composition, and light source analysis can be performed in any particular order or in parallel.

Returning to the example of FIG. 2, the AR scene 120 depicts a first virtual light ray 228 extending from the light source and interacting with a first real-world object 135, such as the vase. The AR module determines the geometric composition of the vase relative to the camera, the material composition (e.g., in this example being more specular than diffuse) , and analyzes the light source depicting the virtual light source as extending from the light source and reflecting from the surface of the vase in a different direction. Another virtual light ray 230 is depicted interacting with a second real-world object 130 (e.g., a table) of a different material composition. The table being of a diffuse material composition does not reflect light rays in a single direction, but instead, reflects light at many different angles as illustrated by diffusely reflected virtual light rays 232. The AR module thus is able to represent the complete trajectory of virtual light rays that interact with the table by representing the many different light rays (at varying angles) resulting from the diffuse reflection.

In some instances, the AR module may generate virtual light rays that interact with virtual object such as the virtual object 124. The geometric and material composition of the virtual object may be selected by an AR application or by a user. In some instances, the geometric composition and material composition can be modified to simulate the effect on virtual light rays. The AR module may present a single virtual light ray 234 that can be dynamically updated upon selection of a new geometric composition or material composition (e.g., updating the interaction with the virtual object 124 including any specular or diffuse reflection, changes in light intensity or hue, etc. ) . In other instances, the AR module may present each virtual light ray superimposed on the same image with each virtual light ray marked so as to inform the user which virtual light ray is interacting with which geometric composition, and/or material composition, and or virtual object such as virtual object 124.

FIG. 3 illustrates an example graphical depiction of the visualization of light rays with an AR system, according to an embodiment of the present invention. In this example, the image captured a real-world environment that includes the inside of a room with varying surfaces (of various geometries and material compositions) . The room is illuminated from light entering through a window 304. The AR system generates multiple virtual light rays to represent the light rays interacting with different surfaces within the room. For instance, a first virtual light ray 308 reflects of a surface of the television 312, which may be specular due to the glossy surface of the television, towards the sofa 316. The first virtual light ray 308 may continue reflecting off the sofa towards the wall 320, off the wall towards the floor 324, until the light becomes too dim to detect or the virtual light ray is no longer within view of the sensors of the computer system.

The AR system determines each reflection along the trajectory of the virtual light ray semi-independently from other reflections. For instance, the first virtual light ray 308 first reflects of the television 312. The television having glossy surfaces causes a specular reflection towards the sofa 316. The AR system can determine how the reflected virtual light ray (from the television) interacts with the sofa. For instance, the sofa may be covered in a diffuse material that causes a diffuse or somewhat diffuse reflection. The AR system may display multiple virtual light rays reflecting from the sofa to represent the diffuse reflection scattering the light (not shown) . Alternatively, the AR system may identify an angle of reflection of the strongest (e.g., highest luminosity) light diffuse reflection. For instance, if the surface does not exhibit a Lambertian reflection (e.g., in which the luminosity of all diffuse reflected light rays is equal) there exists at least one light ray that has a higher luminosity then other light rays after the diffuse reflection. The trajectory of virtual light ray 304 follows this highest luminosity light ray and may not represent the other lower luminosity light rays resulting from the diffuse reflection.

Other virtual light rays, such as virtual light ray 308, interacts with a different surface, but similarly reflects off of multiple surfaces before becoming undetectable or passing out of the frame. The AR system continues to track the position of the light ray within the coordinate system of the AR system (e.g., via a SLAM as described in connection to FIG. 1) such that if the AR system moves capturing a new view of the environment, the AR system can continue to display the light rays in the new view. For instance, if a light ray exits the frame of view, the camera of the AR system can be moved to continue displaying the trajectory of the virtual light source that was previously out of view.

By capturing and representing each reflection of the light rays entering the room through the window, the AR device enables the user to observe and interact with the optical properties of the surfaces of the room. For instance, the virtual light rays illustrate trajectory of light rays into and out of the room. The virtual light rays enable the user to repaint the walls or rearrange the furniture (e.g., altering the geometric composition or material composition) to increase or decrease the illumination by the light source. For another instance, the multiple reflections illustrate the inter-reflection of light that indirectly illuminates surfaces that are not within direct line of site with the light source. Once identified, the user may alter the material composition or geometry to reduce or enhance the inter-reflection.

FIG. 4 illustrates an example block diagram of light ray visualization with an AR device according to an embodiment of the present invention. The AR device (e.g., AR device 420) can receive both user input and environmental input (e.g., sensors of the AR device) . The environmental input includes image data such as an RGBD data 404 (e.g., images received from an RGB-D camera that captures red, green, blue color data and depth, or image data that combined with depth sensor data) . The depth data included in the RGBD data 404 can indicate the distance between the camera and surfaces within an image (e.g., the depth) . For instance, each pixel of an image may include a color value (e.g., a combination of red, green, blue) , an intensity value, and a depth value.

The RGBD data can include a single image, a sequence of images, or the RGBD data may include an image stream in which image (and depth) data is continuously received until the AR session is terminated (e.g., the AR application is terminated or otherwise shutdown) . The AR device may generate a depth map using the depth data of each pixel of an image. If depth data for more than one image is available (e.g., a sequence of images or image stream is received) , the depth map may be augmented using images (and depth data) of multiple views (e.g., different perspectives) of the environment. The resulting depth map may be more accurate that a depth map from a single image. In some instance, the AR device need not generate a depth map. Instead, the raw image data (with depth) may be past geometry analysis 412 and light source analysis 408.

Light source analysis 408 receives the image data from RGBD data 404 and determines a location of the light source of the images of the image data. In some instances, the light source analysis 408 identifies a location of the light source within the environment from one or more images of the image data. In other instances, the light source analysis 408 identifies a location of the light source in each image individually. The light source may appear within an image or may be out of view. For example, the light source may be a lamp that can be seen within an image captured by the camera. In another example, a room can be illuminated by light (from an out of view light source) passing through a window. In another example, there may be two or more light sources, such as light sources within the environment (and within an image) and/or light sources that are not in view. The location of the light source indicates the direction of the light rays within the environment.

The light source analysis 408 may include a classification of the type of the light source. Different light sources emit light differently. For instance, a light source that is a point light type may be a light source at a discrete location that emits light in one or more directions. As an example, a light bulb may be a point light source. A light source that is an area light type may be a source comprised of a geometric shape in which light is emitted uniformly across the surface area of a surface of the geometric shape. As an example, a room illuminated by indirect daylight through a window. The window may be treated as an area light type light source as light passes through the window approximately uniformly across the surface area of the window enabling the window to illuminate the room. Other types of light sources (by way of example only) include spotlights (e.g., such as a light source that emits light in a particular direction) , directional lights (e.g., a distant light source that is emitted in a particular direction such as the sun) , and the like.

In some instances, the light source analysis 408 may receive positional and classification data from a user. For instance, a user may be presented with a map of the environment based on the coordinate system of the AR device. The user may be directed to select a set of coordinates of the map that correspond to the light source and select a particular light source type. Alternatively, the user may enter in a set of coordinates (e.g., via alphanumeric text) or a single coordinate with a size indicator.

In other instances, the light source position and classification may be determined using a trained deep neural network. The deep neural network may be trained using a set of images with each image including a label that indicates: the presence or absence of a light source, the position of the light source within the image, and the classification. Once trained, the deep neural network may take as input an image from the image data and output a position of the light source and the classification of the type of light source. The light source analysis 408 outputs the light source position and the light source type to visualization rendering 432, which uses the light source position and the light source type (in part) to generate a virtual light ray.

Geometric analysis 412 identifies the geometric properties of the surfaces represented in the images of the image data. The AR device, using geometric analysis 412, may determine how light rays interact with a particular surface. Geometric analysis 412 may execute a three-dimensional (3D) reconstruction process using the depth information from the RGBD data (e.g., from one or more images) . The 3D reconstruction process may define a 3D representation of the surfaces in an image that identifies, for each surface, a set of planar surfaces. For instance, a table may include a single rectangular surface with one identified planar surface (e.g., the rectangular surface) . A curved surface such as a vase may be represented as a set of planar surfaces with each planar surface. The 3D reconstruction process can identify the geometric normal for each surface (or planar surface) . In some instances, a wireframe representation can be generated in which each surface is represented by a set of lines and a set of planes.

In some instances, deep learning may be used in addition or in place of the 3D reconstruction process to identify the geometric composition of the environment from the images of the image data. For instance, a trained deep neural network may use the images as input and output geometric composition data (e.g., a wireframe representation, data identifying planar surfaces and each corresponding geometric normal, or the like) .

The geometric composition data can be used to determine how a particular incident light ray will interact with a particular surface. For instance, the AR device can determine an angle of incidence (e.g., the angle formed from the incident light ray and the geometric normal) . The angle of incidence being equal to the angel of reflection can indicate the trajectory of the reflected light ray. The incident angle can also indicate the trajectory if the light ray is entirely (or partially) refracted through the surface.

The geometric composition data from geometric analysis 412 is passed into a material analysis 416 and into the visualization rendering 432. The material analysis 416 uses the geometric composition data to determine how to classify the optical properties of the materials of each surface. For instance, a metallic surface will cause a specular reflection (e.g., where a single light ray is reflected into single outgoing direction) , while a non-metallic surface will cause diffuse reflection (e.g., where the reflected light ray is scattered at many different angles) . Some materials may cause a combination of specular and diffuse reflections.

Material composition can be identified using image segmentation and classifying each segment. Image segmentation may group portions of an image (e.g., a single pixel, or a set of pixels) as being related (e.g., part of the same object in the image) . For instance, in a picture of a room, image segmentation identifies the pixels of the image that correspond to the walls, the pixels that correspond to the floor, the ceiling, the table, etc. Image segmentations may include clustering, edge detection, motion &interactive segmentations, or the like.

In some instances, the geometric composition data may be used to identify the segments (e.g., in addition to or in place of image segmentation) . Since the geometric composition represents the surfaces as individual planar surfaces, the material analysis 416 may use the planar surfaces that correspond to the same surface or object as being part of a single segment. Using the geometric composition data may approximate the result from the image segmentation (e.g., lower accuracy with some pixels being included that should not be included and excluding some pixels that should be included) . Using the geometric composition may reduce the resource consumption (e.g., memory and processing cycles) since the data is already generated before the material analysis initiates. The AR device may determine to use image segmentation (for better accuracy) or the geometric composition data (for reduced resource consumption) at runtime.

Each segment identified by the image segmentations may then be classified according to its optical properties (and/or its material composition) . The segments may be classified using a machine-learning model such as a neural network or deep neural network. For instance, the machine-learning model may be trained using a set of labeled images. The labels may correspond to a classification for each segment in the image. From the set of images the machine-learning model may identify the particular material of each segment (e.g., wood, drywall, metal, glass, etc. ) or the optical properties of the segment (e.g., specular, diffuse, or some combination of specular and diffuse) . Once trained, the segmented images are input into the machine-learning model and the model outputs an identification of the optical properties (and/or material) of each segment. The output from the model is passed to visualization rendering 432.

The AR device 420 receives input from one or more sensors (e.g., such as cameras, depth sensors, and/or the like) and user input (e.g., through a physical interface such as a touchscreen, mouse, keyboard, and/or the like) . The AR device executes a SLAM process to track the position of the AR device within a coordinate system of the AR device. The SLAM process may execute once during an initialization process of the AR device (e.g., to define the coordinate system) . The SLAM process may execute periodically (or continuously) thereafter in set intervals or upon detection from sensors that indicate the AR device has moved. The SLAM process outputs pose data 424. Pose data 424 includes a pose at a particular instant of time. Each pose includes a position (e.g., within the coordinate system) and an orientation (e.g., the direction AR device is facing) .

The user may interact with the AR device 420 using gesture input 428. The AR device 420 includes a display that provides a visual representation of the environment with one or more superimposed virtual objects (including virtual light rays) . The user can provide gesture input by physically interacting with the coordinate system location of a virtual object. For instance, if a virtual object is on a table in front of a user, the user can interact with the virtual object by physically interacting with the area above the table. A user may interact with virtual objects using any number of gestures (e.g., grabbing, pushing, pulling rotating, etc. ) . The AR device may detect the gesture and modify the trajectory of the virtual light ray based on the gesture.

The AR device may use the gesture input 428, pose data 424, material analysis 416, geometric analysis 412, and light source analysis 408 to generate a visualization rendering 432 of the virtual light rays. For instance, the AR device 420 uses the light source analysis to determine an origin location (e.g., the light source’s position) of a virtual light ray and an initial direction. The AR device uses the geometric analysis to determine the output virtual light ray (e.g., the reflected virtual light ray) that results from an interaction with a surface and the material analysis 416 to determine the type of output virtual light ray (e.g., a single ray resulting from a specular reflection or a plurality of rays resulting from a diffuse reflection) . The AR device may then determine the complete trajectory of the virtual light ray in the coordinate system of the AR device 420.

The AR device 420 may use the pose data 424 to update the representation of the virtual light ray in response to a change in the pose (e.g., position and/or orientation) of the AR device. For instance, if the AR device moves, the AR device 420 and visualization rendering 432 updates the appearance of the virtual light rays as a result of the movement (e.g., the virtual light rays may appear larger if the AR device moves closer to the virtual light ray, the virtual light rays may appear smaller if the AR device moves further away, a portion of the virtual light rays that were previously out of view may become visible, etc. ) .

The AR device 420 may use the gesture input 428 to modify a virtual light ray in response to user (gesture) input. The gesture input may modify an appearance of the virtual light rays (e.g., color, size, shape, etc. ) , modify a physical property of a surface (e.g., changing the geometric composition data, material composition, etc. ) , or modify the trajectory of a virtual light ray (e.g., moving virtual light ray) . The gesture input 428 causes the virtualization rendering 432 to generate an update virtual light ray based on the gesture input.

Each of material analysis 416, geometric analysis 412, light source analysis 408, and visualization rendering 432 may include software instructions that execute on a processor to perform each respective function. Alternatively, one or all of material analysis 416, geometric analysis 412, light source analysis 408, and visualization rendering 432 may be embodied on special purpose hardware (e.g., application-specific integrated circuit, field programmable gate array, or the like) configured to perform a corresponding function.

FIG. 5 is a simplified flowchart illustrating a method of visualizing light rays with an AR system according to an embodiment of the present invention. At block 504, a mobile device may receive an image of an environment from a camera of a mobile device. The image may include one or more objects and a light source (e.g., either within view or outside the view of the image) . The images may be received upon initialization of an augmented-reality application executing on the mobile device. Initializing the augmented-reality application may include executing a SLAM process to define a coordinate system of the mobile device and to track the pose of the mobile device within the coordinate system.

At block 508, the geometric composition of the environment can be determined. The geometric composition includes data associated with the surfaces of the one or more objects such as geometric normal of each surface. In some instances, the geometric composition may use depth data from the image to execute a 3D reconstruction process. The process may identify, for each object, one or more surfaces that correspond to the object, and for each surface, one or more planar surfaces that make up the surface. The geometric composition may identify the geometric normal of each surface (and/or planar surface) which can be used to determine reflection angles and subsequently the trajectory of a reflected light ray. In some instances, determining the geometric composition can include generating a wireframe representation of objects within the image (or the entire image) .

At block 512, a material composition of the one or more objects can be determined. The material composition may include a material of the object (e.g., wood, glass, ceramic, etc. ) or the optical properties of the object (e.g., specular, diffuse, or some combination of specular and diffuse) . The material composition may be determined by performing image segmentations, which identifies the pixels of the image that correspond to each object. Alternatively (or additionally, the segments may be determined by (or augmented by) the geometric composition (as described in block 508) . Each segment (e.g., that corresponds to an object) may then be classified using a classifier (e.g., such as a deep neural network, neural network, etc. ) .

At block 516, a position of the light source in the coordinate system of the mobile device is determined using the image of the environment. A category of the light source may also be determined. Examples of categories of light sources include point light, spot light, directional light, area light, and the like. The position of the light source (and its category) can be determined using a trained machine-learning model (e.g., neural network, deep neural network, etc. ) . The model may be trained using supervised or unsupervised learning.

At block 520, the geometric composition, material composition, and the position of the light source can be used by the augmented-reality application to generate a virtual light ray that follows substantially a same trajectory as a light ray emitted from the light source. The entire trajectory of the virtual light ray may be generated including any additional virtual light rays generated from a diffuse reflection. Any number of virtual light rays may be generated (each of which corresponding to a different light ray and following substantially a same trajectory as the corresponding light ray) .

At block 524, the virtual light ray can be presented on a display of the mobile device. The virtual light ray may be superimposed onto the image of the environment with the AR application. A user may interact with the virtual light ray through user input that includes one or more physical interfaces (e.g., touchscreen, etc. ) or gestures captured by the camera of the mobile device. For instance, the user may use gestures to move a virtual light ray, modify a geometric composition or material composition of an object of the image, change a color of the virtual light ray, change a shape of the virtual light ray, change a size of the virtual light ray, and/or the like) . The AR application is configured to render the virtual light rays in real-time such that moving the mobile device can cause the AR application to automatically update the rendered virtual light ray. The user may move the mobile device in the environment to trace the trajectory of the virtual light ray from the light source to a point in which the virtual light ray is no longer detectable or has passed out of view of the mobile device.

The individual blocks 504-524 may be executed once or multiple times and in any particular order without departing from the spirit or the scope of this disclosure. Further, blocks 504-524 may execute once or continuously until such time as the AR application is terminated.

It should be appreciated that the specific steps illustrated in FIG. 5 provide a particular method of visualizing light rays with an AR system according to an embodiment of the present invention. Other sequences of steps may also be performed according to alternative embodiments. For example, alternative embodiments of the present invention may perform the steps outlined above in a different order. Moreover, the individual steps illustrated in FIG. 5 may include multiple sub-steps that may be performed in various sequences as appropriate to the individual step. Furthermore, additional steps may be added or removed depending on the particular applications. One of ordinary skill in the art would recognize many variations, modifications, and alternatives.

FIG. 6 illustrates examples of components of a computer system 600 according to certain embodiments. The computer system 600 is an example of the computer system described herein above. Although these components are illustrated as belonging to a same computer system 600, the computer system 600 can also be distributed. The computer system 600 can be integrated into or implemented as a mobile device.

The computer system 600 includes at least a processor 602, a memory 604, a storage device 606, input/output peripherals (I/O) 608, communication peripherals 610, and an interface bus 612. The interface bus 612 is configured to communicate, transmit, and transfer data, controls, and commands among the various components of the computer system 600. The memory 604 and the storage device 606 include computer-readable storage media, such as RAM, ROM, electrically erasable programmable read-only memory (EEPROM) , hard drives, CD-ROMs, optical storage devices, magnetic storage devices, electronic non-volatile computer storage, for example

memory, and other tangible storage media. Any of such computer readable storage media can be configured to store instructions or program codes embodying aspects of the disclosure. The memory 604 and the storage device 606 also include computer readable signal media. A computer readable signal medium includes a propagated data signal with computer readable program code embodied therein. Such a propagated signal takes any of a variety of forms including, but not limited to, electromagnetic, optical, or any combination thereof. A computer readable signal medium includes any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use in connection with the computer system 600. The computer system 600 may further includes a camera, a display, and a depth sensor. The camera is used to capture an image of an environment, for example, the image of environment referred at block 504 of FIG. 5. The display is for displaying, such as presenting virtual light ray at block 524 of FIG. 5. The depth sensor is configured to detect depth data for each of the one or more objects within the image captured at block 504.

Further, the memory 604 includes an operating system, programs, and applications. The processor 602 is configured to execute the stored instructions and includes, for example, a logical processing unit, a microprocessor, a digital signal processor, and other processors. The memory 604 and/or the processor 602 can be virtualized and can be hosted within another computer system of, for example, a cloud network or a data center. The I/O peripherals 608 include user interfaces, such as a keyboard, screen (e.g., a touch screen) , microphone, speaker, other input/output devices, and computing components, such as graphical processing units, serial ports, parallel ports, universal serial buses, and other input/output peripherals. The I/O peripherals 608 are connected to the processor 602 through any of the ports coupled to the interface bus 612. The communication peripherals 610 are configured to facilitate communication between the computer system 600 and other computing devices over a communications network and include, for example, a network interface controller, modem, wireless and wired interface cards, antenna, and other communication peripherals.

While the present subject matter has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing may readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, it should be understood that the present invention has been presented for purposes of example rather than limitation, and does not preclude inclusion of such modifications, variations, and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art. Indeed, the methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the present invention. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the present invention.

Unless specifically stated otherwise, it is appreciated that throughout this specification discussions utilizing terms such as “processing, ” “computing, ” “calculating, ” “determining, ” and “identifying” or the like refer to actions or processes of a computing device, such as one or more computers or a similar electronic computing device or devices, that manipulate or transform data represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the computing platform.

The system or systems discussed herein are not limited to any particular hardware architecture or configuration. A computing device can include any suitable arrangement of components that provide a result conditioned on one or more inputs. Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computer system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more embodiments of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.

Embodiments of the methods disclosed herein may be performed in the operation of such computing devices. The order of the blocks presented in the examples above can be varied-for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel.

Conditional language used herein, such as, among others, “can, ” “could, ” “might, ” “may, ” “e.g., ” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain examples include, while other examples do not include, certain features, elements, and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without author input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular example.

The terms “including, ” “having, ” and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term “or” means one, some, or all of the elements in the list. The use of “adapted to” or “configured to” herein is meant as open and inclusive language that does not foreclose devices adapted to or configured to perform additional tasks or steps. Additionally, the use of “based on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Similarly, the use of “based at least in part on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based at least in part on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.

The various features and processes described above may be used independently of one another, or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of the present invention. In addition, certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed examples. Similarly, the example systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed examples.

Further areas of applicability of the present invention will become apparent from the detailed description provided herein. It should be understood that the detailed description and specific examples, while indicating various embodiments, are intended for purposes of illustration only and are not intended to necessarily limit the scope of the invention.

Claims

A method for visualizing light rays in a scene, comprising:

receiving, by a camera of a mobile device, an image of an environment, the image being a frame of video captured by the camera, wherein the environment includes a light source;

determining, by the mobile device, a geometric composition of one or more objects within the image;

determining, by the mobile device, a material composition of each of the one or more objects using an object classifier;

determining, by the mobile device and using the image of the environment, a position of the light source relative to the image of the environment;

generating, by an augmented-reality application executing on the mobile device, one or more virtual light rays using the geometric composition, the material composition of the one or more objects, and the position of the light source; and

presenting, on a display of the mobile device, the one or more virtual light rays superimposed on the image of the environment.
The method of claim 1, wherein the one or more virtual light rays have a same direction and a same orientation as corresponding light rays generated by the light source.
The method of claim 1, wherein the virtual light rays approximate an interaction between the light rays generated by the light source and the one or more objects based on the material composition of the one or more objects.
The method of any of claims 1 to 3, further comprising:

detecting, by the mobile device, a user gesture; and

modifying, by the mobile device, a direction or orientation of at least one virtual light ray of the one or more virtual light rays based on the user gesture.
The method of any of claims 1 to 4, wherein determining the geometric composition of the one or more objects comprises:

detecting, using a depth sensor, depth data for each of the one or more objects, wherein, for each object of the one or more objects the depth data includes a distance between the camera of the mobile device and a surface of the object; and

generating, by the mobile device, a three-dimensional reconstruction of the one or more objects using the depth data, the three-dimensional reconstruction defining a wireframe representation of each object of the one or more objects.
The method of any of claims 1 to 5, wherein determining the material composition of each of the one or more objects comprises:

executing, by the mobile device using the object classifier, a semantic segmentation of the image, wherein the semantic segmentation defines one or more semantic segments within the image, each object of the one or more objects being encapsulated within a semantic segment; and

classifying, using a neural network, each semantic segment of the one or more semantic segments as being specular, diffuse, or a combination thereof.
The method of any of claims 1 to 6, wherein determining the position of the light source is performed using a neural network.
The method of any of claims 1 to 7, further comprising:

generating, by the mobile device, a particular virtual light ray of the one or more virtual light rays to trace a first light ray from the light source to a particular surface of a particular object of the one or more objects; and

generating, by the mobile device, one or more additional virtual light rays as reflection of the first light ray off of the particular surface.
The method of claim 8 wherein the one or more additional virtual light rays are represented using a different color or shape from the particular virtual light ray.
A mobile device comprising:

a camera;

a display;

one or more processors; and

one or more memories storing computer-readable instructions that, upon execution by the one or more processors, configure the mobile device to:

receive, by the camera, an image of an environment, the image being a frame of video captured by the camera, wherein the environment includes a light source;

determine a geometric composition of one or more objects within the image;

determine a material composition of each of the one or more objects using an object classifier;

determine, using the image of the environment, a position of the light source relative to the image of the environment;

generate, by an augmented-reality application executing on the mobile device, one or more virtual light rays using the geometric composition, the material composition of the one or more objects, and the position of the light source; and

present, on the display, the one or more virtual light rays superimposed on the image of the environment.
The mobile device of claim 10, wherein the one or more virtual light rays have a same direction and a same orientation as corresponding light rays generated by the light source.
The mobile device of claim 10, wherein the virtual light rays approximate an interaction between the light rays generated by the light source and the one or more objects based on the material composition of the one or more objects.
The mobile device of claim 10, further comprises a depth sensor, wherein the mobile device configured to determine the geometric composition of the one or more objects is configured to:

detect, using the depth sensor, depth data for each of the one or more objects, wherein, for each object of the one or more objects the depth data includes a distance between the camera of the mobile device and a surface of the object; and

generate, a three-dimensional reconstruction of the one or more objects using the depth data, the three-dimensional reconstruction defining a wireframe representation of each object of the one or more objects.
The mobile device of claim 10 wherein the mobile device configured to determine the material composition of each of the one or more objects is configured to:

execute, using the object classifier, a semantic segmentation of the image, wherein the semantic segmentation defines one or more semantic segments within the image, each object of the one or more objects being encapsulated within a semantic segment; and

classify, using a neural network, each semantic segment of the one or more semantic segments as being specular, diffuse, or a combination thereof.
The mobile device of claim 10 wherein determining the position of the light source is performed using a neural network.
One or more non-transitory computer-storage media storing instructions that, upon execution on a computer system, cause the computer system to perform operations comprising:

receiving, by a camera of a mobile device, an image of an environment, the image being a frame of video captured by the camera, wherein the environment includes a light source;

determining, by the mobile device, a geometric composition of one or more objects within the image;

determining, by the mobile device, a material composition of each of the one or more objects using an object classifier;

determining, by the mobile device and using the image of the environment, a position of the light source relative to the image of the environment;

generating, by an augmented-reality application executing on the mobile device, one or more virtual light rays using the geometric composition, the material composition of the one or more objects, and the position of the light source; and

presenting, on a display of the mobile device, the one or more virtual light rays superimposed on the image of the environment.
The one or more non-transitory computer-storage media of claim 16, wherein the one or more virtual light rays have a same direction and a same orientation as corresponding light rays generated by the light source.
The one or more non-transitory computer-storage media of claim 16, wherein the virtual light rays approximate an interaction between the light rays generated by the light source and the one or more objects based on the material composition of the one or more objects.
The one or more non-transitory computer-storage media of any of claims 16 to 18, wherein the operations further comprise:

detecting, by the mobile device, a user gesture; and

modifying, by the mobile device, a direction or orientation of at least one virtual light ray of the one or more virtual light rays based on the user gesture.
The one or more non-transitory computer-storage media of any of claims 16 to 19, wherein determining the position of the light source is performed using a neural network.
A computer system comprising:

one or more processors; and

one or more memories storing computer-readable instructions that, upon execution by the one or more processors, configure the one or more processors to:

receive, by a camera of a mobile device, an image of an environment, the image being a frame of video captured by the camera, wherein the environment includes a light source;

determine a geometric composition of one or more objects within the image;

determine a material composition of each of the one or more objects using an object classifier;

determine, using the image of the environment, a position of the light source relative to the image of the environment;

generate, by an augmented-reality application executing on the mobile device, one or more virtual light rays using the geometric composition, the material composition of the one or more objects, and the position of the light source; and

present, on a display of the mobile device, the one or more virtual light rays superimposed on the image of the environment.