CN116325720A

CN116325720A - Dynamic resolution of depth conflicts in telepresence

Info

Publication number: CN116325720A
Application number: CN202080106226.3A
Authority: CN
Inventors: 埃里克·巴丘克; 丹尼尔·E·菲什
Original assignee: Google LLC
Current assignee: Google LLC
Priority date: 2020-10-21
Filing date: 2020-12-16
Publication date: 2023-06-23
Also published as: US20230396750A1; WO2022086580A1; EP4233310A1

Abstract

Systems and methods are described for determining a capture volume associated with image content captured by at least one camera, determining a depth associated with the image content, defining a viewing range in which stereoscopic effects are depicted when viewing the image content, determining a depth conflict between the image content and a boundary associated with the viewing range, the determining including detecting that at least a portion of the image content extends beyond the boundary associated with the viewing range and resolving the depth conflict of the at least a portion using the viewing range and at least one user interface element, and generating modified image content for rendering using the resolved depth conflict.

Description

Dynamic resolution of depth conflicts in telepresence

Cross Reference to Related Applications

The present application claims priority and benefit from U.S. provisional patent application No.63/198,473 entitled "DYNAMIC RESOLUTION OF DEPTH CONFLICTS IN TELEPRESENCE (dynamic resolution of depth conflict in telepresence)" filed on month 10 and 21 of 2020, the disclosure of which is incorporated herein by reference in its entirety.

Technical Field

The present specification relates generally to methods, apparatus, and algorithms for resolving depth conflicts in three-dimensional (3D) telepresence systems.

Background

Stereoscopic display devices typically provide content to a viewer and convey depth perception. Such displays may include a hemmed frame around the display screen that may unnaturally cut off a view of a portion of content to be rendered on the display screen. Cutting the view may create conflicting visual cues for the viewer. Such conflicts can impair the 3D effect, which can lead to visual fatigue for the viewer.

Disclosure of Invention

A system having one or more computers may be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes the system to perform the actions. One or more computer programs may be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions.

In one general aspect, systems and methods are described that utilize at least one processing device to perform operations including: determining a capture volume associated with image content captured by at least one camera; determining a depth associated with the image content; defining a viewing range within the capture volume based on depth in which stereoscopic effects are depicted when viewing image content; determining a depth conflict between the image content and a boundary associated with the viewing range, the determining comprising: detecting that at least a portion of the image content extends beyond a boundary associated with the viewing range, and in response to determining the depth conflict, resolving the depth conflict of the at least a portion using the viewing range and the at least one user interface element; and generating, with the resolved depth conflict, modified image content for rendering, the modified image content including a portion of the image content replaced by the at least one user interface element.

These and other aspects may include one or more of the following, alone or in combination. For example, the system and method may include: detecting a depth conflict between at least a portion of the image content and a boundary associated with the viewing range includes: a plurality of three-dimensional voxels representing locations in a plane of a display rendering the image content are generated using at least some of the determined depths associated with the image content, the distances being from at least a portion to a boundary, wherein at least one user interface element is selected based on the distances.

In some implementations, the boundary is associated with at least one edge of a lenticular (lenticular) display device; determining a depth conflict based on a tracked head position of a user viewing image content at a remote lenticular display device; resolving depth conflicts includes: the user interface element is resized based on the tracked user head position of the user.

In some implementations, resolving the depth conflict includes: at least one user interface element is generated as a box overlaying at least some of the image content, the at least one box being adapted to accommodate movement depicted in the image content. In some implementations, a side of the frame corresponding to at least a portion extending beyond the boundary is placed in a different plane parallel to and forward of the remainder of the frame to generate a visually perceived tilt of the frame from perpendicular to a non-zero angle to perpendicular.

In some implementations, the at least one user interface element depicts a user interface layer of the thumbnail image with an additional software program that is executed in the memory by the at least one processing device upon accessing the image content addition. In some implementations, the user interface element includes a blur overlay that begins at a boundary and ends at a predefined location associated with a size of a display device depicting the image content, wherein a blur radius associated with the blur overlay is increased at a threshold distance from the boundary.

In some implementations, the blur overlay includes a user interface layer of the thumbnail image with an additional software program that is executed in the memory by the at least one processing device upon access to the image content addition, and the blur overlay is elliptical. In some implementations, the blur overlay is a gradient blur that tapers from a left center portion of the overlay to a left edge of the image content and from a right center portion of the overlay to a right edge of the video content. In some implementations, the gradient blur is placed at a central location associated with the depth conflict and gradually blurry outward to a first edge and a second edge associated with the depth conflict. In some implementations, resolving the depth conflict includes animating at least one user interface element to hide at least a portion of the image content with the modified image content.

The above-described systems and aspects may be configured to perform any combination of the above-described aspects, where each aspect may be implemented with any suitable combination of the features and aspects listed above.

Implementations of the described technology may include hardware, methods, or processes on a computer-accessible medium, or computer software. The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.

Drawings

Fig. 1 is a block diagram illustrating an example 3D content system for displaying image content on a display device according to an implementation described throughout this disclosure.

FIG. 2 is a block diagram of an example system for dynamically resolving depth conflicts for stereoscopic displays according to an implementation described throughout this disclosure.

FIG. 3 is an example display device illustrating depth conflicts for a user according to implementations described throughout this disclosure.

Fig. 4 is a block diagram illustrating an example of a local capture volume and a range of movement within the capture volume according to an implementation described throughout this disclosure.

Fig. 5 is a block diagram illustrating an example of a remote capture volume versus a local capture volume according to an implementation described throughout this disclosure.

Fig. 6 is a block diagram illustrating an example of display edge clipping of a capture volume according to an implementation described throughout this disclosure.

Fig. 7A to 7C are block diagrams illustrating examples of visually perceived tilting of a display device according to implementations described throughout this disclosure.

Fig. 8A-8B are block diagrams illustrating examples of resolution and depth conflict with composite image content according to implementations described throughout this disclosure.

Fig. 9A-9B are block diagrams illustrating an example of resolving depth conflicts by dynamically adjusting a display window of a display device according to implementations described throughout this disclosure.

Fig. 10A-10G are block diagrams illustrating an example of resolving depth conflicts by adjusting boundaries and/or edges according to implementations described throughout this disclosure.

Fig. 11 is a block diagram illustrating an example of resolving depth conflicts using segmented virtual content according to an implementation described throughout this disclosure.

Fig. 12 is a block diagram illustrating an example of application content placed on virtual content according to an implementation described throughout this disclosure.

Fig. 13 is a flowchart illustrating one example of a process of resolving depth conflicts in a 3D content system according to an implementation described throughout this disclosure.

FIG. 14 illustrates an example of a computer device and a mobile computer device that may be used with the techniques described here.

Like reference symbols in the various drawings indicate like elements.

Detailed Description

In general, this document describes examples related to detecting, analyzing, and correcting depth conflicts within three-dimensional (3D) video content. Depth conflicts may occur with respect to User Interface (UI) elements within the video content. For example, in 3D video content (e.g., stereoscopic video), the user perceived depth may vary within a portion of the video. Such depth variations may result in depth conflicts between portions of the video, UI elements in the video, and/or at the selvedges associated with the content in the video and/or the display depicting the content.

Depth conflicts can cause discomfort to the user, which can present difficulties to the user in focusing or panning through portions of the image content and/or focus within the UI elements. For example, depth conflicts may occur if a portion of the user is cut off at the edge of a display screen depicting image content based on the camera capturing such content having the largest capture space. The disappearance of a hand (or other object or user portion) beyond the edge of the display may be inconsistent with what the user's eyes (i.e., brain) are expected to occur. In particular, movement beyond an edge (or other defined boundary) may cause depth conflicts that include artifacts that result in depth cues to the user conflicting with the intended result of the movement in the content (i.e., as perceived by the user's brain).

The systems and methods described herein are configured to maximize the comfort zone of a user by minimizing or eliminating depth conflicts. Maximizing the comfort zone for the user may include: the depth of the area in front of and behind the display screen is evaluated in order to place the UI elements within the 3D video (or other image content) such that the placed UI elements adhere to depth rules and/or minimize violations of such rules. In some implementations, the systems and methods described herein may reduce or eliminate depth conflicts by generating and rendering transparent, translucent, blurred, partially blurred, blurred UI elements according to gradients, and the like.

The systems and methods described herein may provide several advantages over conventional video rendering systems. For example, the systems and methods described herein may dynamically modify the depth of UI elements or objects depicted as image content and/or video content based on the depth of other objects within the 3D video. Unlike conventional systems that remove content from video when the content causes a depth conflict, the systems and methods described herein are used to improve the view of the content that may cause a depth conflict, as described throughout the examples of the present disclosure.

As used herein, depth may refer to: perceived distance from the location of the content depicted on the display screen. As used herein, depth cues may refer to: an indication or indication of a distance that aids in visual depth perception that a user can understand through the eye. Example depth cues may include any or all of vergence, monocular motion parallax, binocular parallax, linear perspective, texture gradients, blending, retinal image size, overlay, chromaticity, shading, and spatial perspective.

As used herein, depth conflict may refer to: user perceived discomfort between depth cues. For example, the user may perceive depth based on any number of depth cues corresponding to the user's field of view. Depth conflicts may occur when two or more such depth cues are uncoordinated with each other. Example depth conflicts may include, but are not limited to, near conflicts, far conflicts, picture conflicts, and/or occlusion and stereoscopic conflicts.

Example corrections to the depth conflict may include, but are not limited to, eliminating the depth conflict, modifying pixels or voxels to modify the depth conflict, reducing the depth conflict, and/or generating and/or moving content to reduce or eliminate the depth conflict, etc. In some implementations, the systems and methods described herein select one or more depth conflict corrections from any number of depth conflict correction techniques. For example, the systems and methods described herein may combine two or more UI elements to correct depth conflicts.

In some implementations, for example, the techniques described herein may be used to synthesize depth correction images that appear accurate and realistic to display on a screen of a 2D or 3D display used in a multi-path video conference. The techniques described herein may be used to generate and display accurate and realistic views of users, objects, and UI content (e.g., image content, video content) and correct user movements or 3D display conflicts.

Fig. 1 is a block diagram illustrating an example 3D content system 100 for displaying content in a stereoscopic display device according to an implementation described throughout this disclosure. The 3D content system 100 may be used by one or more users to: such as video conferencing communication with 3D (e.g., telepresence session) view content on a single 3D display or other device. In general, the system of FIG. 1 may be used to: video and/or images of users and/or objects are captured during a video conference and depth conflicts that may occur in the display of users, objects, and/or other additional UI content are corrected using the systems and techniques described herein.

The system 100 may benefit from the use of the techniques described herein because such techniques may generate, modify, update, and display corrected (e.g., digested depth) views of the capture volume associated with a particular display screen device. In some implementations, the capture volume may be used to determine how to correct the depth conflict. For example, the digestion depth of the view may be displayed to another user in 2D and/or 3D via the system 100.

As used herein, capture volume may refer to: a physical volume of space that may be bounded by one or more boundaries imposed by one or more cameras capturing image/video content within the defined one or more boundaries. In some implementations, the capture volume may refer to: a viewing volume in which a user may be continuously tracked by multiple image sensors (e.g., cameras).

As shown in fig. 1, the 3D content system 100 is being used by a first user 102 and a second user 104. For example,

users

102 and 104 are using 3D content system 100 to participate in a 3D telepresence session. In such examples, 3D content system 100 may allow each of

users

102 and 104 to see a highly realistic and visually coordinated representation of the other user, thereby facilitating user interaction in a manner similar to that of physically presenting each other. The system 100 may access a depth conflict resolver to improve, correct, reduce, or otherwise modify depth conflicts that may occur during a session of a 3D telepresence session.

Each

user

102, 104 may have a corresponding 3D system. Here, user 102 has 3D system 106 and user 104 has 3D system 108. The

3D systems

106, 108 may provide functionality related to 3D content including, but not limited to: capturing an image for 3D display, processing and presenting image information, and processing and presenting audio information. The 3D system 106 and/or the 3D system 108 may constitute: a collection of sensing devices integrated into one unit. The 3D system 106 and/or the 3D system 108 may include some or all of the components described with reference to fig. 2 and 14.

The 3D content system 100 may include one or more 2D or 3D displays. Here, 3D display 110 is provided for 3D system 106 and 3D display 112 is provided for 3D system 108. The 3D displays 110, 112 may use any of a variety of types of 3D display technology to provide an autostereoscopic view to a respective viewer (here, e.g., user 102 or user 104). In some implementations, the 3D displays 110, 112 may be stand-alone units (e.g., self-supporting or wall-mounted). In some implementations, the 3D displays 110, 112 may include or utilize wearable technology (e.g., controllers, head mounted displays, smart glasses, watches, etc.). In some implementations, the

displays

110, 112 may be 2D displays.

In general, a display such as

displays

110, 112 may provide an image that approximates the 3D optical characteristics of a physical object in the real world without the use of a Head Mounted Display (HMD) device. Generally, the displays described herein include flat panel displays, lenticular lenses (e.g., microlens arrays), and/or parallax barriers to redirect images to a plurality of different viewing regions associated with the display.

In some implementations, the

displays

110, 112 may include high resolution and glasses-free lenticular 3D displays. For example, the

displays

110, 112 may include a microlens array (not shown) that includes a plurality of lenses (e.g., microlenses), with glass spacers coupled (e.g., bonded) to the microlenses of the display. The microlenses may be designed such that, from a selected viewing position, a left eye of a user of the display can view a first set of pixels while a right eye of the user can view a second set of pixels (e.g., wherein the second set of pixels is mutually exclusive to the first set of pixels).

In some example displays, there may be a single location that provides a 3D view of image content (e.g., user, object, content, etc.) provided by such displays. The user may sit in a single position to experience proper parallax, minimal distortion, and a true 3D image. If the user moves to a different physical location (or changes head position or eye gaze position), image content (e.g., the user, objects worn by the user, and/or other objects) may begin to appear less authentic, 2D, and/or distorted. The systems and techniques described herein may reconfigure image content projected from a display to ensure that a user can move around while still experiencing proper parallax, low distortion rate, minimal depth conflicts, and real-time real 3D images. Accordingly, the systems and techniques described herein provide the following advantages: the 3D image content and objects are maintained and provided for display to the user independent of any user movement that occurs while the user is viewing the 3D display.

As shown in fig. 1, the 3D content system 100 may be connected to one or more networks. Here, network 114 is connected to 3D system 106 and to 3D system 108. The network 114 may be a publicly available network (e.g., the internet) or a private network, just to name a few examples. The network 114 may be wired or wireless, or a combination of both. The network 114 may include or utilize one or more other devices or systems, including but not limited to one or more servers (not shown).

The

3D systems

106, 108 may include a number of components related to the capturing, processing, transmitting or receiving of 3D information and/or the rendering of 3D content. The

3D systems

106, 108 may include one or more cameras for capturing image content of images to be included in the 3D presentation. Here, 3D system 106 includes

cameras

116 and 118. For example, the cameras 116 and/or 118 may be disposed substantially within the housing of the 3D system 106 such that the objective lenses or lenses of the respective cameras 116 and/or 118 capture image content through one or more openings in the housing. In some implementations, the cameras 116 and/or 118 may be separate from the housing, such as in the form of a stand-alone device (e.g., where wired and/or wireless connection to the 3D system 106).

Cameras

116 and 118 may be positioned and/or oriented to capture a sufficiently representative view of a user (e.g., user 102).

The placement of

cameras

116 and 118 may be arbitrarily chosen, as

cameras

116 and 118 will not normally obscure the view of 3D display 110 to user 102. For example, one of the

cameras

116, 118 may be located somewhere above the face of the user 102, while the other camera may be located somewhere below the face. For example, one of the

cameras

116, 118 may be located somewhere on the right side of the face of the user 102, while the other camera may be located somewhere on the left side of the face. For example, 3D system 108 may include

cameras

120 and 122 in a similar manner. Additional cameras may be used. For example, a third camera may be placed near or behind the display 110.

In some implementations, the

3D systems

106, 108 may include one or more depth sensors to capture depth data to be used in the 3D presentation. Such a depth sensor may be considered part of a depth capture component in 3D content system 100 for characterizing a scene captured by 3D systems 106 and/or 108 in order to properly represent the scene on the 3D display. Further, the system may track the position and orientation of the viewer's head so that the 3D presentation may be rendered in an appearance corresponding to the current point of view of the viewer. Here, the 3D system 106 includes a depth sensor 124. In a similar manner, 3D system 108 may include depth sensor 126. Any of a variety of types of depth sensing or depth capturing may be used to generate and/or modify depth data.

In some implementations, auxiliary stereoscopic depth capture is performed. For example, a scene may be illuminated using light points, and stereo matching may be performed between two respective cameras. The irradiation may be performed using waves of a selected wavelength or range of wavelengths. For example, infrared (IR) light may be used. In some implementations, for example, when generating a view on a 2D device, a depth sensor may not be utilized.

The depth data may include or be based on any information about the scene that reflects a distance between a depth sensor (e.g., depth sensor 124) and an object or UI element in the scene. For content in an image corresponding to an object in a scene, the depth data reflects the distance (or depth) to the object. For example, the spatial relationship between the camera and the depth sensor may be known and may be used to correlate the image from the camera with the signal from the depth sensor to generate depth data for the image.

As shown in fig. 1, the system 100 may include or have access to an image management system 140. Image management system 140 may obtain or otherwise access and/or store image content, video content, algorithms, and/or UI content for provision and rendering on a display screen. The image management system 140 includes a depth conflict resolver 142, a range detector 144, and a UI generator 146. Depth conflict resolver 142 may include any number of algorithms for generating and/or modifying UI elements, for example, to resolve depth conflicts for a user viewing content on 3D display 110 or 3D display 112. Depth conflict resolver 142 may use UI generator 146 to generate UI elements for mitigating, resolving, minimizing, or otherwise modifying perceived depth conflicts.

For example, the range detector 144 may determine a capture volume and comfort range of a user associated with the local 3D system 106. Similarly, for example, the remote range detector may determine a capture volume and comfort range of a user associated with the remote 3D system 108. For example, the range detector 144 is configured to: a line of sight of a user viewing the content is determined and a place where the particular content can be cropped from the edge of the display is determined. This determination may be used to determine whether a depth conflict is likely to occur. A remote range detector (not shown) may perform similar functions for remote device access.

The images captured by the 3D content system 100 may be processed and then displayed as a 3D presentation. As depicted in the example of fig. 1, the 3D image 104' is presented on the 3D display 110. In this way, the user 102 may perceive the 3D image 104' as a 3D presentation of the user 104, and the user 104 may be remote from the user 102. The 3D image 102' is presented on the 3D display 112. In this way, the user 104 may perceive the 3D image 102' as a 3D presentation of the user 102.

The 3D content system 100 may allow participants (e.g., users 102, 104) to communicate with each other and/or others in audio. In some implementations, the 3D system 106 includes a speaker and microphone (not shown). For example, 3D system 108 may similarly include a speaker and a microphone. In this way, 3D content system 100 may allow

users

102 and 104 to conduct 3D telepresence sessions with each other and/or with other people. In general, the systems and techniques described herein may work with system 100 to generate image content and/or video content for display between users of system 100.

FIG. 2 is a block diagram of an example system for dynamically resolving depth conflicts for stereoscopic displays according to an implementation described throughout this disclosure. The system 200 may be used as or included in one or more implementations described herein, and/or may be used to perform operations for one or more examples of composition, processing, modification, or presentation of image content described herein. The overall system 200 and/or one or more of its various components may be implemented according to one or more examples described herein.

The system 200 may include one or more 3D systems 202. In the depicted example,

3D systems

202A, 202B, through 202N are shown, where index N indicates any number. The 3D system 202 may provide for the capture of visual and audio information for 2D or 3D presentations and forward the 2D or 3D information for processing. Such information may include images (e.g., images and/or video) of the scene, depth data about the scene, and audio from the scene. For example, the 2D/3D system 202 may be used as or included within the system 106 and the 2D/3D display 110 (FIG. 1).

The system 200 may include a plurality of cameras, as indicated by camera 204. Any type of light sensing technology may be used to capture images, such as the type of image sensor used in conventional digital cameras. The cameras 204 may be of the same type or of different types. For example, the camera location may be placed anywhere on a 3D system such as system 106. In some implementations, the camera 204 may include a plurality of stereoscopic cameras.

The system 202A includes a depth sensor 206. In some implementations, the depth sensor 206 operates by propagating IR signals onto the scene and detecting response signals. For example, depth sensor 206 may generate and/or detect light beams 128A-B and/or 130A-B. In some implementations, the depth sensor 206 is an optional component, for example, in 2D video conferencing applications that do not utilize depth sensing. In some implementations, the depth sensor 206 of any of the systems 202 can send and receive depth (e.g., depth 232) to the server 214 (e.g., executing the image management system 140). The system 202A also includes at least one microphone 208 and a speaker 210. In some implementations, the microphone 208 and speaker 210 may be part of the system 106.

The system 202 additionally includes a 3D display 212 that can present 3D images. In some implementations, the 3D display 212 may be a stand-alone display. In some implementations, the 3D display may be a lenticular display. In some implementations, the 3D display 212 operates using parallax barrier technology. For example, the parallax barrier may comprise parallel vertical strips of substantially non-transparent material (e.g. opaque film) placed between the screen and the viewer. Different portions of the screen (e.g., different pixels) are viewed by the respective left and right eyes due to parallax between the respective eyes of the viewer. In some implementations, the 3D display 212 operates using lenticular lenses. For example, alternating rows of lenses may be placed in front of the screen, which align the light from the screen toward the left and right eyes of the viewer, respectively.

The system 200 includes: the server 214 may perform particular tasks for data processing, data modeling, data coordination, and/or data transmission. Server 214 and/or its components may include some or all of the components described with reference to fig. 14. In general, the server 214 may receive information from the tracking module 216, which tracking module 216 may include a head/eye tracker 218, a hand tracker 220, and/or a movement detector 222, any of which may be received from any of the 2D/3D systems 202. Server 214 may receive such tracking information in order to correct, eliminate, reduce, or otherwise modify particular detected depth conflicts within image content captured by system 202.

As shown in fig. 2, server 214 includes image management system 140. The image management system 140 may generate 2D and/or 3D information in the form of image content, video content, and/or other UI content. This may include receiving such content (e.g., from 3D system 202A), processing the content, and/or forwarding the (processed and depth corrected) content to another participant (e.g., to another one of 3D systems 202). In some implementations, the image management system 140 may enable delivery of image and/or video content to a user via a display device of a computing device.

Image management system 140 includes a depth conflict resolver (e.g., such as depth conflict resolver 142), a UI generator (e.g., such as UI generator 146), and UI element data 226. Depth conflict resolver 142 may analyze the capture volume size using capture volume detector 228. For example, depth conflict resolver 142 may also analyze the range between UI elements, and/or the range between a user viewing the content and the depicted content using range detector 144.

Depth conflict resolution 142 may generate and modify particular image content 234 received from any of 2D/3D systems 202 to ensure that image content 234 is rendered for system 202 with appropriate depth perception. For example, system 202A may send image content (e.g., video of a user) during a remote presentation session with a user of system 202B. The system 202B may evaluate (e.g., track) the position of the head or eyes of the user of the system 202A and/or 202B to generate UI content 236, virtual content 238, and/or visual effects 240. Generating

such content

236, 238, and/or 240 may include: image content 234 provided by one or more of the systems 202 is accessed and such accessed image content 234 is modified with content 236 to 240 to ensure a comfortable viewing environment for a user of any of the systems 202. Modifications to image content 234 with a range of elements 236-240 (generated by UI generator 146) may take into account specific UI element ranges 242, UI element depths 244, and voxels 246, which will be described in detail below.

In some implementations, depth conflict resolver 142 may analyze image content (e.g., stereoscopic video content) to determine depth 232 associated with image content 234. In some implementations, determining such a depth 232 may include: the correspondence between the left eye view and the right eye view of each image frame of the image content 234 is estimated using a light flow technique. Depth 232 may be determined relative to pixels associated with a particular image frame (and/or generated voxels 246). For example, the depth sensor 206 and/or camera 204 may detect a particular distance between the boundary 230, an object, a portion of an object, a UI element, or other captured content in an image and/or video. Depth 232 may be estimated, calculated, or otherwise determined in real-time or near real-time.

In some implementations, the system 200 may define a viewing range within a particular capture volume based on the determined depth associated with the captured image content. The viewing range defines the volume that depicts the stereoscopic effect when viewing the captured image content. In some implementations, the viewing range can refer to a portion of the display screen. In some implementations, if such a screen provides a stereoscopic effect throughout the screen, the viewing range may refer to the entire display screen.

In some implementations, the viewing range and depth may be used to generate a voxel that defines or models a particular environment associated with the display image content. For example, each voxel may represent a cube inside the 3D model, the cube containing a location inside the 3D mesh and a single color value. Each point in the environment may be represented as a voxel comprising volume symbol data. Surfaces and boundaries associated with environments and objects (e.g., UI elements, virtual content, etc.) may be rendered by extracting isosurfaces from the volume symbol data. When image content (e.g., UI elements, objects, and/or surfaces) changes locations within the environment, the content may be re-rendered in order to update the 3D model of the environment. For example, the image management system 140 may iteratively generate a surface mesh that represents the volume symbol data, and the surface mesh may be updated as the volume symbol data is updated. Similar updates may trigger 3D model updates when new or updated depth information about the environment becomes available.

At some point during playback of the image content (e.g., streaming, video session, etc.), the image management system 140 may detect depth conflicts occurring in the image content. For example, depth conflict resolver 142 may detect a depth conflict between image content and a boundary associated with a viewing range. Such detection may include: it is determined that at least a portion of the image content (e.g., a user's hand) extends beyond a boundary associated with the viewing range (e.g., a bottom edge of the display screen). In some implementations, the boundary may refer to an edge of the capture volume.

In response to determining the depth conflict, the depth conflict for at least a portion (e.g., a hand) may be resolved using the viewing range and the at least one user interface element. For example, detecting a depth conflict between at least a portion of the image content (e.g., a hand) and a boundary (bottom edge of the display screen) may include: at least some of the determined depths associated with the image content are used to generate a plurality of three-dimensional voxels representing respective locations in a plane depicting at least a portion of the display. The determined depth may relate to the portion of the hand that is severed when the hand extends beyond the boundary edge of the display. The UI element may be a box element sized based on the determined distance from the hand to the boundary. Such a distance may be selected to ensure that the hand is hidden, thereby removing depth conflicts from the view of the user viewing the image content. The box may represent UI content 236, which UI content 236 may be generated and displayed within the depicted image content via software. UI content 236 (i.e., UI elements) may include virtual content 238 or virtual content 238, which virtual content 238 overlays, underlies, or otherwise merges against other objects or UI content within the image content. In some implementations, the UI elements (e.g., UI content 236) may include virtual content, fuzzy content, inserted content, and/or other user interface effects for mitigating, resolving, and/or reducing depth conflicts. Visual effects 240 may be applied to further reduce, correct, or eliminate depth conflicts.

The image management system 140 may use the resolved depth conflict to generate modified image content. The modified image content may include a portion of the image content replaced by at least one user interface element. In the above example, the UI element may include a box. Other examples are of course possible, some of which are described in detail throughout this disclosure. The modified image content may be rendered and presented to the user as rendered content 250.

The above exemplary components are described herein as being implemented in server 214, which server 214 may communicate with one or more of 3D systems 202 over network 260 (network 260 may be similar or identical to network 114 in fig. 1). In some implementations, the image management system 140 and/or components thereof may alternatively or additionally be implemented in some or all of the 3D system 202. For example, the above-described depth conflict correction and/or related processing may be performed by the system creating the 3D information before forwarding the 3D information to one or more receiving systems. As another example, the authoring system may forward the image, modeling data, pixels, voxels, depth data, and/or corresponding information to one or more receiving systems that may perform the above-described processing. A combination of these methods may be used.

System 200 is an example of a system that includes a camera (e.g., camera 204), a depth sensor (e.g., depth sensor 206), and a 3D content generator (e.g., image management system 140) having a processor that executes instructions stored in memory. Such instructions may cause the processor to identify image content in an image of a scene included in the 3D information using depth data included in the 3D information (e.g., by a depth processing component). The processor may generate modified 3D information by detecting depth conflicts and correcting (e.g., resolving) or minimizing such depth conflicts. For example, the modified 3D information may be generated from the UI elements 226 and the image content 234 and may be provided to the UI generator 146 to appropriately generate the rendered content 250. Rendered content 250 may be provided to one or more systems 202

The rendered content 250 represents a 3D stereoscopic image (or video portion) of a particular object (e.g., a user image) based at least in part on modifications generated by the image management system 140, the 3D stereoscopic image (or video portion) having appropriate parallax, corrected or eliminated depth conflicts, and a viewing configuration of both eyes associated with a user accessing the display device, as described herein.

In some implementations, the server 214 and a processor (not shown) of the system 202 may include (or be in communication with) a Graphics Processing Unit (GPU). In operation, a processor may include (or may access memory, storage, and other processors (e.g., CPUs)). To facilitate graphics generation and image generation, the processor may communicate with a GPU to display images on a display device (e.g., display 212). The CPU and GPU may be connected via a high speed bus such as PCI, AGP, or PCI-Express. The GPU may be connected to the display through another high speed interface such as HDMI, DVI, or DisplayPort. Typically, GPUs may render image content in pixels. The display 212 may receive image content from the GPU and may display the image content on a display screen.

FIG. 3 is an example display device illustrating depth conflicts for a user according to implementations described throughout this disclosure. Here, lenticular display 300 depicts user 302 during a 3D video conference session. The display 300 is associated with a particular capture volume that the remote user can best view the 3D representation of the user 302. In a 3D representation of a videoconference session, the view of user 302 may extend beyond the bezel of the display at edge 304, which may result in depth conflicts for users viewing the content shown in display 300. For example, a depth conflict may occur at edge 304 because the user's hand may appear to be severed at edge 304 in a 2D representation, but the digital voxel representation of the hand may appear to extend beyond edge 304 in a 3D representation. Because the hand extends beyond the confines of the lenticular display from the user's current point of view, the system described herein may not be able to generate and display the hand. Thus, the arm of user 302 appears to have no hand in 3D, as shown by

depth conflicts

306, 308, 310, and 312. Such views may cause interruption or conflict of what the viewing user expects to see. Similar depth conflicts may occur at any edge of the display 300, for example, if the user 302 moves, stands, etc. during a conversation. Such movement may cause one or more portions of the user to extend beyond the wrapping of the display 300. The system 200 may correct, minimize, or remove such depth conflicts.

Fig. 4 is a block diagram illustrating an example of a local capture volume and a range of movement within the capture volume according to an implementation described throughout this disclosure. The depicted view shows a representative display screen volume 402 with a particular capture volume 404 in the y-z plane. The capture volume 404 represents a local capture volume generated based on triangulated views of captured data from at least three stereoscopic camera cartridges (e.g., cameras 204). The capture volume 404 includes an optimal viewing range 406. Such a range 406 may be determined using the camera 204, the depth sensor 206, and/or the range detector 144.

The user displayed on the capture volume 402 is represented by a head 408 and a hand 410. In this example, the hand 410 is partially captured within the volume 402. In this example, the capture volume is shown as being a distance 412 away from a remote user (not shown). The capture volume is shown at height 414. Such a measurement is configurable depending on the display screen size and the capture volume defined by the camera capturing the image content.

Fig. 5 is a block diagram illustrating an example of a remote capture volume 502 relative to a local capture volume according to an implementation described throughout this disclosure. For example, by flipping the capture volume 402 in the z-plane and translating the volume from about 1.25 meters to about 1.35 meters, the remote capture volume 502 may be approximated to calculate the depth conflict. The remote capture volume 502 includes an optimal viewing range 504. The remote user is approximated by a head 506 and a hand 508. Similar to hand 410, the remote user's hand 412 is also trimmed to the bottom edge of the capture (approximated by capture volume 502). The clipping may result in depth conflicts for users viewing the remote user (i.e., represented by head 506 and hand 508). In this example, the two represented

capture volumes

402 and 502 are placed a distance 412 from each other. Distance 412 may simulate a user standing about four to eight feet from another user to participate in a conversation as if the user were physically present. Content captured and displayed on the representative display screen volume 402 may be displayed to a remote user to replicate such physical presence. Similarly, content captured and displayed on the representative display screen volume 502 may be displayed to the user of the screen volume 402 to replicate the same physical presence.

Fig. 6 is a block diagram illustrating an example of cropping a display edge of a capture volume according to an implementation described throughout this disclosure. In this example, a user line of sight 602 is determined (i.e., based on the user's head 408). The line of sight 602 indicates that the entire hand 508 of the remote user (when viewed by the user's head 408) may be cropped by the display edge of the display represented by the volume 502. In this case, a depth conflict may occur at the bottom edge of the display represented by volume 502.

The image management system 140 may generate a solution to resolve or minimize the detected depth conflict. For example, the system 140 may generate a dark window or box to hide depth conflicts. In some implementations, hiding the depth conflict includes: depth conflicts are resolved by animating at least one user interface element to hide at least a portion of the image content with the modified image content. In some implementations, a floating black bar 604 may be rendered at z=0.4 meters along the bottom edge of the display screen, creating a perception that the subject (e.g., the object (e.g., bar 604) between the user's head 408 and the viewer) is clipping the view, but because the bar 604 is dark and embeds the view, the user does not perceive depth conflicts.

Fig. 7A to 7C are block diagrams illustrating examples of visually perceived tilting of a display device according to implementations described throughout this disclosure. In fig. 7A, another perspective view of bar 604 is shown by bar 702. The top edge of the black bar 702 may be perceived as falling at z=0.4 meters because of how the crop changes as the head 408 moves. The bottom edge of the volume 502 may be ambiguous and because there is no internal detail of the parallax cue within the black bar 702, the black bar 702 may be perceived at different depths. If the display frame bar 702 is black, the lower edge of the bar 702 may be perceived by the user as falling at z=0 meters, and in this case, the bar 702 may be perceived as being slanted.

Referring to fig. 7B, a box bar solution to mitigate depth conflicts is shown. The solution comprises: the box 704 is used to overlay as UI elements around the image content. In this example, portion 706 of window frame 704 is placed in front of image content 708, while the remainder of frame 704 is located behind image content 708.

Although the box 704 and the portion 706 appear in different planes, the user 710 as shown in fig. 7C will perceive the box and the image content to be angled (e.g., tilted) when viewing the image content on a display device, for example, as shown by content 712. This illusion may correct previously perceived depth conflicts.

Fig. 8A-8B are block diagrams illustrating an example of resolving depth conflicts with composite image content according to implementations described throughout this disclosure. As shown in fig. 8A, a user 802 is depicted as hosting image content in a 3D video conference session on a telepresence apparatus 804 of a display screen 806, as described with respect to fig. 1. To correct for depth conflicts that may occur at the bottom edge of the display screen 806, the portion 808 of the table 810 may be physically painted to appear as if the table 810 had a dark stripe.

As shown in fig. 8B, the dark bar 808 may appear to be tilted and floating at an angle at (or near) right angles to the display screen 806 for a user viewing content on the display 806. Such a configuration may correct or eliminate depth conflicts that occur as a result of cutting off portions of the displayed user 802.

Fig. 9A-9B are block diagrams illustrating an example of resolving depth conflicts by dynamically adjusting a display window of a display device according to implementations described throughout this disclosure. As shown in fig. 9A, the capture volume 404 includes an optimal viewing range 406. Similarly, a user displayed on the display screen volume 402 is represented by a header 902. In this example, the wrist or upper arm is partially captured within the volume 402, which results in a portion of the captured content being cropped by the display edge. In this example, the capture volume 402 is shown distant from the remote capture volume 502 by a distance 412, the remote capture volume 502 comprising a user represented by a head 506 and a wrist 508. In this example, the user (i.e., the head 902) may stand, and thus be at the top portion of the optimal viewing range 406. Because the head 902 is higher in the capture volume 406 than when the user is seated (or at a lower elevation), depth conflicts may be perceived when viewing the remote content or portions of the remote user shown in the capture volume 502, as shown by the viewing angle 904. For example, such a viewing angle 904 may view depth conflicts associated with the wrist 508, as the wrist may be perceived as absent when the head 902 views remote content in the capture volume 502. For example, the occurrence of a depth conflict may be determined based on the tracked head position (of the head 902) of the local user viewing the remote content (in the capture volume 502). For example, the depth conflict may be determined by the depth sensor 206 of the system 202A viewing the content received from the system 202B. Both

systems

202A and 202B may include a lenticular display device having four or more edge boundaries, wherein if, for example, a user movement occurs, including a head position change, it may be determined that a depth conflict occurs. In the depicted example, the frame 908a may include a windowed opening surrounding, enclosing, and/or overlaying content depicted in the capture volume 502 of fig. 9A, which ensures that the line of sight 904 stays over content (e.g., the wrist 508) that may cause depth conflicts, as shown by the distance 910 from the bottom edge of the capture volume 502 to the inner edge of the frame 908 a.

To correct for the detected depth conflict described above, the depth conflict resolver 142 may work with the UI generator 146 to provide a visual effect 240, as shown in fig. 9B, to correct for additional movement of the user. For example, depth conflict resolver 142 may determine and trigger a visual effect that raises a frame portion extending from a bottom edge of capture volume 502 to a higher fixed position to eliminate depth conflicts due to viewing wrist 508. The frame portion may include a frosted region or volume, or a blurred region or volume to eliminate depth conflicts.

In some implementations, depth conflict resolution 142 may trigger resizing of particular user interface elements. For example, UI element data 226 may include information about the scope 242, depth 244, and/or voxels 246 of the box UI element. Such information may be used to resize the frame. Thus, depth conflict resolver 142 may trigger resizing a box associated with content depicted in capture volume 502 (e.g., a window surrounding, enclosing, and/or overlaying the content) upon determining that the tracked head position of head 902 has moved to a position where one or more depth conflicts may be generated. Resizing the blocks 908 a-908 b is indicated by a line of sight 912 (modified from line of sight 910) and a distance 914 from the bottom edge of the capture volume 502 to the inner edge of the block 908 b. In this example, distance 908b is increased relative to distance 908a to avoid depth conflicts.

In some implementations, depth conflict resolver 142 may work with range detector 144 to determine a particular boundary 230 and/or a depth 232 associated with an object near boundary 230 or at boundary 230. For example, depth conflict resolver 142 may detect depth conflicts between portions of image content and boundary 230 associated with a viewing range (e.g., wrist 508 at boundary 906) by using at least some of the determined depths of wrist 508 associated with the image content in volume 502. For example, depth conflict resolver 142 may use the depth of wrist 508 to generate a plurality of three-dimensional voxels (as described above) representing the position of wrist 508 in a plane of a display depicting the wrist. In some implementations, the user interface element may be a box, a fuzzy wall, or other UI element selected based on distance (e.g., distance between the wrist outside the boundary and the boundary).

In some implementations, the visual effect 240 may determine a speed at which to perform resizing (e.g., from 908a to 908 b) the frame. For example, depth conflict resolver 142 may determine whether the user is seated, standing, actively moving, etc., and may select the speed at which the frame is resized accordingly. Similarly, the range detector 144 and UI element data 226 may be used to select the actual amount (smaller or larger) of resizing the box.

Fig. 10A-10B are block diagrams illustrating an example of resolving depth conflicts by adjusting boundaries and/or edges of a capture volume according to implementations described throughout this disclosure. Fig. 10A depicts a portion of a room 1000 housing a telepresence display apparatus in which a user 1002 is within a display screen 1004.

If the image management system 140 does not provide depth conflict management, the boundary edges 1006, 1008, 1010, and 1012 may cause depth conflicts to users viewing the user 1002 (or viewing content within the screen 1004) if portions of the user 1002 are edge cropped. Similarly, a user viewing user 1002 (or viewing content within screen 1004) may perceive a depth conflict at what appears to be floating of image content, such as boundary 1014.

In some implementations, the image management system 140 may trigger visual effects 240 and/or virtual content 238 and/or UI content 236 to mitigate or correct depth conflicts that occur at such edges and boundary edges. For example, the depth conflict resolution 142 may generate a grid textured blur wall 1016 for a particular boundary of the screen 1004 to blur the boundary edges to ensure that a user viewing content in the screen 1004 may not view around the boundary edges of the screen 1004 or beyond the boundary edges of the screen 1004. Although the walls 1016 are shown on a single boundary of the screen 1004, any or all boundaries in the screen 1004 may include a blurred wall, such as the wall 1016. In some implementations, depth conflict resolver 142 may generate additional pixels (e.g., regions and/or volumes) to be blurred across screen 1004 to prevent depth conflicts.

Referring to fig. 10B, a user 1002 is shown with a screen 1004 and a box 1006 to define a capture volume 1008. In this example, depth conflict resolver 142 may generate a box surrounding any portion of capture volume 1008 to ensure that depth conflicts are minimized or eliminated. In this example, block 1006 may be generated by image management system 140 via UI generator 146, for example.

Depth conflict-solver 142 also generates a fuzzy shape 1010. The blur shape 1010 may be blurred in the following manner: the partial transparency of the content is obscured to make the remote view of the user 1002 appear to be expanded, thereby increasing the sense of presence of the remote user viewing the user 1002. In this example, the radius 1013 of the blur shape 1010 is selected to encompass a less realistic place where the contents of the volume 1008 are not depicted. Although the blur shape 1010 is a partial ellipse, other shapes are of course possible. In some implementations, the surfaces of the blur shapes described herein are angled such that z=0 at the bottom edge of the blur shape. For example, the blur shape 1010 is shown in fig. 10C at an angle having the same radius 1013.

Referring to fig. 10D, another example UI element/shape for mitigating depth conflicts when viewing content within the volume 1008 includes a blur shape 1014. The blur shape 1014 is a frosted partially transparent semi-angled trapezoid. In particular, the surface of shape 1014 may be angled and rounded 1016 to provide lateral translation to prevent clipping by left display boundary 1006 and right display boundary 1010.

Referring to fig. 10E, another example UI element/shape for mitigating depth conflicts when viewing content within the volume 1008 includes a blur shape 1018. The obscuring shape 1018 is a frosted, partially transparent shell layer, the obscuring shape 1018 being provided for additional content to be displayed in the front central portion of the volume 1008 in the vicinity of the shape 1018, while defining the bottom portion on a side boundary of the volume 1008.

Referring to fig. 10F, another example UI element for mitigating (e.g., hiding, removing, correcting) depth conflicts when viewing content within a volume 1020 includes a shape 1022. Shape 1022 and the size of shape 1022 may be determined by depth sensor 206, which depth sensor 206 may detect possible depth conflicts when image and/or video content is being rendered in volume 1020. In this example, sensor 206 may provide depth 232 to depth conflict resolver 142. The digester 142 may determine a particular region in which depth conflicts may occur for a particular user viewing the content being depicted in the volume 1020. For example, digestion unit 142 may work with depth sensors and cameras on another system 202 to determine head pose data associated with particular captured data. For example, such data may be used to dynamically calculate depth conflicts in order to determine whether the depth conflicts will be visible on a display device depicting the volume 1020.

Referring to fig. 10G, depth conflict resolver 142 determines that portions of the arms of user 1024 may cause a particular depth conflict. In response, depth conflicts can be mitigated as desired. For example, animations via visual effects 240 may fade in or out based on particular detected depth conflicts. In some implementations, a selected shape, such as shape 1026, may have gradient blur. In this example, the image management system 140 using the UI generator 146 may use, for example, a distortion map to generate a blur radius that increases toward the bottom boundary of the capture volume 1020, as shown by shape 1026.

Shape 1026 may be generated when forward movement of user 1024 is detected. For example, as the user 1024 moves forward, the image management system 140 (e.g., using a shader) detects depth conflicts on the lower boundary of the volume 1020 and triggers the UI generator 146 to generate the visual effect 240 to fade the frosted shape 1026 of blur and opacity. In some implementations, similar transitions may be used for sidewalls in the boundary system for limbs or objects outside of the capture volume 1020.

The obscuring shape 1022 is a frosted partially transparent shell, which obscuring shape 1022 provides additional content to be displayed in the front and central portions of the volume 1008 near the shape 1018, while defining the bottom portion on the side boundaries of the volume 1008.

Fig. 11 is a block diagram illustrating an example of resolving depth conflicts using segmented virtual content according to an implementation described throughout this disclosure. Front depth collision mitigation of segments is shown here. A display 1004 is shown with the capture volume 1008. If the large frosted wall that appears in front is visually too distracting when the hand briefly enters the boundary region, the image management system 140 may trigger the depth conflict resolver 142 via the UI generator 146 to generate the segmented grid element 1100 and fade the partitions in and out, which is comfortable for the user. Such a segmented grid element 1100 may include obscured

portions

1102 and 1104 and a semi-opaque portion 1106. Such UI elements may also provide coverage for permanent objects depicted in volume 1008, such as a laptop on a desk.

In some implementations, the grid element 1100 can be a blurred overlay with a gradient blur that tapers from a left central portion 1108 of the overlay (e.g., grid element 1100) to a left edge 1110 of the image content, and from a right central portion 1112 of the overlay to a right edge 1114 of the image content.

Fig. 12 is a block diagram illustrating example application content placed on virtual content according to an implementation described throughout this disclosure. In this example, a shaped semi-transparent UI element 1202 is depicted to mitigate specific detected depth conflicts. For example, UI element 1202 may be ambiguous, partially ambiguous, or translucent. Such elements may be shaped based on the display volume 1008, the content being overlaid, or a particular detected depth conflict. For example, while UI element 1202 is depicted as rectangular, other shapes are possible.

The UI element 1202 may be used as a location to depict additional content to a user of the viewing volume 1008. For example, if user 1008 and user 1020 are accessing

systems

202A and 202B, respectively, the two users may wish to share application data, screen data, and the like. The image management system 140 may trigger the shapes generated to mitigate depth conflicts to begin rendering application content, such as

content

1204 and 1206. Although two application windows are depicted, any number of windows, content, applications, shortcuts, icons, etc. may be depicted. In some implementations, applications and/or content depicted within a UI element, such as element 1202, may include additional UI elements that are determined to be open during a session of viewing image content. For example, UI element 1202 may depict a user interface layer of a thumbnail image (e.g., content 1204 and content 1206) having additional software programs executing in memory by at least one processing device while accessing image content. Such content 1204 and content 1206 may represent software applications that open and execute in memory when the user is operating a remote presentation session with user 1002.

In some implementations, to prevent the mitigation described herein from generating depth conflicts at the intersection of the mitigation and the display edge itself, there is a minimum parallax cue that can be used to place the mitigation at a particular z-height. For example, the applied semi-transparent surfaces are considered to be overlaid on the rendered person, but are perceptively flexible for the exact height to which they are perceived. In particular, the perpendicular sanding design eventually looks similar to the curved sanding design. Perceived pinning (pinning) of the display edge in a vertical frosted design may make a flat vertical surface look like a curve.

In some implementations, the gradient blurring variants described herein provide the advantage of avoiding sharp upper edges. Blurring may provide an improved amount of depth conflict reduction. In some implementations, gradient blurring may be applied at specific areas where depth conflicts are detected, but not outside of these areas. That is, rather than the entire bottom edge of the gradient blur, the gradient blur may be placed at a portion where the content (e.g., user portion, object, etc.) of the display edge creates a depth conflict. For example, the systems described herein may use the detected head pose in combination with the rendered content to detect depth conflicts. Then, a gradient blurring effect may be added at the depth conflict portion. The blur type may vary based on the detected depth conflict level. The gradient may be adjustable to taper away from the edge of the display.

Fig. 13 is a flowchart illustrating one example of a process 1300 of resolving depth conflicts in a 3D content system according to an implementation described throughout this disclosure. In some implementations, process 1300 may utilize an image processing system having at least one processing device and a memory storing instructions that, when executed, cause the processing device to perform the various operations and computer-implemented steps described in the claims. In general, the systems 100, 200, and/or 1400 may be used for the description and execution of the process 1400. In some implementations, each of systems 100, 200, and/or 1400 may represent a single system. In some implementations, the telepresence system described in system 202 may perform the operations of the claims. In some implementations, the server 214 accessed by the system 202 may alternatively perform the operations recited in the claims.

In general, process 1300 utilizes the systems and algorithms described herein to detect and correct depth conflicts for 3D displays. In some implementations, depth conflict detection is based on head and/or eye tracking and captured depth image pixels. In some implementations, depth conflict detection is based on other user movements (e.g., hand movements or placements within the capture volume, and/or body movements or placements within the capture volume). In some implementations, UI elements are generated to mitigate detected depth conflicts. In general, the described process 1300 may be performed in image content, video content, virtual content, UI elements, application content, or other camera capturing content.

At block 1302, process 1300 includes: operations including determining a capture volume associated with captured image content are performed with at least one processing device. For example, the capture volume detector 228 may determine the size of the capture volume 502 (fig. 6). The capture volume may be used to determine: whether the image content resides within or beyond a particular boundary defined by the capture volume. Such image content extending beyond the boundary may cause depth conflicts to users viewing the image content within the capture volume 502. Further, the UI element data 226 and range determination from the range detector 144 may be considered when evaluating depth conflicts.

At block 1304, process 1300 includes: a depth associated with the captured image content is determined. For example, the range detector 144 may calculate the depth 232 based on images captured by the camera 204 and/or data captured by the depth sensor 206. Depth may relate to objects, users, portions of users, UI elements, or other content captured within system 202.

At block 1306, the process 1300 includes: a viewing range is defined within the capture volume based on depth in which stereoscopic effects are depicted when viewing the captured image content. For example, the range detector 144 may utilize the depth 232 and the capture volume 502 size to determine a viewing range, which may be a viewing range (e.g., size, window, volume) of 3D stereoscopic effects and 3D content within a 3D display (such as a lenticular display) for viewing rendered image content. Determining such a viewing range may enable the system 200 to properly ascertain where a particular depth conflict may occur.

At block 1308, process 1300 includes: depth conflicts between the captured image content and boundaries associated with the viewing range are determined. For example, depth conflict resolver 142 may detect: the hand 508 (fig. 6) is outside the capture volume 502 boundary at the bottom edge of the capture volume 502 (i.e., the bottom edge of the display device, as shown by boundary 1012 in fig. 10A). Here, in some implementations, depth conflict resolver 142 may use capture volume detector 228 to detect that at least a portion of the captured image content (i.e., the portion of hand 508 in fig. 6) extends beyond boundary 1012 associated with the viewing range defined by capture volume 502 and/or volume 504.

In some implementations, detecting a depth conflict between at least a portion of the image content (e.g., a portion of the hand 508) and a boundary associated with the viewing range (e.g., boundary 1012) may include: at least some of the determined depths 232 associated with the image content (e.g., head 506 and hand 508) are used to generate 3D voxels representing locations in a plane of a display (e.g., z-plane of the display) that renders the captured image content. In this example, the depth used to detect and/or correct the depth conflict may include the depth of the hand 508. The distance may include a distance from the boundary 1012 to a portion of the hand that is outside the boundary 1012. The UI element for correcting depth perception may be selected based on the distance. In some implementations, the boundary 1012 is associated with at least one edge of the lenticular display apparatus.

In some implementations, voxels (e.g., voxel 246) can be generated using UI element data 226. Voxel 246 may be derived from a point cloud defined in 3D space. Voxel 246 may include a grid of pixels defining a plurality of cells having a fixed size and discrete coordinates. Voxel 246 can be used to determine which portions of a particular image content can cause depth conflicts, and which of those portions should be corrected, resolved, blurred, or otherwise modified to avoid depth conflicts.

In some implementations, depth conflicts may be determined based on tracked head positions of users viewing image content at remote lenticular display devices. For example, a depth conflict associated with the hand 508 may be determined based on a remote user (head 408 in fig. 6) viewing angle (e.g., line of sight 602). For example, a tracked head position may be determined or provided via tracking module 216 using head/eye tracker 218, hand tracker 220, and/or movement detector 222.

At block 1310, process 1300 includes: in response to determining the depth conflict, at least a portion of the depth conflict is resolved using the view range and the at least one UI element. For example, the depth conflict resolver may use the determined viewing range relative to hand 508 in order to select a particular UI element to generate and/or modify. Such UI elements may be generated by the UI generator 146 and provided for rendering with the image content. In some implementations, resolving the depth conflict includes generating a UI element that represents a box surrounding the image content within the volume 502. The block (e.g., block 1006 in fig. 10B) may be adapted to accommodate the movement depicted in the captured image content. For example, the size, shape, or other factors of the block 1006 may be modified to resolve (e.g., overlay) the depth conflict. In some implementations, the size and/or shape of other generated UI elements may be adjusted to resolve depth conflicts. In some implementations, resizing a particular UI element is based on the tracked head position of the user viewing the particular image content.

In some implementations, one side of a UI element, such as a box 1006 corresponding to at least a portion (hand 508) extending beyond the boundary of the capture volume 502, may be placed in a different plane parallel to and forward of the remainder of the box to generate a visually perceived tilt of the box from perpendicular to a non-zero angle relative to perpendicular, as shown in fig. 7A-7B.

At block 1312, process 1300 includes generating modified image content for rendering using the resolved depth conflict. The modified image content may include a portion of the image content replaced by at least one UI element. For example, the UI element may include a blur overlay. The blur overlay may be generated by the UI generator 146. The blur overlay may be 2D or 3D. For example, the blur overlay may begin at the boundary of the capture volume 502 and may end at a predefined location associated with the size of the display device depicting the image content. For example, the size of the display may include predefined minimum and maximum sizes of boxes, overlays, or UI elements.

In some implementations, a UI element, such as a blur overlay, may be defined by the depth conflict resolver 142 and the UI generator 146, where a blur radius associated with the UI element may increase at a threshold distance from the boundary. For example, the blur radius may be animated according to the movement of the image content. In such examples, animating the blur radius (or other UI element) may resolve and/or hide depth conflicts.

In some implementations, the fuzzy overlay may be shaped according to the determined depth conflict. In some implementations, the blur overlay can be shaped according to the size or shape of the image content depicted. Example shapes may include, but are not limited to, square, rectangular, oval, semi-circular, semi-oval, trapezoidal, and the like.

In some implementations, the blur overlay described herein may include additional UI elements that are determined to be open during a session of viewing image content. For example, the additional UI elements may include software programs that are accessed (i.e., executed in memory) by the at least one processing device when accessing image content on the display device. The software program/application may be displayed as selectable UI elements overlaid on the obscured overlay. The user may select a particular application to render the application in a larger form and begin using the application within, around, or near the rendered image and/or video content.

In some implementations, as shown in fig. 11, the blur overlay is a gradient blur that tapers from the left center portion of the overlay to the left edge of the image content and from the right center portion of the overlay to the right edge of the video content. Other variations of gradient blurring are possible, and fig. 11 shows only one example of gradient blurring.

In some implementations, the lower boundary of the capture volume may be used as an interaction zone in which gestures and the like often occur. For example, the image management system 140 may distinguish between a lower portion of the display screen (where depth conflict is more problematic) and an upper middle portion of the display screen where the interactive elements may reside.

Fig. 14 shows an example of a computer device 1400 and a mobile computer device 1450 that can be used with the described technology. Computing device 1400 may include a processor 1402, a memory 1404, a storage device 1406, a high-speed interface 1408 connected to memory 1404 and high-speed expansion ports 1410, and a low-speed interface 1412 connected to low-speed bus 1414 and storage device 1404. The

components

1402, 1404, 1406, 1408, 1410, and 1412 are interconnected using various buses, and may be mounted on a common motherboard or in other manners as appropriate. The processor 1402 may process instructions for execution within the computing device 1400, including instructions stored in the memory 1404 or on the storage device 1406 to display graphical information for a GUI on an external input/output device, such as the display 1416 coupled to the high-speed interface 1408. In some embodiments, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. In addition, multiple computing devices 1400 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a blade server bank, or a multiprocessor system).

Memory 1404 stores information within computing device 1400. In one embodiment, memory 1404 is one or more volatile memory units. In another embodiment, memory 1404 is one or more nonvolatile memory cells. Memory 1404 may also be another form of computer-readable medium, such as a magnetic or optical disk.

The storage device 1406 is capable of providing mass storage for the computing device 1400. In one embodiment, storage device 1406 may be or contain a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including storage area networks or other configured devices. The computer program product may be tangibly embodied in an information carrier. The computer program product may also contain instructions that, when executed, perform one or more methods, such as the methods described herein. The information carrier is a computer-readable medium or a machine-readable medium, such as the memory 1404, the storage device 1406, or memory on processor 1402.

The high-speed controller 1408 manages bandwidth-intensive operations for the computing device 1400, while the low-speed controller 1412 manages lower bandwidth-intensive operations. Such allocation of functions is merely exemplary. In one embodiment, high-speed controller 1408 is coupled to memory 1404, display 1416 (e.g., by a graphics processor or accelerator), and to high-speed expansion port 1410, which can accept various expansion cards (not shown). Low-speed controller 1412 may be coupled to storage device 1406 and low-speed expansion port 1414. The low-speed expansion port, which may include various communication ports (e.g., USB, bluetooth, ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a network device, such as a switch or router, for example, through a network adapter.

Computing device 1400 may be implemented in a number of different forms, as shown. For example, it may be implemented as a standard server 1420, or multiple times in a group of such servers. It may also be implemented as part of a rack server system 1424. Furthermore, it may be implemented in a personal computer such as a laptop computer 1422. Alternatively, components from computing device 1400 may be combined with other components in a mobile device (not shown), such as device 1450. Each of these devices may contain one or more of computing device 1400, 1450, and the entire system may be made up of multiple computing devices 1400 and 1450 communicating with each other.

The computing device 1450 includes a processor 1452, memory 1464, input/output devices such as a display 1454, a communication interface 1466 and a transceiver 1468, and other components. The device 1450 may also be provided with a storage device, such as a microdrive or other device, to provide additional storage. Each of the

components

1450, 1452, 1464, 1454, 1466, and 1468 are interconnected using various buses, and several of these components may be mounted on a common motherboard or in other manners as appropriate.

Processor 1452 is capable of executing instructions within computing device 1450, including instructions stored in memory 1464. The processor may be implemented as a chipset that includes a single plurality of analog processors and digital processors. For example, the processor may provide for coordination of the other components of the device 1450, such as control of user interfaces, applications run by the device 1450, and wireless communication by the device 1450.

The processor 1452 may communicate with a user through a control interface 1458 and a display interface 1456 coupled to a display 1454. For example, the display 1454 may be a TFT LCD (thin film transistor liquid crystal display) or OLED (organic light emitting diode) display or other suitable display technology. The display interface 1456 may include appropriate circuitry for driving the display 1454 to present graphical and other information to a user. The control interface 1458 may receive commands from a user and convert the commands for submission to the processor 1452. In addition, an external interface 1462 may communicate with the processor 1452 to enable near area communication of the device 1450 with other devices. In some embodiments, external interface 1462 may provide multiple interfaces, for example, for wired or wireless communications.

Memory 1464 stores information within computing device 1450. Memory 1464 may be implemented as one or more of one or more computer-readable media, one or more volatile memory units, or one or more non-volatile memory units. Expansion memory 1484 may also be provided and expansion memory 1484 may be connected to device 1450 through expansion interface 1482, which expansion interface 1484 may include, for example, a SIMM (Single wire memory Module) card interface. Such expansion memory 1484 may provide additional storage space for device 1450 or may also store applications or other information for device 1450. In particular, expansion memory 1484 may include instructions to carry out or supplement the processes described above, and may include secure information also. Thus, for example, expansion memory 1484 may be a secure module of device 1450 and may be programmed with instructions that allow secure use of device 1450. Further, secure applications may be provided via the SIMM card, as well as additional information, such as placing identifying information on the SIMM card in an indestructible manner.

The memory may include, for example, flash memory and/or NVRAM memory, as described below. In an embodiment, the computer program product is tangibly embodied in an information carrier. The computer program product contains instructions that when executed perform one or more methods, such as the methods described above. The information carrier is a computer-readable medium or machine-readable medium, such as the memory 1464, expansion memory 1484, or memory on processor 1452, which may be received, for example, over transceiver 1468 or external interface 1462.

The device 1450 may communicate wirelessly through a communication interface 1466, which may include digital signal processing circuitry as necessary. The communication interface 1466 may provide communication under various modes or protocols, such as GSM voice calls, SMS, EMS, or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, GPRS, or the like. Such communication may occur, for example, through radio frequency transceiver 1468. Further, short-range communications may occur, such as using bluetooth, wi-Fi, or other such transceivers (not shown). In addition, a GPS (global positioning system) receiver module 1480 may provide additional navigation-and location-related wireless data to the device 1450, which may be used as appropriate by applications running on the device 1450.

The device 1450 may also communicate audibly using an audio codec 1460, which audio codec 1460 may receive voice information from a user and convert the voice information into usable digital information. The audio codec 1460 may similarly generate audible sound for a user, such as through a speaker, e.g., in a handset of the device 1450. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.), and may also include sound generated by applications operating on device 1450.

The computing device 1450 may be implemented in a number of different forms, as illustrated. For example, computing device 1450 may be implemented as a cellular telephone 1480. The computing device 1450 may also be implemented as part of a smart phone 1482, personal digital assistant, or other similar mobile device.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations may include implementations in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium," computer-readable medium "and/or" computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server) or that includes a middleware component (e.g., an application server) or that includes a front-end component (e.g., a client computer having a graphical user interface or a web browser through which a user can interact with an embodiment of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), and the Internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

In some embodiments, the computing device depicted in fig. 14 may include a sensor that interfaces with a virtual reality or AR headset (VR headset/AR headset/HMD device 1490). For example, one or more sensors included on the computing device 1450 or other computing device depicted in fig. 14 may provide input to the VR headset 1490, or generally to the VR space. The sensors may include, but are not limited to, touch screens, accelerometers, gyroscopes, pressure sensors, biometric sensors, temperature sensors, humidity sensors, and ambient light sensors. The computing device 1450 can use the sensors to determine an absolute position of the computing device in the VR space and/or a detected rotation, which can then be used as input to the VR space. For example, the computing device 1450 may be incorporated into the VR space as a virtual object, such as a controller, laser pointer, keyboard, weapon, and the like. When incorporated into the VR space, the positioning of the computing device/virtual object by the user may allow the user to position the computing device to view the virtual object in a particular manner in the VR space.

In some embodiments, one or more input devices included on or connected to computing device 1450 may be used as input to the VR space. The input device may include, but is not limited to, a touch screen, keyboard, one or more buttons, track pad, touch pad, pointing device, mouse, track ball, joystick, camera, microphone, headphones or earphone cartridge with input capabilities, game controller, or other connectable input device. When a computing device is incorporated into the VR space, a user interacting with an input device included on computing device 1450 may cause certain actions to occur in the VR space.

In some embodiments, one or more output devices included on computing device 1450 can provide output and/or feedback to a user of VR headset 1490 in the VR space. The output and feedback may be visual, audible or audio. The output and/or feedback may include, but is not limited to, rendering a VR space or virtual environment, vibrating, turning on and off or flashing and/or flashing one or more lights or flashes, sounding an alarm, playing a chime, playing a song, and playing an audio file. Output devices may include, but are not limited to, vibration motors, vibration coils, piezoelectric devices, electrostatic devices, light Emitting Diodes (LEDs), flashlights, and speakers.

In some embodiments, the computing device 1450 may be placed within the VR headset 1490 to create a VR system. VR headset 1490 can include one or more positioning elements that allow a computing device 1450, such as a smart phone 1482, to be placed in position within VR headset 1490. In such an embodiment, the display of the smartphone 1482 is capable of rendering stereoscopic images representing a VR space or virtual environment.

In some embodiments, computing device 1450 may appear as other objects in a computer-generated 3D environment. User interaction with the computing device 1450 (e.g., rotating, panning, touching the touch screen, swiping a finger across the touch screen) can be interpreted as interaction with an object in the VR space. As just one example, the computing device may be a laser pointer. In such examples, computing device 1450 appears as a virtual laser pointer in a computer-generated 3D environment. As the user manipulates the computing device 1450, the user in the VR space sees the movement of the laser pointer. The user receives feedback from interactions with the computing device 1450 in a VR environment on the computing device 1450 or VR headset 1490.

In some embodiments, computing device 1450 may include a touch screen. For example, a user can interact with the touch screen in a particular manner that can utilize what happens in the VR space to simulate what happens on the touch screen. For example, a user may use a pinching motion to zoom in and out on content displayed on a touch screen. Such pinching motion on the touch screen may cause information provided in the VR space to be scaled. In another example, a computing device may be rendered as a virtual book in a computer-generated 3D environment. In the VR space, pages of the book may be displayed in the VR space, and swipes of the user's finger across the touch screen may be interpreted as turning/flipping pages of the virtual book. As each page is turned/flipped, audio feedback, such as the turning sound of the pages in the book, may be provided to the user in addition to seeing the page content changes.

In some embodiments, one or more input devices (e.g., mouse, keyboard) other than the computing device may be rendered in a computer-generated 3D environment. A rendered input device (e.g., a rendered mouse, a rendered keyboard) may be used as rendering in the VR space to control objects in the VR space.

Computing device 1400 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The computing device 1450 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smartphones, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit the disclosed embodiments.

Moreover, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. Further, other steps may be provided from the described flows, or steps may be eliminated, and other components may be added to, or removed from, the described systems. Accordingly, other embodiments are within the scope of the following claims.

Claims

1. A computer-implemented method of performing operations with at least one processing device, the operations comprising:

determining a capture volume associated with image content captured by at least one camera;

determining a depth associated with the image content;

Defining a viewing range within the capture volume based on the depth in which stereoscopic effects are depicted when viewing the image content;

determining a depth conflict between the image content and a boundary associated with the viewing range, the determining comprising: detecting that at least a portion of the image content extends beyond the boundary associated with the viewing range;

in response to determining the depth conflict, resolving the depth conflict for the at least a portion using the viewing range and at least one user interface element; and

generating, with the resolved depth conflict, modified image content for rendering, the modified image content comprising a portion of the image content replaced by the at least one user interface element.

2. The method according to claim 1, wherein:

detecting a depth conflict between the at least a portion of the image content and the boundary associated with the viewing range includes: generating a plurality of three-dimensional voxels representing positions in a plane of a display rendering the image content using at least some of the determined depths associated with the image content, the distances from the at least a portion to the boundary; and

The at least one user interface element is selected based on the distance.

3. The method of claim 1 or claim 2, wherein:

the boundary being associated with at least one edge of the lenticular display apparatus;

the depth conflict is determined based on a tracked head position of a user viewing the image content at a remote lenticular display device; and

resolving the depth conflict includes: the user interface element is resized based on the tracked head position of the user.

4. The method of any of the preceding claims, wherein resolving the depth conflict comprises: generating the at least one user interface element as a box overlaying at least some of the image content, the at least one box adapted to accommodate movements depicted in the image content.

5. The method of claim 4, wherein a side of the frame corresponding to the at least a portion extending beyond the boundary is placed in a different plane parallel to and forward of the remainder of the frame to generate a tilt of the frame from perpendicular to a visual perception of a non-zero angle to the perpendicular.

6. The method of any of the preceding claims, wherein the at least one user interface element depicts a user interface layer of a thumbnail image with additional software programs that are executed in memory by the at least one processing device when accessing the image content.

7. The method of any of the preceding claims, wherein the user interface element comprises a blur overlay beginning at the boundary and ending at a predefined location associated with a size of a display device depicting the image content, wherein a blur radius associated with the blur overlay increases at a threshold distance from the boundary.

8. The method of claim 7, wherein the blur overlay comprises a user interface layer of a thumbnail image with an additional software program that is executed in memory by the at least one processing device when accessing the image content.

9. The method of claim 7 or claim 8, wherein the blur overlay is elliptical.

10. The method of any of claims 7 to 9, wherein the blur overlay is a gradient blur that tapers from a left central portion of the overlay to a left edge of the image content and from a right central portion of the overlay to a right edge of the video content.

11. The method of any of the preceding claims, wherein resolving the depth conflict comprises: the at least one user interface element is animated to conceal the at least a portion of image content with the modified image content.

12. An image processing system, comprising:

at least one processing device;

a plurality of stereoscopic cameras; and

a memory storing instructions that, when executed, cause the system to perform operations comprising:

determining a volume associated with image content captured by the plurality of stereoscopic cameras;

determining a depth associated with the image content;

defining a viewing range within the volume, within which a stereoscopic effect occurs, based on the depth;

13. The system of claim 12, wherein:

the at least one user interface element is selected based on the distance.

14. The system of claim 12 or claim 13, wherein:

15. The system of any of claims 12 to 14, wherein resolving the depth conflict comprises: the at least one user interface element is generated as a box around the image content, the at least one box being adapted to accommodate movements depicted in the image content.

16. The system of any of claims 12 to 14, wherein the user interface element comprises a blur overlay beginning at the boundary and ending at a predefined location associated with a size of a display device depicting the image content, wherein a blur radius associated with the blur overlay increases at a threshold distance from the boundary.

17. A non-transitory machine-readable medium having instructions stored thereon, which when executed by a processor, cause a computing device to:

when capturing video content with multiple stereo cameras:

determining a capture volume associated with the captured video content;

determining a depth associated with the captured video content;

defining a viewing range within the capture volume over which stereoscopic effects occur based on the depth;

determining a depth conflict between captured video content and a boundary associated with the viewing range, the determining comprising: detecting that at least a portion of the captured video content extends beyond the boundary associated with the viewing range;

Generating, with the resolved depth conflict, modified video content for rendering, the modified video content comprising a portion of the video content replaced by the at least one user interface element.

18. The machine-readable medium of claim 17, wherein:

detecting a depth conflict between the at least a portion of the captured video content and the boundary associated with the viewing range includes: generating a plurality of three-dimensional voxels representing locations in a plane of a display rendering the captured video content using at least some of the determined depths associated with the captured video content, the distances from the at least a portion to the boundary; and

the at least one user interface element is selected based on the distance.

19. The machine readable medium of claim 17 or claim 18, wherein:

the depth conflict is determined based on a tracked head position of a user viewing the captured video content at a remote lenticular display device; and

20. The machine readable medium of claim 17 or claim 18, wherein the user interface element comprises a blur overlay beginning at the boundary and ending at a predefined location associated with a size of a display device depicting the captured video content, wherein a blur radius associated with the blur overlay is increased at a threshold distance from the boundary.