WO2019111052A2

WO2019111052A2 - Inserting virtual objects in between two real objects in an augmented reality environment

Info

Publication number: WO2019111052A2
Application number: PCT/IB2018/001515
Authority: WO
Inventors: Pak Kit Lam; Peter Han Joo CHONG; Xiang Yu
Original assignee: Zyetric One Limited
Priority date: 2017-12-06
Filing date: 2018-12-06
Publication date: 2019-06-13
Also published as: WO2019111052A3

Abstract

Systems and methods for inserting a virtual object between two real objects in an augmented reality environment include, at an electronic device having a camera system, capturing, at the camera system, image content and depth information associated with the image content, distinguishing, based on the depth information, a first set of pixels from a second set of pixels in the image content, whereby the first set of pixels define a foreground layer and the second set of pixels define a background layer, and in accordance with a determination that the virtual object is available for display behind the first set of pixels, generating a first composite image by inserting a virtual object layer containing the virtual object between the foreground layer and the background layer.

Description

INSERTING VIRTUAL OBJECTS IN BETWEEN TWO REAL OBJECTS IN AN AUGMENTED REALITY ENVIRONMENT

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to U.S. Provisional Patent Application Serial No. 62/595,507, entitled“INSERTING VIRTUAL OBJECTS IN BETWEEN TWO REAL OBJECTS IN AN AUGMENTED REALITY ENVIRONMENT,” filed December 6, 2017, the content of which is hereby incorporated by reference for all purposes.

FIELD

[0002] The present disclosure generally relates to augmented reality, and more specifically, to computer-generated virtual components in an augmented reality environment.

BACKGROUND

[0003] In an augmented reality experience, users can interact with computer generated components in a real-world setting. For example, a user may observe and interact with a computer-generated virtual object that exists among various real-world objects in image frames that are captured by a camera. Such experiences may be created at a computer by overlaying the virtual object on top of the image frame.

SUMMARY

[0004] Below, various embodiments of the present invention are described to provide virtual objects in an augmented reality environment.

[0005] Example methods are disclosed herein. An example method includes, at an electronic device having a camera system, capturing, at the camera system, image content and depth information associated with the image content, distinguishing, based on the depth information, a first set of pixels from a second set of pixels in the image content, whereby the first set of pixels define a foreground layer and the second set of pixels define a background layer, and in accordance with a determination that the virtual object is available for display behind the first set of pixels, generating a first composite image by inserting a virtual object layer containing the virtual object between the foreground layer and the background layer

[0006] In some examples, the method includes determining whether the virtual object is available for display behind the first set of pixels. In some examples, the method includes causing displaying of the first composite image at a display screen, wherein a portion of the virtual object is occluded by at least a portion of the first set of pixels in the displayed first composite image.

[0007] In some examples, the method includes, in accordance with a determination that the virtual object is not available for display behind the first set of pixels, forgoing generating the first composite image. In some examples, the method includes, in accordance with a determination that the virtual object is not available for display behind the first set of pixels: determining that the virtual object is available for display in front of the first set of pixels; and generating a second composite image by overlaying the virtual object layer over both the foreground layer and the background layer. In some examples, the method includes causing displaying of the second composite image at a display screen, wherein the virtual object appears in front of the first set of pixels and is not occluded by the first set of pixels in the second composite image.

[0008] Further, in some examples, the method includes, in accordance with a determination that the virtual object is not available for display behind the first set of pixels: altering a transparency level of at least a portion of the first set of pixels in the foreground layer that overlaps with a location of the virtual object in the virtual object layer; generating a third composite image by inserting the virtual object layer containing the virtual object between the altered foreground layer and the background layer; and causing displaying of the third composite image at a display screen, wherein the virtual object appears in front of the first set of pixels and is not occluded by the first set of pixels in the third composite image.

[0009] In some examples methods, distinguishing the first set of pixels from the second set of pixels further comprises identifying, based on the depth information, pixels in the image content that are within a predetermined distance from the camera system as the first set of pixels; extracting the first set of pixels from the image content; and defining pixels remaining in the image content after the first set of pixels is extracted as the second set of pixels.

[0010] In some example methods, distinguishing the first set of pixels from the second set of pixels further comprises detecting, based on the depth information, a closest object in the image content; extracting pixels in the image content

corresponding to the closest object as the first set of pixels; and defining pixels remaining in the image content after the first set of pixels is extracted as the second set of pixels.

[0011] Further, in some example methods, the first set of pixels corresponds to a foreground object detected in the image content; and the second set of pixels corresponds to one or more background objects. In some examples, the method further comprises determining a display size of the virtual object based on at least one of a size of the foreground object and a size of the one or more background objects.

In some examples, the electronic device further comprises the display screen, and the method further comprises displaying the combined virtual object, foreground layer, and background layer at the display screen.

[0012] In some embodiments, a computer readable storage medium stores one or more programs, and the one or more programs include instructions, which when executed by an electronic device with a camera system, cause the device to perform any of the methods described above and herein.

[0013] In some embodiments, an electronic device, includes a camera system, one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods of described above and herein.

[0014] In some embodiments, an electronic device includes a camera system and means for performing any of the methods described above and herein. BRIEF DESCRIPTION OF THE FIGURES

[0015] The present application can be best understood by reference to the figures described below taken in conjunction with the accompanying drawing figures, in which like parts may be referred to by like numerals.

[0016] FIG. 1 A depicts a front view of an example electronic device that implements various embodiments of the present invention

[0017] FIG. 1B depicts a back view of an example electronic device that implements various embodiments of the present invention

[0018] FIG. 2A depicts an example of an augmented reality environment with the virtual object not inserted between real-world objects.

[0019] FIG. 2B depicts an example of an augmented reality environment with the virtual object inserted between two real-world objects, in accordance with various embodiments of the present invention.

[0020] FIG. 3 A depicts an example real-world foreground objects layer and an example real-world background objects layer, in accordance with various

embodiments of the present invention.

[0021] FIG. 3B depicts an example virtual objects layer inserted between the real- world foreground objects layer and the real-world background objects layer of FIG.

3 A, in accordance with various embodiments of the present invention.

[0022] FIG. 4 depicts an example method for inserting virtual objects between two real-world objects, in accordance with various embodiments of the present invention.

[0023] FIG. 5 depicts a computer system, such as a smart device, that may be used to implement various embodiments of the present invention.

[0024] FIG. 6A depicts a screenshot of an example of a view of an AR environment displayed on a display of a smart device in accordance with various embodiments of the present invention. [0025] FIG. 6B depicts a screenshot of an example of an updated view of the AR environment of Fig. 6A in accordance with various embodiments of the present invention.

[0026] FIG. 6C depicts a screenshot of an alternative example of an updated view of the AR environment of Fig. 6A in accordance with various embodiments of the present invention.

[0027] FIG. 6D depicts a screenshot of an alternative example of an updated view of the AR environment of Fig. 6A in accordance with various embodiments of the present invention.

[0028] FIG. 6E depicts a screenshot of a prior art of a view of an AR environment displayed on a display of an electronic device in accordance with various

embodiments of the present invention.

[0029] FIG. 7 A depicts an example of virtual object layers inserted between real- world object layers for displaying virtual objects within the AR environment of Fig. 6B or Fig. 6C in accordance with various embodiments of the present invention.

[0030] FIG. 7B depicts an example of virtual object layers inserted between real- world object layers for displaying virtual objects within the AR environment of Fig. 6D in accordance with various embodiments of the present invention.

[0031] FIG. 8A depicts an example flow chart showing a process of incorporating and displaying virtual objects in an AR environment in accordance with various embodiments of the present invention.

[0032] FIG. 8B depicts an alternative example flow chart showing a process of incorporating and displaying virtual objects in an AR environment in accordance with various embodiments of the present invention.

DETAILED DESCRIPTION

[0033] The following description is presented to enable a person of ordinary skill in the art to make and use the various embodiments. Descriptions of specific devices, techniques, and applications are provided only as examples. Various modifications to the examples described herein will be readily apparent to those of ordinary skill in the art, and the general principles defined herein may be applied to other examples and applications without departing from the spirit and scope of the present technology. Thus, the disclosed technology is not intended to be limited to the examples described herein and shown, but is to be accorded the scope consistent with the claims.

[0034] Referring to FIGS. 1 A-1B, a front view and a back view, respectively, of smart device 100 which can be utilized to implement various embodiments of the present technology is shown. In some examples, smart device 100 is a smart phone or tablet computing device. However, it is noted that that the embodiments described herein are not limited to performance on a smart device, and can be implemented on other types of electronic devices, such as wearable devices, computers, or laptop computers.

[0035] As shown in FIG. 1 A, a front side of the smart device 100 includes a display screen, such as a touch sensitive display 102, a speaker 122, and a front-facing camera 120. The touch-sensitive display 102 can detect user inputs received thereon, such as a number and/or location of finger contact(s) on the screen, contact duration, contact movement across the screen, contact coverage area, contact pressure, and so on. Such user inputs can generate various interactive effects and controls at the device 100. In some examples, the front-facing camera 120 faces the user and captures the user’s movements, such as hand or facial images or movements, which are registered and analyzed for gesture recognition as described herein. The touch-sensitive display 102 and speaker 122 further promote user interaction with various programs at the device, such as by detecting user inputs while displaying visual effects on the display screen and/or while generating verbal communications or sound effects from the speaker 122.

[0036] FIG. 1B shows an example back view of the smart device 100 having a back-facing camera 124. In some examples, the back-facing camera 124 captures images of an environment or surrounding, such as a room or location that the user is in or observing. In some examples, smart device 100 shows such captured image data as a background to an augmented reality experience displayed on the display screen.

In some examples, the back-facing camera 124 captures the user’s movements, such as hand or facial images or movements, which are registered and analyzed for gesture recognition as described herein. Optionally, smart device 100 includes a variety of other sensors and/or input mechanisms to receive user and environmental inputs, such as microphones (which is optionally integrated with speaker 122),

movement/orientation sensors (e.g., one or more accelerometers, gyroscopes, digital compasses), depth sensors (which are optionally part of front-facing camera 120 and/or back-facing camera 124), and so on. In some examples, smart device 100 is similar to and includes some or all of the components of computing system 500 described below in FIG. 5. In some examples, the present technology is performed at a smart device having image and depth sensors, such as a front-facing camera with depth-sensing capability (e.g., front-facing camera 120) and/or a back-facing camera with depth sensing capability (e.g., back-facing camera 124).

[0037] Various embodiments of the present invention provide systems and methods for inserting one or more virtual objects between two or more real-world objects in an augmented reality environment. In practice, the present invention enhances user interaction with virtual objects by allowing virtual objects to be hidden by real-world objects when desirable in an augmented reality environment. For example, in some augmented reality (“AR”) applications, virtual objects inside the AR environment are positioned to appear on top of real-world objects (hereinafter also referred to as“real objects”), even when such virtual objects should be hidden or otherwise appear behind such real objects. In an example illustration at FIG. 2A, augmented reality environment 200 includes virtual object 102 displayed on top of real object 204, such as a graphic character that appears to be held in the user’s hand. In some cases, the virtual object 202 should be hidden below the real object 204, but due to limitations of some AR displaying technologies, the virtual object 202 still appears on top of the real object 104 as shown at FIG. 2A. This creates a confusing image for the observer of the AR environment. Such limitations may be associated with AR technologies that treat the real object 204 and its surrounding real-world background as a single layer image that is captured by a camera.

[0038] In the present invention, systems and methods are described for rendering the virtual object to appear between real objects, such as between the real object 204 and one or more background objects that are behind the real object 204. As discussed below, the present disclosure is directed to inserting the virtual object between two real objects by employing Digital Depth Information (“DDF, hereinafter also referred to as“depth information”) that is captured by camera systems, such as cameras having depth sensors. Such camera systems may be found in smart devices, including smart phones and tablets. Capturing the DDI allows software applications to extract the foreground object from the background object, such as the rest of the captured image frame, such that the foreground and background of an image can be separated into two layers. As discussed below, the virtual object can be inserted as a middle layer between the two layers to achieve a more desirable display in situations where the virtual object is intended to be hidden by the foreground object. For example, as shown at FIG. 2B, augmented reality environment 210 provides virtual object 212 between foreground object 214 and background object 216, such that the graphic character appears behind the user’s hand and on top of the floor in the background. Rendering AR environments such as FIG. 2B by using the systems and methods discussed herein provides a more realistic perception to an observer or player of the AR application or game, and enhances user interaction with the virtual object and overall user experience.

[0039] Turning to FIGS. 3A-3B, examples illustrating the methods described herein are shown. As shown at FIG. 3 A, systems and methods described herein may be deployed at a smart device, such as device 300 having a display screen 302 and a camera system capable of providing depth information, such as a depth-sensing camera that captures a scene as an image frame along with depth data or information associated with the pixels in the image frame. Such camera systems may include a front-facing camera having a lens facing toward the user and/or a rear-facing camera having a lens facing away from the user and capturing a real-world scene including the user’s hand against a background, as shown at FIG. 3 A. The captured image frame with depth information, also referred to herein as“frame with DDI” or“image content,” may be displayed at the display screen 302. Additionally and/or

alternatively, the image content may be displayed at an external display screen or goggles in operative communication with the device 300. The captured image content can include still image frames, a plurality of image frames corresponding to a previously recorded video, a plurality of image frames captured in real-time and corresponding to a live video, and so on, for generating an augmented reality scene. [0040] As shown at FIG. 3 A, objects in the image content can be extracted, separated, or otherwise distinguished from one another based on depth information associated with their corresponding pixels. For instance, an object can be identified and separated from the background based on its corresponding pixels having depth information that is within a particular depth. In some examples, the particular depth is a predetermined threshold depth or distance from the camera system. In some examples, the particular depth is determined by detecting a closest object from the camera system and distinguishing the closest object from the rest of the frame.

Further in some examples, pixels within the particular depth represent one or more foreground objects, while remaining pixels in the rest of the frame are regarded as background objects. As illustrated at FIG. 3A, two sets of pixels are separated and displayed as two layers, such as background layer 304 and foreground objects layer 306. The foreground objects layer 306 includes foreground object 308, in this case the user’s hand, which is within the particular depth and therefore extracted from the background layer 304 as a foreground layer 306 overlaying the background layer 304. In some examples, portions of the foreground layer 306 surrounding the foreground object 308 are blank and/or transparent. In this way, device 300 programmatically treats foreground objects and background objects in the image content as two separate layers of pixels. In some examples, additional objects corresponding to the same or different threshold depths and/or layers can be identified and overlaid.

[0041] Turning to FIG. 3B, device 300 determines whether there is a virtual object available in the AR application or game and whether it should be displayed behind or in front of the foreground object 308 and/or the foreground objects layer 306. In some examples, the device 300 makes the determination in response to activating the camera system, capturing the image content, launching the AR application, and/or in response to a recognized object or gesture that is captured by the camera system. In some examples, when a virtual object is determined to be available for display behind the foreground object 308, device 300 programmatically displays a virtual objects layer 310 containing the virtual object 312 by inserting the layer 310 behind foreground layer 306 and in front of background layer 304. For instance, as demonstrated at FIG. 3B, the virtual objects layer 310 is inserted between the foreground objects layer 306 of pixels and background layer 304 of pixels. In FIG.

3B, the virtual object 312 is a monster in an AR environment of a gaming application and the space surrounding the virtual objects layer 310 is generally transparent so as not to interfere with display of objects in the background layer 304. Other virtual objects may be contemplated, such as animations, graphics, text, and so on.

[0042] In some examples, device 300 determines how the virtual object 312 should be positioned or displayed relative to the foreground object 308 based on a detected gesture or type of foreground object 308 that is captured in the image content. For example, device 300 discerns whether the user is picking up the monster or patting the monster and displays the monster in front of or behind the user’s hand, respectively. Merely by way of example, device 300 determines whether the foreground set of pixels corresponding to the foreground object 308 matches an object in a database, such as a gestures database, and based on the matching or lack thereof, determines placement of the virtual object relative to the foreground object.

[0043] Further in some examples, if device 300 determines that virtual object 312 is intended for positioning in front of the foreground object 308, device 300 displays the virtual objects layer 310 overlaying both the foreground objects layer 306 and background objects layer 304. In some examples, in accordance with a determination that no pixels in the image content satisfy the particular depth, device 300 forgoes splitting the image content into separate layers and simply overlays the virtual object 312 on top of the captured image content. In some examples, device 300 provides the virtual objects layer 310 between the foreground objects layer 306 and background layer 304 and alters a transparency level of certain pixels corresponding to the foreground object 308 to be opaque or transparent in accordance with whether the virtual object 312 should be displayed in front of or behind the foreground object 308. In some examples, device 300 tracks the foreground object 308 and determines in real-time whether the foreground object 308 should be in front of or behind the virtual object 312, and adjusts ordering and displaying of the layers accordingly.

[0044] Turning now to FIG. 4, an example method 400 is shown for inserting a virtual object between two real objects in an augmented reality environment, as discussed in the various examples described herein. Method 400 may be performed by an electronic device (e.g., device 300) having a camera system capable of obtaining depth information, such as a portable smart device having a camera and depth sensors. As shown at FIG. 4, method 400 includes capturing, at the camera system, image content and depth information associated with the image content (block 402).

[0045] Method 400 includes distinguishing, based on the depth information, a first set of pixels from a second set of pixels in the image content, wherein the first set of pixels define a foreground layer (e.g., foreground objects layer 306) and the second set of pixels define a background layer (e.g., background layer 304) (block 404). In some examples, method 400 includes identifying pixels in the image content that are within a predetermined distance from the camera system as the first set of pixels, extracting the first set of pixels, and defining pixels remaining in the image content as the second set of pixels (block 406). In some examples, method 400 includes detecting a closest object (e.g., foreground object 308) in the image content, extracting pixels corresponding to the closest object as the first set of pixels, and defining pixels remaining in the image content as the second set of pixels (e.g., background layer 304).

[0046] Method 400 includes, in accordance with a determination that a virtual object (e.g., virtual object 312) is available for display behind the first set of pixels, generating a first composite image by inserting a virtual object layer (e.g., virtual objects layer 310) containing the virtual object (e.g., virtual object 312) between the foreground layer (e.g., foreground objects layer 306) and the background layer (e.g., background layer 304) (block 410). In some examples, method 400 includes determining whether the virtual object (e.g., virtual object 312) is available for display behind the first set of pixels (e.g., foreground object 308) (block 412). In some examples, method 400 includes causing displaying of the first composite image at a display screen (block 414). In some examples, method 400 includes, in accordance with a determination that the virtual object (e.g., virtual object 312) is not available for display behind the first set of pixels (e.g., foreground object 308), forgoing generating the first composite image (block 416). In some examples, method 400 includes, in accordance with a determination that the virtual object (e.g., virtual object 312) is not available for display behind the first set of pixels (e.g., foreground object 308): determining that the virtual object is available for display in front of the first set of pixels, and generating a second composite image by overlaying the virtual object layer over both the foreground layer (e.g., foreground objects layer 306) and the background layer (e.g., background layer 304) (block 418). In some examples, method 400 includes causing displaying of the second composite image at a display screen (e.g., display screen 302) (block 420).

[0047] Additionally and/or alternatively, method 400 includes, in accordance with a determination that the virtual object (e.g., virtual object 312) is not

available for display behind the first set of pixels (e.g., foreground object 308):

altering a transparency level of at least a portion of the first set of pixels in the foreground layer (e.g., foreground objects layer 306) that overlaps with a

location of the virtual object in the virtual object layer (e.g., virtual objects layer 310); generating a third composite image by inserting the virtual object layer containing the virtual object between the altered foreground layer and the background layer (e.g., background layer 304); and causing displaying of the third composite image at a display screen (e.g., display screen 302), wherein the virtual object appears in front of the first set of pixels and is not occluded by the first set of pixels in the third composite image.

[0048] In some examples, the first set of pixels corresponds to a foreground object (e.g., foreground object 308) detected in the image content; and the

second set of pixels corresponds to one or more background objects. In some examples, method 400 includes determining a display size of the virtual object (e.g., virtual object 312) based on at least one of a size of the foreground object (e.g., foreground object 308) and a size of the one or more background objects in the background layer 304. In some examples, the electronic device (e.g., device 300) further comprises the display screen (e.g., display screen 302), and method 400 further comprises displaying the combined virtual object (e.g., virtual object 312), foreground layer (e.g., foreground objects layer 306), and background layer (e.g., background layer 304) at the display screen.

[0049] Turning now to FIG. 5, components of an exemplary computing system 500, configured to perform any of the above-described processes and/or operations are depicted. For example, computing system 500 may be used to implement the smart device described above that implements any combination of the above embodiments. Computing system 500 may include, for example, a processor, memory, storage, and input/output peripherals (e.g., display, keyboard, stylus, drawing device, disk drive, Internet connection, camera/scanner, microphone, speaker, etc.). However, computing system 500 may include circuitry or other specialized hardware for carrying out some or all aspects of the processes.

[0050] In computing system 500, the main system 502 may include a motherboard 504 with a bus that connects an input/output (I/O) section 506, one or more microprocessors 508, and a memory section 510, which may have a flash memory card 512 related to it. Memory section 510 may contain computer-executable instructions and/or data for carrying out the techniques and algorithms described above. The I/O section 506 may be connected to display 524, a keyboard 514, a cam era/ scanner 526 (e.g., to detect objects for recognition, depth information, and capture video/image frames), a microphone 528, a speaker 530, a disk storage unit 516, and a media drive unit 518. The media drive unit 518 can read/write a non- transitory computer-readable storage medium 520, which can contain programs 522 and/or data used to implement process 200 and/or process 400.

[0051] Additionally, a non-transitory computer-readable storage medium can be used to store (e.g., tangibly embody) one or more computer programs for performing any one of the above-described processes by means of a computer. The computer program may be written, for example, in a general-purpose programming language (e.g., Pascal, C, C++, Java, or the like) or some specialized application-specific language.

[0052] Computing system 500 may include various sensors, such as front facing camera 530, back facing camera 532, compass 534, accelerometer 536, gyroscope 538, and/or touch-sensitive surface 540. Other sensors may also be included, such as one or more depth sensors associated with cameras 530, 532.

[0053] While the various components of computing system 500 are depicted as separate in FIG. 5, various components may be combined together. For example, display 524 and touch sensitive surface 540 may be combined together into a touch- sensitive display.

[0054] Fig. 6A depicts a screenshot of an example of a view of an AR environment displayed on a display of an electronic device. In some examples, a view (view 670A) of an AR environment is displayed on touch sensitive display 602 of smart device 600. The AR environment has a background (AR background) based on image data captured by back facing camera 624. For instance, as depicted in Fig. 6A, a user puts his / her hand 628 on top of table 630 in real-world environment. Table 630 is positioned on carpet 634. Plant 636 is also positioned on carpet 634 and against wall 632. Hand 628, table 630, wall 632 and carpet 634 are in a field of view of back facing camera 624, all of which are captured by back facing camera 624 as image data. The captured image data includes image of hand 628 (hand 628’), image of table 630 (table 630’), image of carpet 634 (carpet 634’), image of plant 636 (plant 636’) and image of wall 632 (wall 632’).

[0055] View 670A includes hand 628’ positioned on top of table 630’. Table 630’is positioned on carpet 634’. Plant 636’ is also positioned on carpet 634’ and against wall 632’. The captured image data contains depth information associated with hand 628’, table 630’, wall 632’, carpet 634’ and plant 636’ (pixels for each of hand 628’, table 630’, wall 632’, carpet 634’ and plant 636’ have corresponding depth information). The depth information relates to a depth or distance between back- facing camera 624 and each of hand 628, table 630, wall 632, carpet 634 and plant 636.

[0056] In one embodiment, at least two virtual objects (first virtual monster 640 and second virtual monster 642) are configured to appear in the AR environment. First virtual monster 640 is intended to appear on top of table 630’ and second virtual monster 642 is intended to be partially or fully hidden by table 630’ in the AR environment. In some examples, due to limitations of some AR displaying

technologies, virtual monster 642 still appears on top of table 630’ as depicted in Fig. 6E (prior art). This creates a confusing image for a user or an observer of the AR environment.

[0057] Fig. 6B depicts a screenshot of an example of an updated view of the AR environment of Fig. 6A. In some examples, the present invention is embodied in an AR application. Smart device 600 runs the AR application. First virtual monster 640 and second virtual monster 642 are configured to appear in the AR environment. First virtual monster 640 is intended to appear on top of table 630’ and second virtual monster 642 is intended to be partially hidden by table 630’ in the AR environment. View 670A is updated to View 670B. View 670B, including first virtual monster 640, second virtual monster 642, hand 628’, table 630’, wall 632’, carpet 634’ and plant 636’, is displayed on touch sensitive display 602.

[0058] Fig. 6C depicts a screenshot of an alternative example of an updated view of the AR environment of Fig. 6A. In some examples, first virtual monster 640 is intended to appear on top of table 630’ and second virtual monster 642 is intended to be fully hidden by table 630’ (second virtual monster 642 is positioned under table 630’) in the AR environment. View 670A is updated to View 670C. View 670C, including first virtual monster 640, hand 628’, table 630’, wall 632’, carpet 634’ and plant 636’, is displayed on touch sensitive display 602. Second virtual monster 642 is not included in View 670C (is not viewed by the user) because it is hidden by table 630’.

[0059] In one variant, first virtual monster 640, second virtual monster 642 and third monster 644 are configured to appear in the AR environment as depicted in Fig. 6D. In some examples, first virtual monster 640 is intended to appear on top of table 630’, second virtual monster 642 is intended to be partially hidden by table 630’ and third virtual monster 644 is intended to be partially (or even fully) hidden by plant 636’ in the AR environment. View 670A of Fig. 6A is updated to View 670D. View 670D, including first virtual monster 640, second virtual monster 642, third virtual monster 644, hand 628’, table 630’, wall 632’ carpet 634’ and plant 636’, is displayed on touch sensitive display 602.

[0060] FIG. 7 A depicts an example of virtual object layers inserted between real- world object layers for displaying virtual objects within the AR environment of Fig. 6B or Fig. 6C. In one embodiment, at two least virtual objects (first virtual monster 640 and second virtual monster 642) are configured to appear in the AR environment, which contains hand 628’, table 630’, wall 632’, carpet 634’ and plant 636’(hand 628’, table 630’, wall 632’ and carpet 634’ are corresponding to hand 628, table 630, wall 632, carpet 634 and plant 636 in the real-world environment respectively). First virtual monster 640 and second virtual monster 642 are configured to appear at predetermined positions in the AR environment respectively, relative to back-facing camera 624. [0061] For example, first virtual monster 640 is configured to appear at a first predetermined position which is 25 cm from back-facing camera 624 in the AR environment. Second virtual monster 642 is configured to appear at a second predetermined position which is 50 cm from back-facing camera 624 in the AR environment. In this case, first virtual monster 640 is overlaid on first virtual object layer 682 and second virtual monster 642 is overlaid on second virtual object layer 686. Both first virtual object layer 682 and second virtual object layer 686 are configured to be inserted in the AR environment.

[0062] In one example, hand 628 is positioned at 10 cm from back-facing camera 624. Table 630 is positioned at 40 cm from back-facing camera 624. Plant 636’ is positioned at 60 cm from back-facing camera 624. Wall 632’ and carpet 634’ are positioned at 80 cm from back-facing camera.

[0063] Based on the depth information included in the captured image data, smart device 600 determines whether one or more objects belong a group of foreground objects. If the one or more objects have a depth within a predetermined threshold depth or distance from back-facing camera 624, the one or more objects will belong to the group of foreground objects. For instance, the predetermined threshold depth or distance from back-facing camera 624 is 20 cm. Hand 628 is positioned at 10 cm from back-facing camera 624. Therefore, hand 628 belongs to the group of foreground objects. Hand 628’ is extracted from the captured image data and is overlaid onto foreground object layer 680. Foreground object layer 680 including hand 628’ is inserted before first virtual object layer 682, relative to back-facing camera 624. There is no limitation on the predetermined threshold depth or distance. The predetermine threshold depth or distance may be 12 cm, 15 cm or 18 cm from back-facing camera 624.

[0064] Smart device 600 will then determine whether one or more objects belong to a first group of intermediate objects. If one or more objects are positioned between first virtual monster 640 and second virtual monster 642, the one or more objects will belong to the first group of intermediate objects. According to the depth information, table 630 is positioned at 40 cm from back-facing camera 624. Table 630’ is positioned between first virtual monster 640 and second virtual monster 642, with the result that table 630’ belongs to the first group of intermediate objects. Table 630’ is extracted from the captured image data and is overlaid onto first intermediate object layer 684. First intermediate object layer 684 including table 630’ is inserted between first virtual object layer 682 and second virtual object layer 686.

[0065] Smart device 600 will then determine whether one or more objects belong a group of background objects. If one or more objects are positioned behind second virtual monster 642, the one or more objects will belong to the group of background objects. According to the depth information, plant 636’, carpet 634’ and wall 632’ are positioned behind second virtual monster 642, with the result that plant 636’, carpet 634’ and wall 632’ belong to as the background objects. Plant 636’, carpet 634’ and wall 632’ are extracted from the captured image data and are overlaid onto background object layer 692. Background object layer 692 including plant 636’, wall 632’ and carpet 634’ is inserted behind second virtual object layer 686.

[0066] As depicted in Fig. 7A, foreground object layer 680 is configured to be overlaid on first virtual object layer 682, which, in turn, is configured to be overlaid on first intermediate object layer 684. First intermediate object layer 684 is configured to be overlaid on second virtual object layer 686, which, in turn, configured to be overlaid on background object layer 692. Based on such configuration, view 670B of Fig. 6B or view 670C of Fig. 6C is displayed on touch sensitive display 602 of smart device 600.

[0067] In some examples, for layers mentioned above, portions of layers surrounding virtual objects and portions of layers surrounding images of. real-world objects are blank and/or transparent. As depicted in Fig. 6B, second virtual monster 642 is partially hidden by table 630’ in the AR environment. In this case, second virtual monster 642 is located at a first position on second virtual object layer 686, where at least one portion of second virtual monster 642 is occluded by table 630’ (located on first intermediate object layer 684), when first intermediate object layer 684 is overlaid on second virtual object layer 686. The remaining portion of second virtual monster 642 is able to be viewed by the user via touch sensitive display 602 because the remaining portion of second virtual monster 642 is positioned at the blank and/or transparent portions of first intermediate object layer 684, first virtual object layer 682 and foreground object layer 680. [0068] In other examples, as depicted in Fig. 6C, second virtual monster 642 is fully hidden by table 630’ in the AR environment. In this case, second virtual monster 642 is located at a second position on second virtual object layer 686, where second virtual monster 642 in full is blocked by table 630’ (located on first intermediate object layer 684), when first intermediate object layer 684 is overlaid on second virtual object layer 686. The user is not able to view second virtual monster 642 via touch sensitive display 602 because second virtual monster 642 is fully occluded by table 630’.

[0069] FIG. 7B depicts an example of virtual object layers inserted between real- world object layers for displaying virtual objects within the AR environment of Fig. 6D. In some examples, first virtual monster 640 is configured to appear at a first predetermined position which is 25 cm from back-facing camera 624 in the AR environment. Second virtual monster 642 is configured to appear at a second predetermined position which is 50 cm from back-facing camera 624 in the AR environment. Third virtual monster 644 is configured to appear at a third

predetermined position which is 70 cm from back-facing camera 624.

[0070] In this case, first virtual monster 640 is overlaid on first virtual object layer 682, second virtual monster 642 is overlaid on second virtual object layer 686 and third virtual monster 644 is overlaid on third virtual layer 690. First virtual object layer 682, second virtual object layer 686 and third virtual object layer 690 are configured to be inserted in the AR environment. As mentioned above, foreground object layer 680 including hand 628’ before first virtual object layer 682. First intermediate object layer 684 including table 630’ is inserted between first virtual object layer 682 and second virtual object layer 686.

[0071] Smart device 600 will then determine whether one or more objects belong as a second group of intermediate objects. If one or more objects are positioned between second virtual monster 642 and third virtual monster 644, the one or more objects will belong to the second group of intermediate objects. According to the depth information, plant 636 is positioned at 60 cm from back-facing camera 624. Plant 636’is positioned between second virtual monster 642 and third virtual monster 644, with the result that plant 636’ belong to the second group of intermediate objects.

Plant 636’ is extracted from the captured image data and is overlaid onto second intermediate object layer 688. Second intermediate object layer 688 including plant 636’ is inserted between second virtual object layer 686 and third virtual object layer 690.

[0072] Both carpet 634’ and wall 632’ are positioned behind third virtual monster 644, with the result that carpet 634’ and wall 632’ belong to the group of background objects. Carpet 634’ and wall 632’ are extracted from the captured image data and are overlaid onto background object layer 692. Background object layer 692 including both wall 632’ and carpet 634’ is inserted behind third virtual object layer 690.

[0073] The number of intermediate layers being inserted between the foreground layer and background layer depends on the number of virtual objects configured to appear in the AR environment. Merely by way of example, when there are four virtual objects (with different distances from back-facing camera 624) configured to appear in the AR environment, first, second and third intermediate layers may be inserted between the foreground layer and the background layer. According to the various embodiments of the present invention, one or more intermediate layers may be inserted between the foreground layer and the background layer.

[0074] Turning now to FIG. 8A, an example process 800A is shown for displaying virtual monsters in the AR environment of Figs 6A - 6C and Fig.7A on smart device 600. In some examples, Process 800A includes capturing image data via a camera (back-facing camera 624 of smart device 600). (Block 801). In the real-world environment, hand 628, table 630, wall 632, carpet 634 and plant 636 are within the field of view of back-facing camera 624 and are captured by back-facing camera 624 as image data. The captured image data contains depth information associated with each of hand 628, table 630, wall 632, carpet 634 and plant 636. Based on the depth information, smart device 600 is able to determine a depth or distance between back- facing camera 624 and each of hand 628, table 630, wall 632, carpet 634 and plant 636.

[0075] A view (View 670A) of an AR environment is displayed on touch sensitive display 602 of smart device 600 (Block 802). The AR environment has a background (AR background) based on the captured data. View 670A includes hand 628’, table 630’, wall 632’, carpet 634’ and plant 636’. In some examples, at least two virtual objects are configured to appear in the AR environment. Smart device 600 identifies predetermined positions of the at least two virtual objects, relative to back-facing 624 (Block 803). For example, the at least two virtual objects include first virtual monster 640 and second virtual monster 642. First virtual monster 640 is configured to appear at a first predetermined position which is 25 cm from back-facing camera 624 in the AR environment. Second virtual monster 642 is configured to appear at a second predetermined position which is 50 cm from back-facing camera 624 in the AR environment. First virtual monster 640 is overlaid on first virtual object layer 682 and second virtual monster 642 is overlaid on second virtual object layer 686 (Block 804).

[0076] Based on the depth information, smart device 600 determines whether one or more objects belong to a group of foreground objects (Block 805). If one or more objects have a depth within a predetermined threshold depth or distance from back- facing camera 624, the one or more objects will belong to the group of foreground objects. For instance, the predetermined threshold depth or distance from back-facing camera 624 is 20 cm. Hand 628 is positioned at 10 cm from back-facing camera 624. Therefore, hand 628 belongs to the group of foreground objects. Hand 628’ is extracted from the captured image data and is overlaid onto foreground object layer 680 (Block 806). At Block 805, if no foreground objects are identified, Block 807 will be performed.

[0077] Smart device 600 will then determine whether one or more objects belong to a first group of intermediate objects (Block 808). If one or more objects are positioned between first virtual monster 640 and second virtual monster 642, the one or more objects will belong to the first group of intermediate objects. According to the depth information, table 630 is positioned at 40 cm from back-facing camera 624. Table 630’ is positioned between first virtual monster 640 and second virtual monster 642, with the result that table 630’ belongs to the first group of intermediate objects. Table 630’ is extracted from the captured image data and is overlaid on first intermediate object layer 684 (Block 809). At Block 808, if no intermediate objects are identified, Block 810 will be performed.

[0078] Smart device 600 will then determine whether one or more objects belong to a group of background objects (Block 811). If one or more objects are positioned behind second virtual monster 642, the one or more objects will belong to the group of background objects. According to the depth information, plant 636’ wall 632’ and carpet 634’ are positioned at more than 50cm from back-facing camera 624. Plant 636’, wall 632’ and carpet 634’ are positioned are behind second virtual monster 642, with the result that Plant 636’, wall 632’ and carpet 634’ belong to the group of background objects. Plant 636’, wall 632’ and carpet 634’ are extracted from the captured image data and are overlaid on background object layer 692 (Block 812). At Block 811, if no background objects are identified, Block 813 will be performed.

[0079] At Block 814, as depicted in Fig. 7A, foreground object layer 680 is inserted before first virtual object layer 682. First intermediate object layer 684 is inserted between first virtual object layer 682 and second virtual object layer 686. Background layer 692 is inserted behind second virtual object layer 686.

[0080] At Block 815, view 670A will be updated to view 670B of Fig. 6B or view 670C of Fig. 6C. For Blocks 807, 810 and 813, smart device 600 may quit the AR application, perform Block 801, perform Block 815 or turn off back-facing camera 624.

[0081] In one variant, as depicted in FIG. 8B, an alternative example process 800B is shown for displaying virtual monsters in the AR environment of Fig. 6D and Fig.7B on smart device 600. In some examples, Process 800B includes capturing image data via back-facing camera 624 (Block 821). In the real-world environment, hand 628, table 630, wall 632, carpet 634 and plant 636 are within the field of view of back- facing camera 624 and are captured by back-facing camera 624 as image data. The captured image data contains depth information associated with each of hand 628, table 630, wall 632, carpet 634 and plant 636. Based on the depth information, smart device 600 is able to determine a depth or distance between back-facing camera 624 and each of hand 628, table 630, wall 632, carpet 634 and plant 636.

[0082] A view (View 670A) of an AR environment is displayed on touch sensitive display 602 of smart device 600 (Block 822). View 670A includes hand 628’, table 630’, wall 632’, carpet 634’ and plant 636’. In some examples, three virtual objects are configured to appear in the AR environment. Smart device 600 identifies predetermined positions of first virtual monster 640, second virtual monster 642 and third virtual monster 644 respectively, relative to back-facing 624 (Block 823). For example, first virtual monster 640 is configured to appear at a first predetermined position which is 20 cm from back-facing camera 624 in the AR environment. Second virtual monster 642 is configured to appear at a second predetermined position which is 50 cm from back-facing camera 624 in the AR environment. Third virtual monster 644 is configured to appear at a third predetermined position which is 70 cm from back-facing camera 624 in the AR environment. First virtual monster 640 is overlaid on first virtual object layer 682, second virtual monster 642 is overlaid on second virtual object layer 686 and third virtual monster 644 is overlaid on third virtual object layer 690 (Block 824).

[0083] Based on the depth information, smart device 600 determines whether one or more objects belong to a group of foreground objects (Block 825). For instance, the predetermined threshold depth or distance from back-facing camera 624 is 20 cm. Hand 628 is positioned at 10 cm from back-facing camera 624. Therefore, hand 628 belongs to the group of foreground objects. Hand 628’ is extracted from the captured image data and is overlaid onto foreground object layer 680 (Block 826). At Block 825, if no foreground objects are identified, Block 827 will be performed.

[0084] Smart device 600 will then determine whether one or more objects belong to a first group of intermediate objects (Block 828). If one or more objects are positioned between first virtual monster 640 and second virtual monster 642, the one or more objects will belong to the first group of intermediate objects. According to the depth information, table 630 is positioned at 40 cm from back-facing camera 624. Table 630’ is positioned between first virtual monster 640 and second virtual monster 642, with the result that table 630’ belongs to the first group of intermediate objects. Table 630’ is extracted from the captured image data and is overlaid on first intermediate object layer 684 (Block 829). At Block 828, if no intermediate objects are identified, Block 830 will be performed.

[0085] Smart device 600 will then determine whether one or more objects belong to a second group of intermediate objects (Block 831). If one or more objects are positioned between second virtual monster 642 and third virtual monster 644, the one or more objects will belong to the second group of intermediate objects. According to the depth information, plant 636 is positioned at 60 cm from back-facing camera 624. Plant 636’ is positioned between second virtual monster 642 and third virtual monster 644, with the result that plant 636’ belongs to the second group of intermediate objects. Plant 636’ is extracted from the captured image data and is overlaid on second intermediate object layer 688 (Block 832). At Block 831 if no intermediate objects are identified, Block 833 will be performed.

[0086] Smart device 600 will then determine whether one or more objects belong to a group of background objects (Block 834). If one or more objects are positioned behind third virtual monster 644, the one or more objects belong to the group of background objects. According to the depth information, wall 632’ and carpet 634’ are positioned at 80 cm from back-facing camera 624. Wall 632’ and carpet 634’ are positioned are behind third virtual monster 644, with the result that wall 632’ and carpet 634’ belong to the group of background objects. Wall 632’ and carpet 634’ are extracted from the captured image data and are overlaid on background object layer 692 (Block 835). At Block 834 if no background objects are identified, Block 838 will be performed.

At Block 836, as depicted in Fig. 7B, foreground object layer 680 is inserted before first virtual object layer 682. First intermediate object layer 684 is inserted between first virtual object layer 682 and second virtual object layer 686. Second intermediate object layer 688 is inserted between second virtual object layer 686 and third virtual object layer 690. Background layer 692 is inserted behind third virtual object layer 690.

[0087] At Block 837, view 670A will be updated to view 670D of Fig. 6D. For Blocks 827, 830, 833 and 838, smart device 600 may quit the AR application, perform Block 821, perform Block 837 or turn off back-facing camera 624.

[0088] Some non-limiting feature combinations of the present technology are described in the aspects below.

[0089] 1. A method, comprising:

at an electronic device having a camera system:

capturing, at the camera system, image content and depth information associated with the image content;

distinguishing, based on the depth information, a first set of pixels from a second set of pixels in the image content, wherein the first set of pixels define a foreground layer and the second set of pixels define a background layer; and in accordance with a determination that a virtual object is available for display behind the first set of pixels, generating a first composite image by inserting a virtual object layer containing the virtual object between the foreground layer and the background layer.

[0090] 2. The method of aspect 1, further comprising:

determining whether the virtual object is available for display behind the first set of pixels.

[0091] 3. The method of any of aspects 1-2, further comprising:

causing displaying of the first composite image at a display screen, wherein a portion of the virtual object is occluded by at least a portion of the first set of pixels in the displayed first composite image.

[0092] 4. The method of any of aspects 1-3, further comprising:

in accordance with a determination that the virtual object is not available for display behind the first set of pixels, forgoing generating the first composite image.

[0093] 5. The method of any of aspects 1-4, further comprising:

in accordance with a determination that the virtual object is not available for display behind the first set of pixels:

determining that the virtual object is available for display in front of the first set of pixels; and

generating a second composite image by overlaying the virtual object layer over both the foreground layer and the background layer.

[0094] 6. The method of aspect 5, further comprising:

causing displaying of the second composite image at a display screen, wherein the virtual object appears in front of the first set of pixels and is not occluded by the first set of pixels in the second composite image.

[0095] 7. The method of any of aspects 1-4, further comprising:

altering a transparency level of at least a portion of the first set of pixels in the foreground layer that overlaps with a location of the virtual object in the virtual object layer; generating a third composite image by inserting the virtual object layer containing the virtual object between the altered foreground layer and the background layer; and

causing displaying of the third composite image at a display screen, wherein the virtual object appears in front of the first set of pixels and is not occluded by the first set of pixels in the third composite image.

[0096] 8. The method of any of aspects 1-7, wherein distinguishing the first set of pixels from the second set of pixels further comprises:

identifying, based on the depth information, pixels in the image content that are within a predetermined distance from the camera system as the first set of pixels;

extracting the first set of pixels from the image content; and

defining pixels remaining in the image content after the first set of pixels is extracted as the second set of pixels.

[0097] 9. The method of any of aspects 1-8, wherein distinguishing the first set of pixels from the second set of pixels further comprises:

detecting, based on the depth information, a closest object in the image content;

extracting pixels in the image content corresponding to the closest object as the first set of pixels; and

[0098] 10. The method of any of aspects 1-9, further wherein:

the first set of pixels corresponds to a foreground object detected in the image content; and

the second set of pixels corresponds to one or more background objects.

[0099] 11. The method of aspect 10, further comprising:

determining a display size of the virtual object based on at least one of a size of the foreground object and a size of the one or more background

objects.

[00100] 12. The method of any of aspects 3, 6, and 7, wherein the electronic device further comprises the display screen, the method further comprising:

displaying the combined virtual object, foreground layer, and

background layer at the display screen. [00101] 13. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device with a camera system, cause the device to perform any of the methods of aspects 1-12.

[00102] 14. An electronic device, comprising:

a camera system;

one or more processors;

memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods of aspects 1-12.

[00103] 15. An electronic device, comprising:

a camera system; and

means for performing any of the methods of aspects 1-12.

[00104] Various exemplary embodiments are described herein. Reference is made to these examples in a non-limiting sense. They are provided to illustrate more broadly applicable aspects of the disclosed technology. Various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the various embodiments. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process, process act(s) or step(s) to the objective(s), spirit or scope of the various embodiments. Further, as will be appreciated by those with skill in the art, each of the individual variations described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the various embodiments.

Claims

CLAIMS What is claimed is:

1. A method, comprising:

at an electronic device having a camera system:

2. The method of claim 1, further comprising:

3. The method of claim 1, further comprising:

4. The method of claim 1, further comprising:

5. The method of claim 1, further comprising:

determining that the virtual object is available for display in front of the first set of pixels; and generating a second composite image by overlaying the virtual object layer over both the foreground layer and the background layer.

6. The method of claim 5, further comprising:

7. The method claim 1, further comprising:

altering a transparency level of at least a portion of the first set of pixels in the foreground layer that overlaps with a location of the virtual object in the virtual object layer;

generating a third composite image by inserting the virtual object layer containing the virtual object between the altered foreground layer and the background layer; and

8. The method of claim 1, wherein distinguishing the first set of pixels from the second set of pixels further comprises:

extracting the first set of pixels from the image content; and

9. The method of claim 1, wherein distinguishing the first set of pixels from the second set of pixels further comprises:

detecting, based on the depth information, a closest object in the image content; extracting pixels in the image content corresponding to the closest object as the first set of pixels; and

10. The method of claim 1, further wherein:

the second set of pixels corresponds to one or more background objects.

11. The method of claim 10, further comprising:

determining a display size of the virtual object based on at least one of a size of the foreground object and a size of the one or more background objects.

12. The method of claim 3, wherein the electronic device further comprises the display screen, the method further comprising:

displaying the combined virtual object, foreground layer, and background layer at the display screen.

13. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device with a camera system, cause the device to perform any of the methods of claims 1-12.

14. An electronic device, comprising:

a camera system;

one or more processors;

memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods of claims 1-12.

15. An electronic device, comprising:

a camera system; and

means for performing any of the methods of claims 1-12.