CN117333788A - Content conversion based on reflective object recognition - Google Patents

Content conversion based on reflective object recognition Download PDF

Info

Publication number
CN117333788A
CN117333788A CN202310798444.3A CN202310798444A CN117333788A CN 117333788 A CN117333788 A CN 117333788A CN 202310798444 A CN202310798444 A CN 202310798444A CN 117333788 A CN117333788 A CN 117333788A
Authority
CN
China
Prior art keywords
user
electronic device
physical environment
virtual content
reflective object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310798444.3A
Other languages
Chinese (zh)
Inventor
横川裕
D·W·查尔梅斯
B·W·坦普尔
R·耐尔
T·G·索尔特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apple Inc
Original Assignee
Apple Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/214,575 external-priority patent/US20240005612A1/en
Application filed by Apple Inc filed Critical Apple Inc
Publication of CN117333788A publication Critical patent/CN117333788A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • AHUMAN NECESSITIES
    • A47FURNITURE; DOMESTIC ARTICLES OR APPLIANCES; COFFEE MILLS; SPICE MILLS; SUCTION CLEANERS IN GENERAL
    • A47GHOUSEHOLD OR TABLE EQUIPMENT
    • A47G1/00Mirrors; Picture frames or the like, e.g. provided with heating, lighting or ventilating means
    • A47G1/02Mirrors used as equipment
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/017Head mounted
    • G02B27/0172Head mounted characterised by optical features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04815Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04883Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
    • H04N13/117Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation the virtual viewpoint locations being selected by the viewers or determined by viewer tracking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/332Displays for viewing with the aid of special glasses or head-mounted displays [HMD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/366Image reproducers using viewer tracking
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/017Head mounted
    • G02B2027/0178Eyeglass type
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/011Emotion or mood input determined on the basis of sensed human body parameters such as pulse, heart rate or beat, temperature of skin, facial expressions, iris, voice pitch, brain activity patterns

Abstract

The present invention relates to content conversion based on reflective object identification. "various implementations disclosed herein include devices, systems, and methods for rendering virtual content based on detecting a reflective object and determining a three-dimensional (3D) position of the reflective object in a physical environment. For example, an exemplary process may include obtaining sensor data (e.g., images, sounds, motion, etc.) from a sensor of an electronic device in a physical environment that includes one or more objects. The method may also include detecting a reflective object of the one or more objects based on the sensor data. The method may also include determining a 3D position of the reflective object in the physical environment (e.g., where a plane of a mirror is located). The method may also include presenting the virtual content in a view of the physical environment. The virtual content may be positioned at a 3D location based on the 3D location of the reflective object.

Description

Content conversion based on reflective object recognition
Technical Field
The present disclosure relates generally to displaying content by an electronic device, and in particular, to systems and methods of presenting content in response to detecting reflective objects in a real-world physical environment.
Background
Electronic devices are typically used to present a view to a user that includes virtual content and content from the surrounding physical environment. It may be desirable to provide a means to effectively detect the appropriate times and locations to provide these views.
Disclosure of Invention
Various implementations disclosed herein include devices, systems, and methods for displaying virtual content based on detecting mirrors in a physical environment and determining a three-dimensional (3D) position of the mirror (e.g., where a plane of the mirror is located). Mirrors may be detected in one or more objects in a physical environment based on sensor data (e.g., images, depth data, sonar, motion data, position data, etc.). Mirror detection may involve object recognition using machine learning techniques, sonar detection, image detection, and the like. Mirror detection may involve detecting that a user is wearing a device, such as a Head Mounted Display (HMD), and standing in front of the mirror (e.g., based on reflection, detecting the shape of the HMD, detecting Light Emitting Diodes (LEDs) on the HMD, etc.). Mirror detection may involve detecting a light source (e.g., LEDs around the frame of the mirror, infrared (IR) LEDs behind the mirror) or some type of other marker on or near the mirror (e.g., an augmented reality marker). Mirror detection techniques may include detecting and/or tracking facial regions over time. Mirror detection may involve comparing the position and relative angle of facial boundaries. Face tracking may involve tracking the face/head of a user wearing the HMD, and thus may involve tracking facial features and/or tracking HMD features (e.g., eye gaze detection, etc.). Determining the 3D position of the mirror may be based on comparing the reflected HMD position over time with the HMD position from the associated motion data. Determining the 3D position of the mirror may be based on sampling a midpoint between the reflected position and the position of the real head-mounted device and planar detection. Determining the 3D position of the mirror may be based on checking whether the reflected HMD rotation is a mirrored rotation of the real HMD through the computing plane. This technique may involve determining a mirror boundary (e.g., a boundary rectangle) based on a technique that detects differences between the reflection and the surface.
In some implementations, a context associated with use of the electronic device in the physical environment can be determined, and virtual content can be displayed and/or modified based on the determined context. For example, a user is detected in a situation in which the user would benefit from virtual content assistance (e.g., a particular application). Exemplary contexts in which virtual content (e.g., a particular application) may be triggered include: (a) user activity at time of day (e.g., first time in the morning to look at a mirror, display a calendar application and a news application), (b) user execution activity (e.g., make-up will trigger an enhancement or beauty application, or try on new clothing will trigger a clothing application), (c) user action in a particular manner or verbal command (e.g., dancing to record social media video), and/or (d) user proximity to a particular location, object, or person (e.g., if a detected mirror is in a gym, display a gym application to track progress, etc.).
In some implementations, virtual content may be provided in one or more different sets of views to improve the user experience (e.g., while wearing a head-mounted display (HMD)). Some implementations allow interactions with virtual content (e.g., application gadgets). In some implementations, a device (e.g., a handheld device, laptop, desktop, or HMD) provides a view (e.g., a visual experience and/or an auditory experience) of a 3D environment to a user and obtains physiological data (e.g., gaze characteristics) and motion data (e.g., a controller moving an avatar, head movement, etc.) associated with a user's response with a sensor. Based on the obtained physiological data, the techniques described herein may determine vestibular cues of a user during viewing of a 3D environment (e.g., an augmented reality (XR) environment) by tracking gaze characteristics and other interactions of the user (e.g., user movements in a physical environment). Based on the vestibular cues, the techniques may detect interactions with virtual content and provide different sets of views to improve the user experience when viewing the 3D environment. Virtual content that appears to be positioned on (or based on) a mirror may be positioned on or based on a 3D location of a reflection (e.g., a mirror). The virtual content may also be interactive such that a user may change the size of an application, move an application, select one or more selectable icons on an application, close an application, etc.
In general, one innovative aspect of the subject matter described in this specification can be embodied in methods that include, at an electronic device having a processor and a sensor, the acts of: obtaining sensor data from a sensor of an electronic device in a physical environment including one or more objects, detecting a reflective object of the one or more objects based on the sensor data, determining a three-dimensional (3D) position of the reflective object in the physical environment, and presenting virtual content in a view of the physical environment, wherein the virtual content is positioned at the 3D position based on the 3D position of the reflective object.
These and other embodiments can each optionally include one or more of the following features.
In some aspects, determining the 3D position of the reflective object in the physical environment includes detecting a plane of the reflective object, and wherein the virtual content includes one or more interactable elements presented on the detected plane of the reflective object.
In some aspects, determining the 3D location of the reflective object in the physical environment comprises: detecting a reflection of the electronic device on the reflective object, determining a 3D position of the reflection of the electronic device, and determining a midpoint position between the 3D position of the reflection of the electronic device and the 3D position of the electronic device based on the 3D position of the reflective object.
In some aspects, determining the 3D position of the reflective object in the physical environment according to the electronic device rotating about a vertical axis during a first period of time includes: the method includes detecting a reflection of a rotation of the electronic device on the reflective object, determining 3D position data of the reflection of the rotation of the electronic device during the first time period, and determining the 3D position of the reflective object based on comparing the 3D position data of the electronic device with the 3D position data of the reflection of the rotation of the electronic device during the first time period.
In some aspects, the 3D position of the reflective object is determined based on depth data from the sensor data, and the 3D position of the virtual content is based on the depth data associated with the 3D position of the reflective object.
In some aspects, the 3D location of the reflective object includes a 3D location of the reflective object at a first distance from the electronic device, the 3D location of the virtual content of the view of the physical environment is at a second distance from the electronic device that is greater than the first distance, and presenting the virtual content in the view of the physical environment includes presenting spatialized audio at a perceived distance to a sound source based on the 3D location of the virtual content.
In some aspects, the sensor data includes an image of a head of a user of the electronic device, and wherein detecting the reflective object is based on determining: the head of the user rotates about a vertical axis by an amount that is twice the amount of rotation of the electronic device as seen in the image; or the head of the user does not rotate about a forward axis as seen in the image.
In some aspects, the method further comprises the acts of: a reflection in the reflective object of the one or more objects is detected based on the sensor data, wherein detecting the reflection is based on tracking facial features of a user of the electronic device or facial recognition of the user.
In some aspects, the method further comprises the acts of: in accordance with detecting the reflective object, a context associated with use of the electronic device in the physical environment is determined based on the sensor data, and the virtual content is presented based on the context.
In some aspects, the context includes a time of day and presenting the virtual content is based on the time of day, the context includes movement of a user of the electronic device relative to a reflection in the reflective object and presenting the virtual content is based on the movement of the user, or the context includes user interaction with the reflection and presenting the virtual content is based on the user interaction with the reflection.
In some aspects, determining the context includes: determining the use of the electronic device in a new location, determining the use of the electronic device during a certain type of activity, or determining that the electronic device is within a proximity threshold distance of a location, an object, another electronic device, or a person.
In some aspects, the method further comprises the acts of: the method may include determining a scene understanding of the physical environment based on the sensor data, determining that a user of the electronic device is a unique user within an area associated with the view of the physical environment based on the scene understanding, and presenting the virtual content based on user preference settings associated with the user being the unique user within the area associated with the view of the physical environment.
In some aspects, the virtual content includes a visual representation of another user based on a communication session with another electronic device.
In some aspects, the depth position of the 3D position of the virtual content is the same as the depth of the reflective object detected in the reflection.
In some aspects, the electronic device is a Head Mounted Device (HMD).
According to some implementations, a non-transitory computer readable storage medium has stored therein instructions that are computer executable to perform or cause to be performed any of the methods described herein. According to some implementations, an apparatus includes one or more processors, non-transitory memory, and one or more programs; the one or more programs are stored in the non-transitory memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing or causing performance of any of the methods described herein.
Drawings
Accordingly, the present disclosure may be understood by those of ordinary skill in the art, and the more detailed description may reference aspects of some illustrative implementations, some of which are shown in the accompanying drawings.
FIG. 1 illustrates a device that presents a visual environment and obtains physiological data from a user in a real-world physical environment that includes mirrors, according to some implementations.
Fig. 2A, 2B, and 2C illustrate a user wearing a Head Mounted Device (HMD) according to some implementations, and the HMD is detecting a mirror.
Fig. 3 illustrates an exemplary view of the electronic device of fig. 1, according to some implementations.
Fig. 4 illustrates an exemplary view of the electronic device of fig. 1, according to some implementations.
Fig. 5 illustrates an exemplary view of the electronic device of fig. 1, according to some implementations.
Fig. 6 illustrates an exemplary electronic device operating in a different physical environment during a communication session, with a view of a first device having a user representation of a second user on a mirror, according to some implementations.
FIG. 7 is a flow chart representation of presenting virtual content at a three-dimensional (3D) location based on detecting and determining a 3D location of mirrors in a physical environment, according to some implementations.
Fig. 8 illustrates device components of an exemplary device according to some implementations.
Fig. 9 illustrates an example of an HMD according to some implementations.
The various features shown in the drawings may not be drawn to scale according to common practice. Accordingly, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. In addition, some figures may not depict all of the components of a given system, method, or apparatus. Finally, like reference numerals may be used to refer to like features throughout the specification and drawings.
Detailed Description
Numerous details are described to provide a thorough understanding of the example implementations shown in the drawings. However, the drawings illustrate only some example aspects of the disclosure and therefore should not be considered limiting. It will be apparent to one of ordinary skill in the art that other effective aspects or variations do not include all of the specific details set forth herein. Moreover, well-known systems, methods, components, devices, and circuits have not been described in detail so as not to obscure the more pertinent aspects of the exemplary implementations described herein.
Fig. 1 shows a real world physical environment 100 comprising a device 10 with a display 15. In some implementations, the device 10 presents the content 20 to the user 25, as well as visual characteristics 30 associated with the content 20. In some examples, the content 20 may be buttons, user interface icons, text boxes, graphics, and the like. In some implementations, visual characteristics 30 associated with content 20 include visual characteristics such as hue, saturation, size, shape, spatial frequency, motion, highlighting, and the like. For example, the content 20 may be displayed with a green highlighting visual characteristic 30 that overlays or surrounds the content 20.
In addition, the physical environment 100 includes a door 150 and a mirror 160. The mirror 160 reflects a portion of the physical environment 100. For example, as shown, mirror 160 shows reflection 125 of user 25 and reflection 110 of device 10 when user 25 is looking at his or her own reflection 125. For illustrative purposes, the remaining environment (e.g., background) of the physical environment 100 behind the user 25 is not shown in fig. 1 or other figures, so that focusing on the subject matter discussed herein (e.g., the user 25 and the reflective portion of the device 10) is easier.
In some implementations, the content 20 may represent a visual three-dimensional (3D) environment (e.g., an augmented reality (XR) environment), and the visual characteristics 30 of the 3D environment may change continuously. The head pose measurements may be obtained by an Inertial Measurement Unit (IMU) or other tracking system. In one example, a user may perceive a real world physical environment while holding, wearing, or approaching an electronic device that includes one or more sensors that obtain physiological data to evaluate eye characteristics indicative of the user's gaze characteristics and movement data of the user.
In some implementations, the visual characteristics 30 are feedback mechanisms of a user's view specific to the 3D environment (e.g., visual cues or audio cues presented during viewing). In some implementations, the view of the 3D environment (e.g., content 20) may occupy the entire display area of the display 15. For example, the content 20 may include a sequence of images as visual characteristics 30 and/or audio cues presented to the user (e.g., 360 degree video on a Head Mounted Device (HMD)).
The device 10 obtains physiological data (e.g., pupil data) from the user 25 via sensors 35 (e.g., one or more cameras facing the user to capture light intensity data and/or depth data of the user's facial features and/or eye gaze). For example, the device 10 obtains eye gaze characteristic data 40. While this example and other examples discussed herein illustrate a single device 10 in a real-world physical environment 100, the techniques disclosed herein are applicable to multiple devices and other real-world physical environments. For example, the functions of device 10 may be performed by a plurality of devices.
In some implementations, as shown in fig. 1, the device 10 is a handheld electronic device (e.g., a smart phone or tablet computer). In some implementations, the device 10 is a wearable HMD. In some implementations, the device 10 is a laptop computer or a desktop computer. In some implementations, the device 10 has a touch pad, and in some implementations, the device 10 has a touch sensitive display (also referred to as a "touch screen" or "touch screen display").
In some implementations, the device 10 includes sensors 60, 65 for acquiring image data of the physical environment. The image data may include light intensity image data and/or depth data. For example, sensor 60 may be one or more cameras for capturing RGB data, and sensor 65 may be one or more depth sensors (e.g., structured light, time of flight, etc.) for capturing depth data.
In some implementations, the device 10 includes an eye tracking system for detecting eye position and eye movement. For example, the eye tracking system may include one or more Infrared (IR) Light Emitting Diodes (LEDs), an eye tracking camera (e.g., a Near Infrared (NIR) camera), and an illumination source (e.g., an NIR light source) that emits light (e.g., NIR light) to the eyes of the user 25. Further, the illumination source of the device 10 may emit NIR light to illuminate the eyes of the user 25, and the NIR camera may capture images of the eyes of the user 25. In some implementations, images captured by the eye tracking system may be analyzed to detect the position and movement of the eyes of user 25, or to detect other information about the eyes, such as pupil dilation or pupil diameter. Further, gaze points estimated from eye-tracked images may enable gaze-based interactions with content shown on a display of the device 10. In some implementations, one or more IR LEDs may face outward (e.g., toward a physical environment, such as toward mirror 160), which may be used for reflection by mirror 160 in order to detect the position of mirror 160 by device 110 via sensors 60, 65, as discussed further herein.
In some implementations, the device 10 has a Graphical User Interface (GUI), one or more processors, memory, and one or more modules, programs, or sets of instructions stored in the memory for performing a plurality of functions. In some implementations, the user 25 interacts with the GUI through finger contacts and gestures on the touch-sensitive surface. In some implementations, these functions include image editing, drawing, rendering, word processing, web page creation, disk editing, spreadsheet making, game playing, phone calls, video conferencing, email sending and receiving, instant messaging, fitness support, digital photography, digital video recording, web browsing, digital music playing, and/or digital video playing. Executable instructions for performing these functions may be included in a computer-readable storage medium or other computer program product configured for execution by one or more processors.
In some implementations, the device 10 employs various physiological sensors, detection or measurement systems. In an exemplary implementation, the detected physiological data includes head pose measurements determined by an IMU or other tracking system. In some implementations, the detected physiological data can include, but is not limited to: electroencephalogram (EEG), electrocardiogram (ECG), electromyogram (EMG), functional near infrared spectrum signals (fNIRS), blood pressure, skin conductance or pupillary response. In addition, the device 10 may detect multiple forms of physiological data simultaneously to benefit from the simultaneous acquisition of physiological data. Furthermore, in some implementations, the physiological data represents involuntary data, i.e., responses that are not consciously controlled. For example, the pupillary response may be indicative of involuntary movement.
In some implementations, a machine learning model (e.g., a trained neural network) is applied to identify patterns in physiological data, including identifying physiological responses to viewing a 3D environment (e.g., content 20 of fig. 1). Further, a machine learning model may be used to match these patterns with learning patterns corresponding to indications of interests or intentions of the user 25 interactions. In some implementations, the techniques described herein may learn patterns specific to a particular user 25. For example, the technique may begin learning from determining that the peak pattern represents an indication of the user's 25 interest or intent in response to a particular visual characteristic 30 while viewing the 3D environment, and use that information to subsequently identify a similar peak pattern as another indication of the user's 25 interest or intent. Such learning may allow for relative interactions of the user with the plurality of visual features 30 in order to further adjust the visual features 30 and enhance the physiological response of the user to the 3D environment.
In some implementations, the position and features (e.g., edges of eyes, nose, or nostrils) of the head 27 of the user 25 are extracted by the device 10 and used to find coarse position coordinates of the eyes 45 of the user 25, thereby simplifying the determination of accurate eye 45 features (e.g., position, gaze direction, etc.) and making gaze characteristic measurements more reliable and robust. Furthermore, device 10 may easily combine the position of the 3D component of head 27 with gaze angle information obtained by eye component image analysis in order to identify a given screen object that user 25 views at any given time. In some implementations, the use of 3D mapping in combination with gaze tracking allows the user 25 to freely move his or her head 27 and eyes 45 while reducing or eliminating the need to actively track the head 27 using sensors or transmitters on the head 27.
By tracking the eyes 45, some implementations reduce the need to recalibrate the user 25 after the user 25 moves his or her head 27. In some implementations, the device 10 uses depth information to track movement of the pupil 50, thereby enabling calculation of a reliably presented pupil diameter based on a single calibration by the user 25. Using techniques such as Pupil Center Cornea Reflection (PCCR), pupil tracking, and pupil shape, device 10 may calculate the pupil diameter and gaze angle of eye 45 from the points of head 27, and use the positional information of head 27 to recalculate the gaze angle and other gaze characteristic measurements. In addition to reduced recalibration, further benefits of tracking head 27 may include reducing the number of light projection sources and reducing the number of cameras used to track eye 45.
In some implementations, the pupillary response may be responsive to auditory stimuli detected by one or both ears 70 of user 25. For example, device 10 may include a speaker 12 that projects sound via sound waves 14. The device 10 may include other audio sources such as a headphone jack for headphones, a wireless connection to an external speaker, and so forth.
Fig. 2A-2C illustrate a user wearing an HMD while a device is detecting mirrors in a physical environment, according to some implementations. Fig. 2A-2C illustrate the physical environment 100 of fig. 1, except that the device 10 is replaced by a device 210 (HMD). The physical environment 100 includes a door 150 and a mirror 160 that reflects a portion of the physical environment 100. For example, as shown in fig. 2A-2C, mirror 160 shows a reflection 125 of user 25 and a reflection 220 of device 210 when user 25 is looking at his or her own reflection 125 while wearing device 210 on his or her head 27. For example, the device 210 (HMD) may be configured as a pass-through video device via a camera, or the device 210 may include a transparent element that allows the user 25 to view the physical environment.
Fig. 2A illustrates a user 25 wearing a device 210 according to some implementations, and the device 210 is detecting a mirror 160 in the physical environment 100. Specifically, the device 210 detects the mirror 160 by acquiring image data via the sensor 260 and determining that the reflection 220 of the device 210 depicts the HMD (e.g., based on shape) using one or more known techniques (e.g., object detection algorithms, etc.).
Fig. 2B illustrates a user 25 wearing a device 210 according to some implementations, and the device 210 is detecting a mirror 160 in the physical environment 100. Specifically, the device 210 detects the mirror 160 by emitting IR LED light 272 from the IR LED emitter 270 and acquiring data via the sensor 260 (e.g., an IR LED light detector), and determines that the acquired data is indicative of detection of IR LED reflection 276 using one or more known techniques for detecting IR LED reflection.
Fig. 2C illustrates a user 25 wearing a device 210 according to some implementations, and the device 210 is detecting a mirror 160 in the physical environment 100. Specifically, the device 210 detects the mirror 160 by acquiring data of light (e.g., IR LED light) from the light sources 280, 282, 284, and/or 286 (e.g., IR LED emitters) via the sensor 260 (e.g., IR LED light detector) using one or more known techniques for detecting light. Based on the detected light from one or more of the light sources 280, 282, 284, and 286, the device 210 may determine the plane and/or boundary of the mirror 160. For illustrative purposes, light sources 280, 282, 284, and 286 are shown in the corners. The location of each light source (e.g., IR LED light, etc.) may be placed at any location on the mirror 160 or may be placed on an edge or frame of the mirror 160.
Additionally or alternatively, in some implementations, the indicia 290 may be utilized to detect the mirror 160. For example, in a physical environment, a physical element (e.g., QR code, bar code, decal, etc.) may be placed on mirror 160 such that device 210 may detect mark 290 via image data from sensor 260 and determine that the object (mirror 160) is a mirror based on the data acquired from mark 290. Alternatively, in a virtual environment (such as an augmented reality environment), the marker 290 may be considered merely a virtual or augmented sticker viewable by the device 210. For example, a user or system may "mark" or designate the mirror 160 as a mirror, and virtually place the mark 290 at those particular coordinates, such that during additional use, once the device 210 detects the mark 290, the system may quickly and automatically determine that the object is the mirror 160 without having to make other calculations or determinations (e.g., based on reflection data or IR LED light data, etc.). In some implementations, the data obtained from the tag 290 may also be used by the device 10, 210, etc. to pull data of any type of object (e.g., a bar code of a product, such as the mirror 160) in which it is placed. The data obtained from the markers 290 may include information, such as boundaries, that may be used with the techniques described herein for the size of the display (if applicable) of the virtual content to be displayed thereon.
With respect to different methods for detecting mirrors, referring back to fig. 2A-2C, using the techniques described herein, the device 210 or device 10 may determine the 3D position of the mirror 160 in the physical environment 100. For example, in accordance with the techniques, the system may determine the position of the plane of the mirror 160. Additionally or alternatively, the techniques may define or determine boundaries of a plane of the mirror within a 3D coordinate system of the physical environment. For example, determining the boundaries may include determining a defined length and width of the physical mirror 160. For mirror surfaces that are too large for a particular view, the system may define the boundary edge as smaller (e.g., smaller area for displaying virtual content). The 3D mirror position may be determined based on a comparison of the reflective headset position over time with the headset position from the motion data. In addition, the 3D mirror position may be determined based on sampling a midpoint between the reflected position and the position of the real head mounted device and planar detection. Furthermore, the 3D mirror position may be determined based on checking whether the reflected headset rotation is a mirror rotation of the real headset through the computing plane. This may further involve determining mirror boundaries (e.g., boundary rectangles) based on techniques that detect differences between the reflection and the surface.
Fig. 3 illustrates an exemplary view 300 of the physical environment 100 provided by the electronic device 10 of fig. 1. View 300 may be a live camera view of physical environment 100, a view of physical environment 100 through a see-through display, or a view generated based on a 3D model corresponding to physical environment 100. View 300 includes depictions of aspects of physical environment 100, such as representation 360 of mirror 160 and representation 375 of clock 370 (not shown in fig. 1 based on the perspective of user 25 in fig. 1). Within view 300 of representation 360 of mirror 160 is representation 325 of reflection 125 of user 25 and representation 310 of reflection 110 of device 10.
In addition, the view of representation 360 (e.g., mirror 160) includes virtual content 314, virtual content 316, and audio content 318, which may be presented (e.g., displayed and/or sounded) based on: a reflection is detected or a mirror 160 is detected in one or more objects in the physical environment 100 (e.g., reflection from the object surface is detected) based on the obtained sensor data. Virtual content 314 and virtual content 316 may be selected for presentation based on context and positioned at a 3D location based on the 3D location of mirror 160 (e.g., the reflective surface of the object). For example, the determined context of the current environment of FIG. 3 may be that user 25 looks himself or herself in mirror 160 early in the morning (e.g., clock 370 shows time 6:15 am). Thus, the context includes the time of day, and presenting the virtual content 314, 316 (e.g., virtual applications) is based on the time of day. In other words, device 10 may provide a calendar application (virtual content 316), a news application (virtual content 314), and the like, detecting that user 25 is in a situation in which the user would benefit from a particular user interface, such as in the morning. In addition, virtual content 314, 316 may be presented such that it appears to be located on the surface of representation 360 of mirror 160 in view 300. In other examples, virtual content 314, 316 may be presented at a depth corresponding to the depth of reflection 125 of user 25 (e.g., twice the distance between user 25 and mirror 160). Positioning the virtual content 314, 316 at this depth may advantageously prevent the user 25 from having to change the focal plane of his eyes when looking at the virtual content 314, 316 from his reflection 125, or vice versa. In still other examples, the virtual content 314, 316 may be presented such that it appears to be located elsewhere in the view 300.
In some implementations, the virtual content 314, 316 and/or the audio content 318 may be provided automatically (e.g., in response to a context determined based on time of day). Additionally or alternatively, virtual content 314, 316 and/or audio content 318 may be provided through user interaction with device 10 (e.g., selecting an application) or through a verbal request (e.g., "please show my calendar application on a mirror"). The virtual content 314, 316 is for illustrative purposes. Additionally or alternatively, the virtual content 314, 316 may include spatialized audio and/or video. For example, instead of showing a news feed application, virtual content 314 may include a live news channel displayed on representation 360 of mirror 160 (e.g., a news television channel that is virtually displayed within representation 360 of mirror 160). In an exemplary implementation, the virtual content includes audio cues that are played using spatial audio to hear from a 3D location on the representation 360 of the mirror 160, where the 3D location is determined based on context, the location of the reflection (e.g., the mirror 160), or both. For example, for a particular use case in which an application desires to direct the user's attention to a particular location (and a particular depth), the user 25 may be presented with a spatialized audio prompt to cause him or her to see a reflection of an object that may be at a distance (but within the field of view of the mirror 160).
The audio content 318 is represented by an icon in fig. 3, but the audio content 318 may or may not be visible to the user, and is shown in fig. 3 to represent the location of the spatialized audio that the user will hear as being "behind" the mirror. For example, based on the detection plane of the mirror 160, a perceived distance to the sound source may be achieved. In some implementations, the spatialized audio may be viewed within the virtual reflection. For example, a virtual application such as an avatar may appear to speak from a reflected 3D location rather than from a mirror's 3D location (e.g., the reflection and associated sound may appear to be at a distance twice as great as the distance from the actual surface of the mirror). Alternatively, if the sound associated with the icon 318 does not have a particular source (e.g., is not from a particular application or an avatar from another user, etc.), the user may also be shown an icon of the audio content 318.
The electronic device 10 determines whether an object in the physical environment 100 is a reflective object (such as mirror 160) using one or more mirror detection techniques discussed further herein. For example, a mirror or other reflective surface is detected in one or more objects in a physical environment. For example, a user may wake up in the morning to look at a mirror through his or her device (e.g., device 10), and the system may detect the mirror based on one or more mirror detection techniques. Mirror detection techniques may include detecting that a user is wearing a device (such as an HMD) and standing in front of a mirror (e.g., based on reflection, detecting a shape of the HMD, detecting LEDs on the HMD, etc.). Mirror detection may involve detecting a light source (e.g., LEDs around the frame of the mirror, IR LEDs behind the mirror) or some type of other marker on or near the mirror (e.g., an augmented reality marker). Mirror detection techniques may include using face detection areas over time, mirrors may be detected by comparing the position and relative angle of face boundaries while wearing the HMD. The mirror may be detected based on object recognition using ML techniques. In some examples, detecting the mirror may include using a face detection area over time, where reflections may be detected by comparing a position and a relative angle of a face boundary when wearing the HMD. Once it has been detected that the user is looking at the mirror, the techniques described herein may determine whether the reflection is a mirror and detect the boundary of the mirror based on shape analysis, edge detection, and/or loss of tracking face recognition in the area.
Fig. 4 illustrates an exemplary view 400 of the physical environment 100 provided by the electronic device 10. View 400 may be a live camera view of physical environment 100, a view of physical environment 100 through a see-through display, or a view generated based on a 3D model corresponding to physical environment 100. View 400 includes depictions of aspects of physical environment 100, such as representation 460 of mirror 160. Within view 400 of representation 460 of mirror 160 is representation 425 of reflection 125 of user 25 and representation 410 of reflection 110 of device 10.
In addition, the view of representation 460 (e.g., mirror 160) includes virtual content 414 and virtual content 416, which may be presented (e.g., display and/or emit spatialized sound) based on: a reflection is detected or a mirror 160 is detected in one or more objects in the physical environment 100 (e.g., reflection from the object surface is detected) based on the obtained sensor data. Virtual content 414 and virtual content 416 may be selected for presentation based on the context and positioned at a 3D location based on the 3D location of mirror 160 (e.g., the reflective surface of the object). For example, the determined context of the current environment of fig. 4 may be that the user 25 is looking at himself or herself in the mirror 160 and is not happy and/or shows an incorrect posture while standing (e.g., standing without a fine pick). Thus, the context includes physiological data about the user 25, and presenting the virtual content 414, 416 (e.g., virtual application) is based on physiological tracking. In some implementations, physiological-based tracking (such as skeletal tracking) may be used for guidance, assessment, and feedback related to posture, exercise, sports, clothing, and the like. In an exemplary implementation, the sensor data obtained by the device 10 includes physiological data of a user of the electronic device, and the techniques described herein include detecting movement of the user based on the physiological data, and modifying virtual content in a view of the physical environment based on the detected movement of the user. In other words, device 10 may provide a gesture analyzer application (virtual content 414) and/or provide an exercise application (virtual content 416), etc., detecting that user 25 is in a situation in which the user would benefit from a particular user interface. In addition, the virtual content 414, 416 may be presented such that it appears to be located on the surface of the representation 460 of the mirror 160 in the view 400. In other examples, the virtual content 414, 416 may be presented at a depth corresponding to the depth of the reflection 125 of the user 25 (e.g., twice the distance between the user 25 and the mirror 160). Positioning the virtual content 414, 416 at this depth may advantageously prevent the user 25 from having to change the focal plane of his eyes when looking at the virtual content 414, 416 from his reflection 125, or vice versa. In still other examples, the virtual content 414, 416 may be presented such that it appears to be located elsewhere in the view 400.
Fig. 5 illustrates an exemplary view 500 of the physical environment 100 provided by the electronic device 10. View 500 may be a live camera view of physical environment 100, a view of physical environment 100 through a see-through display, or a view generated based on a 3D model corresponding to physical environment 100. View 500 includes depictions of aspects of physical environment 100, such as representation 560 of mirror 160 and user representation 525 of reflection 125 of user 25. For example, as shown in fig. 5, the user representation 525 is presented as virtual content (e.g., memoji or animoji, such as a wolf's face and associated body) over the body and face of the reflection 125 of the user 25. Other techniques may be used to present virtual content only over the user's face (e.g., only a wolf face avatar is presented, but the user's hands, body, etc. are still shown in reflection by a mirror).
In some implementations, the virtual content (e.g., virtual content 314, 316, 414, 416, 525, and audio content 318) of fig. 3-5 may be modified over time based on the proximity of the electronic device to the anchor location (e.g., mirror 160). For example, when the user 25 becomes closer, the spatialized audio notification (e.g., audio content 318) may indicate a closer proximity. Additionally or alternatively, for visual icons, if the user begins to walk away from the mirror 160 in a different direction, the virtual content may increase in size or begin to blink. In addition, the virtual content may include a text widget that informs the user of the location of objects displayed within the reflection of the mirror (or any reflective surface of the object).
In some implementations, visual transition effects (e.g., fading, blurring, etc.) may be applied to virtual content to provide a more pleasing XR experience to the user. For example, a visual transition effect may be applied to virtual content when a user turns away from the virtual content more than a threshold amount (e.g., outside of an activation zone). Defining the activation zone based on the anchored content object encourages the user to remain relatively stationary and provides a target object to focus on. As the user moves, the visual transition effect applied to the virtual content may indicate to the user that the virtual content is to be deactivated (e.g., faded out). Thus, the user can dismiss the virtual content by turning away from the virtual content. In some implementations, transitioning away or fading out the virtual content may be based on the rate at which its head or electronic device 10 is turned exceeding a threshold or the amount by which its head or electronic device 10 is turned exceeding a threshold, such that the virtual content will remain in a 3D position in which it was just prior to the user turning its head or electronic device 10.
The views 300, 400, 500, etc. display the virtual content as being in the first world-locked presentation mode such that the virtual content appears to remain at its location within the environment 100 (e.g., a window application anchored to a particular location of the reflection 325 of the user 25 in the view 300) despite translational and/or rotational movement of the electronic device 10. The view of the virtual content may remain world-locked until the user satisfies a certain condition and may transition to a different presentation mode, such as a head/display locked presentation mode, in which the virtual content remains displayed on the display 15 or at the same location relative to the electronic device 10 when translational and/or rotational movement is applied to the electronic device 10. For example, a window application anchored to mirror 160 may transition to a gadget anchored to the upper left corner of display 15, which the user may later activate to view the application window. Selectively anchoring virtual content to locations on the display when not in use, rather than locations in the environment, may save power by requiring localization of content only when necessary. In some implementations, better legibility of content (e.g., text messages within virtual content) may also be achieved by locating the virtual content at a fixed distance from the point of view of the electronic device 10 or user. For example, holding a book at a particular distance may make it easier to read and understand. Similarly, locating virtual content at a desired distance may also make it easier to read and understand.
In some implementations, the system can detect user interactions with the virtual content (e.g., stretching out to "touch" the virtual content) and can generate and display an application window. For example, the user in FIG. 4 may provide a motion that extends to interact with interactable element 416, and the system may then display a new application window (e.g., an exercise video). In some implementations, the system can detect that the user has temporarily moved his or her viewing direction to another location outside of the activation region (e.g., the activation region containing the view of the virtual content application window). For example, the user may look away from the initial activation zone in response to briefly diverting attention by some event in the physical environment (e.g., another person in the room). Based on the user "looking away" from the initial activation area, the system may begin to fade out and/or shrink the virtual content. However, when the virtual content and associated application window are initially active (e.g., within the activation zone), once the user has returned to a point of view similar to or the same as the original view, the system may return to displaying the virtual content and associated application window as originally intended when the user activates the application by interacting with the virtual content before the user briefly distracts.
In some implementations, the system may position the mirror 160 based on a notification or request from an application on the device 110 or from another device. For example, if user 25 receives a 3D model of an article of apparel to be viewed from another user (e.g., a friend provides a 3D model of a hat for user 25 to virtually try on), the system may automatically attempt to locate a mirror (or reflection on the surface of the object). Additionally or alternatively, if user 110 communicates with another user via another device (e.g., a communication session, such as a video chat room as discussed herein with reference to fig. 6), and the other user sends a notification to find a mirror and try on a 3D model of the hat, the system may also automatically attempt to locate the mirror if user 25 accepts the notification/request to try on the 3D model of the hat.
Fig. 6 illustrates an exemplary electronic device operating in a different physical environment during a communication session, with a view of a first device having a user representation of a second user on a mirror, according to some implementations. In particular, fig. 6 illustrates an exemplary operating environment 600 of an electronic device 10, 615 operating in different physical environments 102, 602, respectively, during a communication session (e.g., when the electronic devices 10, 615 share information with each other or with an intermediary device (such as a communication session server). In this example of fig. 6, the physical environment 102 (e.g., the physical environment 100 of fig. 1) is a room that includes a mirror 160. The electronic device 10 includes one or more cameras, microphones, depth sensors, or other sensors that may be used to capture information about the physical environment 102 and objects therein and information about the user 25 of the electronic device 10 as well as evaluate the physical environment and objects therein. Information about physical environment 102 and/or user 25 may be used to provide visual content (e.g., for user presentation) and audio content (e.g., for text transcription) during a communication session. For example, a communication session may provide one or more participants (e.g., users 25, 604) with the following views: a 3D environment generated based on camera images and/or depth camera images of the physical environment 102, a representation of the user 25 based on camera images and/or depth camera images of the user 25, and/or a textual transcription of audio spoken by the user (e.g., transcription bubbles). As shown in fig. 6, user 25 is speaking with user 604.
In this example, the physical environment 602 is a room that includes wall-mounted ornaments 652, sofas 654, and coffee tables 656. The electronic device 615 includes one or more cameras, microphones, depth sensors, or other sensors that may be used to capture information about the physical environment 602 and objects therein and information about the user 604 of the electronic device 615 as well as to evaluate the physical environment and objects therein. Information about the physical environment 602 and/or the user 604 may be used to provide visual and audio content during a communication session. For example, a communication session may provide the following views: a 3D environment generated based on a camera image and/or a depth camera image of the physical environment 102 (from the electronic device 10) and a representation 625 of the user 25 based on a camera image or a depth camera image of the user 25 (from the electronic device 10). For example, in communication with device 615 through communication session instruction set 682, the 3D environment may be transmitted by device 10 through communication session instruction set 680 (e.g., via network connection 684). As shown in fig. 6, audio spoken by user 25 is transcribed (e.g., via a communication instruction set 682) at device 615 (or via a remote server), and view 620 provides user 604 with a textual transcription of audio spoken by speaker (user 25) (e.g., "best seen wolf face avatar |") via transcription bubble 690 fig. 6 shows an example of a virtual environment (e.g., 3D environment 650) in which user representations (avatars) of user preferences (e.g., wolf avatars of user representations 606 of user 604 displayed on representation 660 of mirror 160) are provided to the user for each user, provided that the user representations of each user are agreed to be viewed during a particular communication session. For example, as shown in FIG. 6, the electronic device 615 within the physical environment 602 provides a view 620 that enables the user 604 to view a representation 625 (e.g., avatar) of at least a portion of the user 25 (e.g., upward from the middle of the torso) and a transcription of words spoken by the user 25 via a transcription bubble 690 (e.g., "nice wolf face avatar-
In the examples of fig. 1 and 3-6, the electronic devices 10 and 615 are shown as handheld devices, and in fig. 2A and 2C, the electronic device 210 is shown as a Head Mounted Device (HMD). The electronic devices 10, 210, and 615 may be mobile phones, tablets, laptops, etc. In some implementations, the electronic devices 10, 210, 615 may be worn by a user. For example, the electronic devices 10, 210, and 615 may be watches, HMDs, head mounted devices (eyeglasses), headphones, ear-mounted devices, and the like. In some implementations, the functionality of the devices 10, 210, and 615 is implemented via two or more devices, such as a mobile device and a base station or a head-mounted device and an ear-mounted device. Various functions may be distributed among multiple devices including, but not limited to, a power function, a CPU function, a GPU function, a storage function, a memory function, a visual content display function, an audio content production function, and the like. Multiple devices that may be used to implement the functionality of electronic devices 10, 210, and 615 may communicate with each other via wired or wireless communications. In some implementations, each device communicates with a separate controller or server to manage and coordinate the user's experience (e.g., a communication session server). Such controllers or servers may be located in the physical environment 100 or may be remote with respect to the physical environment.
In addition, in the examples of fig. 3-6, 3D environments 300, 400, 500, 650, and 670 are XR environments based on a common coordinate system that may be shared with other users (e.g., virtual rooms for avatars for multi-person communication sessions). In other words, the common coordinate system of the 3D environments 300, 400, 500, 650, and 670 is different from the coordinate system of the physical environments 100, 602, and the like. For example, a common reference point may be used to align the coordinate systems. In some implementations, the common reference point may be a virtual object within the 3D environment that each user can visualize within their respective views. For example, the user represents a common center piece table (e.g., the user's avatar) positioned around it within the 3D environment. Alternatively, the common reference point is not visible within each view. For example, a common coordinate system of the 3D environment may use a common reference point to locate each respective user representation (e.g., around a table/desk). Thus, if the common reference point is visible, each view of the device will be able to visualize the "center" of the 3D environment for perspective when viewing other user representations. The visualization of the common reference point may become more relevant to the multi-user communication session such that the view of each user may add perspective to the location of each other user during the communication session.
In some implementations, the representation of each user may be real or not real and/or may represent the current and/or previous appearance of the user. For example, a photo realistic representation of the user 10 may be generated based on a combination of the live image and the previous image of the user. The previous image may be used to generate a portion of the representation where actual image data is not available (e.g., a portion of the user's face that is not in the field of view of the camera or sensor of the electronic device 10 or may be obscured, for example, by a headset or other means). In one example, the electronic device 10, 210, and 615 is an HMD, and the live image data of the user's face includes downward facing cameras that obtain images of the user's cheeks and mouth and inward facing camera images of the user's eyes, which may be combined with previous image data of other portions of the user's face, head, and torso that are not currently observable from the device's sensors. The prior data regarding the user appearance may be obtained at an earlier time during a communication session, during a prior use of the electronic device, during a registration process for obtaining sensor data of the user appearance from multiple perspectives and/or conditions, or otherwise.
Some implementations provide for a representation of at least a portion of a user within a 3D environment other than the user's physical environment during a communication session, and provide for a representation of another object of the user's physical environment to provide a context based on detecting a condition. For example, during a communication session, representations of one or more other objects of the physical environment may be displayed in the view. For example, based on determining that user 25 is interacting with a physical object in physical environment 100, a representation (e.g., a real or proxy) may be displayed in the view to provide a context for the user's 25 interaction. For example, if the first user 25 picks up an object such as a home photo frame to show to another user, the view may include a real view of the photo frame (e.g., live video). Thus, in displaying an XR environment, the view may present a virtual object representing a user picking up a generic object, display a virtual object similar to a photo frame, display a previously acquired image of an actual photo frame from an obtained 3D scan, and so forth.
Fig. 7 is a flow chart illustrating an exemplary method 700. In some implementations, a device, such as device 10 (fig. 1), performs the techniques of method 400 of rendering virtual content (e.g., an application) at a 3D location based on detecting and determining the 3D location of mirrors in a physical environment. In some implementations, the techniques of method 700 are performed on a mobile device, desktop, laptop, HMD, or server device. In some implementations, the method 700 is performed on processing logic (including hardware, firmware, software, or a combination thereof). In some implementations, the method 700 is performed on a processor executing code stored in a non-transitory computer readable medium (e.g., memory).
At block 702, the method 700 obtains sensor data (e.g., image data, sound data, motion data, etc.) from a sensor of an electronic device in a physical environment including one or more objects. For example, the electronic device may capture one or more images, depth data, etc. of the user's current room. In some implementations, the sensor data includes depth data and light intensity image data obtained during image capture.
In some implementations, obtaining sensor data (e.g., images, sounds, motion, etc.) from sensors of an electronic device in a physical environment includes tracking a gaze direction, and detecting a mirror in which the gaze direction corresponds to the detected physical environment. In some implementations, tracking the user's gaze may include tracking on which pixel the user's gaze is currently focused. For example, obtaining physiological data associated with a user's gaze (e.g., eye gaze characteristics data 40) may involve obtaining image or electrooculogram signal (EOG) data of eyes from which gaze direction and/or movement may be determined. In some implementations, the 3D environment may be an XR environment provided when a user wears a device such as an HMD. In addition, the XR environment may be presented to the user, where the virtual image may be overlaid onto a live view of the physical environment. In some implementations, tracking the user's gaze with respect to the display includes tracking the pixel at which the user's gaze is currently focused.
At block 704, method 700 detects a reflective object of the one or more objects based on the sensor data. For example, block 704 may include detecting a mirror or other reflective surface in one or more objects in the physical environment. For example, when a user wakes up in the morning to see a mirror through his or her device (e.g., device 10), the system may detect the mirror based on one or more mirror detection techniques. Mirror detection techniques may include detecting that a user is wearing a device (such as an HMD) and standing in front of a mirror (e.g., based on reflection, detecting a shape of the HMD, detecting LEDs on the HMD, etc.). Mirror detection may involve detecting a light source (e.g., LEDs around the frame of the mirror, IR LEDs behind the mirror) or some type of other marker on or near the mirror (e.g., an augmented reality marker). Mirror detection techniques may include using face detection areas over time, where mirrors may be detected by comparing the position and relative angle of face boundaries when wearing an HMD. The mirror may be detected based on object recognition using ML techniques. In some examples, detecting the mirror may include using a face detection area over time, where reflections may be detected by comparing a position and a relative angle of a face boundary when wearing the HMD. Once it has been detected that the user is looking at the mirror, the method 700 may determine whether the reflection is a mirror and detect the boundary of the mirror based on shape analysis, edge detection, and/or loss of tracking face recognition in the area.
In some implementations, detecting the mirror is based on detecting the mirror in one or more objects in the physical environment. In order to detect the physical object of the mirror, the system may utilize time-dependent techniques, such as a visual inertial odometer ranging (VIO) system, among other purely visual or static techniques. For example, a mirror may similarly appear as a window into another space, and the edge of the window may be detected using the relative parallax between the mirror surface it is located on and the reflected image within. Additionally or alternatively, in some implementations, a depth sensor (e.g., using an active time-of-flight sensor) may be used to detect a physical object of the mirror to interpret the relative parallax between the mirror surface on which it is located and the reflected image within.
In some implementations, the detection mirror may be based on a non-geometric correlation. For example, the reflection of the electronic device may be detected as the sensing device, and the sensing device may be determined to be visually identical or nearly identical to the sensing device. For example, a tablet detecting the HMD will not infer a mirror. Additionally or alternatively, the sensed face of the user may be determined to be visually similar to the face of a person holding the electronic device (e.g., wearing the HMD). Additionally or alternatively, in some implementations, the likelihood of detecting the mirror may be determined by other sensed parameters of the general environment (such as planar detection techniques). For example, the mirror may be located on or near a larger plane (such as a wall) and parallel thereto.
In some implementations, the detection mirror is based on object detection techniques using machine learning (e.g., neural networks, decision trees, support vector machines, bayesian networks, etc.).
At block 706, the method 700 determines a 3D location of a reflective object in a physical environment. For example, the system determines the plane of the mirror, such as the boundary of the mirror. The 3D mirror position may be determined based on a comparison of the reflective headset position over time with the headset position from the motion data. In addition, the 3D mirror position may be determined based on sampling a midpoint between the reflected position and the position of the real head mounted device and planar detection. Furthermore, the 3D mirror position may be determined based on checking whether the reflected headset rotation is a mirror rotation of the real headset through the computing plane. This may further involve determining mirror boundaries (e.g., boundary rectangles) based on techniques that detect differences between the reflection and the surface.
In some implementations, determining the 3D position of the mirror in the physical environment includes detecting a plane (and associated boundaries) of the mirror. For example, as shown in fig. 1, the device 10 may obtain image data (light intensity data and depth data) of the mirror 160 via the sensors 60 and 65, and determine the plane of the mirror 160 based on determining the outer edge of the mirror 160 and the 2D plane of the mirror 160.
In some implementations, determining the 3D position of the mirror in the physical environment includes detecting a reflection of an electronic device (e.g., an HMD) on the mirror, determining 3D position data of the reflection of the electronic device during a first period of time, and determining the 3D position of the mirror based on comparing the 3D position data of the electronic device and the 3D position data of the reflection of the electronic device during the first period of time. For example, the techniques described herein may compare the reflected HMD position over time with the HMD position from motion data (e.g., acquired via an IMU, etc.).
In some implementations, determining a 3D position of a mirror in a physical environment includes detecting a reflection of an electronic device (HMD) on the mirror, determining the 3D position of the reflection of the electronic device, and determining a midpoint position between the 3D position of the reflection of the electronic device and the 3D position of the electronic device based on the 3D position of the mirror. For example, the techniques described herein may utilize sampling of position data for a midpoint between a reflected position of a device and a position of a real device relative to a detection plane of the device 160 and determining whether a reflected rotation of the device (e.g., HMD) is a mirrored rotation of the real device through a calculation plane of the mirror.
In some implementations, determining the 3D position of the mirror in the physical environment includes rotating the mirror in the physical environment about a vertical axis according to the electronic device during a second period of time, determining the 3D position of the mirror in the physical environment includes detecting a reflection of a rotation of the electronic device (e.g., HMD) on the mirror, determining 3D position data of the reflection of the rotation of the electronic device during the second period of time, and determining the 3D position of the mirror based on comparing the 3D position data of the electronic device and the 3D position data of the reflection of the rotation of the electronic device during the second period of time. For example, the techniques described herein may determine whether the reflected rotation of a device (e.g., HMD) is a specular rotation of a real device through a calculated plane of the mirror. This may involve determining mirror boundaries (e.g., boundary rectangles) based on techniques that detect differences between the reflection and the surface.
In some implementations, the 3D position of the mirror is determined based on depth data from the sensor data, and the 3D position of the virtual content is based on the depth data associated with the reflected 3D position. For example, the virtual content may be placed at the same depth as the mirror. In some examples, depth may be detected by taking the distance of the user reflection as determined by computer vision techniques and dividing by two. Additionally or alternatively, the 3D position may also be at a virtual depth (e.g., a depth determined by computer vision) of the user in the mirror.
In some implementations, determining the 3D position of the mirror is based on tracking a pose of the electronic device relative to the physical environment, and detecting that a view of a display of the electronic device is oriented toward the virtual content based on the pose of the electronic device. For example, a position sensor may be used to obtain location information of a device (e.g., device 10). For positioning information, some implementations include a VIO system to estimate the distance traveled by determining equivalent ranging information using camera sequence images (e.g., light intensity data such as RGB data). Alternatively, some implementations of the present disclosure may include SLAM systems (e.g., position sensors). The SLAM system may include a multi-dimensional (e.g., 3D) laser scanning and range measurement system that is GPS independent and provides real-time simultaneous localization and mapping. The SLAM system can generate and manage very accurate point cloud data generated from reflections of laser scans from objects in the environment. Over time, the movement of any point in the point cloud is accurately tracked so that the SLAM system can use the point in the point cloud as a reference point for position, maintaining a precise understanding of its position and orientation as it travels through the environment. The SLAM system may also be a visual SLAM system that relies on light intensity image data to estimate the position and orientation of the camera and/or device. For example, for a 3D reconstruction algorithm, knowing that there is a mirror will inform the SLAM system to be updated accordingly.
At block 708, the method 700 presents virtual content in a view of the physical environment, where the virtual content is positioned at a 3D location based on a 3D location of a reflective object (e.g., a mirror). For example, virtual content (such as calendar applications, news, etc.) may be provided in response to detecting that the user is in a situation where the user would benefit from a particular UI (such as in the morning). In other examples, the clothing application may be presented in response to determining that the user is looking at himself/herself or turning around in a particular manner while wearing new clothing/caps. In other examples, the social media video recording application may be presented if the user is acting in some way (e.g., making a particular gesture) or providing a verbal command. In other examples, if the user is determined to be in a gym, the exercise application may be displayed to track progress, provide exercise techniques, and so forth. In other examples, if the user is determined to be making up or the like, an enlarged enhanced view of the user's face may be provided. In addition, the virtual content may be positioned on a reflected 3D location or based on the 3D location (e.g., a location of an object having a reflective surface such as a mirror).
In some implementations, the virtual content includes one or more interactable elements presented on a detection plane of the mirror. For example, after detecting the plane of the mirror 160, the mirror plane and the 3D position of the mirror plane may be used as a touch screen. For example, interactable elements are displayed such as augmented reality icons and/or applications (e.g., virtual content 314, 316 of FIG. 3) that a user may select and/or interact with.
In some implementations, the depth position of the 3D position of the virtual content is the same as the depth of the reflective object detected in the mirror. For example, as shown in fig. 4, the virtual content is an interactable element 416 (e.g., a button for selecting an exercise application or video) that appears to be at the same depth as the user's reflection 425 (e.g., on the same detection plane of mirror 160). Presenting virtual content on the same detection plane or reflective surface of the mirror allows the content to be more easily displayed (e.g., text) for viewing/reading for a better visual experience by the user. In particular, locating the virtual content at the same depth as the user's reflection 425 may advantageously prevent the user from having to change the focal plane of their eyes when looking at the virtual content from their reflection, or vice versa.
In some implementations, the spatialized audio may be used in conjunction with or in lieu of presenting the virtual content. In an exemplary implementation, the 3D position of the mirror includes a 3D position of the mirror at a first distance from the electronic device, the 3D position of the virtual content of the view of the physical environment is at a second distance from the electronic device that is greater than the first distance, and presenting the virtual content in the view of the physical environment includes presenting the spatialized audio at a perceived distance to the sound source based on the 3D position of the virtual content. For example, as shown in fig. 3, audio content 318 is spatially audio that is "behind" the mirror that the user will hear. For example, based on the detection plane of the mirror 160, a perceived distance to the sound source may be achieved. In some implementations, the spatialized audio may be viewed within the virtual reflection. For example, a virtual application such as an avatar may appear to speak from a reflected 3D location rather than from a mirror's 3D location (e.g., the reflection and associated sound may appear to be at a distance twice as great as the distance from the actual surface of the mirror).
In some implementations, physiological-based tracking (such as skeletal tracking) may be used for guidance, assessment, and feedback related to posture, exercise, sports, clothing, and the like. In an exemplary implementation, the sensor data includes physiological data of a user of the electronic device, and the method 700 further includes detecting movement of the user based on the physiological data, and modifying virtual content in the view of the physical environment based on the detected movement of the user. For example, as shown in fig. 4, after detecting that the user has an incorrect gesture (e.g., is standing without a pick, is smiling, etc.), a virtual icon 414 is presented to the user 25.
In some implementations, the method 700 further includes updating the virtual content based on detecting a change in the 3D position of the mirror. For example, the techniques described herein may detect a mirror position change and use the mirror position change as an input. For example, during a virtual chat session, if the mirror is a hand-held mirror type or a movable mirror, the user may place the mirror face down to hang up.
In some implementations, the method 700 further includes detecting reflections in a mirror in one or more objects based on the sensor data. In some implementations, the detection reflection is based on the movement and pose of the visually sensed face of the user (or other object) in relation to the movement/pose of the sensing device (e.g., based on face recognition of the user, face tracking/eye gaze detection, object detection via ML, etc.). For example, in the case where the user views himself in a reflective manner, even when the camera moves within the space, the boundary rectangle may be contained in a specific area and rotation may not be displayed. For example, since the image sensor may be worn on the head of the user, the face of the user in the captured image will not show a change in orientation about the front-facing axis, although the electronic device is rotated about the same axis. The rotation of the sensing device detected about the axis may include a vertical axis, a horizontal axis, or both. Detecting reflections based on either axis may also include an orientation relationship as determined by the visually sensed face of the user (or other object) related to the movement/pose of the sensing device.
In some implementations, detecting the reflection is based on tracking facial features of a user of the electronic device. In some implementations, detecting the reflection is based on facial recognition of a user of the electronic device. In some implementations, the sensor data includes an image of a head of a user of the electronic device, and detecting the mirror is based on determining that the head of the user is rotating in a yaw direction (e.g., about a vertical axis). For example, the mirror may be detected in response to seeing a face that does not rotate about a forward-facing axis and rotates about a vertical axis by twice the amount determined by an on-board sensor (e.g., from a gyroscope) for the device 10.
In some implementations, the sensor data includes an image of a user's head of the electronic device, and presenting virtual content in the view of the physical environment includes presenting a virtual representation (e.g., an avatar) of the user at a 3D location associated with a reflection of the user's head on a mirror. For example, as shown in fig. 5, the user representation 525 is presented as virtual content (e.g., emoji, memoji, or animoji, such as a wolf's face and associated body) on the user's body and face. Other techniques may be used to present virtual content only over the user's face (e.g., only a wolf face avatar is presented, but the user's hands, body, etc. are still shown in reflection by a mirror).
In some implementations, if the reflective surface is not a completely reflective surface (e.g., a window), additional vision techniques may be utilized to help detect the reflection. For example, a machine learning model may be trained to identify characteristic dual images of reflections superimposed on the real world.
In some implementations, the method 700 further includes determining a context associated with use of the device electronics in the physical environment based on the sensor data, and presenting the virtual content based on the context. For example, determining the context may include detecting that the user is in a situation in which the user would benefit from presenting virtual content (e.g., a particular application), such as time of day, trying on new clothing, making up, and so forth. The method 700 may use various ways of detecting the context of the physical environment. In some implementations, detecting the context includes determining use of the electronic device in the new location (e.g., using a mirror in a hotel room that the user did not previously reach). In some implementations, detecting the context includes determining use of the electronic device during a certain type of activity (e.g., exercise, make-up, try on new clothing, etc.). In some implementations, detecting the context includes determining that the electronic device is within a proximity threshold distance of a location, an object, another electronic device, or a person.
In some implementations, the context includes a time of day, and presenting the virtual content is based on the time of day. For example, as shown in fig. 3, the context includes a situation where the user would benefit from a particular UI (such as in the morning), and virtual content (such as calendar applications, news applications, etc.) may be presented such that it appears as an application window on the surface of the mirror.
In some implementations, the context includes movement of a user of the electronic device relative to the mirror, and presenting the virtual content is based on the movement of the user. For example, as shown in FIG. 4, if the user is looking at himself or herself, or is turning in a particular way with an incorrect gesture, the virtual content may include an exercise application.
In some implementations, the context includes user interaction with the mirror, and presenting the virtual content is based on the user interaction with the mirror. For example, the context may include touching a mirror at the location of the virtual content, a sliding action on the mirror to close a window/application, verbal commands, etc. to activate and/or interact with a particular application.
In some implementations, the virtual content (e.g., virtual content 314, 316 of fig. 3) may be modified over time based on the proximity of the electronic device to the anchor location (e.g., mirror 160). For example, when the user 25 becomes closer, the spatialized audio notification (e.g., audio content 318) may indicate a closer proximity. Additionally or alternatively, for visual icons, if the user begins to walk away from the mirror 160 in a different direction, the virtual content may increase in size or begin to blink. In addition, the virtual content may include a text widget that informs the user of the location of objects displayed within the reflection of the mirror (or any reflective surface of the object).
In some examples, method 700 may further include capturing an image of the user in response to detecting the mirror. For example, images may be captured once a day to allow a user to track their appearance over time. In other examples, the user's activity data may be recorded in response to detecting the reflection. For example, if it is determined that the user is performing an exercise in a gym or in front of a mirror, the electronic device 10 may record the duration of the exercise or the number of calories burned.
In some implementations, the method 700 may further include using a mirror for the multi-user communication session. In an exemplary implementation, the virtual content includes a visual representation of another user based on a communication session with another electronic device. For example, as shown in FIG. 6, a representation 606 of the user 604 (e.g., a wolf avatar) is presented in reflection of a representation 660 of the mirror 160. In some examples, method 700 may further include a self-rendering mode for rendering virtual content for a user to use a mirror to see what his or her self-rendering may look to others in an XR environment (e.g., during a communication session as shown in fig. 6). For example, if a user wants to disclose presentation information (e.g., "name tag" or other identifying information) to other XR users, after detecting the reflective surface, the user may initiate a self-rendering mode to view self-identifying virtual content.
In some examples, method 700 may further include a privacy setting mode based on determining that no other person is present (or may be triggered by the user himself). In an example implementation, the techniques may include determining a scene understanding of the physical environment based on the sensor data, determining that a user of the electronic device is the only user within an area associated with the view of the physical environment based on the scene understanding, and presenting the virtual content based on user preference settings associated with the only user within the area associated with the view of the physical environment. For example, if the user is the only person currently in the bathroom (e.g., the partner of user 25 may also be in the same room), he or she may only wish for the morning routine application to open in the bathroom (e.g., virtual content 314 and virtual content 316 of FIG. 3).
Fig. 8 is a block diagram of an exemplary device 800. Device 800 illustrates an exemplary device configuration of device 10. While certain specific features are shown, those of ordinary skill in the art will appreciate from the disclosure that various other features are not shown for brevity and so as not to obscure more pertinent aspects of the implementations disclosed herein. To this end, as a non-limiting example, in some implementations, the device 10 includes one or more processing units 802 (e.g., microprocessors, ASIC, FPGA, GPU, CPU, processing cores, and the like), one or more input/output (I/O) devices and sensors 806, one or more communication interfaces 808 (e.g., USB, FIREWIRE, THUNDERBOLT, IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, GSM, CDMA, TDMA, GPS, IR, BLUETOOTH, ZIGBEE, SPI, I C, and/or similar types of interfaces), one or more programming (e.g., I/O) interfaces 810, one or more displays 812, one or more inwardly and/or outwardly facing image sensor systems 814, a memory 820, and one or more communication buses 804 for interconnecting these components and various other components.
In some implementations, one or more communication buses 804 include circuitry that interconnects and controls communications between system components. In some implementations, the one or more I/O devices and sensors 806 include at least one of: an Inertial Measurement Unit (IMU), accelerometer, magnetometer, gyroscope, thermometer, one or more physiological sensors (e.g., blood pressure monitor, heart rate monitor, blood oxygen sensor, blood glucose sensor, etc.), one or more microphones, one or more speakers, a haptic engine, or one or more depth sensors (e.g., structured light, time of flight, etc.), and the like.
In some implementations, the one or more displays 812 are configured to present a view of the physical environment or the graphical environment to the user. In some implementations, the one or more displays 812 correspond to holographic, digital Light Processing (DLP), liquid Crystal Displays (LCD), liquid crystal on silicon (LCoS), organic light emitting field effect transistors (OLET), organic Light Emitting Diodes (OLED), surface conduction electron emitter displays (SED), field Emission Displays (FED), quantum dot light emitting diodes (QD-LED), microelectromechanical systems (MEMS), and/or similar display types. In some implementations, the one or more displays 812 correspond to diffractive, reflective, polarizing, holographic, etc. waveguide displays. For example, the device 10 includes a single display. As another example, the device 10 includes a display for each eye of the user.
In some implementations, the one or more image sensor systems 814 are configured to obtain image data corresponding to at least a portion of the physical environment 100. For example, the one or more image sensor systems 814 include one or more RGB cameras (e.g., with Complementary Metal Oxide Semiconductor (CMOS) image sensors or Charge Coupled Device (CCD) image sensors), monochrome cameras, IR cameras, depth cameras, event-based cameras, and the like. In various implementations, the one or more image sensor systems 814 also include an illumination source, such as a flash, that emits light. In various implementations, the one or more image sensor systems 814 further include an on-camera Image Signal Processor (ISP) configured to perform a plurality of processing operations on the image data.
Memory 820 includes high-speed random access memory such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices. In some implementations, the memory 820 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 820 optionally includes one or more storage devices remotely located from the one or more processing units 802. Memory 820 includes a non-transitory computer-readable storage medium.
In some implementations, memory 820 or a non-transitory computer-readable storage medium of memory 820 stores an optional operating system 830 and one or more instruction sets 840. Operating system 830 includes procedures for handling various basic system services and for performing hardware related tasks. In some implementations, the instruction set 840 includes executable software defined by binary information stored in the form of electrical charges. In some implementations, the instruction set 840 is software that is executable by the one or more processing units 802 to implement one or more of the techniques described herein.
Instruction set 840 includes mirror detect instruction set 842, content instruction set 844, context instruction set 846, and communication session instruction set 848. The instruction set 840 may be embodied as a single software executable or as a plurality of software executable files.
In some implementations, the mirror detection instruction set 842 is executable by the processing unit 802 to detect mirrors in a physical environment in one or more objects based on sensor data. The mirror detection instruction set 842 may be configured to determine a 3D position of a mirror in a physical environment (e.g., a position where a plane of the mirror is located). For these purposes, in various implementations, the instructions include instructions and/or logic for the instructions as well as heuristics and metadata for the heuristics.
In some implementations, the content instruction set 844 is executable by the processing unit 802 to provide and/or track content for display on a device. The content instruction set 844 may be configured to monitor and track content over time (e.g., while viewing an XR environment), and to generate and display virtual content (e.g., applications associated with the determined 3D position of the mirror). For these purposes, in various implementations, the instructions include instructions and/or logic for the instructions as well as heuristics and metadata for the heuristics.
In some implementations, the set of contextual instructions 846 may be executable by the processing unit 802 to determine a context associated with use of an electronic device (e.g., device 10) in a physical environment (e.g., physical environment 100) using one or more of the techniques discussed herein or as otherwise may be appropriate. For these purposes, in various implementations, the instructions include instructions and/or logic for the instructions as well as heuristics and metadata for the heuristics.
In some implementations, the communication session instruction set 848 may be executed by the processing unit 802 to facilitate a communication session between two or more electronic devices (e.g., device 10 and device 615 as shown in fig. 6) using one or more of the techniques discussed herein or which may otherwise be appropriate. For these purposes, in various implementations, the instructions include instructions and/or logic for the instructions as well as heuristics and metadata for the heuristics.
While the instruction set 840 is shown as residing on a single device, it should be understood that in other implementations, any combination of elements may reside on separate computing devices. Moreover, FIG. 8 is a functional description of various features that are more fully utilized in a particular implementation, as opposed to a structural schematic of the implementations described herein. As will be appreciated by one of ordinary skill in the art, the individually displayed items may be combined and some items may be separated. The actual number of instruction sets, and how features are distributed among them, will vary depending upon the particular implementation, and may depend in part on the particular combination of hardware, software, and/or firmware selected for the particular implementation.
Fig. 9 illustrates a block diagram of an exemplary headset 900, according to some implementations. The head mounted device 900 includes a housing 901 (or shell) that houses the various components of the head mounted device 900. The housing 901 includes (or is coupled to) an eye pad (not shown) disposed at a proximal end (relative to the user 25) of the housing 901. In various implementations, the eye pad is a plastic or rubber piece that comfortably and snugly holds the headset 900 in place on the face of the user 25 (e.g., around the eyes 45 of the user 25).
The housing 901 houses a display 910 that displays an image, emits light toward or onto the eyes of the user 25. In various implementations, the display 910 emits light through an eyepiece having one or more optical elements 905 that refract the light emitted by the display 910, causing the display to appear to the user 25 to be at a virtual distance greater than the actual distance from the eye to the display 910. For example, the optical element 905 may include one or more lenses, waveguides, other Diffractive Optical Elements (DOEs), and the like. In order for the user 25 to be able to focus on the display 910, in various implementations, the virtual distance is at least greater than the minimum focal length of the eye (e.g., 6 cm). Furthermore, in order to provide a better user experience, in various implementations, the virtual distance is greater than 1 meter.
The housing 901 also houses an eye/gaze tracking system that includes one or more light sources 922, a camera 924, and a controller 980. The one or more light sources 922 emit light onto the eyes of the user 25, which is reflected as a pattern of light (e.g., a flash) that is detectable by the camera 924. Based on the light pattern, controller 980 may determine eye-tracking characteristics of user 25. For example, controller 980 may determine a gaze direction and/or blink status (open or closed) of user 25. As another example, controller 980 may determine a pupil center, pupil size, or point of interest. Thus, in various implementations, light is emitted by one or more light sources 922, reflected from an eye 45 of a user 25, and detected by a camera 924. In various implementations, light from the eye 45 of the user 25 is reflected from a hot mirror or passed through an eyepiece before reaching the camera 924.
The housing 901 also houses an audio system that includes one or more audio sources 926 that the controller 980 may utilize to provide audio to the user's ear 70 via the sound waves 14 in accordance with the techniques described herein. For example, the audio source 926 may provide sound for both background sound and auditory stimuli that may be spatially presented in a 3D coordinate system. Audio source 926 may include a speaker, a connection to an external speaker system (such as a headset), or an external speaker connected via a wireless connection.
The display 910 emits light in a first wavelength range and the one or more light sources 922 emit light in a second wavelength range. Similarly, camera 924 detects light in the second wavelength range. In various implementations, the first wavelength range is a visible wavelength range (e.g., a wavelength range of approximately 400nm-700nm in the visible spectrum) and the second wavelength range is a near infrared wavelength range (e.g., a wavelength range of approximately 700nm-1400nm in the near infrared spectrum).
In various implementations, eye tracking (or in particular, a determined gaze direction) is used to enable a user to interact (e.g., user 25 selects it by viewing an option on display 910), provide gaze point rendering (e.g., higher resolution is presented in the area of display 910 that user 25 is viewing and lower resolution is presented elsewhere on display 910), or correct distortion (e.g., for images to be provided on display 910).
In various implementations, the one or more light sources 922 emit light toward the eyes of the user 25, which is reflected in the form of a plurality of flashes.
In various implementations, the camera 924 is a frame/shutter based camera that generates images of the eyes of the user 25 at a particular point in time or points in time at a frame rate. Each image comprises a matrix of pixel values corresponding to pixels of the image, which pixels correspond to the positions of the photo sensor matrix of the camera. In implementations, each image is used to measure or track pupil dilation by measuring changes in pixel intensities associated with one or both of the user's pupils.
In various implementations, the camera 924 is an event camera that includes a plurality of light sensors (e.g., a matrix of light sensors) at a plurality of respective locations that generates an event message indicating a particular location of a particular light sensor in response to the particular light sensor detecting a light intensity change.
It should be understood that the implementations described above are cited by way of example, and that the present disclosure is not limited to what has been particularly shown and described hereinabove. Rather, the scope includes both combinations and subcombinations of the various features described hereinabove as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.
As described above, one aspect of the present technology is to collect and use physiological data to improve the user's electronic device experience in interacting with electronic content. The present disclosure contemplates that in some cases, the collected data may include personal information data that uniquely identifies a particular person or that may be used to identify an interest, characteristic, or predisposition of a particular person. Such personal information data may include physiological data, demographic data, location-based data, telephone numbers, email addresses, home addresses, device characteristics of personal devices, or any other personal information.
The present disclosure recognizes that the use of such personal information data in the present technology may be used to benefit users. For example, personal information data may be used to improve the interaction and control capabilities of the electronic device. Thus, the use of such personal information data enables planned control of the electronic device. In addition, the present disclosure contemplates other uses for personal information data that are beneficial to the user.
The present disclosure also contemplates that entities responsible for the collection, analysis, disclosure, transmission, storage, or other use of such personal information and/or physiological data will adhere to established privacy policies and/or privacy practices. In particular, such entities should exercise and adhere to privacy policies and practices that are recognized as meeting or exceeding industry or government requirements for maintaining the privacy and security of personal information data. For example, personal information from a user should be collected for legal and legitimate uses of an entity and not shared or sold outside of those legal uses. In addition, such collection should be done only after the user's informed consent. In addition, such entities should take any required steps to secure and protect access to such personal information data and to ensure that other people who are able to access the personal information data adhere to their privacy policies and procedures. In addition, such entities may subject themselves to third party evaluations to prove compliance with widely accepted privacy policies and practices.
Regardless of the foregoing, the present disclosure also contemplates implementations in which a user selectively prevents use or access to personal information data. That is, the present disclosure contemplates that hardware elements or software elements may be provided to prevent or block access to such personal information data. For example, with respect to content delivery services customized for a user, the techniques of the present invention may be configured to allow the user to choose to "join" or "leave" to participate in the collection of personal information data during the registration service. In another example, the user may choose not to provide personal information data for the targeted content delivery service. In yet another example, the user may choose not to provide personal information, but allow anonymous information to be transmitted for improved functionality of the device.
Thus, while the present disclosure broadly covers the use of personal information data to implement one or more of the various disclosed embodiments, the present disclosure also contemplates that the various embodiments may be implemented without accessing such personal information data. That is, various embodiments of the present technology do not fail to function properly due to the lack of all or a portion of such personal information data. For example, the content may be selected and delivered to the user by inferring preferences or settings based on non-personal information data or absolute minimum personal information such as content requested by a device associated with the user, other non-personal information available to the content delivery service, or publicly available information.
In some embodiments, the data is stored using a public/private key system that only allows the owner of the data to decrypt the stored data. In some other implementations, the data may be stored anonymously (e.g., without identifying and/or personal information about the user, such as legal name, user name, time and location data, etc.). Thus, other users, hackers, or third parties cannot determine the identity of the user associated with the stored data. In some implementations, a user may access stored data from a user device other than the user device used to upload the stored data. In these cases, the user may need to provide login credentials to access their stored data.
Numerous specific details are set forth herein to provide a thorough understanding of the claimed subject matter. However, it will be understood by those skilled in the art that the claimed subject matter may be practiced without these specific details. In other instances, methods, devices, or systems known by those of ordinary skill have not been described in detail so as not to obscure the claimed subject matter.
Unless specifically stated otherwise, it is appreciated that throughout the description, discussions utilizing terms such as "processing," "computing," "calculating," "determining," or "identifying" or the like, refer to the action or processes of a computing device, such as one or more computers or similar electronic computing devices, that manipulate or transform data represented as physical, electronic, or magnetic quantities within the computing platform's memory, registers, or other information storage device, transmission device, or display device.
The one or more systems discussed herein are not limited to any particular hardware architecture or configuration. The computing device may include any suitable arrangement of components that provide results conditioned on one or more inputs. Suitable computing devices include a multi-purpose microprocessor-based computer system that accesses stored software that programs or configures the computing system from a general-purpose computing device to a special-purpose computing device that implements one or more implementations of the subject invention. The teachings contained herein may be implemented in software for programming or configuring a computing device using any suitable programming, scripting, or other type of language or combination of languages.
Implementations of the methods disclosed herein may be performed in the operation of such computing devices. The order of the blocks presented in the above examples may be varied, e.g., the blocks may be reordered, combined, or divided into sub-blocks. Some blocks or processes may be performed in parallel.
The use of "adapted" or "configured to" herein is meant to be an open and inclusive language that does not exclude devices adapted or configured to perform additional tasks or steps. In addition, the use of "based on" is intended to be open and inclusive in that a process, step, calculation, or other action "based on" one or more of the stated conditions or values may be based on additional conditions or beyond the stated values in practice. Headings, lists, and numbers included herein are for ease of explanation only and are not intended to be limiting.
It will also be understood that, although the terms "first," "second," etc. may be used herein to describe various objects, these objects should not be limited by these terms. These terms are only used to distinguish one object from another. For example, a first node may be referred to as a second node, and similarly, a second node may be referred to as a first node, which changes the meaning of the description, so long as all occurrences of "first node" are renamed consistently and all occurrences of "second node" are renamed consistently. The first node and the second node are both nodes, but they are not the same node.
The terminology used herein is for the purpose of describing particular implementations only and is not intended to be limiting of the claims. As used in the description of this specification and the appended claims, the singular forms "a," "an," and "the" are intended to cover the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "comprises" and "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or groups thereof.
As used herein, the term "if" may be interpreted to mean "when the prerequisite is true" or "in response to a determination" or "upon a determination" or "in response to detecting" that the prerequisite is true, depending on the context. Similarly, the phrase "if it is determined that the prerequisite is true" or "if it is true" or "when it is true" is interpreted to mean "when it is determined that the prerequisite is true" or "in response to a determination" or "upon determination" that the prerequisite is true or "when it is detected that the prerequisite is true" or "in response to detection that the prerequisite is true", depending on the context.
The foregoing description and summary of the invention should be understood to be in every respect illustrative and exemplary, but not limiting, and the scope of the invention disclosed herein is to be determined not by the detailed description of illustrative implementations, but by the full breadth permitted by the patent laws. It is to be understood that the specific implementations shown and described herein are merely illustrative of the principles of this invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention.

Claims (20)

1. A method, the method comprising:
at an electronic device having a processor and a sensor:
obtaining sensor data from the sensors of the electronic device in a physical environment comprising one or more objects;
detecting a reflective object of the one or more objects based on the sensor data;
determining a three-dimensional 3D position of the reflective object in the physical environment; and
virtual content is presented in a view of the physical environment, wherein the virtual content is positioned at the 3D location based on the 3D location of the reflective object.
2. The method of claim 1, wherein determining the 3D position of the reflective object in the physical environment comprises detecting a plane of the reflective object, and wherein the virtual content comprises one or more interactable elements presented on the detected plane of the reflective object.
3. The method of claim 1, wherein determining the 3D location of the reflective object in the physical environment comprises:
detecting a reflection of the electronic device on the reflective object;
determining a 3D location of the reflection of the electronic device; and
a midpoint position between the 3D position of the reflection of the electronic device and a 3D position of the electronic device is determined based on the 3D position of the reflection object.
4. The method of claim 1, wherein determining the 3D position of the reflective object in the physical environment in accordance with the electronic device rotating about a vertical axis during a first period of time comprises:
detecting a reflection of a rotation of the electronic device on the reflective object;
determining 3D position data of the reflection of the rotation of the electronic device during the first time period; and
the 3D position of the reflective object is determined based on comparing the 3D position data of the electronic device and the 3D position data of the reflection of the rotation of the electronic device during the first period of time.
5. The method of claim 1, wherein the 3D location of the reflective object is determined based on depth data from the sensor data, and the 3D location of the virtual content is based on the depth data associated with the 3D location of the reflective object.
6. The method according to claim 1, wherein:
the 3D location of the reflective object includes a 3D location of the reflective object at a first distance from the electronic device;
the 3D location of the virtual content of the view of the physical environment is at a second distance from the electronic device that is greater than the first distance; and
Presenting the virtual content in the view of the physical environment includes presenting spatialized audio at a perceived distance to a sound source based on the 3D location of the virtual content.
7. The method of claim 1, wherein the sensor data comprises an image of a head of a user of the electronic device, and wherein detecting the reflective object is based on determining:
the head of the user as seen in the image is rotating about a vertical axis by twice the amount of rotation of the electronic device; or alternatively
The head of the user as seen in the image is not rotating about a forward axis.
8. The method of any of claims 1, further comprising:
detecting a reflection in the reflective object of the one or more objects based on the sensor data, wherein detecting the reflection is based on tracking facial features of a user of the electronic device or facial recognition of the user.
9. The method of claim 1, further comprising:
in accordance with the detection of the reflecting object,
determining a context associated with use of the electronic device in the physical environment based on the sensor data, and
The virtual content is presented based on the context.
10. The method according to claim 9, wherein:
the context includes a time of day, and presenting the virtual content is based on the time of day;
the context includes movement of a user of the electronic device relative to a reflection in the reflection object, and presenting the virtual content is based on the movement of the user; or alternatively
The context includes a user interaction with the reflection, and presenting the virtual content is based on the user interaction with the reflection.
11. The method of claim 9, wherein determining the context comprises:
determining use of the electronic device in a new location;
determining use of the electronic device during a certain type of activity; or alternatively
The electronic device is determined to be within a proximity threshold distance of a location, an object, another electronic device, or a person.
12. The method of claim 1, further comprising:
determining a scene understanding of the physical environment based on the sensor data;
determining, based on the scene understanding, that a user of the electronic device is the only user within an area associated with the view of the physical environment; and
The virtual content is presented based on user preference settings associated with the user being the only user within the region associated with the view of the physical environment.
13. The method of claim 1, wherein the virtual content comprises a visual representation of another user based on a communication session with another electronic device.
14. The method of claim 1, wherein a depth position of the 3D position of the virtual content is the same as a depth of a reflective object detected in the reflection.
15. The method of claim 1, wherein the electronic device is a Head Mounted Device (HMD).
16. An apparatus, the apparatus comprising:
a sensor;
a non-transitory computer readable storage medium; and
one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium includes program instructions that, when executed on the one or more processors, cause the apparatus to perform operations comprising:
obtaining sensor data from the sensors of the device in a physical environment comprising one or more objects;
Detecting a reflective object of the one or more objects based on the sensor data;
determining a three-dimensional 3D position of the reflective object in the physical environment; and
virtual content is presented in a view of the physical environment, wherein the virtual content is positioned at the 3D location based on the 3D location of the reflective object.
17. The system of claim 16, wherein determining the 3D location of the reflective object in the physical environment comprises detecting a plane of the reflective object, and wherein the virtual content comprises one or more interactable elements presented on the detected plane of the reflective object.
18. The system of claim 16, wherein determining the 3D location of the reflective object in the physical environment comprises:
detecting a reflection of the electronic device on the reflective object;
determining 3D location data of the reflection of the electronic device during a first period of time; and
the 3D position of the reflective object is determined based on comparing the 3D position data of the electronic device and the 3D position data of the reflection of the electronic device during the first period of time.
19. The system of claim 16, wherein the sensor data comprises an image of a head of a user of the electronic device, and wherein detecting the reflective object is based on determining:
the head of the user as seen in the image is rotating about a vertical axis by twice the amount of rotation of the electronic device; or alternatively
The head of the user as seen in the image is not rotating about a forward axis.
20. A non-transitory computer readable storage medium storing computer executable program instructions on a device to perform operations comprising:
obtaining sensor data from sensors of an electronic device in a physical environment comprising one or more objects;
detecting a reflective object of the one or more objects based on the sensor data;
determining a three-dimensional 3D position of the reflective object in the physical environment; and presenting virtual content in a view of the physical environment, wherein the virtual content is positioned at the 3D location based on the 3D location of the reflective object.
CN202310798444.3A 2022-06-30 2023-06-29 Content conversion based on reflective object recognition Pending CN117333788A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US63/357,503 2022-06-30
US18/214,575 US20240005612A1 (en) 2022-06-30 2023-06-27 Content transformations based on reflective object recognition
US18/214,575 2023-06-27

Publications (1)

Publication Number Publication Date
CN117333788A true CN117333788A (en) 2024-01-02

Family

ID=89289157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310798444.3A Pending CN117333788A (en) 2022-06-30 2023-06-29 Content conversion based on reflective object recognition

Country Status (1)

Country Link
CN (1) CN117333788A (en)

Similar Documents

Publication Publication Date Title
CN110968189B (en) Pupil modulation as cognitive control signal
US20220253136A1 (en) Methods for presenting and sharing content in an environment
US11733769B2 (en) Presenting avatars in three-dimensional environments
US20220269333A1 (en) User interfaces and device settings based on user identification
US20220262080A1 (en) Interfaces for presenting avatars in three-dimensional environments
US20230384907A1 (en) Methods for relative manipulation of a three-dimensional environment
US20240077937A1 (en) Devices, methods, and graphical user interfaces for controlling avatars within three-dimensional environments
CN116368529A (en) Representation of a user based on the appearance of the current user
US20230343049A1 (en) Obstructed objects in a three-dimensional environment
US20230316674A1 (en) Devices, methods, and graphical user interfaces for modifying avatars in three-dimensional environments
US20240005612A1 (en) Content transformations based on reflective object recognition
CN117333788A (en) Content conversion based on reflective object recognition
US20230351676A1 (en) Transitioning content in views of three-dimensional environments using alternative positional constraints
US20240005623A1 (en) Positioning content within 3d environments
CN117957513A (en) Interaction based on mirror detection and context awareness
US20230171484A1 (en) Devices, methods, and graphical user interfaces for generating and displaying a representation of a user
WO2023043647A1 (en) Interactions based on mirror detection and context awareness
US20230103161A1 (en) Devices, methods, and graphical user interfaces for tracking mitigation in three-dimensional environments
US20230288985A1 (en) Adjusting image content to improve user experience
US20230152935A1 (en) Devices, methods, and graphical user interfaces for presenting virtual objects in virtual environments
US20230384860A1 (en) Devices, methods, and graphical user interfaces for generating and displaying a representation of a user
CN117331434A (en) Locating content within a 3D environment
US20230418372A1 (en) Gaze behavior detection
US20240104859A1 (en) User interfaces for managing live communication sessions
US20230350539A1 (en) Representations of messages in a three-dimensional environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination