WO2023220071A2 - Systèmes, procédés et interfaces utilisateur graphiques pour balayage et modélisation d'environnements - Google Patents

Systèmes, procédés et interfaces utilisateur graphiques pour balayage et modélisation d'environnements Download PDF

Info

Publication number
WO2023220071A2
WO2023220071A2 PCT/US2023/021563 US2023021563W WO2023220071A2 WO 2023220071 A2 WO2023220071 A2 WO 2023220071A2 US 2023021563 W US2023021563 W US 2023021563W WO 2023220071 A2 WO2023220071 A2 WO 2023220071A2
Authority
WO
WIPO (PCT)
Prior art keywords
physical environment
representation
cameras
view
user interface
Prior art date
Application number
PCT/US2023/021563
Other languages
English (en)
Other versions
WO2023220071A3 (fr
Inventor
Allison W. DRYER
Giancarlo Yerkes
Praveen Sharma
Grant R. PAUL
Joseph A. MALIA
Original Assignee
Apple Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/144,746 external-priority patent/US20230368458A1/en
Application filed by Apple Inc. filed Critical Apple Inc.
Publication of WO2023220071A2 publication Critical patent/WO2023220071A2/fr
Publication of WO2023220071A3 publication Critical patent/WO2023220071A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04815Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/033Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
    • G06F3/0346Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor with detection of the device orientation or free movement in a 3D space, e.g. 3D mice, 6-DOF [six degrees of freedom] pointers using gyroscopes, accelerometers or tilt-sensors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04845Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour

Definitions

  • This relates generally to computer systems for augmented and/or virtual reality, including but not limited to electronic devices for scanning and modeling environments, such as physical environments, and/or objects therein using augmented and/or virtual reality environments.
  • Augmented reality environments are useful for annotating and modeling physical environments and objects therein.
  • a user Before a model of a physical environment is generated, a user needs to scan the physical environment using depth and/or image sensing devices.
  • Conventional methods of scanning and modeling using augmented and/or virtual reality are cumbersome, inefficient, and limited.
  • conventional methods of scanning and modeling using augmented reality are limited in functionality, by not providing sufficient feedback and requiring the user to specify what type of features are being scanned.
  • conventional methods of scanning using augmented reality do not provide sufficient guidance to help the user scan the environment successfully and efficiently.
  • the computer system includes a desktop computer.
  • the computer system is portable (e.g., a notebook computer, tablet computer, or handheld device).
  • the computer system includes a personal electronic device (e.g., a wearable electronic device, such as a watch).
  • the computer system has (and/or is in communication with) a touchpad.
  • the computer system has (and/or is in communication with) a touch-sensitive display (also known as a “touch screen” or “touch-screen display”).
  • the computer system has a graphical user interface (GUI), one or more processors, memory and one or more modules, programs or sets of instructions stored in the memory for performing multiple functions.
  • GUI graphical user interface
  • the user interacts with the GUI in part through stylus and/or finger contacts and gestures on the touch-sensitive surface.
  • the functions optionally include game playing, image editing, drawing, presenting, word processing, spreadsheet making, telephoning, video conferencing, e-mailing, instant messaging, workout support, digital photographing, digital videoing, web browsing, digital music playing, note taking, and/or digital video playing. Executable instructions for performing these functions are, optionally, included in a non-transitory computer readable storage medium or other computer program product configured for execution by one or more processors.
  • a method is performed at a computer system that is in communication with a display generation component, one or more input devices, and one or more cameras.
  • the method includes displaying, via the display generation component, a first user interface, wherein the first user interface concurrently includes: a representation of a field of view of one or more cameras, the representation of the field of view including a first view of a physical environment that corresponds to a first viewpoint of a user in the physical environment, and a preview of a three-dimensional model of the physical environment.
  • the preview includes a partially completed three-dimensional model of the physical environment that is displayed with a first orientation that corresponds to the first viewpoint of the user.
  • the method includes, while displaying the first user interface, detecting first movement of the one or more cameras in the physical environment that changes a current viewpoint of the user in the physical environment from the first viewpoint to a second viewpoint.
  • the method further includes, in response to detecting the first movement of the one or more cameras: updating the preview of the three-dimensional model in the first user interface in accordance with the first movement of the one or more cameras, including adding additional information to the partially completed three- dimensional model and rotating the partially completed three-dimensional model from the first orientation that corresponds to the first viewpoint of the user to a second orientation that corresponds to the second viewpoint of the user.
  • the method includes, while displaying the first user interface, with the representation of the field of view including a second view of the physical environment that corresponds to the second viewpoint of the user, and with the preview of the three-dimensional model including the partially completed model with the second orientation, detecting first input directed to the preview of the three-dimensional model in the first user interface.
  • the method includes, in response to detecting the first input directed to the preview of the three-dimensional model in the first user interface: updating the preview of the three-dimensional model in the first user interface in accordance with the first input, including, in accordance with a determination that the first input meets first criteria, rotating the partially completed three-dimensional model from the second orientation that corresponds to the second viewpoint of the user to a third orientation that does not correspond to the second viewpoint of the user.
  • a method is performed at a computer system that is in communication with a display generation component, one or more input devices, and one or more cameras.
  • the method includes displaying, via the display generation component, a first user interface.
  • the first user interface includes a representation of a field of view of one or more cameras, and the representation of the field of view includes a respective view of a physical environment that corresponds to a current viewpoint of a user in the physical environment.
  • the method includes, while displaying the first user interface, in accordance with a determination that a first object has been detected in the field of view of the one or more cameras, displaying, at a first time , a first representation of the first object at a position in the representation of the field of view that corresponds to a location of the first object in the physical environment.
  • One or more spatial properties of the first representation of the first object have values that correspond to one or more spatial dimensions of the first object in the physical environment.
  • the method includes, at a second time later than the first time, replacing display of the first representation of the first object with display of a second representation of the first object in the representation of the field of view.
  • the second representation of the first object does not spatially indicate the one or more spatial dimensions of the first object in the physical environment.
  • a method is performed at a computer system that is in communication with a display generation component, one or more input devices, and one or more cameras.
  • the method includes, during a scan of a physical environment to obtain depth information of at least a portion of the physical environment: displaying, via the display generation component, a first user interface.
  • the first user interface includes a representation of a field of view of one or more cameras, and the representation of the field of view includes a respective view of a physical environment that corresponds to a current viewpoint of a user in the physical environment.
  • the method includes, while displaying the first user interface, detecting movement of the one or more cameras in the physical environment, including detecting first movement that changes the current viewpoint of the user from a first viewpoint in the physical environment to a second viewpoint in the physical environment.
  • the method further includes, in response to detecting the movement of the one or more cameras in the physical environment that includes the first movement that changes the current viewpoint of the user from the first viewpoint in the physical environment to the second viewpoint in the physical environment, in accordance with a determination that there is a respective portion of the physical environment that has not been scanned that is between a first portion of the physical environment that has been scanned and a second portion of the physical environment that has been scanned, displaying, in the first user interface, a first visual indication overlaying the representation of the field of view of the one or more cameras, wherein the first visual indication indicates a location of the respective portion of the physical environment in the field of view of the one or more cameras, while the respective portion of the physical environment is not visible in representation of the field of view of the one or more cameras.
  • a method is performed at a computer system that is in communication with a display generation component, one or more input devices, and one or more cameras.
  • the method includes, during a scan of a physical environment to obtain depth information of at least a portion of the physical environment, displaying, via the display generation component, a first user interface, wherein the first user interface includes a representation of a field of view of one or more cameras.
  • the method includes displaying a plurality of graphical objects overlaying the representation of the field of view of the one or more cameras, including displaying at least a first graphical object at a first location that represents one or more estimated spatial properties of a first physical feature that has been detected in a respective portion of the physical environment in the field of view of the one or more cameras, and a second graphical object at a second location that represents one or more estimated spatial properties of a second physical feature that has been detected in the respective portion of the physical environment in the field of view of the one or more cameras.
  • the method includes, while displaying the plurality of graphical objects overlaying the representation of the field of view of the one or more cameras, changing one or more visual properties of the first graphical object in accordance with variations in a respective predicted accuracy of the estimated spatial properties of the first physical feature, and changing the one more visual properties of the second graphical object in accordance with variations in a respective predicted accuracy of the estimated spatial properties of the second physical feature.
  • a computer system includes (and/or is in communication with) a display generation component (also called a display device, e.g., a display, a projector, a head-mounted display, a heads-up display, or the like), one or more cameras (e.g., video cameras that continuously, or repeatedly at regular intervals, provide a live preview of at least a portion of the contents that are within the field of view of the cameras and optionally generate video outputs including one or more streams of image frames capturing the contents within the field of view of the cameras), and one or more input devices (e.g., a touch- sensitive surface, such as a touch-sensitive remote control, or a touchscreen display that also serves as the display generation component, a mouse, a joystick, a wand controller, and/or one or more cameras tracking the position of one or more features of the user such as the user’s hands), optionally one or more depth sensors, optionally one or more pose sensors, optionally one or more sensors to detect
  • a display generation component also called a
  • a computer readable storage medium has stored therein instructions that, when executed by a computer system that includes (and/or is in communication with) a display generation component, one or more cameras, one or more input devices, optionally one or more pose sensors, optionally one or more sensors to detect intensities of contacts with the touch-sensitive surface, and optionally one or more tactile output generators, cause the computer system to perform or cause performance of the operations of any of the methods described herein.
  • a graphical user interface on a computer system that includes (and/or is in communication with) a display generation component, one or more cameras, one or more input devices, optionally one or more pose sensors, optionally one or more sensors to detect intensities of contacts with the touch-sensitive surface, optionally one or more tactile output generators, a memory, and one or more processors to execute one or more programs stored in the memory includes one or more of the elements displayed in any of the methods described herein, which are updated in response to inputs, in accordance with any of the methods described herein.
  • a computer system includes (and/or is in communication with) a display generation component, one or more cameras, one or more input devices, optionally one or more pose sensors, optionally one or more sensors to detect intensities of contacts with the touch-sensitive surface, optionally one or more tactile output generators, and means for performing or causing performance of the operations of any of the methods described herein.
  • an information processing apparatus for use in a computer system that includes (and/or is in communication with) a display generation component, one or more cameras, one or more input devices, optionally one or more pose sensors, optionally one or more sensors to detect intensities of contacts with the touch-sensitive surface, and optionally one or more tactile output generators, includes means for performing or causing performance of the operations of any of the methods described herein.
  • computer systems that have (and/or are in communication with) a display generation component, one or more cameras, one or more input devices, optionally one or more pose sensors, optionally one or more sensors to detect intensities of contacts with the touch-sensitive surface, and optionally one or more tactile output generators, are provided with improved methods and interfaces for annotating, measuring, and modeling environments, such as physical environments, and/or objects therein using augmented and/or virtual reality environments, thereby increasing the effectiveness, efficiency, and user satisfaction with such computer systems.
  • Such methods and interfaces may complement or replace conventional methods for annotating, measuring, and modeling environments, such as physical environments, and/or objects therein using augmented and/or virtual reality environments.
  • Figure 1 A is a block diagram illustrating a portable multifunction device with a touch-sensitive display in accordance with some embodiments.
  • Figure IB is a block diagram illustrating example components for event handling in accordance with some embodiments.
  • Figure 2A illustrates a portable multifunction device having a touch screen in accordance with some embodiments.
  • Figure 2B illustrates a portable multifunction device having optical sensors and a time-of-flight sensor in accordance with some embodiments.
  • Figure 3 A is a block diagram of an example multifunction device with a display and a touch-sensitive surface in accordance with some embodiments.
  • Figures 3B-3C are block diagrams of example computer systems in accordance with some embodiments.
  • Figure 4A illustrates an example user interface for presenting a menu of applications on a portable multifunction device in accordance with some embodiments.
  • Figure 4B illustrates an example user interface for a multifunction device with a touch-sensitive surface that is separate from the display in accordance with some embodiments.
  • Figures 5A-5AD illustrate example user interfaces for scanning and modeling an environment and interacting with a generated schematic representation thereof in accordance with some embodiments.
  • Figures 6A-6F are flow diagrams of a method of displaying a preview of a three-dimensional model of an environment during scanning and modeling of the environment, in accordance with some embodiments.
  • Figures 7A-7D are flow diagrams of a method of displaying representations of objects identified in an environment during scanning and modeling of the environment, in accordance with some embodiments.
  • Figures 8A-8D are flow diagrams of a method of providing guidance indicating location of a missed portion of a presumably completed portion of an environment during scanning and modeling of the environment, in accordance with some embodiments.
  • Figures 9A-9E are flow diagrams of a method of displaying scan progress indication during scanning and modeling of an environment, in accordance with some embodiments.
  • augmented reality environments are useful for facilitating scanning and modeling physical environments and objects therein, by providing different views of the physical environments and objects therein and guiding the user to move through the physical environments to capture the data necessary to generate the models of the physical environments.
  • Conventional methods of scanning and modeling using augmented and/or virtual reality environments are often limited in functionality.
  • conventional methods of scanning and modeling physical environments using augmented reality do not provide a preview of a three-dimensional model that is generated based on the scan until the scan is fully completed.
  • conventional methods of scanning and modeling physical environments using augmented reality display a three-dimensional representation of the physical environment during the scan of the physical environment, but do not allow the user to manipulate or view the three-dimensional representation from a different angle during the scan of the physical environment.
  • conventional methods of scanning and modeling physical environments do not scan and model structural and nonstructural elements of the physical environment simultaneously during the same scan and do not display annotations based on recognition of the structural elements and nonstructural elements in the augmented reality environment and the preview of the three- dimensional model of the physical environment.
  • the embodiments disclosed herein provide an intuitive way for a user to scan and model an environment using augmented and/or virtual reality environments (e.g., by providing more intelligent and sophisticated functionality, by enabling the user to perform different operations in the augmented reality environment with fewer inputs, and/or by simplifying the user interface). Additionally, the embodiments herein provide improved feedback that provide additional information to the user about the physical objects being scanned or modeled and about the operations being performed in the virtual/augmented reality environment.
  • the systems, methods, and GUIs described herein improve user interface interactions with augmented and/or virtual reality environments in multiple ways. For example, they make it easier to scan and model a physical environment, by providing automatic detection of features in the physical space and annotate different types of detected features, improved guidance, ... by providing the user with improved feedback about the progress of the modeling process while modeling an environment.
  • Figures 1A-1B, 2A-2B, and 3A-3C provide a description of example devices.
  • Figures 4A-4B and 5A-5AD illustrate example user interfaces for interacting with, annotating, scanning, and modeling environments, such as augmented reality environments.
  • Figures 6A-6F are flow diagrams of a method of displaying a preview of a three-dimensional model of an environment during scanning and modeling of the environment, in accordance with some embodiments.
  • Figures 7A-7D are flow diagrams of a method of displaying representations of objects identified in an environment during scanning and modeling of the environment, in accordance with some embodiments.
  • Figures 8A-8D are flow diagrams of a method of providing guidance indicating location of a missed portion of a presumably completed portion of an environment during scanning and modeling of the environment, in accordance with some embodiments.
  • Figures 9A-9E are flow diagrams of a method of displaying scan progress indication during scanning and modeling of an environment, in accordance with some embodiments.
  • the user interfaces in Figures 5A-5AD are used to illustrate the processes in Figures 6A-6F, 7A-7D, 8A-8D, and 9A-9E.
  • first, second, etc. are, in some instances, used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first contact could be termed a second contact, and, similarly, a second contact could be termed a first contact, without departing from the scope of the various described embodiments. The first contact and the second contact are both contacts, but they are not the same contact, unless the context clearly indicates otherwise.
  • the term “if’ is, optionally, construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context.
  • the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event]” or “in response to detecting [the stated condition or event],” depending on the context.
  • Computer systems for augmented and/or virtual reality include electronic devices that produce augmented and/or virtual reality environments. Embodiments of electronic devices, user interfaces for such devices, and associated processes for using such devices are described.
  • the device is a portable communications device, such as a mobile telephone, that also contains other functions, such as PDA and/or music player functions.
  • portable multifunction devices include, without limitation, the iPhone®, iPod Touch®, and iPad® devices from Apple Inc. of Cupertino, California.
  • Other portable electronic devices such as laptops or tablet computers with touch- sensitive surfaces (e.g., touch-screen displays and/or touchpads), are, optionally, used.
  • the device is not a portable communications device, but is a desktop computer with a touch-sensitive surface (e.g., a touch-screen display and/or a touchpad) that also includes, or is in communication with, one or more cameras.
  • a computer system that includes an electronic device that has (and/or is in communication with) a display and a touch-sensitive surface is described. It should be understood, however, that the computer system optionally includes one or more other physical user-interface devices, such as a physical keyboard, a mouse, a joystick, a wand controller, and/or cameras tracking the position of one or more features of the user such as the user’s hands.
  • the computer system optionally includes one or more other physical user-interface devices, such as a physical keyboard, a mouse, a joystick, a wand controller, and/or cameras tracking the position of one or more features of the user such as the user’s hands.
  • the device typically supports a variety of applications, such as one or more of the following: a gaming application, a note taking application, a drawing application, a presentation application, a word processing application, a spreadsheet application, a telephone application, a video conferencing application, an e-mail application, an instant messaging application, a workout support application, a photo management application, a digital camera application, a digital video camera application, a web browsing application, a digital music player application, and/or a digital video player application.
  • applications such as one or more of the following: a gaming application, a note taking application, a drawing application, a presentation application, a word processing application, a spreadsheet application, a telephone application, a video conferencing application, an e-mail application, an instant messaging application, a workout support application, a photo management application, a digital camera application, a digital video camera application, a web browsing application, a digital music player application, and/or a digital video player application.
  • the various applications that are executed on the device optionally use at least one common physical user-interface device, such as the touch-sensitive surface.
  • One or more functions of the touch-sensitive surface as well as corresponding information displayed by the device are, optionally, adjusted and/or varied from one application to the next and/or within a respective application.
  • a common physical architecture (such as the touch- sensitive surface) of the device optionally supports the variety of applications with user interfaces that are intuitive and transparent to the user.
  • FIG. 1 A is a block diagram illustrating portable multifunction device 100 with touch-sensitive display system 112 in accordance with some embodiments.
  • Touch- sensitive display system 112 is sometimes called a “touch screen” for convenience, and is sometimes simply called a touch-sensitive display.
  • Device 100 includes memory 102 (which optionally includes one or more computer readable storage mediums), memory controller 122, one or more processing units (CPUs) 120, peripherals interface 118, RF circuitry 108, audio circuitry 110, speaker 111, microphone 113, input/output (VO) subsystem 106, other input or control devices 116, and external port 124.
  • memory 102 which optionally includes one or more computer readable storage mediums
  • memory controller 122 includes one or more processing units (CPUs) 120, peripherals interface 118, RF circuitry 108, audio circuitry 110, speaker 111, microphone 113, input/output (VO) subsystem 106, other input or control devices 116, and external port 124.
  • CPUs central processing unit
  • Device 100 optionally includes one or more optical sensors 164 (e.g., as part of one or more cameras).
  • Device 100 optionally includes one or more intensity sensors 165 for detecting intensities of contacts on device 100 (e.g., a touch-sensitive surface such as touch-sensitive display system 112 of device 100).
  • Device 100 optionally includes one or more tactile output generators 163 for generating tactile outputs on device 100 (e.g., generating tactile outputs on a touch-sensitive surface such as touch-sensitive display system 112 of device 100 or touchpad 355 of device 300). These components optionally communicate over one or more communication buses or signal lines 103.
  • the term “tactile output” refers to physical displacement of a device relative to a previous position of the device, physical displacement of a component (e.g., a touch-sensitive surface) of a device relative to another component (e.g., housing) of the device, or displacement of the component relative to a center of mass of the device that will be detected by a user with the user’s sense of touch.
  • a component e.g., a touch-sensitive surface
  • another component e.g., housing
  • the tactile output generated by the physical displacement will be interpreted by the user as a tactile sensation corresponding to a perceived change in physical characteristics of the device or the component of the device.
  • a touch-sensitive surface e.g., a touch-sensitive display or trackpad
  • the user is, optionally, interpreted by the user as a “down click” or “up click” of a physical actuator button.
  • a user will feel a tactile sensation such as an “down click” or “up click” even when there is no movement of a physical actuator button associated with the touch-sensitive surface that is physically pressed (e.g., displaced) by the user’s movements.
  • movement of the touch-sensitive surface is, optionally, interpreted or sensed by the user as “roughness” of the touch-sensitive surface, even when there is no change in smoothness of the touch-sensitive surface. While such interpretations of touch by a user will be subject to the individualized sensory perceptions of the user, there are many sensory perceptions of touch that are common to a large majority of users.
  • a tactile output when a tactile output is described as corresponding to a particular sensory perception of a user (e.g., an “up click,” a “down click,” “roughness”), unless otherwise stated, the generated tactile output corresponds to physical displacement of the device or a component thereof that will generate the described sensory perception for a typical (or average) user.
  • Using tactile outputs to provide haptic feedback to a user enhances the operability of the device and makes the user-device interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the device) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the device more quickly and efficiently.
  • device 100 is only one example of a portable multifunction device, and that device 100 optionally has more or fewer components than shown, optionally combines two or more components, or optionally has a different configuration or arrangement of the components.
  • the various components shown in Figure 1 A are implemented in hardware, software, firmware, or a combination thereof, including one or more signal processing and/or application specific integrated circuits.
  • Memory 102 optionally includes high-speed random access memory and optionally also includes non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state memory devices. Access to memory 102 by other components of device 100, such as CPU(s) 120 and the peripherals interface 118, is, optionally, controlled by memory controller 122.
  • Peripherals interface 118 can be used to couple input and output peripherals of the device to CPU(s) 120 and memory 102.
  • the one or more processors 120 run or execute various software programs and/or sets of instructions stored in memory 102 to perform various functions for device 100 and to process data.
  • peripherals interface 118, CPU(s) 120, and memory controller 122 are, optionally, implemented on a single chip, such as chip 104. In some other embodiments, they are, optionally, implemented on separate chips.
  • RF (radio frequency) circuitry 108 receives and sends RF signals, also called electromagnetic signals. RF circuitry 108 converts electrical signals to/from electromagnetic signals and communicates with communications networks and other communications devices via the electromagnetic signals.
  • RF circuitry 108 optionally includes well-known circuitry for performing these functions, including but not limited to an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chipset, a subscriber identity module (SIM) card, memory, and so forth.
  • RF circuitry 108 optionally communicates with networks, such as the Internet, also referred to as the World Wide Web (WWW), an intranet and/or a wireless network, such as a cellular telephone network, a wireless local area network (LAN) and/or a metropolitan area network (MAN), and other devices by wireless communication.
  • networks such as the Internet, also referred to as the World Wide Web (WWW), an intranet and/or a wireless network, such as a cellular telephone network, a wireless local area network (LAN) and/or a metropolitan area network (MAN), and other devices by wireless communication.
  • WWW World Wide Web
  • LAN wireless local area network
  • the wireless communication optionally uses any of a plurality of communications standards, protocols and technologies, including but not limited to Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), high-speed downlink packet access (HSDPA), high-speed uplink packet access (HSUPA), Evolution, Data-Only (EV-DO), HSPA, HSPA+, Dual-Cell HSPA (DC-HSPA), long term evolution (LTE), near field communication (NFC), wideband code division multiple access (W-CDMA), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wireless Fidelity (Wi-Fi) (e.g., IEEE 802.
  • GSM Global System for Mobile Communications
  • EDGE Enhanced Data GSM Environment
  • HSDPA high-speed downlink packet access
  • HUPA high-speed uplink packet access
  • Evolution, Data-Only (EV-DO) Evolution, Data-Only
  • HSPA HSPA+
  • DC-HSPA Dual-Cell HSPA
  • LTE long term evolution
  • I la IEEE 802.1 lac, IEEE 802.1 lax, IEEE 802.1 lb, IEEE 802.11g and/or IEEE 802.1 In
  • VoIP voice over Internet Protocol
  • Wi-MAX a protocol for e-mail (e.g., Internet message access protocol (IMAP) and/or post office protocol (POP)), instant messaging (e.g., extensible messaging and presence protocol (XMPP), Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions (SIMPLE), Instant Messaging and Presence Service (IMPS)), and/or Short Message Service (SMS), or any other suitable communication protocol, including communication protocols not yet developed as of the filing date of this document.
  • IMAP Internet message access protocol
  • POP post office protocol
  • instant messaging e.g., extensible messaging and presence protocol (XMPP), Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions (SIMPLE), Instant Messaging and Presence Service (IMPS)
  • SMS Short Message Service
  • Audio circuitry 110, speaker 111, and microphone 113 provide an audio interface between a user and device 100.
  • Audio circuitry 110 receives audio data from peripherals interface 118, converts the audio data to an electrical signal, and transmits the electrical signal to speaker 111.
  • Speaker 111 converts the electrical signal to human-audible sound waves.
  • Audio circuitry 110 also receives electrical signals converted by microphone 113 from sound waves.
  • Audio circuitry 110 converts the electrical signal to audio data and transmits the audio data to peripherals interface 118 for processing. Audio data is, optionally, retrieved from and/or transmitted to memory 102 and/or RF circuitry 108 by peripherals interface 118.
  • audio circuitry 110 also includes a headset jack (e.g., 212, Figure 2A). The headset jack provides an interface between audio circuitry 110 and removable audio input/output peripherals, such as output-only headphones or a headset with both output (e.g., a headphone for one or both ears) and input (e.g., a microphone).
  • I/O subsystem 106 couples input/output peripherals on device 100, such as touch-sensitive display system 112 and other input or control devices 116, with peripherals interface 118.
  • I/O subsystem 106 optionally includes display controller 156, optical sensor controller 158, intensity sensor controller 159, haptic feedback controller 161, and one or more input controllers 160 for other input or control devices.
  • the one or more input controllers 160 receive/send electrical signals from/to other input or control devices 116.
  • the other input or control devices 116 optionally include physical buttons (e.g., push buttons, rocker buttons, etc.), dials, slider switches, joysticks, click wheels, and so forth.
  • input controlled s) 160 are, optionally, coupled with any (or none) of the following: a keyboard, infrared port, USB port, stylus, and/or a pointer device such as a mouse.
  • the one or more buttons optionally include an up/down button for volume control of speaker 111 and/or microphone 113.
  • the one or more buttons optionally include a push button (e.g., 206, Figure 2A).
  • Touch-sensitive display system 112 provides an input interface and an output interface between the device and a user.
  • Display controller 156 receives and/or sends electrical signals from/to touch-sensitive display system 112.
  • Touch-sensitive display system 112 displays visual output to the user.
  • the visual output optionally includes graphics, text, icons, video, and any combination thereof (collectively termed “graphics”).
  • graphics optionally includes graphics, text, icons, video, and any combination thereof (collectively termed “graphics”).
  • some or all of the visual output corresponds to user interface objects.
  • the term “affordance” refers to a user-interactive graphical user interface object (e.g., a graphical user interface object that is configured to respond to inputs directed toward the graphical user interface object). Examples of user-interactive graphical user interface objects include, without limitation, a button, slider, icon, selectable menu item, switch, hyperlink, or other user interface control.
  • Touch-sensitive display system 112 has a touch-sensitive surface, sensor or set of sensors that accepts input from the user based on haptic and/or tactile contact.
  • Touch-sensitive display system 112 and display controller 156 (along with any associated modules and/or sets of instructions in memory 102) detect contact (and any movement or breaking of the contact) on touch-sensitive display system 112 and converts the detected contact into interaction with user-interface objects (e.g., one or more soft keys, icons, web pages or images) that are displayed on touch-sensitive display system 112.
  • user-interface objects e.g., one or more soft keys, icons, web pages or images
  • a point of contact between touch-sensitive display system 112 and the user corresponds to a finger of the user or a stylus.
  • Touch-sensitive display system 112 optionally uses LCD (liquid crystal display) technology, LPD (light emitting polymer display) technology, or LED (light emitting diode) technology, although other display technologies are used in other embodiments.
  • Touch-sensitive display system 112 and display controller 156 optionally detect contact and any movement or breaking thereof using any of a plurality of touch sensing technologies now known or later developed, including but not limited to capacitive, resistive, infrared, and surface acoustic wave technologies, as well as other proximity sensor arrays or other elements for determining one or more points of contact with touch-sensitive display system 112.
  • capacitive, resistive, infrared, and surface acoustic wave technologies as well as other proximity sensor arrays or other elements for determining one or more points of contact with touch-sensitive display system 112.
  • projected mutual capacitance sensing technology is used, such as that found in the iPhone®, iPod Touch®, and iPad® from Apple Inc. of Cupertino, California.
  • Touch-sensitive display system 112 optionally has a video resolution in excess of 100 dpi. In some embodiments, the touch screen video resolution is in excess of 400 dpi (e.g., 500 dpi, 800 dpi, or greater).
  • the user optionally makes contact with touch-sensitive display system 112 using any suitable object or appendage, such as a stylus, a finger, and so forth.
  • the user interface is designed to work with finger-based contacts and gestures, which can be less precise than stylus-based input due to the larger area of contact of a finger on the touch screen.
  • the device translates the rough finger-based input into a precise pointer/cursor position or command for performing the actions desired by the user.
  • device 100 in addition to the touch screen, device 100 optionally includes a touchpad for activating or deactivating particular functions.
  • the touchpad is a touch-sensitive area of the device that, unlike the touch screen, does not display visual output.
  • the touchpad is, optionally, a touch-sensitive surface that is separate from touch-sensitive display system 112 or an extension of the touch-sensitive surface formed by the touch screen.
  • Device 100 also includes power system 162 for powering the various components.
  • Power system 162 optionally includes a power management system, one or more power sources (e.g., battery, alternating current (AC)), a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator (e.g., a lightemitting diode (LED)) and any other components associated with the generation, management and distribution of power in portable devices.
  • a power management system one or more power sources (e.g., battery, alternating current (AC)), a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator (e.g., a lightemitting diode (LED)) and any other components associated with the generation, management and distribution of power in portable devices.
  • power sources e.g., battery, alternating current (AC)
  • AC alternating current
  • a recharging system e.g., a recharging system
  • a power failure detection circuit e.g., a
  • Device 100 optionally also includes one or more optical sensors 164 (e.g., as part of one or more cameras).
  • Figure 1 A shows an optical sensor coupled with optical sensor controller 158 in I/O subsystem 106.
  • Optical sensor(s) 164 optionally include charge-coupled device (CCD) or complementary metal-oxide semiconductor (CMOS) phototransistors.
  • CMOS complementary metal-oxide semiconductor
  • Optical sensor(s) 164 receive light from the environment, projected through one or more lens, and converts the light to data representing an image.
  • imaging module 143 also called a camera module
  • optical sensor(s) 164 optionally capture still images and/or video.
  • an optical sensor is located on the back of device 100, opposite touch-sensitive display system 112 on the front of the device, so that the touch screen is enabled for use as a viewfinder for still and/or video image acquisition.
  • another optical sensor is located on the front of the device so that the user's image is obtained (e.g., for selfies, for videoconferencing while the user views the other video conference participants on the touch screen, etc.).
  • Device 100 optionally also includes one or more contact intensity sensors 165.
  • Figure 1A shows a contact intensity sensor coupled with intensity sensor controller 159 in I/O subsystem 106.
  • Contact intensity sensor(s) 165 optionally include one or more piezoresistive strain gauges, capacitive force sensors, electric force sensors, piezoelectric force sensors, optical force sensors, capacitive touch-sensitive surfaces, or other intensity sensors (e.g., sensors used to measure the force (or pressure) of a contact on a touch-sensitive surface).
  • Contact intensity sensor(s) 165 receive contact intensity information (e.g., pressure information or a proxy for pressure information) from the environment.
  • contact intensity information e.g., pressure information or a proxy for pressure information
  • At least one contact intensity sensor is collocated with, or proximate to, a touch-sensitive surface (e.g., touch-sensitive display system 112). In some embodiments, at least one contact intensity sensor is located on the back of device 100, opposite touch-screen display system 112 which is located on the front of device 100.
  • Device 100 optionally also includes one or more proximity sensors 166.
  • Figure 1 A shows proximity sensor 166 coupled with peripherals interface 118. Alternately, proximity sensor 166 is coupled with input controller 160 in I/O subsystem 106. In some embodiments, the proximity sensor turns off and disables touch-sensitive display system 112 when the multifunction device is placed near the user's ear (e.g., when the user is making a phone call).
  • Device 100 optionally also includes one or more tactile output generators 163.
  • Figure 1 A shows a tactile output generator coupled with haptic feedback controller 161 in VO subsystem 106.
  • tactile output generator(s) 163 include one or more electroacoustic devices such as speakers or other audio components and/or electromechanical devices that convert energy into linear motion such as a motor, solenoid, electroactive polymer, piezoelectric actuator, electrostatic actuator, or other tactile output generating component (e.g., a component that converts electrical signals into tactile outputs on the device).
  • Tactile output generator(s) 163 receive tactile feedback generation instructions from haptic feedback module 133 and generates tactile outputs on device 100 that are capable of being sensed by a user of device 100.
  • At least one tactile output generator is collocated with, or proximate to, a touch-sensitive surface (e.g., touch-sensitive display system 112) and, optionally, generates a tactile output by moving the touch- sensitive surface vertically (e.g., in/out of a surface of device 100) or laterally (e.g., back and forth in the same plane as a surface of device 100).
  • at least one tactile output generator sensor is located on the back of device 100, opposite touch-sensitive display system 112, which is located on the front of device 100.
  • Device 100 optionally also includes one or more accelerometers 167, gyroscopes 168, and/or magnetometers 169 (e.g., as part of an inertial measurement unit (IMU)) for obtaining information concerning the pose (e.g., position and orientation or attitude) of the device.
  • Figure 1 A shows sensors 167, 168, and 169 coupled with peripherals interface 118.
  • sensors 167, 168, and 169 are, optionally, coupled with an input controller 160 in I/O subsystem 106.
  • information is displayed on the touch-screen display in a portrait view or a landscape view based on an analysis of data received from the one or more accelerometers.
  • Device 100 optionally includes a GPS (or GLONASS or other global navigation system) receiver for obtaining information concerning the location of device 100.
  • the software components stored in memory 102 include operating system 126, communication module (or set of instructions) 128, contact/motion module (or set of instructions) 130, graphics module (or set of instructions) 132, haptic feedback module (or set of instructions) 133, text input module (or set of instructions) 134, Global Positioning System (GPS) module (or set of instructions) 135, and applications (or sets of instructions) 136.
  • memory 102 stores device/global internal state 157, as shown in Figures 1A and 3.
  • Device/global internal state 157 includes one or more of: active application state, indicating which applications, if any, are currently active; display state, indicating what applications, views or other information occupy various regions of touch-sensitive display system 112; sensor state, including information obtained from the device’s various sensors and other input or control devices 116; and location and/or positional information concerning the device’s pose (e.g., location and/or attitude).
  • Operating system 126 e.g., iOS, Android, Darwin, RTXC, LINUX, UNIX, OS X, WINDOWS, or an embedded operating system such as VxWorks
  • Operating system 126 includes various software components and/or drivers for controlling and managing general system tasks (e.g., memory management, storage device control, power management, etc.) and facilitates communication between various hardware and software components.
  • general system tasks e.g., memory management, storage device control, power management, etc.
  • Communication module 128 facilitates communication with other devices over one or more external ports 124 and also includes various software components for handling data received by RF circuitry 108 and/or external port 124.
  • External port 124 e.g., Universal Serial Bus (USB), FIREWIRE, etc.
  • USB Universal Serial Bus
  • FIREWIRE FireWire
  • the external port is a multi-pin (e.g., 30-pin) connector that is the same as, or similar to and/or compatible with the 30-pin connector used in some iPhone®, iPod Touch®, and iPad® devices from Apple Inc. of Cupertino, California.
  • the external port is a Lightning connector that is the same as, or similar to and/or compatible with the Lightning connector used in some iPhone®, iPod Touch®, and iPad® devices from Apple Inc. of Cupertino, California.
  • the external port is a USB Type-C connector that is the same as, or similar to and/or compatible with the USB Type-C connector used in some electronic devices from Apple Inc. of Cupertino, California.
  • Contact/motion module 130 optionally detects contact with touch-sensitive display system 112 (in conjunction with display controller 156) and other touch- sensitive devices (e.g., a touchpad or physical click wheel).
  • Contact/motion module 130 includes various software components for performing various operations related to detection of contact (e.g., by a finger or by a stylus), such as determining if contact has occurred (e.g., detecting a finger-down event), determining an intensity of the contact (e.g., the force or pressure of the contact or a substitute for the force or pressure of the contact), determining if there is movement of the contact and tracking the movement across the touch-sensitive surface (e.g., detecting one or more finger-dragging events), and determining if the contact has ceased (e.g., detecting a finger-up event or a break in contact).
  • Contact/motion module 130 receives contact data from the touch-sensitive surface.
  • Determining movement of the point of contact which is represented by a series of contact data, optionally includes determining speed (magnitude), velocity (magnitude and direction), and/or an acceleration (a change in magnitude and/or direction) of the point of contact. These operations are, optionally, applied to single contacts (e.g., one finger contacts or stylus contacts) or to multiple simultaneous contacts (e.g., “multitouch”/multiple finger contacts).
  • contact/motion module 130 and display controller 156 detect contact on a touchpad.
  • Contact/motion module 130 optionally detects a gesture input by a user. Different gestures on the touch-sensitive surface have different contact patterns (e.g., different motions, timings, and/or intensities of detected contacts). Thus, a gesture is, optionally, detected by detecting a particular contact pattern. For example, detecting a finger tap gesture includes detecting a finger-down event followed by detecting a finger-up (lift off) event at the same position (or substantially the same position) as the finger-down event (e.g., at the position of an icon).
  • detecting a finger swipe gesture on the touch- sensitive surface includes detecting a finger-down event followed by detecting one or more finger-dragging events, and subsequently followed by detecting a finger-up (lift off) event.
  • tap, swipe, drag, and other gestures are optionally detected for a stylus by detecting a particular contact pattern for the stylus.
  • detecting a finger tap gesture depends on the length of time between detecting the finger-down event and the finger-up event, but is independent of the intensity of the finger contact between detecting the finger-down event and the finger-up event.
  • a tap gesture is detected in accordance with a determination that the length of time between the finger-down event and the finger-up event is less than a predetermined value (e.g., less than 0.1, 0.2, 0.3, 0.4 or 0.5 seconds), independent of whether the intensity of the finger contact during the tap meets a given intensity threshold (greater than a nominal contact-detection intensity threshold), such as a light press or deep press intensity threshold.
  • a finger tap gesture can satisfy particular input criteria that do not require that the characteristic intensity of a contact satisfy a given intensity threshold in order for the particular input criteria to be met.
  • the finger contact in a tap gesture typically needs to satisfy a nominal contact-detection intensity threshold, below which the contact is not detected, in order for the finger-down event to be detected.
  • a similar analysis applies to detecting a tap gesture by a stylus or other contact.
  • the nominal contact-detection intensity threshold optionally does not correspond to physical contact between the finger or stylus and the touch sensitive surface.
  • a swipe gesture, a pinch gesture, a depinch gesture, and/or a long press gesture are optionally detected based on the satisfaction of criteria that are either independent of intensities of contacts included in the gesture, or do not require that contact(s) that perform the gesture reach intensity thresholds in order to be recognized.
  • a swipe gesture is detected based on an amount of movement of one or more contacts;
  • a pinch gesture is detected based on movement of two or more contacts towards each other;
  • a depinch gesture is detected based on movement of two or more contacts away from each other;
  • a long press gesture is detected based on a duration of the contact on the touch-sensitive surface with less than a threshold amount of movement.
  • the statement that particular gesture recognition criteria do not require that the intensity of the contact(s) meet a respective intensity threshold in order for the particular gesture recognition criteria to be met means that the particular gesture recognition criteria are capable of being satisfied if the contact(s) in the gesture do not reach the respective intensity threshold, and are also capable of being satisfied in circumstances where one or more of the contacts in the gesture do reach or exceed the respective intensity threshold.
  • a tap gesture is detected based on a determination that the finger-down and finger-up event are detected within a predefined time period, without regard to whether the contact is above or below the respective intensity threshold during the predefined time period, and a swipe gesture is detected based on a determination that the contact movement is greater than a predefined magnitude, even if the contact is above the respective intensity threshold at the end of the contact movement.
  • detection of a gesture is influenced by the intensity of contacts performing the gesture (e.g., the device detects a long press more quickly when the intensity of the contact is above an intensity threshold or delays detection of a tap input when the intensity of the contact is higher), the detection of those gestures does not require that the contacts reach a particular intensity threshold so long as the criteria for recognizing the gesture can be met in circumstances where the contact does not reach the particular intensity threshold (e.g., even if the amount of time that it takes to recognize the gesture changes).
  • Contact intensity thresholds, duration thresholds, and movement thresholds are, in some circumstances, combined in a variety of different combinations in order to create heuristics for distinguishing two or more different gestures directed to the same input element or region so that multiple different interactions with the same input element are enabled to provide a richer set of user interactions and responses.
  • the statement that a particular set of gesture recognition criteria do not require that the intensity of the contact(s) meet a respective intensity threshold in order for the particular gesture recognition criteria to be met does not preclude the concurrent evaluation of other intensity-dependent gesture recognition criteria to identify other gestures that do have criteria that are met when a gesture includes a contact with an intensity above the respective intensity threshold.
  • first gesture recognition criteria for a first gesture - which do not require that the intensity of the contact(s) meet a respective intensity threshold in order for the first gesture recognition criteria to be met - are in competition with second gesture recognition criteria for a second gesture - which are dependent on the contact(s) reaching the respective intensity threshold.
  • the gesture is, optionally, not recognized as meeting the first gesture recognition criteria for the first gesture if the second gesture recognition criteria for the second gesture are met first. For example, if a contact reaches the respective intensity threshold before the contact moves by a predefined amount of movement, a deep press gesture is detected rather than a swipe gesture.
  • a swipe gesture is detected rather than a deep press gesture.
  • the first gesture recognition criteria for the first gesture still do not require that the intensity of the contact(s) meet a respective intensity threshold in order for the first gesture recognition criteria to be met because if the contact stayed below the respective intensity threshold until an end of the gesture (e.g., a swipe gesture with a contact that does not increase to an intensity above the respective intensity threshold), the gesture would have been recognized by the first gesture recognition criteria as a swipe gesture.
  • particular gesture recognition criteria that do not require that the intensity of the contact(s) meet a respective intensity threshold in order for the particular gesture recognition criteria to be met will (A) in some circumstances ignore the intensity of the contact with respect to the intensity threshold (e.g. for a tap gesture) and/or (B) in some circumstances still be dependent on the intensity of the contact with respect to the intensity threshold in the sense that the particular gesture recognition criteria (e.g., for a long press gesture) will fail if a competing set of intensitydependent gesture recognition criteria (e.g., for a deep press gesture) recognize an input as corresponding to an intensity-dependent gesture before the particular gesture recognition criteria recognize a gesture corresponding to the input (e.g., for a long press gesture that is competing with a deep press gesture for recognition).
  • a competing set of intensitydependent gesture recognition criteria e.g., for a deep press gesture
  • Pose module 131 in conjunction with accelerometers 167, gyroscopes 168, and/or magnetometers 169, optionally detects pose information concerning the device, such as the device’s pose (e.g., roll, pitch, yaw and/or position) in a particular frame of reference.
  • Pose module 131 includes software components for performing various operations related to detecting the position of the device and detecting changes to the pose of the device.
  • Graphics module 132 includes various known software components for rendering and displaying graphics on touch-sensitive display system 112 or other display, including components for changing the visual impact (e.g., brightness, transparency, saturation, contrast or other visual property) of graphics that are displayed.
  • graphics includes any object that can be displayed to a user, including without limitation text, web pages, icons (such as user-interface objects including soft keys), digital images, videos, animations and the like.
  • graphics module 132 stores data representing graphics to be used. Each graphic is, optionally, assigned a corresponding code. Graphics module 132 receives, from applications etc., one or more codes specifying graphics to be displayed along with, if necessary, coordinate data and other graphic property data, and then generates screen image data to output to display controller 156.
  • Haptic feedback module 133 includes various software components for generating instructions (e.g., instructions used by haptic feedback controller 161) to produce tactile outputs using tactile output generator(s) 163 at one or more locations on device 100 in response to user interactions with device 100.
  • instructions e.g., instructions used by haptic feedback controller 161 to produce tactile outputs using tactile output generator(s) 163 at one or more locations on device 100 in response to user interactions with device 100.
  • Text input module 134 which is, optionally, a component of graphics module 132, provides soft keyboards for entering text in various applications (e.g., contacts 137, e-mail 140, IM 141, browser 147, and any other application that needs text input).
  • applications e.g., contacts 137, e-mail 140, IM 141, browser 147, and any other application that needs text input.
  • GPS module 135 determines the location of the device and provides this information for use in various applications (e.g., to telephone 138 for use in location-based dialing, to camera 143 as picture/video metadata, and to applications that provide locationbased services such as weather widgets, local yellow page widgets, and map/navigation widgets).
  • applications e.g., to telephone 138 for use in location-based dialing, to camera 143 as picture/video metadata, and to applications that provide locationbased services such as weather widgets, local yellow page widgets, and map/navigation widgets.
  • Virtual/augmented reality module 145 provides virtual and/or augmented reality logic to applications 136 that implement augmented reality, and in some embodiments virtual reality, features. Virtual/augmented reality module 145 facilitates superposition of virtual content, such as a virtual user interface object, on a representation of at least a portion of a field of view of the one or more cameras.
  • virtual content such as a virtual user interface object
  • the representation of at least a portion of a field of view of the one or more cameras may include a respective physical object and the virtual user interface object may be displayed at a location, in a displayed augmented reality environment, that is determined based on the respective physical object in the field of view of the one or more cameras or a virtual reality environment that is determined based on the pose of at least a portion of a computer system (e.g., a pose of a display device that is used to display the user interface to a user of the computer system).
  • a pose of a display device that is used to display the user interface to a user of the computer system.
  • Applications 136 optionally include the following modules (or sets of instructions), or a subset or superset thereof:
  • contacts module 137 (sometimes called an address book or contact list);
  • calendar module 148 • calendar module 148;
  • widget modules 149 which optionally include one or more of: weather widget 149-1, stocks widget 149-2, calculator widget 149-3, alarm clock widget 149-4, dictionary widget 149-5, and other widgets obtained by the user, as well as user-created widgets 149-6;
  • widget creator module 150 for making user-created widgets 149-6;
  • search module 151 • search module 151;
  • video and music player module 152 which is, optionally, made up of a video player module and a music player module;
  • map module 154 • map module 154;
  • Examples of other applications 136 that are, optionally, stored in memory 102 include other word processing applications, other image editing applications, drawing applications, presentation applications, JAVA-enabled applications, encryption, digital rights management, voice recognition, and voice replication.
  • contacts module 137 includes executable instructions to manage an address book or contact list (e.g., stored in application internal state 192 of contacts module 137 in memory 102 or memory 370), including: adding name(s) to the address book; deleting name(s) from the address book; associating telephone number(s), e-mail address(es), physical address(es) or other information with a name; associating an image with a name; categorizing and sorting names; providing telephone numbers and/or e-mail addresses to initiate and/or facilitate communications by telephone 138, video conference 139, e-mail 140, or IM 141; and so forth.
  • an address book or contact list e.g., stored in application internal state 192 of contacts module 137 in memory 102 or memory 370
  • telephone module 138 includes executable instructions to enter a sequence of characters corresponding to a telephone number, access one or more telephone numbers in address book 137, modify a telephone number that has been entered, dial a respective telephone number, conduct a conversation and disconnect or hang up when the conversation is completed.
  • the wireless communication optionally uses any of a plurality of communications standards, protocols and technologies.
  • videoconferencing module 139 includes executable instructions to initiate, conduct, and terminate a video conference between a user and one or more other participants in accordance with user instructions.
  • e-mail client module 140 includes executable instructions to create, send, receive, and manage e-mail in response to user instructions.
  • e-mail client module 140 makes it very easy to create and send e-mails with still or video images taken with camera module 143.
  • the instant messaging module 141 includes executable instructions to enter a sequence of characters corresponding to an instant message, to modify previously entered characters, to transmit a respective instant message (for example, using a Short Message Service (SMS) or Multimedia Message Service (MMS) protocol for telephony -based instant messages or using XMPP, SIMPLE, Apple Push Notification Service (APNs) or IMPS for Internet-based instant messages), to receive instant messages, and to view received instant messages.
  • SMS Short Message Service
  • MMS Multimedia Message Service
  • APIs Apple Push Notification Service
  • IMPS Internet Messaging Protocol
  • transmitted and/or received instant messages optionally include graphics, photos, audio files, video files and/or other attachments as are supported in a MMS and/or an Enhanced Messaging Service (EMS).
  • EMS Enhanced Messaging Service
  • instant messaging refers to both telephony-based messages (e.g., messages sent using SMS or MMS) and Internet-based messages (e.g., messages sent using XMPP, SIMPLE, APNs, or IMPS).
  • workout support module 142 includes executable instructions to create workouts (e.g., with time, distance, and/or calorie burning goals); communicate with workout sensors (in sports devices and smart watches); receive workout sensor data; calibrate sensors used to monitor a workout; select and play music for a workout; and display, store and transmit workout data.
  • camera module 143 includes executable instructions to capture still images or video (including a video stream) and store them into memory 102, modify characteristics of a still image or video, and/or delete a still image or video from memory 102.
  • image management module 144 includes executable instructions to arrange, modify (e.g., edit), or otherwise manipulate, label, delete, present (e.g., in a digital slide show or album), and store still and/or video images.
  • modify e.g., edit
  • present e.g., in a digital slide show or album
  • browser module 147 includes executable instructions to browse the Internet in accordance with user instructions, including searching, linking to, receiving, and displaying web pages or portions thereof, as well as attachments and other files linked to web pages.
  • calendar module 148 includes executable instructions to create, display, modify, and store calendars and data associated with calendars (e.g., calendar entries, to do lists, etc.) in accordance with user instructions.
  • widget modules 149 are mini-applications that are, optionally, downloaded and used by a user (e.g., weather widget 149-1, stocks widget 149-2, calculator widget 149-3, alarm clock widget 149-4, and dictionary widget 149-5) or created by the user (e.g., user-created widget 149-6).
  • a widget includes an HTML (Hypertext Markup Language) file, a CSS (Cascading Style Sheets) file, and a JavaScript file.
  • a widget includes an XML (Extensible Markup Language) file and a JavaScript file (e.g., Yahoo! Widgets).
  • the widget creator module 150 includes executable instructions to create widgets (e.g., turning a user-specified portion of a web page into a widget).
  • search module 151 includes executable instructions to search for text, music, sound, image, video, and/or other files in memory 102 that match one or more search criteria (e.g., one or more user-specified search terms) in accordance with user instructions.
  • search criteria e.g., one or more user-specified search terms
  • video and music player module 152 includes executable instructions that allow the user to download and play back recorded music and other sound files stored in one or more file formats, such as MP3 or AAC files, and executable instructions to display, present or otherwise play back videos (e.g., on touch-sensitive display system 112, or on an external display connected wirelessly or via external port 124).
  • device 100 optionally includes the functionality of an MP3 player, such as an iPod (trademark of Apple Inc.).
  • notes module 153 includes executable instructions to create and manage notes, to do lists, and the like in accordance with user instructions.
  • map module 154 includes executable instructions to receive, display, modify, and store maps and data associated with maps (e.g., driving directions; data on stores and other points of interest at or near a particular location; and other location-based data) in accordance with user instructions.
  • online video module 155 includes executable instructions that allow the user to access, browse, receive (e.g., by streaming and/or download), play back (e.g., on the touch screen 112, or on an external display connected wirelessly or via external port 124), send an e-mail with a link to a particular online video, and otherwise manage online videos in one or more file formats, such as H.264.
  • instant messaging module 141 rather than e-mail client module 140, is used to send a link to a particular online video.
  • annotation and modeling module 195 includes executable instructions that allow the user to model physical environments and/or physical objects therein and to annotate (e.g., measure, draw on, and/or add virtual objects to and manipulate virtual objects within) a representation (e.g., live or previously-captured) of a physical environment and/or physical objects therein in an augmented and/or virtual reality environment, as described in more detail herein.
  • ToF sensor module 196 includes executable instructions for capturing depth information of a physical environment.
  • ToF sensor module 196 operates in conjunction with camera module 143 to provide depth information of a physical environment.
  • modules and applications correspond to a set of executable instructions for performing one or more functions described above and the methods described in this application (e.g., the computer-implemented methods and other information processing methods described herein).
  • modules i.e., sets of instructions
  • memory 102 optionally stores a subset of the modules and data structures identified above.
  • memory 102 optionally stores additional modules and data structures not described above.
  • device 100 is a device where operation of a predefined set of functions on the device is performed exclusively through a touch screen and/or a touchpad.
  • a touch screen and/or a touchpad as the primary input control device for operation of device 100, the number of physical input control devices (such as push buttons, dials, and the like) on device 100 is, optionally, reduced.
  • the predefined set of functions that are performed exclusively through a touch screen and/or a touchpad optionally include navigation between user interfaces.
  • the touchpad when touched by the user, navigates device 100 to a main, home, or root menu from any user interface that is displayed on device 100.
  • a “menu button” is implemented using a touch-sensitive surface.
  • the menu button is a physical push button or other physical input control device instead of a touch-sensitive surface.
  • Figure IB is a block diagram illustrating example components for event handling in accordance with some embodiments.
  • memory 102 in Figures 1A or 370 ( Figure 3A) includes event sorter 170 (e.g., in operating system 126) and a respective application 136-1 (e.g., any of the aforementioned applications 136, 137-155, 380-390).
  • event sorter 170 e.g., in operating system 126
  • application 136-1 e.g., any of the aforementioned applications 136, 137-155, 380-390.
  • Event sorter 170 receives event information and determines the application 136-1 and application view 191 of application 136-1 to which to deliver the event information.
  • Event sorter 170 includes event monitor 171 and event dispatcher module 174.
  • application 136-1 includes application internal state 192, which indicates the current application view(s) displayed on touch- sensitive display system 112 when the application is active or executing.
  • device/global internal state 157 is used by event sorter 170 to determine which application(s) is (are) currently active, and application internal state 192 is used by event sorter 170 to determine application views 191 to which to deliver event information.
  • application internal state 192 includes additional information, such as one or more of: resume information to be used when application 136-1 resumes execution, user interface state information that indicates information being displayed or that is ready for display by application 136-1, a state queue for enabling the user to go back to a prior state or view of application 136-1, and a redo/undo queue of previous actions taken by the user.
  • Event monitor 171 receives event information from peripherals interface 118.
  • Event information includes information about a sub-event (e.g., a user touch on touch- sensitive display system 112, as part of a multi-touch gesture).
  • Peripherals interface 118 transmits information it receives from I/O subsystem 106 or a sensor, such as proximity sensor 166, accelerometer(s) 167, and/or microphone 113 (through audio circuitry 110).
  • Information that peripherals interface 118 receives from I/O subsystem 106 includes information from touch-sensitive display system 112 or a touch-sensitive surface.
  • event monitor 171 sends requests to the peripherals interface 118 at predetermined intervals.
  • peripherals interface 118 transmits event information.
  • peripheral interface 118 transmits event information only when there is a significant event (e.g., receiving an input above a predetermined noise threshold and/or for more than a predetermined duration).
  • event sorter 170 also includes a hit view determination module 172 and/or an active event recognizer determination module 173.
  • Hit view determination module 172 provides software procedures for determining where a sub-event has taken place within one or more views, when touch- sensitive display system 112 displays more than one view. Views are made up of controls and other elements that a user can see on the display.
  • FIG. 1 Another aspect of the user interface associated with an application is a set of views, sometimes herein called application views or user interface windows, in which information is displayed and touch-based gestures occur.
  • the application views (of a respective application) in which a touch is detected optionally correspond to programmatic levels within a programmatic or view hierarchy of the application. For example, the lowest level view in which a touch is detected is, optionally, called the hit view, and the set of events that are recognized as proper inputs are, optionally, determined based, at least in part, on the hit view of the initial touch that begins a touch-based gesture.
  • Hit view determination module 172 receives information related to sub-events of a touch-based gesture. When an application has multiple views organized in a hierarchy, hit view determination module 172 identifies a hit view as the lowest view in the hierarchy which should handle the sub-event. In most circumstances, the hit view is the lowest level view in which an initiating sub-event occurs (i.e., the first sub-event in the sequence of subevents that form an event or potential event). Once the hit view is identified by the hit view determination module, the hit view typically receives all sub-events related to the same touch or input source for which it was identified as the hit view.
  • Active event recognizer determination module 173 determines which view or views within a view hierarchy should receive a particular sequence of sub-events. In some embodiments, active event recognizer determination module 173 determines that only the hit view should receive a particular sequence of sub-events. In other embodiments, active event recognizer determination module 173 determines that all views that include the physical location of a sub-event are actively involved views, and therefore determines that all actively involved views should receive a particular sequence of sub-events. In other embodiments, even if touch sub-events were entirely confined to the area associated with one particular view, views higher in the hierarchy would still remain as actively involved views.
  • Event dispatcher module 174 dispatches the event information to an event recognizer (e.g., event recognizer 180). In embodiments including active event recognizer determination module 173, event dispatcher module 174 delivers the event information to an event recognizer determined by active event recognizer determination module 173. In some embodiments, event dispatcher module 174 stores in an event queue the event information, which is retrieved by a respective event receiver module 182.
  • an event recognizer e.g., event recognizer 180.
  • event dispatcher module 174 delivers the event information to an event recognizer determined by active event recognizer determination module 173.
  • event dispatcher module 174 stores in an event queue the event information, which is retrieved by a respective event receiver module 182.
  • operating system 126 includes event sorter 170.
  • application 136-1 includes event sorter 170.
  • event sorter 170 is a stand-alone module, or a part of another module stored in memory 102, such as contact/motion module 130.
  • application 136-1 includes a plurality of event handlers 190 and one or more application views 191, each of which includes instructions for handling touch events that occur within a respective view of the application’s user interface.
  • Each application view 191 of the application 136-1 includes one or more event recognizers 180.
  • a respective application view 191 includes a plurality of event recognizers 180.
  • one or more of event recognizers 180 are part of a separate module, such as a user interface kit or a higher level object from which application 136-1 inherits methods and other properties.
  • a respective event handler 190 includes one or more of: data updater 176, object updater 177, GUI updater 178, and/or event data 179 received from event sorter 170.
  • Event handler 190 optionally utilizes or calls data updater
  • one or more of the application views 191 includes one or more respective event handlers 190. Also, in some embodiments, one or more of data updater 176, object updater
  • GUI updater 178 is included in a respective application view 191.
  • a respective event recognizer 180 receives event information (e.g., event data 179) from event sorter 170, and identifies an event from the event information.
  • Event recognizer 180 includes event receiver 182 and event comparator 184.
  • event recognizer 180 also includes at least a subset of: metadata 183, and event delivery instructions 188 (which optionally include sub-event delivery instructions).
  • Event receiver 182 receives event information from event sorter 170.
  • the event information includes information about a sub-event, for example, a touch or a touch movement.
  • the event information also includes additional information, such as location of the sub-event.
  • the event information optionally also includes speed and direction of the sub-event.
  • events include rotation of the device from one orientation to another (e.g., from a portrait orientation to a landscape orientation, or vice versa), and the event information includes corresponding information about the current pose (e.g., position and orientation) of the device.
  • Event comparator 184 compares the event information to predefined event or sub-event definitions and, based on the comparison, determines an event or sub-event, or determines or updates the state of an event or sub-event.
  • event comparator 184 includes event definitions 186.
  • Event definitions 186 contain definitions of events (e.g., predefined sequences of sub-events), for example, event 1 (187-1), event 2 (187- 2), and others.
  • sub-events in an event 187 include, for example, touch begin, touch end, touch movement, touch cancellation, and multiple touching.
  • the definition for event 1 (187-1) is a double tap on a displayed object.
  • the double tap for example, comprises a first touch (touch begin) on the displayed object for a predetermined phase, a first lift-off (touch end) for a predetermined phase, a second touch (touch begin) on the displayed object for a predetermined phase, and a second lift-off (touch end) for a predetermined phase.
  • the definition for event 2 (187-2) is a dragging on a displayed object.
  • the dragging for example, comprises a touch (or contact) on the displayed object for a predetermined phase, a movement of the touch across touch- sensitive display system 112, and lift-off of the touch (touch end).
  • the event also includes information for one or more associated event handlers 190.
  • event definition 187 includes a definition of an event for a respective user-interface object.
  • event comparator 184 performs a hit test to determine which user-interface object is associated with a sub-event. For example, in an application view in which three user-interface objects are displayed on touch- sensitive display system 112, when a touch is detected on touch-sensitive display system 112, event comparator 184 performs a hit test to determine which of the three user-interface objects is associated with the touch (sub-event). If each displayed object is associated with a respective event handler 190, the event comparator uses the result of the hit test to determine which event handler 190 should be activated. For example, event comparator 184 selects an event handler associated with the sub-event and the object triggering the hit test.
  • the definition for a respective event 187 also includes delayed actions that delay delivery of the event information until after it has been determined whether the sequence of sub-events does or does not correspond to the event recognizer’s event type.
  • a respective event recognizer 180 determines that the series of subevents do not match any of the events in event definitions 186, the respective event recognizer 180 enters an event impossible, event failed, or event ended state, after which it disregards subsequent sub-events of the touch-based gesture. In this situation, other event recognizers, if any, that remain active for the hit view continue to track and process subevents of an ongoing touch-based gesture.
  • a respective event recognizer 180 includes metadata 183 with configurable properties, flags, and/or lists that indicate how the event delivery system should perform sub-event delivery to actively involved event recognizers.
  • metadata 183 includes configurable properties, flags, and/or lists that indicate how event recognizers interact, or are enabled to interact, with one another.
  • metadata 183 includes configurable properties, flags, and/or lists that indicate whether sub-events are delivered to varying levels in the view or programmatic hierarchy.
  • a respective event recognizer 180 activates event handler 190 associated with an event when one or more particular sub-events of an event are recognized.
  • a respective event recognizer 180 delivers event information associated with the event to event handler 190. Activating an event handler 190 is distinct from sending (and deferred sending) sub-events to a respective hit view. In some embodiments, event recognizer 180 throws a flag associated with the recognized event, and event handler 190 associated with the flag catches the flag and performs a predefined process.
  • event delivery instructions 188 include sub-event delivery instructions that deliver event information about a sub-event without activating an event handler. Instead, the sub-event delivery instructions deliver event information to event handlers associated with the series of sub-events or to actively involved views. Event handlers associated with the series of sub-events or with actively involved views receive the event information and perform a predetermined process.
  • data updater 176 creates and updates data used in application 136-1. For example, data updater 176 updates the telephone number used in contacts module 137, or stores a video file used in video and music player module 152.
  • object updater 177 creates and updates objects used in application 136-1. For example, object updater 177 creates a new user-interface object or updates the position of a user-interface object.
  • GUI updater 178 updates the GUI. For example, GUI updater 178 prepares display information and sends it to graphics module 132 for display on a touch- sensitive display.
  • event handler(s) 190 includes or has access to data updater 176, object updater 177, and GUI updater 178.
  • data updater 176, object updater 177, and GUI updater 178 are included in a single module of a respective application 136-1 or application view 191. In other embodiments, they are included in two or more software modules.
  • event handling of user touches on touch-sensitive displays also applies to other forms of user inputs to operate multifunction devices 100 with input-devices, not all of which are initiated on touch screens.
  • mouse movement and mouse button presses optionally coordinated with single or multiple keyboard presses or holds; contact movements such as taps, drags, scrolls, etc., on touch-pads; pen stylus inputs; inputs based on real-time analysis of video images obtained by one or more cameras; movement of the device; oral instructions; detected eye movements; biometric inputs; and/or any combination thereof are optionally utilized as inputs corresponding to sub-events which define an event to be recognized.
  • FIG. 2A illustrates a portable multifunction device 100 (e.g., a view of the front of device 100) having a touch screen (e.g., touch-sensitive display system 112, Figure 1 A) in accordance with some embodiments.
  • the touch screen optionally displays one or more graphics within user interface (UI) 200.
  • UI user interface
  • a user is enabled to select one or more of the graphics by making a gesture on the graphics, for example, with one or more fingers 202 (not drawn to scale in the figure) or one or more styluses 203 (not drawn to scale in the figure).
  • selection of one or more graphics occurs when the user breaks contact with the one or more graphics.
  • the gesture optionally includes one or more taps, one or more swipes (from left to right, right to left, upward and/or downward) and/or a rolling of a finger (from right to left, left to right, upward and/or downward) that has made contact with device 100.
  • inadvertent contact with a graphic does not select the graphic.
  • a swipe gesture that sweeps over an application icon optionally does not select the corresponding application when the gesture corresponding to selection is a tap.
  • Device 100 optionally also includes one or more physical buttons, such as “home” or menu button 204.
  • menu button 204 is, optionally, used to navigate to any application 136 in a set of applications that are, optionally executed on device 100.
  • the menu button is implemented as a soft key in a GUI displayed on the touch-screen display.
  • device 100 includes the touch-screen display, menu button 204 (sometimes called home button 204), push button 206 for powering the device on/off and locking the device, volume adjustment button(s) 208, Subscriber Identity Module (SIM) card slot 210, head set jack 212, and docking/charging external port 124.
  • Push button 206 is, optionally, used to turn the power on/off on the device by depressing the button and holding the button in the depressed state for a predefined time interval; to lock the device by depressing the button and releasing the button before the predefined time interval has elapsed; and/or to unlock the device or initiate an unlock process.
  • device 100 also accepts verbal input for activation or deactivation of some functions through microphone 113.
  • Device 100 also, optionally, includes one or more contact intensity sensors 165 for detecting intensities of contacts on touch-sensitive display system 112 and/or one or more tactile output generators 163 for generating tactile outputs for a user of device 100.
  • FIG. 2B illustrates a portable multifunction device 100 (e.g., a view of the back of device 100) that optionally includes optical sensors 164-1 and 164-2, and time-of- flight (“ToF”) sensor 220.
  • optical sensors e.g., cameras
  • ToF time-of- flight
  • the portable multifunction device can determine depth information from the disparity between the information concurrently captured by the optical sensors (e.g., disparities between the captured images).
  • Depth information provided by (e.g., image) disparities determined using optical sensors 164-1 and 164-2 may lack accuracy, but typically provides high resolution.
  • time- of-flight sensor 220 is optionally used in conjunction with optical sensors 164-1 and 164-2.
  • ToF sensor 220 emits a waveform (e.g., light from a light emitting diode (LED) or a laser), and measures the time it takes for the reflection(s) of the waveform (e.g., light) to return back to ToF sensor 220.
  • Depth information is determined from the measured time it takes for the light to return back to ToF sensor 220.
  • a ToF sensor typically provides high accuracy (e.g., accuracy of 1 cm or better with respect to measured distances or depths), but may lack high resolution (e.g., ToF sensor 220 optionally has a resolution that is one quarter of the resolution of optical sensors 164, or less than one quarter of the resolution of optical sensors 164, or one sixteenth of the resolution of optical sensors 164, or less than one sixteenth of the resolution of optical sensors 164). Therefore, combining depth information from a ToF sensor with depth information provided by (e.g., image) disparities determined using optical sensors (e.g., cameras) provides a depth map that is both accurate and has high resolution.
  • depth information from a ToF sensor with depth information provided by (e.g., image) disparities determined using optical sensors (e.g., cameras) provides a depth map that is both accurate and has high resolution.
  • Figure 3 A is a block diagram of an example multifunction device with a display and a touch-sensitive surface in accordance with some embodiments.
  • Device 300 need not be portable.
  • device 300 is a laptop computer, a desktop computer, a tablet computer, a multimedia player device, a navigation device, an educational device (such as a child’s learning toy), a gaming system, or a control device (e.g., a home or industrial controller).
  • Device 300 typically includes one or more processing units (CPU’s) 310, one or more network or other communications interfaces 360, memory 370, and one or more communication buses 320 for interconnecting these components.
  • CPU processing units
  • Communication buses 320 optionally include circuitry (sometimes called a chipset) that interconnects and controls communications between system components.
  • Device 300 includes input/output (I/O) interface 330 comprising display 340, which is optionally a touch-screen display.
  • I/O interface 330 also optionally includes a keyboard and/or mouse (or other pointing device) 350 and touchpad 355, tactile output generator 357 for generating tactile outputs on device 300 (e.g., similar to tactile output generator(s) 163 described above with reference to Figure 1 A), sensors 359 (e.g., optical, acceleration, proximity, touch-sensitive, and/or contact intensity sensors similar to analogous described above with reference to Figure 1 A, and optionally a time-of-flight sensor 220 described above with reference to Figure 2B).
  • I/O interface 330 comprising display 340, which is optionally a touch-screen display.
  • I/O interface 330 also optionally includes a keyboard and/or mouse (or other pointing device) 350 and touchpad 355, tactile output
  • Memory 370 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and optionally includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 370 optionally includes one or more storage devices remotely located from CPU(s) 310. In some embodiments, memory 370 stores programs, modules, and data structures analogous to the programs, modules, and data structures stored in memory 102 of portable multifunction device 100 ( Figure 1A), or a subset thereof. Furthermore, memory 370 optionally stores additional programs, modules, and data structures not present in memory 102 of portable multifunction device 100.
  • memory 370 of device 300 optionally stores drawing module 380, presentation module 382, word processing module 384, website creation module 386, disk authoring module 388, and/or spreadsheet module 390, while memory 102 of portable multifunction device 100 ( Figure 1A) optionally does not store these modules.
  • Each of the above identified elements in Figure 3 A are, optionally, stored in one or more of the previously mentioned memory devices.
  • Each of the above identified modules corresponds to a set of instructions for performing a function described above.
  • the above identified modules or programs (e.g., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules are, optionally, combined or otherwise re-arranged in various embodiments.
  • memory 370 optionally stores a subset of the modules and data structures identified above. Furthermore, memory 370 optionally stores additional modules and data structures not described above.
  • Figures 3B-3C are block diagrams of example computer systems 301 in accordance with some embodiments.
  • computer system 301 includes and/or is in communication with:
  • input device(s) (302 and/or 307, e.g., a touch-sensitive surface, such as a touch- sensitive remote control, or a touch-screen display that also serves as the display generation component, a mouse, a joystick, a wand controller, and/or cameras tracking the position of one or more features of the user such as the user’s hands);
  • virtual/augmented reality logic 303 e.g., virtual/augmented reality module 145
  • display generation component(s) e.g., a display, a projector, a head-mounted display, a heads-up display, or the like) for displaying virtual user interface elements to the user;
  • camera(s) e.g., 305 and/or 311 for capturing images of a field of view of the device, e.g., images that are used to determine placement of virtual user interface elements, determine a pose of the device, and/or display a portion of the physical environment in which the camera(s) are located;
  • pose sensor(s) e.g., 306 and/or 311 for determining a pose of the device relative to the physical environment and/or changes in pose of the device.
  • computer system 301 e.g., camera(s) 305 and/or 311
  • computer system 301 includes and/or is in communication with a time-of-fhght sensor (e.g., time-of-flight sensor 220, Figure 2B) for capturing depth information as described above with reference to Figure 2B.
  • a time-of-fhght sensor e.g., time-of-flight sensor 220, Figure 2B
  • input device(s) 302 virtual/augmented reality logic 303, display generation component(s) 304, camera(s) 305; and pose sensor(s) 306 are all integrated into the computer system (e.g., portable multifunction device 100 in Figures 1 A-1B or device 300 in Figure 3 such as a smartphone or tablet).
  • virtual/augmented reality logic 303 virtual/augmented reality logic 303
  • display generation component(s) 304 display generation component(s) 304
  • camera(s) 305; and pose sensor(s) 306 are all integrated into the computer system (e.g., portable multifunction device 100 in Figures 1 A-1B or device 300 in Figure 3 such as a smartphone or tablet).
  • pose sensor(s) 306 are all integrated into the computer system (e.g., portable multifunction device 100 in Figures 1 A-1B or device 300 in Figure 3 such as a smartphone or tablet).
  • the computer system in addition to integrated input device(s) 302, virtual/augmented reality logic 303, display generation component(s) 304, camera(s) 305; and pose sensor(s) 306, the computer system is also in communication with additional devices that are separate from the computer system, such as separate input device(s) 307 such as a touch-sensitive surface, a wand, a remote control, or the like and/or separate display generation component(s) 308 such as virtual reality headset or augmented reality glasses that overlay virtual objects on a physical environment.
  • separate input device(s) 307 such as a touch-sensitive surface, a wand, a remote control, or the like
  • separate display generation component(s) 308 such as virtual reality headset or augmented reality glasses that overlay virtual objects on a physical environment.
  • the input device(s) 307, display generation component(s) 309, camera(s) 311; and/or pose sensor(s) 312 are separate from the computer system and are in communication with the computer system.
  • other combinations of components in computer system 301 and in communication with the computer system are used.
  • display generation component(s) 309, camera(s) 311, and pose sensor(s) 312 are incorporated in a headset that is either integrated with or in communication with the computer system.
  • all of the operations described below with reference to Figures 5A-5AD are performed on a single computing device with virtual/augmented reality logic 303 (e.g., computer system 301-a described below with reference to Figure 3B).
  • a computing device with virtual/augmented reality logic 303 communicates with a separate computing device with a display 450 and/or a separate computing device with a touch-sensitive surface 451).
  • the computing device that is described below with reference to Figures 5A-5AD is the computing device (or devices) that contain(s) the virtual/augmented reality logic 303.
  • the virtual/augmented reality logic 303 could be divided between a plurality of distinct modules or computing devices in various embodiments; however, for the purposes of the description herein, the virtual/augmented reality logic 303 will be primarily referred to as residing in a single computing device so as not to unnecessarily obscure other aspects of the embodiments.
  • the virtual/augmented reality logic 303 includes one or more modules (e.g., one or more event handlers 190, including one or more object updaters 177 and one or more GUI updaters 178 as described in greater detail above with reference to Figure IB) that receive interpreted inputs and, in response to these interpreted inputs, generate instructions for updating a graphical user interface in accordance with the interpreted inputs which are subsequently used to update the graphical user interface on a display.
  • modules e.g., one or more event handlers 190, including one or more object updaters 177 and one or more GUI updaters 178 as described in greater detail above with reference to Figure IB
  • an interpreted input for an input that has been detected e.g., by a contact motion module 130 in Figures 1 A and 3
  • recognized e.g., by an event recognizer 180 in Figure IB
  • distributed e.g., by event sorter 170 in Figure IB
  • the interpreted inputs are generated by modules at the computing device (e.g., the computing device receives raw contact input data so as to identify gestures from the raw contact input data).
  • some or all of the interpreted inputs are received by the computing device as interpreted inputs (e.g., a computing device that includes the touch-sensitive surface 451 processes raw contact input data so as to identify gestures from the raw contact input data and sends information indicative of the gestures to the computing device that includes the virtual/augmented reality logic 303).
  • both a display and a touch-sensitive surface are integrated with the computer system (e.g., 301-a in Figure 3B) that contains the virtual/augmented reality logic 303.
  • the computer system may be a desktop computer or laptop computer with an integrated display (e.g., 340 in Figure 3) and touchpad (e.g., 355 in Figure 3).
  • the computing device may be a portable multifunction device 100 (e.g., a smartphone, PDA, tablet computer, etc.) with a touch screen (e.g., 112 in Figure 2 A).
  • a touch-sensitive surface is integrated with the computer system while a display is not integrated with the computer system that contains the virtual/augmented reality logic 303.
  • the computer system may be a device 300 (e.g., a desktop computer or laptop computer) with an integrated touchpad (e.g., 355 in Figure 3) connected (via wired or wireless connection) to a separate display (e.g., a computer monitor, television, etc.).
  • the computer system may be a portable multifunction device 100 (e.g., a smartphone, PDA, tablet computer, etc.) with a touch screen (e.g., 112 in Figure 2A) connected (via wired or wireless connection) to a separate display (e.g., a computer monitor, television, etc.).
  • a portable multifunction device 100 e.g., a smartphone, PDA, tablet computer, etc.
  • a touch screen e.g., 112 in Figure 2A
  • a separate display e.g., a computer monitor, television, etc.
  • a display is integrated with the computer system while a touch-sensitive surface is not integrated with the computer system that contains the virtual/augmented reality logic 303.
  • the computer system may be a device 300 (e.g., a desktop computer, laptop computer, television with integrated set-top box) with an integrated display (e.g., 340 in Figure 3) connected (via wired or wireless connection) to a separate touch-sensitive surface (e.g., a remote touchpad, a portable multifunction device, etc.).
  • the computer system may be a portable multifunction device 100 (e.g., a smartphone, PDA, tablet computer, etc.) with a touch screen (e.g., 112 in Figure 2A) connected (via wired or wireless connection) to a separate touch-sensitive surface (e.g., a remote touchpad, another portable multifunction device with a touch screen serving as a remote touchpad, etc.).
  • a portable multifunction device 100 e.g., a smartphone, PDA, tablet computer, etc.
  • a touch screen e.g., 112 in Figure 2A
  • a separate touch-sensitive surface e.g., a remote touchpad, another portable multifunction device with a touch screen serving as a remote touchpad, etc.
  • neither a display nor a touch-sensitive surface is integrated with the computer system (e.g., 301-c in Figure 3C) that contains the virtual/augmented reality logic 303.
  • the computer system may be a stand-alone computing device 300 (e.g., a set-top box, gaming console, etc.) connected (via wired or wireless connection) to a separate touch-sensitive surface (e.g., a remote touchpad, a portable multifunction device, etc.) and a separate display (e.g., a computer monitor, television, etc.).
  • the computer system has an integrated audio system (e.g., audio circuitry 110 and speaker 111 in portable multifunction device 100).
  • the computing device is in communication with an audio system that is separate from the computing device.
  • the audio system e.g., an audio system integrated in a television unit
  • the audio system is integrated with a separate display.
  • the audio system e.g., a stereo system
  • the audio system is a stand-alone system that is separate from the computer system and the display.
  • Figure 4A illustrates an example user interface for a menu of applications on portable multifunction device 100 in accordance with some embodiments. Similar user interfaces are, optionally, implemented on device 300.
  • user interface 400 includes the following elements, or a subset or superset thereof:
  • Tray 408 with icons for frequently used applications such as: o Icon 416 for telephone module 138, labeled “Phone,” which optionally includes an indicator 414 of the number of missed calls or voicemail messages; o Icon 418 for e-mail client module 140, labeled “Mail,” which optionally includes an indicator 410 of the number of unread e-mails; o Icon 420 for browser module 147, labeled “Browser”; and o Icon 422 for video and music player module 152, labeled “Music”; and
  • Icons for other applications such as: o Icon 424 for IM module 141, labeled “Messages”; o Icon 426 for calendar module 148, labeled “Calendar”; o Icon 428 for image management module 144, labeled “Photos”; o Icon 430 for camera module 143, labeled “Camera”; o Icon 432 for online video module 155, labeled “Online Video”; o Icon 434 for stocks widget 149-2, labeled “Stocks”; o Icon 436 for map module 154, labeled “Maps”; o Icon 438 for weather widget 149-1, labeled “Weather”; o Icon 440 for alarm clock widget 149-4, labeled “Clock”; o Icon 442 for workout support module 142, labeled “Workout Support”; o Icon 444 for notes module 153, labeled “Notes”; and o Icon 446 for for
  • a label for a respective application icon includes a name of an application corresponding to the respective application icon.
  • a label for a particular application icon is distinct from a name of an application corresponding to the particular application icon.
  • Figure 4B illustrates an example user interface on a device (e.g., device 300, Figure 3 A) with a touch-sensitive surface 451 (e.g., a tablet or touchpad 355, Figure 3 A) that is separate from the display 450.
  • a touch-sensitive surface 451 e.g., a tablet or touchpad 355, Figure 3 A
  • the device detects inputs on a touch- sensitive surface that is separate from the display, as shown in FIG. 4B.
  • the touch-sensitive surface (e.g., 451 in Figure 4B) has a primary axis (e.g., 452 in Figure 4B) that corresponds to a primary axis (e.g., 453 in Figure 4B) on the display (e.g., 450).
  • the device detects contacts (e.g., 460 and 462 in Figure 4B) with the touch-sensitive surface 451 at locations that correspond to respective locations on the display (e.g., in Figure 4B, 460 corresponds to 468 and 462 corresponds to 470).
  • user inputs e.g., contacts 460 and 462, and movements thereof
  • the device on the touch-sensitive surface e.g., 451 in Figure 4B
  • the device on the touch-sensitive surface e.g., 451 in Figure 4B
  • the device on the touch-sensitive surface e.g., 451 in Figure 4B
  • the device on the touch-sensitive surface e.g., 451 in Figure 4B
  • the device on the touch-sensitive surface e.g., 451 in Figure 4B
  • Similar methods are, optionally, used for other user interfaces described herein.
  • finger inputs e.g., finger contacts, finger tap gestures, finger swipe gestures, etc.
  • one or more of the finger inputs are replaced with input from another input device (e.g., a mouse based input or a stylus input, movement of the device or of one or more cameras of the device relative to a surrounding physical environment), and/or user movement relative to the device that is tracked using one or more cameras).
  • another input device e.g., a mouse based input or a stylus input, movement of the device or of one or more cameras of the device relative to a surrounding physical environment
  • a swipe gesture is, optionally, replaced with a mouse click (e.g., instead of a contact) followed by movement of the cursor along the path of the swipe (e.g., instead of movement of the contact), or by a hand gesture involving a user moving his or her hand in a particular direction.
  • a tap gesture is, optionally, replaced with a mouse click while the cursor is located over the location of the tap gesture (e.g., instead of detection of the contact followed by ceasing to detect the contact) or by a corresponding hand gesture that is representative of a tap gesture.
  • multiple inputs are simultaneously detected, it should be understood that multiple input devices of a particular type are, optionally, used simultaneously, or multiple input devices of different types are, optionally, used simultaneously.
  • the term “focus selector” refers to an input element that indicates a current part of a user interface with which a user is interacting.
  • the cursor acts as a “focus selector,” so that when an input (e.g., a press input) is detected on a touch-sensitive surface (e.g., touchpad 355 in Figure 3 A or touch-sensitive surface 451 in Figure 4B) while the cursor is over a particular user interface element (e.g., a button, window, slider or other user interface element), the particular user interface element is adjusted in accordance with the detected input.
  • a touch-sensitive surface e.g., touchpad 355 in Figure 3 A or touch-sensitive surface 451 in Figure 4B
  • a particular user interface element e.g., a button, window, slider or other user interface element
  • a detected contact on the touch-screen acts as a “focus selector,” so that when an input (e.g., a press input by the contact) is detected on the touch-screen display at a location of a particular user interface element (e.g., a button, window, slider or other user interface element), the particular user interface element is adjusted in accordance with the detected input.
  • a particular user interface element e.g., a button, window, slider or other user interface element
  • focus is moved from one region of a user interface to another region of the user interface without corresponding movement of a cursor or movement of a contact on a touch-screen display (e.g., by using a tab key or arrow keys to move focus from one button to another button); in these implementations, the focus selector moves in accordance with movement of focus between different regions of the user interface.
  • the focus selector is generally the user interface element (or contact on a touch-screen display) that is controlled by the user so as to communicate the user’s intended interaction with the user interface (e.g., by indicating, to the device, the element of the user interface with which the user is intending to interact).
  • a focus selector e.g., a cursor, a contact, or a selection box
  • a press input is detected on the touch-sensitive surface (e.g., a touchpad or touch screen) will indicate that the user is intending to activate the respective button (as opposed to other user interface elements shown on a display of the device).
  • a focus indicator e.g., a cursor or selection indicator
  • LT user interfaces
  • a computer system e.g., portable multifunction device 100 (Figure 1A), device 300 ( Figure 3A), or computer system 301 ( Figure 3B)
  • a display generation component e.g., a display device, such as a display, a projector, a head-mounted display, a heads-up display, or the like
  • one or more cameras e.g., video cameras that continuously provide a live preview of at least a portion of the contents that are within the field of view of the cameras and optionally generate video outputs including one or more streams of image frames capturing the contents within the field of view of the cameras
  • one or more input devices e.g., a touch- sensitive surface, such as a touch-sensitive remote control, or a touchscreen display that also serves as the display generation component, a mouse, a joystick, a wand controller, and/or cameras tracking the position of one
  • Figures 5A-5AD illustrate example user interfaces for scanning and modeling environments such as physical environments in accordance with some embodiments.
  • the user interfaces in these figures are used to illustrate the processes described below, including the processes in Figures 6A-6F, 7A-7D, 8A-8D, and 9A-9E.
  • the focus selector is, optionally: a respective finger or stylus contact, a representative point corresponding to a finger or stylus contact (e.g., a centroid of a respective contact or a point associated with a respective contact), or a centroid of two or more contacts detected on the touch-sensitive display system 112.
  • analogous operations are, optionally, performed on a device with a display 450 and a separate touch-sensitive surface 451 in response to detecting the contacts on the touch-sensitive surface 451 while displaying the user interfaces shown in the figures on the display 450, along with a focus selector.
  • Figures 5A-5AD illustrate example user interfaces for scanning and modeling a physical environment using augmented reality in accordance with some embodiments.
  • Figure 5A shows an example home screen user interface (e.g., home screen 502) that includes a plurality of application icons corresponding to different applications, including at least application icon 420 for a browser application and application icon 504 for a paint design application.
  • the browser application and the paint design application are illustrative examples of applications published by different application vendors that utilize an application programming interface (API) or developer tool kit that provides some or all of the scanning and modeling functions described herein.
  • API application programming interface
  • the different applications provided by the different application vendors may have different functionality and/or user interfaces in addition to the scanning and modeling functionality and user interfaces described herein.
  • the different applications provided by the different application vendors may provide additional user interfaces for interacting with various representations (e.g., a two-dimensional map, a three-dimensional model, and/or image and depth data) of a physical environment that have been obtained using the scanning and modeling user interfaces described herein.
  • various representations e.g., a two-dimensional map, a three-dimensional model, and/or image and depth data
  • a respective input that meets selection criteria is detected on an application icon of a respective application in the home screen user interface (e.g., tap input 506 is detected on application icon 420, tap input 508 is detected on application icon 504, an in-air gesture directed to an application icon in a virtual or augmented reality environment, or another selection input that activates a corresponding application).
  • a user interface of the respective application is displayed. For example, in response to tap input 506 on application icon 420, a user interface of the browser application is displayed (e.g., as shown in Figure 5B).
  • a user interface of the paint design application is displayed (e.g., as shown in Figure 5C).
  • a user may interact with the user interface of the respective application to cause changes in the user interface of the respective application.
  • user interface 510 of the browser application displays a webpage (e.g., with a URL of “www://example.com”) corresponding to a seller of audio/visual equipment (e.g., an online store called “Example Store”) that provides functions for selecting the type(s) and quantities of different audio/visual equipment (e.g., speakers, subwoofers, cameras, and/or displays) for purchase.
  • a webpage e.g., with a URL of “www://example.com”
  • Example Store an online store
  • user interface 514 of the paint design applications displays user selected interior surfaces (e.g., accent wall, wall with windows, wall behind TV, and/or wall behind couch) and corresponding paint/wallpaper selections.
  • user selected interior surfaces e.g., accent wall, wall with windows, wall behind TV, and/or wall behind couch
  • Figures 5B and 5C illustrate examples of how the scanning and modeling user interfaces described herein may be utilized through the application programming user interface or developer tool kit.
  • the webpage shown in user interface 510 of the browser application has an embedded user interface object 512 that, when selected, causes display of the scanning and modeling user interfaces described here.
  • user interface 514 of the paint design application also include user interface object 512 that, when selected, causes display of the scanning and modeling user interfaces described herein.
  • the appearance of user interface object 512 does not have to be identical in the user interfaces of different applications, as long as it is configured to trigger the same application programming user interface and/or developer tool kit for the same scanning and modeling function (e.g., “start scan”).
  • different applications may utilize different application programming interfaces or developer’s tool kit to trigger different scanning and modeling user interfaces that share some or all of the features described herein.
  • device 100 in response to detecting a respective input that activates user interface object 512 (e.g., tap input 516 on user interface object 512 in Figure 5B, or tap input 518 on user interface object 512 in Figure 5C), device 100 displays, as shown in Figure 5D, an initial state of the scanning and modeling user interfaces described herein.
  • a respective input that activates user interface object 512 e.g., tap input 516 on user interface object 512 in Figure 5B, or tap input 518 on user interface object 512 in Figure 5C
  • device 100 displays, as shown in Figure 5D, an initial state of the scanning and modeling user interfaces described herein.
  • device 100 is located in a physical environment (e.g., room 520 or another three-dimensional environment, that includes structural elements (e.g., walls, ceiling, floor, windows, and/or doors) and nonstructural elements (e.g., pieces of furniture, appliances, physical objects, pets, and/or people)).
  • a physical environment e.g., room 520 or another three-dimensional environment, that includes structural elements (e.g., walls, ceiling, floor, windows, and/or doors) and nonstructural elements (e.g., pieces of furniture, appliances, physical objects, pets, and/or people)).
  • structural elements e.g., walls, ceiling, floor, windows, and/or doors
  • nonstructural elements e.g., pieces of furniture, appliances, physical objects, pets, and/or people
  • the camera(s) of device 100 are facing toward a first portion of room 520, and the field of view of the camera(s) includes the first portion of the room 520 that corresponds to the current viewpoint of the camera(s) (e.g., the current viewpoint is determined based on the current location and the current pan/tilt/yaw angles of the camera(s) relative to the physical environment).
  • the viewpoint and the field of view of the camera(s) change accordingly, and user interface 522 would show a different portion of the physical environment corresponding to the updated viewpoint and updated field of view.
  • the initial state of user interface 522 includes camera view 524, and user interface object 526 that is overlaid on camera view 524.
  • user interface object 526 is optionally animated to indicate movement executed by the camera(s) relative to the physical environment 520.
  • user interface object 526 is animated in a respective manner to prompt the user to start moving the camera(s) in the physical environment in a corresponding manner (e.g., executing back and forth sideways motion, or figure-8 motion) that helps device 100 to identify one or more cardinal directions (e.g., horizontal direction, and/or vertical direction) and/or one or more planes (e.g., horizontal planes, and/or vertical planes) in the physical environment.
  • cardinal directions e.g., horizontal direction, and/or vertical direction
  • planes e.g., horizontal planes, and/or vertical planes
  • the initial state of user interface 522 further includes a prompt (e.g., banner 528, or another type of alert or guide) that provides textual instruction (e.g., “Find a wall to scan”, or another instruction) and/or graphical guidance (e.g., animated illustration of how to move the device, or another type of illustrative guide) to the user regarding how to start the scanning process.
  • a prompt e.g., banner 528, or another type of alert or guide
  • textual instruction e.g., “Find a wall to scan”, or another instruction
  • graphical guidance e.g., animated illustration of how to move the device, or another type of illustrative guide
  • room 520 includes a number of structural elements, including four walls (e.g., wall 530, 532, 534, and 536), a ceiling (e.g., ceiling 538), a floor (e.g., floor 540), a window (e.g., window 542), and an entryway (e.g., entryway 544).
  • Room 520 further includes a number of non- structural elements, including various pieces of furniture (e.g., stool 546, cabinet 548, TV stand 550, couch 552, and side table 554), physical objects (e.g., floor lamp 556, and table lamp 558), and other physical objects (e.g., TV 560, and boxes 562).
  • Figure 5C includes a top view 564 of room 520 that illustrates relative positions of the structural elements and non- structural elements of room 520, as well as a respective position (as indicated by the circular pointy end of object 566) and facing direction of the camera(s) (e.g., as represented by the arc side of the object 566) of device 100.
  • camera view 524 included in user interface 522 includes a representation of a first portion the physical environment that includes representation 530’ of wall 530, representation 532’ of wall 532, representation 538’ of ceiling 538, representation 540’ of floor 540, representation 548’ of cabinet 548, and representation 542’ of window 542.
  • the representation of the first portion of the physical environment corresponds to the current viewpoint of the user, as indicated by the position and facing direction of object 566 in the top view 564 of room 520.
  • user interface 522 in this example includes a camera view of the physical environment as the representation of the field of view of the one or more cameras
  • the representation of the field of view included in user interface 522 is, optionally, a pass-through view of the physical environment as seen through a transparent or semi-transparent display generation component that displays the user interface 522.
  • the touch-screen display of device 100 in this example is optionally replaced with another type of display generation component, such as a head-mounted display, a projector, or a heads-up display, that displays the user interface 522.
  • touch input described in these examples are replaced with in-air gestures or other types of user inputs.
  • the representations of objects, structural elements, nonstructural elements that appear in the representation of field of view are referred to using the same reference numbers of their counterparts in the physical environment, rather than the primed version of the reference numbers.
  • Figures 5E-5W illustrate changes in user interface 522 during the scanning and modeling of room 520, in accordance with some embodiments.
  • Figures 5E-5W show device 100 displaying an augmented reality view of room 520, including a representation of the field of view of the one or more camera(s) (e.g., camera view 524 or a view of the environment through a transparent or semi-transparent display generation component) and a preview of a three-dimensional model of room 520 that is being generated based on the scan of room 520 (e.g., preview 568, or another preview that includes a partially completed three- dimensional model of the physical environment).
  • a representation of the field of view of the one or more camera(s) e.g., camera view 524 or a view of the environment through a transparent or semi-transparent display generation component
  • a preview of a three-dimensional model of room 520 that is being generated based on the scan of room 520 (e.g., preview 568, or another preview that includes a partially
  • the preview of the three-dimensional model of room 520 is overlaid on the representation of the field of view of the one or more camera(s) in user interface 522, e.g., as shown in Figures 5E-5W. In some embodiments, the preview of the three-dimensional model of room 520 is optionally displayed in a separate region of user interface 522 from the representation of the field of view.
  • the augmented reality view of room 520 further includes various prompts, alerts, annotations, and/or visual guides (e.g., textual and/or graphical objects for prompting and guiding the user to change the viewpoint, moving slowly, moving faster, going back to rescan a missed spot, and/or performing another action to facilitate the scan) that are overlaid on and/or separately displayed the representation of the field of view.
  • various prompts, alerts, annotations, and/or visual guides e.g., textual and/or graphical objects for prompting and guiding the user to change the viewpoint, moving slowly, moving faster, going back to rescan a missed spot, and/or performing another action to facilitate the scan
  • Figure 5E illustrates the changes in user interface 522 at the beginning of the scan of the first portion of the physical environment.
  • user interface object 526 is transformed into preview 568 of the three-dimensional model that is being generated based on the captured image and depth data.
  • the data is limited, and the progress of scanning and model generation is illustrated by an expanding graphical indication (e.g., indication 570, or another graphical indication) within preview 568.
  • preview 568 has a three-dimensional shape that is typical of the physical environment (e.g., a cubic shape for a room, or a rectangular cuboid for a house).
  • the three-dimensional shape is modified (e.g., expanded, and/or adjusted) as the shape of the physical environment is explored and ascertained based on the captured image and/or depth data during the scan.
  • device 100 performs edge detection and surface detection (e.g., plane detection and/or detection of curved surfaces) in the first portion of the physical environment based on the captured image and/or depth data; and as edge(s) and surfaces are detected and characterized in the first portion of the physical environment, device 100 displays respective graphical representations of the detected edges and/or surfaces in user interface 522.
  • edge detection and surface detection e.g., plane detection and/or detection of curved surfaces
  • graphical object 571 (e.g., a line, and/or a linear graphical object) is displayed at a location that corresponds to a detected edge between wall 530 and floor 540;
  • graphical object 572 (e.g., a line, and/or a linear graphical object) is displayed at a location that corresponds to a detected edge between wall 530 and ceiling 538;
  • graphical object 574 (e.g., a line, and/or a linear graphical object) is displayed at a location that corresponds to a detected edge between wall 530 and wall 532;
  • graphical object 576 (e.g., a line, and/or a linear graphical object) is displayed at a location that corresponds to a detected edge between wall 532 and floor 540.
  • the respective graphical representations of the detected edges are extended in lengths and/or thickness as additional portions of the detected edges are detected and/or ascertained based on the progress of scan and model generation.
  • the positions of the respective graphical representations are adjusted (e.g., shift and/or dither) as the precise locations of the detected edges are adjusted based on the progress of scan and model generation.
  • graphical object 576 is extended in length along the edge between wall 532 and floor 540.
  • the visual characteristics e.g., lengths, shapes, thicknesses, amount of featuring, luminance, translucency, opacity, and/or sharpness
  • the visual characteristics e.g., lengths, shapes, thicknesses, amount of featuring, luminance, translucency, opacity, and/or sharpness
  • the visual characteristics e.g., extended in length, more details or more crisp in shape, reduced in thickness, reduced feathering on the boundaries, increasing opacity, increasing luminance, reduced translucency, and/or increasing sharpness.
  • the visual characteristics e.g., lengths, shapes, thicknesses, amount of featuring, luminance, translucency, opacity, and/or sharpness
  • the visual characteristics e.g., lengths, shapes, thicknesses, amount of featuring, luminance, translucency, opacity, and/or sharpness
  • the graphical objects e.g., graphical object 571 and graphical object 574
  • change accordingly e.g., extended in length, more details or more crisp in shape, reduced in thickness, reduced feathering on the boundaries, increasing opacity, increasing luminance, reduced translucency, and/or increasing sharpness).
  • additional graphical objects e.g., graphical object 578, and graphical object 580
  • additional graphical objects are displayed at the respective locations of the detected edges and/or surfaces.
  • an overlay e.g., a color overlay, and/or a texture overlay
  • other types of graphical objects e.g., point cloud, wireframe, and/or texture
  • visual effects e.g., blur, change in saturation, change in opacity, and/or change in luminance
  • the area covered by the overlay, other types of graphical objects and/or visual effects is expanded as the scan and model generation progress and more of the surfaces are detected and characterized.
  • an overlay, point cloud, wireframe, texture and/or visual effect gradually expand across the detected surfaces corresponding to wall 530 and 532 in Figures 5E and 5F.
  • the visual properties e.g., intensity, saturation, luminance, density, opacity, fill material type, and/or sharpness
  • the visual properties e.g., intensity, saturation, luminance, density, opacity, fill material type, and/or sharpness
  • the overlay, point cloud, wireframe, texture and/or visual effect applied to the locations of the detected surfaces also change accordingly (e.g., increase, decrease, or change in other manners).
  • device 100 in addition to detecting edges and surfaces of structural elements (e.g., walls, ceiling, floor, windows, entryway, and/or doors), device 100 also detect non- structural elements (e.g., furniture, fixtures, physical objects, and/or other types of non- structural elements) at the same time during the scan.
  • non- structural elements e.g., furniture, fixtures, physical objects, and/or other types of non- structural elements
  • device 100 displays graphical object 580 at the location of the detected cabinet 548 (e.g., including displaying segments 580-1, 580-2, 580-3, and 580-4 at the locations of the detected edges) to convey the spatial characteristics that have been estimated for the detected edges and/or surfaces of cabinet 548.
  • the degrees of progress and predicted accuracies for the spatial properties of edges, surfaces, and/or objects that are detected in different sub-portions of the first portion of the physical environment may be different.
  • the predicted accuracy for the spatial properties of the edge between wall 530 and floor 540 is greater than the predicted accuracy for the spatial properties of the edge between wall 532 and ceiling 538, and greater than the predicted accuracy for the spatial properties of the detected edges of cabinet 548.
  • the visual properties e.g., lengths, shapes, thicknesses, amount of featuring, luminance, translucency, opacity, and/or sharpness
  • the visual properties e.g., lengths, shapes, thicknesses, amount of featuring, luminance, translucency, opacity, and/or sharpness
  • the visual properties e.g., lengths, shapes, thicknesses, amount of featuring, luminance, translucency, opacity, and/or sharpness
  • different portions of a graphical object that are displayed for different portions of a detected physical feature optionally, have different values for one or more visual properties at a given moment, where the values of the one or more visual properties are determined based on the respective predicted accuracies for the spatial properties of the different portions of the detected physical feature.
  • different portions of the graphical object 580 for different portions of the detected edges and/or surfaces of cabinet 548 have different values for one or more visual properties (e.g., thickness, sharpness, amount of feathering, and/or luminance) depending on the respective predicted accuracies of the spatial properties of the different portions of the detected edges and/or surfaces of the cabinet.
  • one or more visual properties e.g., thickness, sharpness, amount of feathering, and/or luminance
  • preview 568 of the three-dimensional model of room 520 is updated to show portions of wall 530, wall 532, and floor 540 that have been detected based on the scanned image and/or depth data.
  • the spatial relationship between the detected walls 530, wall 532, and floor 540 is shown in preview 568 by the spatial relationship between their corresponding representations 530”, 532” and 540”.
  • a graphical object e.g., overlay 570, or another graphical object
  • overlay 570 is displayed in preview 568 to indicate realtime progress of the scan and model generation (e.g., overlay 570 expands across the surfaces of the representations 530”, 532”, and 540” as the spatial properties of their corresponding physical features are estimated with better and better accuracy).
  • preview 568 includes a partially completed three-dimensional model of room 520, and the partially completed three-dimensional model of room 520 is oriented relative to the viewpoint of the cameras in accordance with the orientation of room 520 relative to the viewpoint of the camera(s).
  • the portion of the physical environment in the field of view of the cameras e.g., the camera view of the physical environment, the augmented reality view of the physical environment, and/or the pass- through view of the physical environment
  • the portion of the physical environment in the field of view of the cameras corresponds to the portion of the partially completed three-dimensional model that faces toward the viewpoint of the user.
  • camera view 524 and the orientation of the partially completed three-dimensional model in preview 568 are updated accordingly to reflect to movement of the viewpoint of the user.
  • FIG. 5H As the scan and model generation continue overtime, more edges and/or surfaces are detected in the first portion of the physical environment. Respective graphical objects are displayed at locations of the detected edges and/or surfaces to represent their spatial properties (e.g., a new surface corresponding to the front surface of cabinet 548 is detected, and/or a new surface corresponding to the left side surface of cabinet 548 is detected) and/or existing graphical objects are expanded and/or extended along newly detected portions of previously detected edges and/or surfaces (e.g., graphical object 582 is extended along newly detected edges of window 542).
  • graphical objects are displayed at locations of the detected edges and/or surfaces to represent their spatial properties (e.g., a new surface corresponding to the front surface of cabinet 548 is detected, and/or a new surface corresponding to the left side surface of cabinet 548 is detected) and/or existing graphical objects are expanded and/or extended along newly detected portions of previously detected edges and/or surfaces (e.g., graphical object 582 is extended along newly detected edges of window 542).
  • FIG. 5H As the scan and model generation continue, detection and characterization of one or more edges and/or surfaces of one or more structural elements and non- structural elements of room 520 are completed. As illustrated in Figure 5H, in response to detecting that detection and characterization of the edge between wall 530 and floor 540 are completed (e.g., in accordance with a determination that the predicted accuracy of one or more spatial properties of the edge is above a completion threshold, and/or in accordance with a determination that an entire extent of the edge has been detected), a final state of graphical object 571 is displayed.
  • the final state of a graphical object that is displayed in response to detecting completion of the detection and characterization of its corresponding edge or surface in the physical environment has a set of predetermined values for one or more visual properties (e.g., shape, thickness, amount of featuring, luminance, translucency, opacity, and/or sharpness) of the graphical object.
  • visual properties e.g., shape, thickness, amount of featuring, luminance, translucency, opacity, and/or sharpness
  • graphical object 571 is, optionally, a line that is broken at places, has a higher luminance, has a higher degree of feathering along its boundaries, and/or is semitransparent; and in response to detecting the completion of the detection and characterization of the edge between wall 530 and floor 540, the final state of graphical object 571 is displayed which is, optionally, a solid line without broken pieces, has a lower luminance, has no feathering or reduced degree of feathering along its boundaries, and/or is opaque.
  • graphical object 580 is, optionally, multiple broken or dashed lines, has multiple levels of luminance along different edges and/or different portions of the same edge, have multiple degrees of feathering along the boundaries of different edges and/or different portions of the same edge, and/or are of different levels of translucencies along different edges and/or different portions of the same edge; and in response to detecting the completion of the detection and characterization of the edges of cabinet 548, the final state of graphical object 580 is displayed which is, optionally, a set of solid lines (e.g., a two-dimensional bounding box, a three-dimensional bounding box, or other types of outlines), have a uniform and lower luminance, have no feathering or reduced degree of feathering along all edges, and/or are uniformly opaque.
  • a set of solid lines e.g., a two-dimensional bounding box, a three-dimensional bounding box, or other types of outlines
  • an overlay or other types of graphical object e.g., wireframe, point cloud, and/or texture
  • graphical object e.g., wireframe, point cloud, and/or texture
  • the final state of the graphical object is displayed which is, optionally, of uniform luminance, is of a continuous shape, has a stable appearance with no flickering, and/or is more opaque.
  • completion of detection and characterization of a surface is visually indicated by an animation (e.g., a sudden increase of luminance followed by a decrease of luminance of the overlay and/or graphical object displayed at the location of the surface) and/or a quick change in a set of visual properties of the overlay and/or graphical object displayed at the location of the detected surface.
  • an animation e.g., a sudden increase of luminance followed by a decrease of luminance of the overlay and/or graphical object displayed at the location of the surface
  • a quick change in a set of visual properties of the overlay and/or graphical object displayed at the location of the detected surface e.g., a sudden increase of luminance followed by a decrease of luminance of the overlay and/or graphical object displayed at the location of the surface
  • completion of detection and characterization of an edge is visually indicated by a change from a line with varying visual characteristics (e.g., varying luminance, varying thickness, varying lengths, varying amount of feathering, and/or varying levels of sharpness) (e.g., based on changes and variations of predicted accuracies of the estimated spatial properties of the detected edge or different portions of the detected edge) to a line that is stable, uniform, and solid, is with a preset luminance, thickness, and/or sharpness, and/or is without feathering.
  • varying visual characteristics e.g., varying luminance, varying thickness, varying lengths, varying amount of feathering, and/or varying levels of sharpness
  • completion of detecting and characterizing the edge between wall 530 and floor 540 is indicated by display of animation and/or visual effect 584 that is different from the changes in the appearance of graphical object 571 that were displayed in accordance with progress of the scan at or near the wall 530, floor 540, and the edge therebetween and/or in accordance with the changes in the predicted accuracies of the estimated spatial properties of the detected edge.
  • the speed by which graphical object 571 is extended along the detected edge is based on the predicted accuracy of the estimated spatial properties of the detected edge, e.g., graphical object 571 extends along the edge with a slower speed initially, and graphical object 571 extends with a faster speed as the scan progresses and the predicted accuracy of the estimated spatial properties of the detected edge improves over time.
  • completion of detecting and characterizing the surfaces of cabinet 548 is indicated by display of animation and/or visual effect 586 (e.g., animations and/or visual effects 586-1, 586-2, and 586-3 shown on different surfaces of cabinet 548) that is different from the expansion and changes in the appearance of the overlay on the cabinet 548 that were displayed in accordance with progress of the scan of cabinet 548 and/or in accordance with the changes in the predicted accuracies of the estimated spatial properties of the detected surfaces and edges of cabinet 548.
  • animation and/or visual effect 586 e.g., animations and/or visual effects 586-1, 586-2, and 586-3 shown on different surfaces of cabinet 548
  • the scanning progress indication e.g., an overlay and/or visual effect
  • an enhanced visual property e.g., higher luminance, higher opacity, and/or higher color saturation
  • completion of the scan and modeling of the detected surface is visually indicated by an animated change that shows an accelerated enhancement of the visual property followed by a decrease of the enhancement (e.g., an increase in luminance followed by a decrease in luminance, an increase in opacity followed by a decrease in opacity, and/or an increase in color saturation followed by a decrease in color saturation).
  • the scanning progress indication e.g., a linear graphical object, and/or a bounding box
  • an enhanced visual property e.g., higher luminance, higher opacity, and/or higher color saturation
  • completion of the scan and modeling of the detected edge is visually indicated by an animated change that shows an accelerated enhancement of the visual property followed by a decrease of the enhancement (e.g., an increase in luminance followed by a decrease in luminance, an increase in opacity followed by a decrease in opacity, and/or an increase in color saturation followed by a decrease in color saturation).
  • the predicted accuracies of the spatial properties of three edges that meet at the comer is improved; and consequently, an amount of feathering and/or other visual effect that is applied to the graphical objects displayed at the locations of the detected edges (e.g., graphical objects 572, 574, and 578 in Figure 5H) to indicate the predicted accuracies of the detected edges (e.g., animated flickering, and/or shifting of textures) is reduced (e.g., as shown by visual effect 588 in Figure 5H).
  • the predicted accuracies of three detected edges meet a preset threshold accuracy; and if the detected edges do not intersect at the same comer, the predicted accuracies of the detected edges will be reduced.
  • the edge between wall 530 and floor 540 is partially behind cabinet 548, and optionally, graphical object 571 is extended along the predicted location of the edge behind cabinet 548 based on the imaging and depth data captured of wall 530 and floor 540.
  • the portion of graphical object 571 that is supposedly behind cabinet 548 is optionally displayed with reduced visual prominence as compared to other portions of graphical object 571 that is displayed along an unobscured portion of the edge.
  • the reduced visual prominence e.g., reduced luminance, reduced opacity, increased feathering, and/or reduced sharpness
  • the reduced visual prominence corresponds to a reduced predicted accuracy of the spatial properties of the portion of the edge behind cabinet 548.
  • graphical object 580 that is displayed along the detected edges and/or surfaces of cabinet 548 gradually forms a three-dimensional bounding box around the view of cabinet 548 in user interface 522.
  • the spatial characteristics (e.g., size, length, height, thickness, spatial extent, dimensions, and/or shape) of graphical object 580 correspond to the spatial characteristics (e.g., size, length, height, thickness, spatial extent, dimensions, and/or shape) of cabinet 548.
  • object 548 is a three-dimensional object that is simplified relative to cabinet 548 (e.g., detailed surface textures and decorative patterns on the surface of cabinet 548 are not represented in object 548”).
  • the surface and edges of wall 530, the surface and edges of wall 532, and the surface and edges of floor 540 are also represented by their corresponding representations 530”, 532”, and 540” in preview 568.
  • the orientation of the partially completed model of room 520 in preview 568 and the camera view 524 of the first portion of room 520 in user interface 522 correspond to the same viewpoint of the user (e.g., the viewpoint represented by the position and facing direction of object 566 in top view 564 of room 520).
  • device 100 after detection and modeling of the edges and surfaces in the first portion of the physical environment have been completed (e.g., including at least three edges and surface of wall 530, edges and surface of window 542, and edges and surfaces of cabinet 548), device 100, optionally, displays a prompt that guide the user to continue to move the one or more cameras to scan a new portion of the environment.
  • a prompt that guide the user to continue to move the one or more cameras to scan a new portion of the environment.
  • the user After scanning the first portion of room 520, the user turns the cameras to face a second portion of room 520 adjacent to the first portion of room 520.
  • the current viewpoint of the user is indicated by the position and facing direction of object 566 in top view 564 of room 520 in Figure 51.
  • camera view 524 of the physical environment included in user interface 522 is updated to include the second portion of room 520, including wall 532 and furniture and physical objects in front of wall 532 (e.g., stool 546, TV stand 550, TV 560, and floor lamp 556).
  • window 542 has been shifted out of the current field of view of the cameras
  • cabinet 548 has been shifted to the left side of the field of view of the cameras.
  • graphical object 576 for the edge between wall 532 and floor 540 is extended along the edge between wall 532 and floor 540 based on newly captured image and depth data from the second portion of the physical environment.
  • the earlier displayed portion of graphical object 576 e.g., the left portion
  • the earlier displayed portion of graphical object 576 is optionally displayed with less visual enhancement (e.g., lower luminance, lower color saturation, and/or less opacity) but more definiteness (e.g., more stable, more solid, more sharpness, less flickering, and/or less feathering) to indicate a greater predicted accuracy for the spatial characteristics of the left portion of the edge between wall 532 and floor 540;
  • the later displayed portion of graphical object 576 e.g., the right portion
  • is optionally displayed with more visual enhancement e.g., greater luminance, greater color saturation, and/or greater opacity
  • less definiteness e.g., more patchy, more broken, less sharpness, more flickering, and/
  • graphical object 590 is displayed at a location of stool 546 to indicate an outline of stool 546
  • graphical object 592 is displayed at a location of TV 560 to indicate edges and surface of TV 560
  • graphical object 594 is displayed at a location of TV stand 550 to indicate an outline of TV stand 550.
  • graphical objects 590, 592, 594 are displayed with different values or sets of values for one or more visual properties (e.g., luminance, thickness, texture, feathering, blur, sharpness, density, and/or opacity) in accordance with respective predicted accuracies of the estimated spatial properties of the edges and surfaces of stool 546, TV 560, and TV stand 550.
  • visual properties e.g., luminance, thickness, texture, feathering, blur, sharpness, density, and/or opacity
  • the appearances of graphical objects 590, 592, 594 are continuously updated (e.g., expanded and/or updated in values for the one or more visual properties) in accordance with detection of new portions of the edges and surfaces of stool 546, TV 560, and TV stand 550 and in accordance with updates to the respective predicted accuracies of the estimated spatial properties of the edges and surfaces of stool 546, TV 560, and TV stand 550.
  • preview 568 is also updated to show the partially completed model with a different orientation that corresponds to the current viewpoint of the user.
  • the partially completed model of room 520 is rotated around a vertical axis to the left by a first angular amount, in response to a rotation of the camera’s field of view around a vertical axis to the right by the first angular amount.
  • object 548 that represents cabinet 548, representation 530” for wall 530, and representation 542” (e.g., a hollowed out area, a transparent area, or another type of representation) for window 542 are rotated to the left side of preview 568, while cabinet 548 and wall 530 are shifted to the left side of the camera view 524 in user interface 522.
  • representation 530” of wall 530, representation 532” of wall 532, and representation 540” of floor 540 in the partially completed three-dimensional model of room 520 displayed in preview 568 are expanded as more image and depth data of wall 530, wall 532, and floor 540 are captured by the one or more cameras and processed by device 100.
  • FIG. 51 after cabinet 548 is identified (e.g., recognized to be of a known type of object, recognized to have a respective label or name, recognized to belong to a known group, and/or can otherwise been identified with a label, icon, or another similar representation) (e.g., based on the scanned data, and/or the spatial characteristics of the cabinet), the previously displayed graphical object 580 at the location of cabinet 548 is gradually replaced by another representation 596 of cabinet 548 (e.g., a label, an icon, an avatar, a textual object, and/or a graphical object) that does not spatially indicate the one or more spatial characteristics (e.g., size, length, height, thickness, dimensions, and/or shape) of cabinet 548.
  • another representation 596 of cabinet 548 e.g., a label, an icon, an avatar, a textual object, and/or a graphical object
  • graphical object 580 spatially indicates the one or more spatial characteristics of cabinet 548 (e.g., the size, length, height, thickness, dimensions, and/or shape of graphical object 580 corresponds to the size, length, height, thickness, dimensions, and/or shape of cabinet 548, and/or graphical object 580 is a bounding box or outline of cabinet 548).
  • Graphical object 580 is gradually faded out from the location of cabinet 548, when another representation 596 is displayed at the location of cabinet 548.
  • representation 596 are independent of the spatial characteristics of cabinet 548 (e.g., the size, length, height, thickness, dimensions, and/or shape of representation 596 of cabinet 548 do not correspond to the size, length, height, thickness, dimensions, and/or shape of cabinet 548).
  • representation 596 is smaller (e.g., occupies less area, and/or has a smaller spatial extent) than graphical object 580.
  • representation 596 indicates a type of object that has been identified (e.g., representation 596 includes a name of cabinet 548, a model number of cabinet 548, a type of furniture that cabinet 548 is, a brand name of cabinet 548, and/or an owner or maker of cabinet 548).
  • representation 596 is an icon or image that indicates the object type of cabinet 548.
  • graphical object 580 is no longer displayed (e.g., as shown in Figure 5 J).
  • graphical object 580 is displayed in a translucent and/or dimmed state, or another state with reduced visual prominence.
  • the spatial relationship between graphical object 580 and cabinet 548 is fixed after scanning and modeling of cabinet 548 is completed, regardless of the orientation of cabinet 524 relative to the current viewpoint of the user (e.g., when the viewpoint changes, graphical object 580 and cabinet 548 move and turn in the same manner in the camera view 524).
  • the spatial relationship between representation 596 and cabinet 548 is not fixed and may change depending on the current viewpoint of the user (e.g., when the viewpoint changes, representation 596 and cabinet 548 may translate together (e.g., representation 596 is attached to a detected front surface of cabinet 548), but representation 596 will turn to face toward the current viewpoint irrespective of the facing direction of cabinet 548 relative to the viewpoint).
  • Figure 5J as the scan of the second portion of the physical environment continues, graphical object 580 ceases to be displayed at the location of cabinet 548 in camera view 524, and representation 596 remains displayed at the location of cabinet 548 (e.g., representation 596 is attached to the front surface of cabinet 548 and is turned to face toward the viewpoint of the user).
  • graphical objects corresponding to the newly detected edges and/or surfaces are displayed at the respective locations of these newly detected edges and/or surfaces in camera view 524 (e.g., graphical object 598 is displayed at the location of floor lamp 556).
  • one or more display properties of the graphical objects corresponding to the detected edges and/or surfaces are updated according to the changes in the predicted accuracies of the spatial properties of their corresponding edges and surfaces (e.g., the display properties of graphical object 590 corresponding to stool 546, of graphical object 594 corresponding to TV stand 550, and of graphical object 576 for the edge between wall 532 and floor 540, are updated based on the changes in the predicted accuracies of the spatial characteristics of their corresponding structural and/or nonstructural elements).
  • a final state of the graphical object representing the edge and/or surface is displayed (e.g., final state of graphical object 592 for TV 560 is displayed), and optionally, an animated change in the appearance of the graphical object is displayed to indicate the completion of the scan and modeling of the edge and/or surface (e.g., visual effect 598 is displayed for the completion of the scan of the edge between wall 532 and floor 540, and visual effect 600 is displayed for the completion of the scan of the surface of TV 560).
  • device 100 determines that an unscanned portion of room 520 exists between the first portion of the physical environment that has been modeled and the second portion of the physical environment that has been modeled. In some embodiments, device 100 determines that an unscanned portion of the physical environment exists between two scanned portions of the physical environment based on a determination that the models of the two scanned portions of the physical environment cannot be joined together satisfactorily.
  • cabinet 548 when the first portion of the room 520 is being scanned (e.g., as shown in Figures 5F-5H), cabinet 548 is in a position that blocks a portion of wall 530 from being captured by the cameras; and when the viewpoint changes and the second portion of the room is being scanned, cabinet 548 still blocks the view of the missed portion of wall 530, and the missed portion of wall 530 is almost completely moved out of the field of view of the cameras when the second portion of room 520 is in the field of view of the cameras.
  • the missed portion of wall 530 that has not been scanned refers to the portion of wall 530 that includes entryway 544 which is visually obscured by cabinet 548 from certain viewing angles, and not the portion of wall 530 that is directly behind the back surface of cabinet 548 which would not be visible from any viewing angle.
  • Device 100 determines, e.g., based on the above information, that the user may have presumed that scan and modeling of the first wall 530 of the physical environment has been completed and that the user has moved on to scan the second portion of the physical environment. Based on the above determination, device 100 displays a prompt (e.g., banner 602, and/or another alert or notification) for the user to scan a missed spot in the presumably completed portion of the physical environment.
  • a prompt e.g., banner 602, and/or another alert or notification
  • the prompt is updated to provide more detailed and up-to-date guidance about how the user may move to scan the missed portion of the presumably completed portion of the physical environment (e.g., updated banner that reads “move forward,” “move left,” “turn to face the camera to the left,” and/or other appropriate instructions).
  • device 100 in addition to the prompt, device 100 also displays one or more visual guides to help the user to find the location of the missed portion of the already scanned portion of the physical environment.
  • a visual indication (e.g., arrow 604, and/or another type of visual indication or graphical object) is displayed to indicate the location of the missed portion of wall 530 hidden behind cabinet 548 (e.g., arrow 604 points toward the location of the missed portion of wall 530 that is behind cabinet 548 from the current viewing angle).
  • the visual indication is an animated object (e.g., animated arrow, and/or animated icon), and the animation (e.g., movement direction of the animated object, and/or movement pattern of the animated object) indicates the location of the missed portion of wall 530 that is behind cabinet 548 as viewed from the current viewing angle.
  • device 100 displays the visual indication at a location that is on the side of the camera view that is closest to the missed portion of wall 530.
  • the visual indication is optionally updated depending on the relative spatial positions of the missed portion of wall 530 and the currently displayed portion of the physical environment.
  • the visual indication is displayed at a visual depth that corresponds to the missed portion of the presumably completed portion of the physical environment (e.g., arrow 604 is displayed at a depth corresponding to the depth of the missed portion of wall 530 hidden behind cabinet 548 from the current viewing angle).
  • device 100 further displays a visual indication (e.g., dot 606 or another type of visual indication) at a location in the camera view that corresponds to a location from where the missed portion of wall 530 can be captured by the cameras.
  • a visual indication e.g., dot 606 or another type of visual indication
  • dot 606 is displayed overlaying camera view 524 at a location on floor 540 to indicate that if the user were to stand close to stool 546 and point the cameras in the direction indicated by arrow 604, image and depth data for the missed portion of wall 530 would be captured.
  • the visual indication is an animated object (e.g., a bouncing ball, another type of animated object or visual effect).
  • the visual indication is displayed at a visual depth that corresponds to the location from which the missed portion of the presumably completed portion of the physical environment can be scanned (e.g., dot 606 is displayed at a depth corresponding to the depth of the location from which the missed portion of wall 530 behind cabinet 548 can be scanned).
  • a visual indication that indicates the location of the missed portion of wall 530 is displayed in preview 568 of the three-dimensional model of room 520.
  • arrow 608 is displayed in the partially completed model of room 520 at a location next to representation 548” for cabinet 548 and pointing toward a portion of representation 530” for wall 530 that has not been scanned and modeled (e.g., the unscanned portion of wall 530 is shown as a flat portion, irrespective of what structural and/or nonstructural elements exist in the unscanned portion of wall 530 and the space in front of it).
  • the appearance of visual indication 608 corresponds to the appearance of visual indication 604.
  • the appearance of visual indication 608 is different from visual indication 604, where the respective appearances of visual indication 608 and visual indication 604 are, optionally, tailored to their respective surrounding environments to enhance visibility of the visual indications.
  • a visual indication that indicates the location from where a user can place the cameras to capture of the missed portion of wall 530 is displayed in preview 568 of the three-dimensional model of room 520.
  • dot 610 is displayed in the partially completed model of room 520 at a location on the representation 540” of floor 540, next to representation 548” for cabinet 548.
  • the appearance of visual indication 610 corresponds to the appearance of visual indication 606.
  • the appearance of visual indication 610 is different from visual indication 606, where the respective appearances of visual indication 610 and visual indication 606 are, optionally, tailored to their respective surrounding environments to enhance visibility of the visual indications.
  • visual indication 608 and/or visual indication 610 are animated.
  • visual indication 608 and/or visual indication 610 are stationary relative to preview 568 of the three-dimensional model of room 520.
  • representations of newly detected objects are added to the partially completed three-dimensional model of room 520 in preview 568.
  • object 560 for TV 560 is added to a location in the partially completed three-dimensional model that corresponds to the location of TV 560 in the physical environment.
  • a representation of the nonstructural element is not added into the partially completed model in preview 568 (e.g., representations for stool 546, TV stand 550, and floor lamp 598 are not added to the model yet).
  • Figures 5K-5P illustrate interaction with the partially completed three- dimensional model of room 520 in preview 568, while the scan of the second portion of the physical environment is ongoing and progressing. For example, during the scan of the second portion of the physical environment, more objects are identified and their corresponding spatial representations (e.g., bounding boxes, or other graphical objects that spatially indicate the spatial dimensions of the objects) are replaced by their corresponding nonspatial representations (e.g., icons, labels, and/or other graphical objects that do not spatially indicate the spatial dimensions of the objects).
  • spatial characteristics and/or predicted accuracies of spatial characteristics of one or more edges and/or surfaces have changed, and the spatial characteristics and the visual properties of their spatial representations have been updated accordingly.
  • corresponding visual effects are displayed to indicate the completion of the detection and modeling of these edges and/or surfaces.
  • its corresponding representation e.g., a three- dimensional representation, or a two-dimensional representation
  • graphical object 590 of stool 546 is updated to its final state that spatially represents the spatial characteristics of stool 546 (e.g., graphical object 590 is displayed as a bounding box, or another shape that represents the spatial extent of stool 546), and a corresponding three-dimensional representation 546” of stool 546 (e.g., a cylinder that represents the shape and spatial extent of stool 546) is added to the partially completed model of room 520 at a location left of representation 560” for TV 560.
  • a corresponding three-dimensional representation 546 e.g., a cylinder that represents the shape and spatial extent of stool 546
  • graphical object 594 of TV stand 550 is updated to its final state that spatially represents the spatial characteristics of TV stand 550 (e.g., graphical object 594 is displayed as a bounding box, or another shape that represents the spatial extent of TV stand 550), and a corresponding three-dimensional representation 550” of TV stand 550 (e.g., a cuboid that represents the shape and spatial extent of TV stand 550) is added to the partially completed model of room 520 at a location below of representation 560” for TV 560.
  • a corresponding three-dimensional representation 550 e.g., a cuboid that represents the shape and spatial extent of TV stand 550
  • graphical object 592 spatially indicates the one or more spatial characteristics of TV 560 (e.g., the size, length, height, thickness, dimensions, and/or shape of graphical object 592 corresponds to the size, length, height, thickness, dimensions, and/or shape of TV 560, and/or graphical object 592 is a bounding box or outline of TV 560).
  • Graphical object 592 is gradually faded out from the location of TV 560, when another representation 612 is displayed at the location of TV 560.
  • the spatial characteristics of representation 612 are independent of the spatial characteristics of TV 560 (e.g., the size, length, height, thickness, dimensions, and/or shape of graphical object 612 do not correspond to the size, length, height, thickness, dimensions, and/or shape of TV 560).
  • representation 612 is smaller (e.g., occupies less area, and/or has a smaller spatial extent) than graphical object 592 and smaller than TV 560.
  • representation 612 indicates a type of object that has been identified (e.g., representation 612 includes a name of TV 560, a model number of TV 560, a type of appliance that TV 560 is, a brand name of TV 560, and/or an owner or maker of TV 560).
  • representation 612 is an icon or image that indicate the object type of TV 560.
  • graphical object 592 is no longer displayed (e.g., as shown in Figure 5L).
  • graphical object 592 is displayed in a translucent and/or dimmed state, or another state with reduced visual prominence.
  • the spatial relationship between graphical object 592 and TV 560 is fixed after scanning and modeling of TV 560 is completed, regardless of the orientation of TV 560 relative to the current viewpoint of the user (e.g., when the viewpoint changes, graphical object 592 and TV 560 move and turn in the same manner in the camera view 524).
  • the spatial relationship between representation 612 and TV 560 is not fixed and may change depending on the current viewpoint of the user (e.g., when the viewpoint changes, representation 612 and TV 560 may translate together (e.g., representation 612 is attached to a detected front surface of TV 560), but representation 612 will turn to face toward the current viewpoint irrespective of the facing direction of TV 560).
  • the non-spatial representation of a large chair and the non-spatial representation of a small chair are optionally the same (e.g., both are a label with a stylized chair icon, or a textual label “chair”), even though their spatial representations are different (e.g., one is a bigger bounding box and the other is a small bounding box, or one is a large cylinder for a big round chair, and one is a small cube for a small desk chair).
  • non-spatial representations of smart home devices e.g., a smart speaker, a smart home device, and/or a smart lamp
  • a smart speaker e.g., a smart speaker, a smart home device, and/or a smart lamp
  • different visual properties other than spatial properties e.g., visual properties such as colors and/or textual or graphical content
  • the non-spatial representation 596 of cabinet 548 and the non- spatial representation 612 of TV 560 are respectively displayed at locations of their corresponding objects, but both are turned to face toward the current viewpoint of the user.
  • the positions and perspectives of cabinet 548 and TV 560 would change in camera view 524 according to the movement of the viewpoint (e.g., non-spatial representation 596 of cabinet 548 would translate with the front surface of cabinet 548 while turning to continue to face toward the viewpoint, and non-spatial representation 612 of TV 560 would translated with the front surface of TV 560 while turning to continue to face toward the viewpoint (e.g., optionally turning by a different amount and/or toward a different direction from the amount and/or direction executed by the non-spatial representation 596)).
  • Figure 5L as the scan and modeling of the second portion of the physical environment continue, scan and modeling of floor lamp 556 is completed, and a final state of graphical object 598 is displayed to indicate the spatial characteristics of floor lamp 556.
  • representation 556” of floor lamp 556 is added to the partially completed model of room 520 in preview 568 to a position to the right of representation 550” of TV stand 550.
  • non-spatial representation 614 of stool 546 e.g., a label, an icon, an avatar, a textual object, and/or a graphical object
  • identity of stool 546 e.g., object type, model number, name, owner, maker, and/or textual description
  • the spatial representation 590 of stool 546 cease to be displayed or is reduced in visual prominence (e.g., displayed with less luminance, and/or color saturation, and/or more translucency).
  • the spatial representation of the object remains displayed without being replaced by a non-spatial representation (e.g., the spatial representation 594 of TV stand 550 remains displayed and is not replaced with a corresponding non-spatial representation because the TV stand 550 has not been identified by device 100).
  • the spatial representation of the object fades out after the period of time even if no non-spatial representation replaces it.
  • the spatial representation 590 of stool 546 is replaced by the non-spatial representation 614 of stool 546 and ceases to be displayed in camera view 524.
  • non-spatial representation 616 of floor lamp 556 is displayed at a location of the floor lamp 556 in camera view 524 facing toward the viewpoint.
  • Non-spatial representation 616 identifies floor lamp 556 (e.g., identifies the name, object type, owner, group, maker, and/or model number of floor lamp 556).
  • the spatial representation 598 of floor lamp 556 is reduced in visual prominence or ceases to be displayed when the non-spatial representation 616 of floor lamp 556 is displayed at the location of floor lamp 556 in camera view 524.
  • detecting the start of the input includes detecting contact 616 at a location on touch screen 220 that corresponds to a portion of the partially completed three-dimensional model in preview 568.
  • device 100 further detects movement of contact 616 in a first direction across touch screen 220 (e.g., a swipe input or a drag input on the partially completed model in preview 568 to the right).
  • device 100 in response to detecting the input that includes the movement in the first direction (e.g., in response to detecting the swipe input or drag input on the partially completed model in preview 568 in the first direction), moves the partially completed model in preview 568 in a first manner in accordance with the first input (e.g., rotating and/or translating the partially completed model in the first direction).
  • device 100 in response to a rightward swipe on the partially completed model, rotates the partially completed model around a vertical axis (e.g., an axis in the direction of gravity, and/or an axis that points in a downward direction of the preview 568 and/or user interface 522).
  • the amount and/or speed of rotation of the partially completed model is based on the distance and/or speed of the swipe input detected on the partially completed model.
  • objects and/or surfaces within the partially completed model may become visually occluded by other objects and/or surfaces in the partially completed model (e.g., representation 550” of TV stand 550 occludes representation 546” of stool 546, and representation 556” of floor lamp 556 occludes representation 550” of TV stand 550) as a result of the rotation.
  • visual indications for guiding the user to rescan a missed spot in a presumably completed portion of the physical environment may become visually occluded by other objects and/or surfaces in the partially completed model as a result of the rotation (e.g., arrow 608 becomes occluded by representation 548” of cabinet 548), and/or may visually occlude other objects and/or surfaces in the partially completed model as a result of the rotation.
  • FIG. 5L after the partially completed model of room 520 in preview 568 is rotated in accordance with the drag input by contact 616, and before termination of the drag input (e.g., before liftoff of contact 616, or before detecting other types of termination depending on the input type), the partially completed model of room 520 in preview 568 is shown with an orientation that is different from the orientation of the physical environment relative to the viewpoint of the user.
  • device 100 restores the orientation of the partially completed model in preview 568, such that the orientation of the partially completed model again matches the orientation of the physical environment relative to the current viewpoint.
  • device 100 updates the camera view 524 such that the view of the physical environment in user interface 522 continues to correspond to the current viewpoint, where the orientation of the partially completed model after the rotation and/or movement of the partially completed model by the user input is not based on the current viewpoint as long as the termination of the input has not been detected.
  • device 100 displays the partially completed model with an orientation that corresponds to the current viewpoint, e.g., the same orientation as the physical environment in the camera view 524.
  • FIG. 5N another user input (e.g., a depinch gesture by two contacts 618-1 and 618-2 moving away from each other after touching down on the partially completed model in preview 568, or another scaling input of a different input type) is detected at the location of the partially completed model in preview 568.
  • device 100 rescales the partially completed model in preview 568 in accordance with the user input (e.g., increases the scale of the partially completed model in accordance with the movement of the contacts in the depinch gesture, and/or decrease the scale of the partially completed model in accordance with the movement of the contacts in a pinch gesture).
  • the direction and magnitude of the rescaling of the partially completed model is based on the direction and magnitude of the relative movement of the user input (e.g., contacts moving apart causes enlargement of the model, contacts moving together causes shrinking of the model, and/or center of contacts moving in a respective direction causes translation of the model while the model is being rescaled).
  • the partially completed model of room 520 is enlarged.
  • the changed scale of the partially completed model in preview 568 is maintained, e.g., obscuring a larger portion of the camera view 524 than before the input was detected.
  • device 100 displays the partially completed model with the original scale that was used before the user input was detected.
  • Figures 5Q-5R illustrate rescan of the missed portion of wall 530 and the region in front of it (e.g., visually occluded by cabinet 548 and/or behind cabinet 548 along the line of sight from the viewpoints of the user, when scanning the first portion and the second portion of room 520), in accordance with the guidance provided by object 604 and 606, in accordance with some embodiments.
  • the user if the user, at the prompt of banner 602, and in accordance with the guidance provided by arrow 604 and dot 606 in the camera view 524 (and/or in accordance with the guidance provided by arrow 608 and dot 610 in preview 568), the user moves the cameras toward the location indicated by dot 606 and/or dot 610.
  • banner 602 is optionally updated to show updated instructions to guide the user to move the cameras to the desired location and/or face the desired direction to scan the missed portion of the physical environment.
  • the updated viewpoint of the user is indicated by the position and facing direction of object 566 in top view 564 of room 520.
  • camera view 524 is updated to show a closer view of cabinet 548, as the cameras are moving toward the location in the physical environment that marked by dot 606 in the camera view 524.
  • arrow 604 is shown to be visually occluded (e.g., the tip of the arrow 604 is not drawn, or shown as translucent) by the cabinet 548, if the location of the arrow 604 in the physical environment would be visually occluded by the cabinet 548 from the current viewpoint of the user.
  • the non-spatial representations of identified objects e.g., representation 596 for cabinet 548 and representation 614 for stool 546) are shown at the locations of their corresponding objects and are respectively turned to face toward the current viewpoint.
  • Figure 5R the user has moved to a location indicated by dot 606 and/or dot 610 and pointed the cameras toward the location indicated by arrow 604 and/or arrow 610 (e.g., the current location and facing direction of the user is indicated by object 566 in top view 564 of room 520), and camera view 524 is updated to show the missed portion of wall 530 and the region in front of it.
  • image and/or depth data for the missed portion of wall 530 and the region in front of it are captured by the cameras and processed by device 100, and edges, surfaces, and/or objects in this portion of the physical environment are detected and modeled and are optionally identified.
  • a structural element e.g., entryway 544, and/or another structural element
  • graphical object 620 is displayed at the location of the structural element in camera view 524 to spatially represent the spatial characteristics of the structural element (e.g., graphical object 620 is an outline and/or an overlay that indicates a shape, size, and/or an outline of entry way 544).
  • the spatial representation of the structural element may optionally be replaced with a non-spatial representation (e.g., an icon, a label, or another type of non-spatial representation) that does not spatially represent to spatial characteristics of the identified structural element and that indicates an identity of the structural element (e.g., a type of the structural element, a name of the structural element, and/or a style of the structural element).
  • a non-spatial representation e.g., an icon, a label, or another type of non-spatial representation
  • the scanning and modeling of a missed portion of the physical environment are analogous to the scanning and modeling of an unscanned, new portion of the physical environment described with respect to Figures 5F- 5P above.
  • the camera view is updated to show the physical environment from a different perspective and position, while the partially completed model of room 520 in preview 568 is rotated to correspond to the current viewpoint.
  • the entryway 544 is represented by a hollowed out area or a transparent region 544” in the representation 530” of wall 530 that has a size corresponding to that size of the entryway 544 in the physical environment.
  • the portion of the camera view 524 that is located behind the representation 542” for window 542, and representation 544” for entryway 544 is visible through representation 542” for window 542 and representation 544” for entryway 544 in the partially completed three- dimensional model in preview 568.
  • graphical objects corresponding to the edges, surfaces, and/or objects in the third portion of the physical environment are displayed.
  • graphical object 622 is displayed at the location of couch 552 overlaying camera view 524 in response to detection of one or more edges and/or surfaces of couch 552.
  • Graphical object 622 is a spatial representation that spatially indicates the spatial characteristics of couch 552 in camera view 524.
  • graphical object 622 is expanded as additional edges and/or surfaces or additional portions of detected edges and/or surfaces are detected and characterized; and the values of one or more visual properties of graphical object 622 are updated in real-time in accordance with changes in the predicted accuracies of the spatial characteristics of the corresponding edges and/or surfaces represented by graphical object 622.
  • a final state of graphical object 622 is displayed in response to determining that detection and spatial characterization of couch 552 is completed, where the final state of graphical object 622 is a solid three-dimensional outline and/or bounding box of couch 552.
  • completion of the scan and spatial characterization of couch 552 is indicated by an animated change (e.g., a sudden increase in luminance followed by a reduction in luminance of an overlay on the edges and/or surfaces of couch 552, and/or cessation of applied visual effect (e.g., feathering, and/or flickering) on the edges and/or surfaces of couch 552).
  • an animated change e.g., a sudden increase in luminance followed by a reduction in luminance of an overlay on the edges and/or surfaces of couch 552
  • cessation of applied visual effect e.g., feathering, and/or flickering
  • representation 530 of wall 530 is rotated to a position that would visually obscure more than a threshold portion of the representations of other objects and/or surfaces inside the three-dimensional model (e.g., representation 532” of wall 532, representations 534” of wall 534, representation 550” of TV stand 550, representation 560” of TV 560, and/or other representations of structural elements and/or nonstructural elements).
  • representation 530” of wall 530 is made more translucent or removed completely, so that all or part of the representations of other portions of the partially completed three-dimensional model that would otherwise be visually obscured by representation 530” of wall 530 become visible in preview 568.
  • representation 532” of wall 532 and representation 534” of wall 534 are visible, while representation 530” of wall 530 is removed or is made transparent, partially transparent or translucent.
  • the outlines of representation 544” of entryway 544 and representation 542” of window 542 remain displayed as a transparent, partially transparent, partially transparent, or hollowed out area in the partially completed three-dimensional model of room 520 (e.g., objects inside the partially completed model is visible through the transparent, partially transparent, or hollowed out area), even though representation 530” of wall 530 has been removed (e.g., optionally with an outline remaining) or has been made transparent, partially transparent or translucent in preview 568.
  • device 100 in response to detecting the completion of the scanning and modeling of couch 552, device 100 displays representation 552” of couch 552 in the partially completed model in preview 568, where the location of representation 552” of couch 552 in the partially completed model of room 520 corresponds to the location of couch 552 in room 520.
  • the cameras after scanning the third portion of the physical environment, the cameras are turned to face toward side table 554 in room 520.
  • device 100 updates the camera view 524 to include a fourth portion of the physical environment that corresponds to the current viewpoint of the user, the fourth portion of the physical environment including wall 534, couch 552, side table 554, and table lamp 558.
  • device 100 also rotates the partially completed three-dimensional model in preview 568 to a new orientation that corresponds to the current viewpoint of the user.
  • graphical objects corresponding to the edges, surfaces, and/or objects in the fourth portion of the physical environment are displayed.
  • graphical object 624 is displayed at the location of side table 554 overlaying camera view 524 in response to detection of one or more edges and/or surfaces of side table 554.
  • Graphical object 624 is a spatial representation that spatially indicates the spatial characteristics of side table 554 in camera view 524.
  • graphical object 624 is expanded as additional edges and/or surfaces or additional portions of detected edges and/or surfaces are detected and characterized; and the values of one or more visual properties of graphical object 624 are updated in real-time in accordance with changes in the predicted accuracies of the spatial characteristics of the corresponding edges and/or surfaces represented by graphical object 624.
  • a final state of graphical object 624 is displayed in response to determining that detection and spatial characterization of side table 554 is completed, where the final state of graphical object 624 is a solid three-dimensional outline and/or bounding box of side table 554.
  • completion of the scan and spatial characterization of side table 554 is indicated by an animated change (e.g., a sudden increase in luminance followed by a reduction in luminance of an overlay on the edges and/or surfaces of side table 554, and/or cessation of applied visual effect (e.g., feathering, and/or flickering) on the edges and/or surfaces of side table 554).
  • animated change e.g., a sudden increase in luminance followed by a reduction in luminance of an overlay on the edges and/or surfaces of side table 554, and/or cessation of applied visual effect (e.g., feathering, and/or flickering) on the edges and/or surfaces of side table 554).
  • graphical object 626 is displayed at the location of table lamp 558 overlaying camera view 524 in response to detection of one or more edges and/or surfaces of table lamp 558.
  • Graphical object 626 is a spatial representation that spatially indicates the spatial characteristics of table lamp 558 in camera view 524.
  • graphical object 626 is expanded as additional edges and/or surfaces or additional portions of detected edges and/or surfaces are detected and characterized; and the values of one or more visual properties of graphical object 626 are updated in real-time in accordance with changes in the predicted accuracies of the spatial characteristics of the corresponding edges and/or surfaces represented by graphical object 626.
  • a final state of graphical object 626 is displayed in response to determining that detection and spatial characterization of table lamp 558 is completed, where the final state of graphical object is a solid three-dimensional outline and/or bounding box of table lamp 558.
  • completion of the scan and spatial characterization of table lamp 558 is indicated by an animated change (e.g., a sudden increase in luminance followed by a reduction in luminance of an overlay on the edges and/or surfaces of side table 554, and/or cessation of applied visual effect (e.g., feathering, and/or flickering) on the edges and/or surfaces of table lamp 554).
  • an animated change e.g., a sudden increase in luminance followed by a reduction in luminance of an overlay on the edges and/or surfaces of side table 554, and/or cessation of applied visual effect (e.g., feathering, and/or flickering) on the edges and/or surfaces of table lamp 554).
  • FIG. 5U after the spatial representation of couch 552, e.g., graphical object 622 or another graphical object, is displayed at the location of couch 552 in camera view 524, device 100 identifies couch 552, e.g., determines an object type, a model number, a style, an owner, and/or a category of couch 552. In response to identifying couch 552, device 100 replaces the spatial representation of couch 552 (e.g., graphical object 624, or another spatial representation that spatially indicates spatial dimensions of couch 552) with a non- spatial representation of couch 552 (e.g., object 628, or another object that does not spatially indicate spatial dimensions of couch 552).
  • couch 552 e.g., graphical object 624, or another spatial representation that spatially indicates spatial dimensions of couch 552
  • a non- spatial representation of couch 552 e.g., object 628, or another object that does not spatially indicate spatial dimensions of couch 552).
  • graphical object 632 that is displayed at a location of an edge between wall 534 and floor 540 includes a portion that is behind couch 552 and side table 554; and in accordance with lower predicted accuracies of the spatial characteristics of the portion of the edge behind couch 552 and side table 554, the portion of graphical object 632 corresponding to the portion of the edge behind couch 552 and side table 554 is displayed with reduced visibility (e.g., has a higher translucency, reduced luminance, reduced sharpness, more feathering, and/or has a greater blur radius) as compared to the portion of the edge that is not occluded by couch 552 and side table 554.
  • reduced visibility e.g., has a higher translucency, reduced luminance, reduced sharpness, more feathering, and/or has a greater blur radius
  • representation 530” of wall 530 is still in a position that would visually obscure more than a threshold portion of the representations of other objects and/or surfaces inside the three- dimensional model (e.g., representation 532” of wall 532, representations 534” of wall 534, representation 546” of stool 546, representation 550” of TV stand 550, representation 560” of TV 560, representation 556” of floor lamp 556, representation 552” of couch 552, representation 554” of side table 554, representation 558” of table lamp 558, and/or other representations of structural elements and/or nonstructural elements).
  • representation 532” of wall 532, representations 534” of wall 534, representation 546” of stool 546, representation 550” of TV stand 550 representation 560” of TV 560, representation 556” of floor lamp 556, representation 552” of couch 552, representation 554” of side table 554, representation 558” of table lamp 558, and/or other representations of structural elements and/or nonstructural elements.
  • representation 530” of wall 530 is made more translucent or removed completely, so that all or part of the representations of other portions of the partially completed three-dimensional model that would otherwise be visually obscured by representation 530” of wall 530 become visible in preview 568.
  • representation 532” of wall 532 and representation 534” of wall 534 are visible, while representation 530” of wall 530 is removed (optionally with an outline remaining) or is made transparent, partially transparent or translucent.
  • the outlines of representation 544” of entryway 544 and representation 542” of window 542 remain displayed as a transparent, partially transparent, or hollowed out area in the partially completed three-dimensional model of room 520 (e.g., objects inside the partially completed model is visible through the transparent, partially transparent, or hollowed out area), even though representation 530” of wall 530 has been removed or has been made transparent, partially transparent or translucent in preview 568.
  • device 100 in response to detecting the completion of the scanning and modeling of side table 554 and table lamp 558, device 100 displays representation 554” of side table 554 and representation 558” of table lamp 558 in the partially completed model in preview 568, where the locations of representation 554” of side table 554 and representation 558” of table lamp 558 in the partially completed model of room 520 correspond respectively to the location of side table 554 and table lamp 558 in room 520.
  • FIG. 5 V after scanning the fourth portion of the physical environment, the cameras are moved and turned to face toward the last unscanned wall of room 520, namely, wall 536.
  • the current position and facing direction of the cameras are indicated by the position and facing direction of object 566 in top view 564 of room 520.
  • device 100 updates the camera view 524 to include a fifth portion of the physical environment that corresponds to the current viewpoint of the user, the fourth portion of the physical environment including wall 536, and boxes 562.
  • device 100 also rotates the partially completed three- dimensional model in preview 568 to a new orientation that corresponds to the current viewpoint of the user.
  • FIG. 5 V As shown in Figure 5 V, as image and/or depth data of the fifth portion of the physical environment are captured and processed, graphical objects corresponding to the edges, surfaces, and/or objects in the fifth portion of the physical environment are displayed. For example, graphical objects 630 is displayed at the location of boxes 562 overlaying camera view 524 in response to detection of one or more edges and/or surfaces of boxes 562. Graphical objects 630 are spatial representations that spatially indicate the spatial characteristics of boxes 562 in camera view 524.
  • graphical objects 630 are expanded as additional edges and/or surfaces or additional portions of detected edges and/or surfaces are detected and characterized; and the values of one or more visual properties of graphical objects 630 are updated in real-time in accordance with changes in the predicted accuracies of the spatial characteristics of the corresponding edges and/or surfaces represented by graphical objects 630.
  • a final state of graphical objects 630 is displayed in response to determining that detection and spatial characterization of boxes 562 is completed, where the final state of graphical objects 630 include solid three-dimensional outlines and/or bounding boxes of boxes 562.
  • completion of the scan and spatial characterization of boxes 562 is indicated by an animated change (e.g., a sudden increase in luminance followed by a reduction in luminance of an overlay on the edges and/or surfaces of boxes 562, and/or cessation of applied visual effect (e.g., feathering, and/or flickering) on the edges and/or surfaces of boxes).
  • an animated change e.g., a sudden increase in luminance followed by a reduction in luminance of an overlay on the edges and/or surfaces of boxes 562, and/or cessation of applied visual effect (e.g., feathering, and/or flickering) on the edges and/or surfaces of boxes).
  • Figure 5 V after the spatial representation of boxes 562 (e.g., graphical objects 630 or another graphical object) is displayed at the location of boxes 562 in camera view 524, device 100 is not able to identify boxes 562 (e.g., determines an object type, a model number, a style, an owner, and/or a category, of boxes 562). Consequently, the spatial representation of boxes 562 remains displayed in camera view 524, as long as boxes 562 are still in the field of view.
  • graphical objects 630 cease to be displayed after a period of time or become less visible (e.g., become more translucent and/or reduced in luminance) even if boxes 562 are not identified.
  • representation 532” of wall 532 is moved into a position that would visually obscure more than a threshold portion of the representations of other objects and/or surfaces inside the three-dimensional model (e.g., representation 530” of wall 530, representations 534” of wall 534, representation 550” of TV stand 550, representation 560” of TV 560, representation 546” of stool 546, representation 556” of floor lamp 556, representation 552” of couch 552, representation 554” of side table 554, representation 558” of table lamp 558, representation 562” of boxes 562, representation 548” of cabinet 548, representation 544” of entry way 547, and/or other representations of structural elements and/or nonstructural elements).
  • representation 530” of wall 530, representations 534” of wall 534, representation 550” of TV stand 550 representation 560” of TV 560, representation 546” of stool 546, representation 556” of floor lamp 556, representation 552” of couch 552, representation 554” of side table 554, representation 558”
  • representation 532” of wall 532 is made more translucent or removed completely, so that all or part of the representations of other portions of the partially completed three-dimensional model that would otherwise be visually obscured by representation 532” of wall 532 become visible in preview 568.
  • representation 530” of wall 530 and representation 534” of wall 534 are visible, while representation 532” of wall 532 is removed (optionally with an outline remaining) or is made transparent, partially transparent or translucent.
  • an outline of representation 532” of wall 532 remains displayed, while representation 532” of wall 532 is removed or made more translucent.
  • device 100 in response to detecting the completion of the scanning and modeling of boxes 562, device 100 displays representation 562” of boxes 562 in the partially completed model in preview 568, where the locations of representation 562” of boxes 562 in the partially completed model of room 520 correspond respectively to the location of boxes 562 in room 520.
  • the user After scanning and modeling the fifth portion of the physical environment, the user turns the cameras to a sixth portion of the physical environment, where the sixth portion of the physical environment includes at least a portion of the previously scanned and modeled first portion of the physical environment.
  • device 100 scans and models the sixth portion of the physical environment and determines that the user has completed a loop to capture all walls of the room 520, and an edge between wall 530 and wall 536 have been detected and modeled.
  • device 100 also detects and models the edge between wall 536 and floor 540, as well as the edge between wall 530 and floor 540.
  • Figure 5W the partially completed three-dimensional model of room 520 in preview 568 is rotated to an orientation that corresponds to the current viewpoint of the user and that corresponds to the currently displayed portion of the physical environment.
  • Representation 532” of wall 532 and representation 534” of wall 534 are moved into a position that would visually obscure more than a threshold portion of the representations of other objects and/or surfaces inside the three-dimensional model (e.g., representation 530” of wall 530, representations 536” of wall 536, and/or other representations of structural elements and/or nonstructural elements in room 520).
  • representation 532” of wall 532 and representation 534” of wall 534 are made more translucent or removed completely, so that all or part of the representations of other portions of the partially completed three-dimensional model that would otherwise be visually obscured by representation 532” of wall 532 and representation 534” of wall 534 become visible in preview 568.
  • outlines of representation 532” of wall 532 and representation 534” of wall 534 remain displayed, while representations 532” and 534” are removed or made more translucent.
  • the edge between representations 532” and 534” remains displayed in preview 568 to indicate the position of the edge between wall 532 and 534 in the physical environment.
  • device 100 in response to detecting the completion of the scanning and modeling of the entire room (e.g., all four walls and its interior, or another set of required structural elements and/or nonstructural elements), ceases to display the partially completed three-dimensional model of room 520 and displays an enlarged three-dimensional model 634 of room 520 that has been generated based on the completed scan and modeling of room 520.
  • the completed three-dimensional model 634 of room 520 is displayed in a user interface 636 that does not include camera view 524.
  • user interface 522 includes a passthrough view of the physical environment as seen through a transparent or semi-transparent display
  • device 100 optionally displays an opaque or semi-transparent background layer that blocks and/or blurs the view of the physical environment when displaying the three-dimensional model 634 in user interface 636.
  • user interface 522 includes an affordance (e.g., “exit” button 638, or another user interface object that can be selected to terminate or pause the scanning and modeling process) that, when selected, causes display of user interface 636 before device 100 determines, based on predetermined rules and criteria, that scanning and modeling of room 520 is completed.
  • affordance e.g., “exit” button 638, or another user interface object that can be selected to terminate or pause the scanning and modeling process
  • device 100 stops the scan and modeling process and displays an enlarged version of the partially completed three-dimensional model available at that time in user interface 636, and device 100 stores and displays the partially completed three-dimensional model as the completed three-dimensional mode of room 520 at that point.
  • affordance e.g., “exit” button 638, or another analogous user interface object
  • device 100 displays the completed three-dimensional model 634 (or the partially completed three-dimensional model, if scan is terminated early by the user) in an orientation that does not necessarily correspond to the current position and facing direction of the cameras (e.g., as indicated by the position and facing direction of object 566 in top view 564 of room 520).
  • the orientation of the three-dimensional model 634 is chosen by device 100 to enable better viewing of the objects detected in the physical environment.
  • the orientation of the three-dimensional model 634 is chosen based on the initial viewpoint of the user when the scan is first started or based on the final viewpoint of the user when the scan is ended (e.g., the representation of the first wall that is scanned by the user faces toward the user in user interface 636, or the representation of the last wall that is scanned by the user faces toward the user in user interface 636).
  • device 100 detects the start of a user input directed to the three-dimensional model 634.
  • detecting the start of the input includes detecting contact 638 at a location on touch screen 220 that corresponds to a portion of the three-dimensional model 634 in user interface 636.
  • device 100 further detects movement of contact 638 in a first direction across touch screen 220 (e.g., a swipe input or a drag input on the completed model 634 in user interface 636).
  • device 100 in response to detecting the input that includes the movement in the first direction (e.g., in response to detecting the swipe input or drag input on the completed model 634 in user interface in the first direction), moves the completed three-dimensional model 634 in a first manner in accordance with the input (e.g., rotating and/or translating the completed model 634 in the first direction).
  • device 100 in response to a rightward swipe on the completed model 634, rotates the completed model around a vertical axis (e.g., an axis in the direction of gravity, and/or an axis that points in a downward direction of the model 634 and/or user interface 636).
  • the amount and/or speed of rotation of the completed model is based on the distance and/or speed of the swipe input detected on the completed model.
  • objects and/or surfaces within the completed model may become visually occluded by other objects and/or surfaces in the partially completed model as a result of the rotation.
  • the completed model 634 of room 520 in user interface 636 is shown with the orientation that is specified by the user input.
  • device 100 when termination of the drag input is detected, device 100 does not restore the orientation of the completed model 634 in user interface 636, such that the orientation of the completed model continues to be displayed with the orientation that was specified by the user input (e.g., different from that shown in Figure 5X). This is in contrast to the behavior of the partially completed model in preview 568, as described with respect to Figures 5K-5M above.
  • device 100 rescales the completed model in user interface 636 in accordance with the user input (e.g., increases the scale of the completed model in accordance with the movement of the contacts in the depinch gesture, and/or decrease the scale of the completed model in accordance with the movement of the contacts in a pinch gesture).
  • another user input e.g., a depinch gesture by two contacts moving away from each other after touching down on the completed model 634 in user interface 636, or another scaling input of a different input type
  • the direction and magnitude of the rescaling of the completed model 634 is based on the direction and magnitude of the relative movement of the user input (e.g., contacts moving apart causes enlargement of the model, contacts moving together causes shrinking of the model, and/or center of contacts moving in a respective direction causes translation of the model while being rescaled).
  • the completed model 634 of room 520 is rescaled in response to detecting the user input that corresponds to a request to rescale the completed model 634 in user interface 636.
  • the changed scale of the completed model 634 in user interface 636 is maintained (e.g., the rescaled model may even be partially out of the display area of the display generation component).
  • device 100 displays the completed model with the last scale that was used before the user input was terminated.
  • user interface 636 optionally includes a plurality of selectable user interface objects corresponding to different operations related to the scanning process and/or operations related to the model and/or data that has been generated.
  • user interface 636 includes an affordance (e.g., “Done” button 638, or another type of user interface object) that, when selected by a user input (e.g., a tap input, an air tap gesture, and/or another type of selection input), causes device 100 to terminate the scanning and modeling process described herein, and return to the application from which the scanning and modeling process was initiated.
  • an affordance e.g., “Done” button 638, or another type of user interface object
  • device 100 in response to activation of button 638, device 100 ceases to display user interface 636, and displays user interface 644 of the browser application (e.g., as shown in Figure 5AB) if the scanning and modeling process was started from user interface 510 of the browser application (e.g., in response to selection of button 516 in Figure 5B).
  • device 100 in response to activation of button 638, device 100 ceases to display user interface 636, and displays user interface 646 of the paint design application (e.g., as shown in Figure 5AC) if the scanning and modeling process was started from user interface 514 of the paint design application (e.g., in response to selection of button 516 in Figure 5C).
  • user interface 636 includes an affordance (e.g., “Rescan” button 640, or another type of user interface object) that, when selected by a user input (e.g., a tap input, an air tap gesture, and/or another type of selection input), causes device 100 to return to user interface 522, and allow the user to restart the scanning and modeling process and/or rescan one or more portions of the physical environment.
  • an affordance e.g., “Rescan” button 640, or another type of user interface object
  • device 100 in response to activation of button 640, ceases to display user interface 636, and displays user interface 522 with preview 568 (e.g., including the currently completed three-dimensional model of room 520 to be updated further, or including a brand new partially completed three-dimensional model to be built from scratch) and camera view 524 (e.g., updated based on the current viewpoint).
  • the redisplayed user interface 522 includes one or more user interface objects for the user to specify which portion of the model needs to be updated and/or rescanned.
  • the redisplayed user interface 522 includes one or more visual guide to indicate which portion of the model has lower predicated accuracies.
  • user interface 636 includes an affordance (e.g., “Share” button 642, or another type of user interface object) that, when selected by a user input (e.g., a tap input, an air tap gesture, and/or another type of selection input), causes device 100 to display a user interface with selectable options to interact with the generated model and corresponding data, such as sharing, storing, and/or opening using one or more applications (e.g., an application from which the scanning and modeling process was initiated, and/or applications that are different from the application from which the scanning and modeling process was first initiated).
  • a user input e.g., a tap input, an air tap gesture, and/or another type of selection input
  • applications e.g., an application from which the scanning and modeling process was initiated, and/or applications that are different from the application from which the scanning and modeling process was first initiated.
  • device 100 in response to activation of button 642, ceases to display user interface 636 and displays user interface 648 (e.g., as shown in Figure 5AD), where the user may interact with one or more selectable user interface object to review the model and/or corresponding data and perform one or more operations with respect to the model and/or corresponding data.
  • user interface 648 e.g., as shown in Figure 5AD
  • user interface 644 of the browser application includes a representation of the completed three-dimensional model 634 that is optionally augmented with other information and graphical objects.
  • the three-dimensional model 634 of room 520 is used to show how user-selected AV equipment can be placed inside room 520.
  • user interface 644 allows the user to drag the three-dimensional model and rescale the three-dimensional model using various inputs (e.g., using a drag input, a pinch input, and/or a depinch input).
  • user interface 644 includes an affordance 645-1 (e.g., “Go back” button, or other analogous user interface object) that, when selected, causes device 100 to cease to display user interface 644 and redisplay user interface 636.
  • user interface 644 includes an affordance 645-2 (e.g., “Share” button, or other analogous user interface object) that, when selected, causes device 100 to display a plurality of selectable options for sharing the model 634, corresponding data of model 634, the layout of the AV equipment (e.g., selected, and/or recommended) that is generated based on model 634, a listing of AV equipment that has been selected by the user as well as their placement locations in room 520, a listing of recommended AV equipment generated based on the model of room 520, scanned data of room 520, and/or a listing of objects identified in room 520.
  • affordance 645-1 e.g., “Go back” button, or other analogous user interface object
  • affordance 645-2 e.g.,
  • device 100 also provides different options for sharing the above data and information, such as options for choosing one or more recipients, and/or using one or more applications for sharing the above data and information (e.g., examples are provided with respect to Figure 5AD).
  • user interface 644 of the browser application includes an affordance (e.g., “Print” button 654-3, or another analogous user interface object) that, when selected, cause the current view of the three-dimensional model 634 (optionally including the augmentations applied to the model) to be printed to a file or a printer.
  • affordance e.g., “Print” button 654-3, or another analogous user interface object
  • the device optionally displays a plurality of selectable options to configure the printing of the model 634 (e.g., choosing a printer, choosing the subject matter and data for printing, and/or choosing the format for printing).
  • user interface 644 includes an affordance (e.g., “Rescan” button 645-4, or another analogous user interface object) that, when selected, causes device 100 to cease to display user interface 644 and displays user interface 522 (e.g., as shown in Figure 5D and 5E, or 5W) or user interface 636 (e.g., as shown in Figure 5X) for the user to rescan room 520 (e.g., to improve the model 634 or to build a new model of room 520 from scratch).
  • affordance e.g., “Rescan” button 645-4, or another analogous user interface object
  • user interface 644 includes an affordance (e.g., “Checkout”, or another analogous user interface object) that, when selected, causes device 100 to generate a payment interface to pay for the AV equipment and services provided through the user interface of the browser application (e.g., the scanning and modeling services, and/or the layout and recommendation services).
  • an affordance e.g., “Checkout”, or another analogous user interface object
  • user interface 646 of the paint design application includes a representation of the completed three-dimensional model 634 that is optionally augmented with other information and graphical objects.
  • the three-dimensional model 634 of room 520 is used to show room 520 would look if paint and/or wallpaper selected by the user are applied.
  • user interface 646 allows the user to drag and rotate the three-dimensional model and rescale the three- dimensional model using various inputs (e.g., using a drag input, a pinch input, and/or a depinch input).
  • user interface 646 includes an affordance 647-1 (e.g., “Back” button, or other analogous user interface object) that, when selected, causes device 100 to cease to display user interface 646 and redisplay user interface 636.
  • user interface 646 includes an affordance 647-2 (e.g., “Share” button, or other analogous user interface object) that, when selected, causes device 100 to display a plurality of selectable options for sharing the model 634, corresponding data of model 634, the rendered views of room 520 with selected or recommended paint and wallpaper that are generated based on model 634, a listing of selected paint and wallpaper that have been selected by the user as well as their placement locations in room 520, a listing of recommended paint and/or wallpaper generated based on the model of room 520, scanned data of room 520, and/or a listing of objects identified in room 520.
  • affordance 647-1 e.g., “Back” button, or other analogous user interface object
  • affordance 647-2 e.g., “Share”
  • device 100 also provides different options for sharing the above data and information, such as options for choosing one or more recipients, and/or using one or more applications for sharing the above data and information (e.g., examples are provided with respect to Figure 5AD).
  • user interface 646 includes an affordance (e.g., “Print” button 657-3, or another analogous user interface object) that, when selected, cause the current view of the three-dimensional model 634 (optionally including the augmentations applied to the model) to be printed to a file or a printer.
  • affordance e.g., “Print” button 657-3, or another analogous user interface object
  • the device optionally displays a plurality of selectable options to configure the printing of the model 634 (e.g., choosing a printer, choosing the subject matter and data for printing, and/or choosing the format for printing).
  • user interface 644 includes an affordance (e.g., “New Room” button 647-4, or another analogous user interface object) that, when selected, causes device 100 to cease to display user interface 646 and displays user interface 522 (e.g., as shown in Figure 5D and 5E) for the user to scan another room or rescan room 520 from scratch.
  • affordance e.g., “New Room” button 647-4, or another analogous user interface object
  • user interface 646 includes paint selection summary for the different walls of room 520, and includes affordances 647-5 for changing the paint and/or wallpaper selections for the different walls.
  • affordances 647-5 for changing the paint and/or wallpaper selections for the different walls.
  • model 634 in user interface 646 is automatically updated by device 100 to show the newly selected paint/wallpaper on their respective surfaces.
  • Figure 5AD shows an example user interface 648 that are associated with the “Sharing” function of user interface 636, user interface 644, and/or user interface 646.
  • user interface 648 is optionally a user interface of an operating system and/or a native application that provides the scanning and modeling functions described herein (e.g., an application of a vendor that provides the API or developer tool kit of the scanning and modeling function).
  • user interface 648 provides a listing of subject matter than can be shared.
  • a representation of model 634 of room 520, a representation of a top view 564 of room 520, and/or a listing of identified objects in room 520 are displayed in user interface 648, along with corresponding selection controls (e.g., checkboxes, radial buttons, and/or other selection controls).
  • selection controls e.g., checkboxes, radial buttons, and/or other selection controls.
  • subsequent sharing functions are applied to one or more of the model 634, top view 564, and listing 649-1, based on their respective selection state as specified by the selection controls.
  • user interface 648 includes an affordance (e.g., “Go Back” button 649-2, or other analogous user interface object) that, when selected, causes device 100 to cease to display user interface 648 and redisplay the user interface from which user interface 648 was triggered (e.g., user interface 636 in Figure 5AA, user interface 644 in Figure 5AB, or user interface 646 in Figure 5AC).
  • user interface 648 displays a plurality of selectable representations of contacts or potential recipients 649-3 for sending the selected subject matter (e.g., model 634, top view 564, and/or listing 649-1).
  • selection of one or more of the representations of contacts or potential recipients 649-3 causes display of a communication user interface (e.g., instant messaging user interface, email user interface, network communication user interface (e.g., WiFi, P2P, and/or Bluetooth transmission interface), and/or a shared network device user interface) for sending and/or sharing the selected subject matter (e.g., model 634, top view 564, and/or listing 649-1).
  • a communication user interface e.g., instant messaging user interface, email user interface, network communication user interface (e.g., WiFi, P2P, and/or Bluetooth transmission interface), and/or a shared network device user interface) for sending and/or sharing the selected subject matter (e.g., model 634, top view 564, and/or listing 649-1).
  • user interface 648 displays a plurality of selectable representations of applications 649-4 for opening and/or sending the selected subject matter (e.g., model 634, top view 564, and/or listing 649-1).
  • selection of one or more of the representations of applications 649-4 causes device 100 to display respective user interfaces of the selected applications in which the selected subject matter (e.g., model 634, top view 564, and/or listing 649-1) can be viewed, stored, and/or shared with another user of the selected applications.
  • user interface 648 includes an affordance (e.g., “copy” button 649-5, or another analogous user interface object) that, when selected, causes device 100 to make a copy of the selected subject matter (e.g., model 634, top view 564, and/or listing 649-1) in a clipboard or memory, so that it can be pasted into another application and/or user interface that is opened later.
  • user interface 648 includes an affordance (e.g., “Publish” button 649-6, or another analogous user interface object) that, when selected, causes device 100 to display user interface for publishing the selected subject matter (e.g., model 634, top view 564, and/or listing 649-1) to an online location (e.g., a website, an online bulletin board, a social network platform, and/or a public and/or private sharing platform) so other users can see the selected subject matter remotely from another device.
  • an affordance e.g., “Publish” button 649-6, or another analogous user interface object
  • an online location e.g., a website, an online bulletin board, a social network platform, and/or a public and/or private sharing platform
  • user interface 648 includes an affordance (e.g., “Add to” button 649-8, or another analogous user interface object) that, when selected, causes device 100 to display user interface for inserting the selected subject matter (e.g., model 634, top view 564, and/or listing 649-1) to an existing model (e.g., model of a house including room 520 and other rooms, and/or an existing collection of models) of a physical environment.
  • an affordance e.g., “Add to” button 649-8, or another analogous user interface object
  • user interface 648 includes an affordance (e.g., “Save As” button 649-9, or another analogous user interface object) that, when selected, causes device 100 to display user interface for saving the selected subject matter (e.g., model 634, top view 564, and/or listing 649-1) in a different format that is more suitable for sharing with another user or platform.
  • an affordance e.g., “Save As” button 649-9, or another analogous user interface object
  • Figures 6A-6F are flow diagrams illustrating a method 650 of displaying a preview of a three-dimensional model of an environment during scanning and modeling of the environment, in accordance with some embodiments.
  • Method 650 is performed at a computer system (e.g., portable multifunction device 100 (Figure 1A), device 300 ( Figure 3A), or computer system 301 ( Figure 3B)) with a display device (e.g., a display, optionally touch-sensitive, a projector, a head-mounted display, a heads-up display, or the like, such as touch screen 112 ( Figure 1A), display 340 ( Figure 3 A), or display generation component(s)
  • a display device e.g., a display, optionally touch-sensitive, a projector, a head-mounted display, a heads-up display, or the like, such as touch screen 112 ( Figure 1A), display 340 ( Figure 3 A), or display generation component(s)
  • one or more cameras e.g., optical sensor(s) 164 ( Figure 1 A) or camera(s)
  • Some operations in method 800 are, optionally, combined and/or the order of some operations is, optionally, changed.
  • the method 650 is a method for displaying a preview of a three-dimensional model of an environment during scanning and modeling of the environment, and adding additional information to the preview of the three-dimensional model as the scan progresses.
  • the preview of the three-dimensional model can be manipulated (e.g., rotated, or otherwise oriented) independently of the field of view of one or more cameras of the computer system. Displaying the preview of the three-dimensional model, and allowing manipulation independent of the field of view of the computer system’s cameras, increases the efficiency of computer system by reducing the number of inputs the user needs to interact with the preview of the three-dimensional model.
  • the user can freely rotate the preview of the three-dimensional model to a desired orientation, without having to constantly readjust the orientation of the preview (e.g., as would be required if the preview always attempted to re-align the orientation to match the field of view of the one or more cameras of the computer system).
  • This also provides improved visual feedback to the user (e.g., improved visual feedback regarding the progress of the scan), as the preview of the three-dimensional environment can be updated with additional information as the scan progresses.
  • the computer system displays (652), via the display generation component, a first user interface (e.g., a scan user interface that is displayed to show progress of an initial scan of a physical environment to build a three-dimensional model of the physical environment, a camera user interface, and/or a user interface that is displayed in response to a user’s request to perform a scan of a physical environment or to start an augmented reality session in a physical environment), wherein the first user interface concurrently includes (e.g., in an overlaying manner, or an adjacent manner): a representation of a field of view of one or more cameras (e.g., images or video of a live feed from the camera(s), or a view of the physical environment through a transparent or semitransparent display), the representation of the field of view including a first view of a physical environment that corresponds to a first viewpoint of a user in the physical environment (e.g., the first
  • the partially completed model is oriented so that the model and the physical environment have the same or substantially similar orientations relative to the first viewpoint of the user.
  • the first user interface e.g., user interface 520
  • the first user interface includes camera view 524 capturing a first view of room 520 that corresponds to a first viewpoint of a user (e.g., as represented by object 566 in the top view 564 of room 520), and a preview of a three- dimensional model of room 520 (e.g., preview 568 that includes a partially completed three- dimensional model of a first portion of room 520).
  • the computer system detects (654) first movement of the one or more cameras in the physical environment that changes a current viewpoint of the user in the physical environment from the first viewpoint to a second viewpoint (e.g., movement of the one or more cameras include translation and/or rotation in three-dimensions in the physical environment) (e.g., movement of the one or more cameras include panning movements and/or tilting movements that change the direction that the camera faces; horizontal movements and/or vertical movements that change the x, y, z positions of the camera relative to the physical environment, and/or various combinations of the above).
  • first movement of the one or more cameras in the physical environment that changes a current viewpoint of the user in the physical environment from the first viewpoint to a second viewpoint
  • movement of the one or more cameras include translation and/or rotation in three-dimensions in the physical environment
  • movement of the one or more cameras include panning movements and/or tilting movements that change the direction that the camera faces; horizontal movements and/or vertical movements that change the x, y, z positions of the camera relative to the physical
  • the one or more cameras of device 100 are moved and turned (e.g., as represented by the movement and rotation of object 566 in top view 564 in Figure 51 relative to Figure 5 J).
  • the computer system in response to detecting the first movement of the one or more cameras: the computer system updates (656) the preview of the three-dimensional model (and, optionally, updating the representation of the field of view of the cameras) in the first user interface in accordance with the first movement of the one or more cameras, including adding additional information to the partially completed three-dimensional model (e.g., based on depth information captured by the one or more cameras) and rotating the partially completed three-dimensional model from the first orientation that corresponds to the first viewpoint of the user to a second orientation that corresponds to the second viewpoint of the user.
  • the computer system updates (656) the preview of the three-dimensional model (and, optionally, updating the representation of the field of view of the cameras) in the first user interface in accordance with the first movement of the one or more cameras, including adding additional information to the partially completed three-dimensional model (e.g., based on depth information captured by the one or more cameras) and rotating the partially completed three-dimensional model from the first orientation that corresponds to the first viewpoint of the user to a second orientation that
  • the preview includes a view of the updated, partially completed three- dimensional model of the physical environment from the perspective of a virtual user located at or close to the second viewpoint relative to the three-dimensional model.
  • the updated, partially completed model is oriented so that the model and the physical environment have the same or substantially similar orientations relative to the second viewpoint of the user.
  • updating the preview of the three- dimensional model includes scaling the view of the three-dimensional model to accommodate more portions of the model in the same display region as the portions are added to the model.
  • device 100 in response to detecting the movement of the one or more cameras of device 100 (as indicated by the movement and rotation of object 566 in top view 564 of room 520), updates the camera view 524 to show a second portion of the physical environment and rotates preview 568 to a second orientation that corresponds to the updated viewpoint of the user.
  • the computer system detects (658) first input directed to the preview of the three-dimensional model in the first user interface (e.g., a swipe input on a touch-sensitive surface, and/or in the air; and/or an air gesture that specifies a direction of movement or rotation) (e.g., the first input is determined to be directed to the preview because the preview has input focus, and/or the location of the first input corresponds to the position of the preview in the first user interface).
  • first input directed to the preview of the three-dimensional model in the first user interface e.g., a swipe input on a touch-sensitive surface, and/or in the air; and/or an air gesture that specifies a direction of movement or rotation
  • device 100 while displaying the user interface 522 with the camera view 524 showing the second portion of room 520 and the preview 568 of the three-dimensional model of room 520, device 100 detects a swipe input by a contact 616 in a first direction on the partially completed three-dimensional model in preview 568, where the partially completed three-dimensional model in preview 568 is shown with the second orientation that corresponds to the orientation of the room 520 relative to the current viewpoint (e.g., viewpoint as indicated by object 566 in top view 564 of room 520 in Figure 5K, same as that shown in Figures 5I-5J).
  • viewpoint e.g., viewpoint as indicated by object 566 in top view 564 of room 520 in Figure 5K, same as that shown in Figures 5I-5J.
  • the computer system in response to detecting the first input directed to the preview of the three-dimensional model in the first user interface: the computer system updates (660) the preview of the three-dimensional model in the first user interface in accordance with the first input, including, in accordance with a determination that the first input meets first criteria (e.g., the first input includes a swipe input in a first direction, a pinch and drag air gesture, or another analogous input of other input types, while the preview of the three-dimensional model has input focus), rotating the partially completed three-dimensional model from the second orientation that corresponds to the second viewpoint of the user to a third orientation that does not correspond to the second viewpoint of the user (e.g., while the representation of the field of view continues to show the second view of the physical environment that corresponds to the second viewpoint of the user, or while the representation of the field of view continues to be updated in accordance with movement of the one or more cameras that is executed during the first input).
  • first criteria e.g., the first input includes a swipe input in a first direction, a pinch
  • the computer system rotates the partially completed three-dimensional model of room 520 in preview 568 to a new orientation (as shown in Figure 5L) that is different from the second orientation (shown in Figure 5K) that corresponds to the orientation of room 520 relative to the current viewpoint of the user.
  • the computer system in response to detecting the first input directed to the preview of the three-dimensional model in the first user interface, updates the partially completed three-dimensional model based on depth information of a respective portion of the physical environment that is in the current field of view of the one or more cameras (e.g., the field of view is continuously updated based on the movement of the one or more cameras, and the model is continuously updated based on newly acquired depth information of the portion of the physical environment in the field of view).
  • the three-dimensional model is generated using at least first depth information of a first portion of the physical environment that corresponds to the first viewpoint and second depth information of a second portion of the physical environment that corresponds to the second viewpoint.
  • depth information includes data that is needed to detect and/or determine respective distances to various objects and/or surfaces in a portion of the physical environment that is in the field of view of the cameras.
  • depth information is used to determine spatial relationships and spatial characteristics of physical features (e.g., objects, surfaces, edges, and/or lines) in the physical environment.
  • the movement of cameras that change the viewpoint of the user is not a required condition for enabling the manual rotation of the preview of the three-dimensional model set forth above.
  • the computer system detects another input directed to the preview of the three-dimensional model; and in response to detecting the new input directed to the preview of the three-dimensional model in the first user interface, the computer system updates the three-dimensional model based on the depth information and updates the preview of the three-dimensional model in the first user interface in accordance with the new input, wherein updating the preview includes, in accordance with a determination that the new input meets the first criteria (e.g., the new input includes a swipe input in the first direction, a pinch and drag air gesture, or another analogous input of a different input type, while the preview of the three-dimensional model has input focus), rotating the partially completed three-dimensional model from the respective orientation that corresponds to the current viewpoint of the user to a new orientation that does not correspond to the current viewpoint of the user (
  • the orientation of the partially completed three- dimensional model is changed in a direction and/or by an amount that is determined based on one or more characteristics (e.g., direction, duration, distance, speed, and/or velocity of the input that meets the first criteria).
  • characteristics e.g., direction, duration, distance, speed, and/or velocity of the input that meets the first criteria.
  • the computer system while displaying the first user interface, including the representation of the field of view and the preview of three-dimensional model, the computer system adds (662), to the representation of the field of view, respective graphical objects at positions (e.g., overlaying the positions on the representation of the field of view) that correspond to one or more physical features (e.g., physical objects, physical surfaces, physical planes, physical boundaries, and/or physical edges) that have been detected in a respective portion of the physical environment that is visible in the representation of the field of view.
  • respective graphical objects e.g., overlaying the positions on the representation of the field of view
  • graphical objects 572, 578, 576, and 571 are added to locations of various structural elements such as edges between wall 530, wall 532, ceiling 538, and floor 540 in camera view 524.
  • graphical objects 580 is added to a location of a nonstructural element, such as cabinet 548 in camera view 524 in Figure 5F.
  • graphical object 576 is added to a location of an edge between wall 532 and floor 540 in camera view 524
  • graphical objects 598 and 594 are added to locations of floor lamp 556 and TV stand 550 in camera view.
  • the computer system displays visual feedback to visually indicate the progress of the scan in the form of outlines or overlays that convey the estimated spatial characteristics and identity information of the detected object and, optionally, the predicted accuracy of the estimated spatial characteristics and identity information of the detected object.
  • various spatial characteristics e.g., lengths, sizes, widths, shapes, boundaries, surfaces, and/or a combination of two or more of the above
  • identity information e.g., object type, category, grouping, ownership, category, and/or a combination of two or more of the above
  • the computer system displays visual feedback to visually indicate the progress of the scan in the form of outlines or overlays that convey the estimated spatial characteristics and identity information of the detected object and, optionally, the predicted accuracy of the estimated spatial characteristics and identity information of the detected object.
  • the visual feedback is dynamically updated based on the changes in the predicted accuracy of the estimated spatial characteristics of the detected objects (more details of this visual feedback are described with respect to Figures 9A-9E and accompanying descriptions). Adding respective graphical objects at positions that correspond to one or more physical features that have been detected in respective portions of the physical environment provides improved visual feedback to the user (e.g., improved visual feedback regarding locations of physical features in the physical environment, and/or improved visual feedback regarding which physical features in the physical environment the computer system has detected).
  • the one or more physical features include (664) at least a first physical object (e.g., a piece of furniture, an appliance, a piece of equipment, an piece of home decor, a person, a pet, and so on), and the respective graphical objects include at least a first graphical object that is displayed at a first position on the representation of the field of view that corresponds to the first physical object.
  • a first physical object e.g., a piece of furniture, an appliance, a piece of equipment, an piece of home decor, a person, a pet, and so on
  • the respective graphical objects include at least a first graphical object that is displayed at a first position on the representation of the field of view that corresponds to the first physical object.
  • graphical object 580 is displayed at a location of cabinet 548 in camera view 524, once one or more edges and surfaces of cabinet 548 have been detected.
  • graphical object 592 is displayed at the location of TV 560 in camera view 524, once one or more edges and surfaces of TV 560 have been detected.
  • the first graphical object is of a first type that includes an outline, a bounding box, and/or an overlay with the shape of, the first physical object, where the first graphical object of the first type has spatial characteristics that indicate the spatial characteristics of the first physical object.
  • the first graphical object is of a second type that includes a label, an icon, and/or an avatar of the first physical object that indicates the type, nature, grouping, and/or category of the first physical object, but the spatial characteristics of the first graphical object (other than the displayed position of the first graphical object) do not necessarily correspond to the spatial characteristics of the first physical object.
  • the first graphical object transforms from the first type to the second type during the scan as more information is determined about the physical object and the object type is recognized from the physical characteristics of the physical object.
  • the computer system concurrently displays both the graphical object of the first type and the graphical object of the second type, e.g., at least for a period of time, for a respective physical object, during the scan.
  • Adding, to the representation of the field of view, at least a first graphical object that is displayed at a first position on the representation of the field of view that corresponds to the first physical object provides improved visual feedback to the user (e.g., improved visual feedback regarding a location of the first physical object, and/or improved visual feedback that the computer system has detected the first physical object).
  • the one or more physical features include (666) at least a first physical surface (e.g., a curved surface, and/or a plane) (e.g., a wall, a window, a door, an entryway, a floor, a ceiling, and/or a tabletop), and the respective graphical objects include at least a second graphical object (e.g., an outline, a bounding box, a filled area, an overlay, a color filter, and/or a transparency filter) that is displayed at a second position on the representation of the field of view that corresponds to the first physical surface.
  • a first physical surface e.g., a curved surface, and/or a plane
  • the respective graphical objects include at least a second graphical object (e.g., an outline, a bounding box, a filled area, an overlay, a color filter, and/or a transparency filter) that is displayed at a second position on the representation of the field of view that corresponds to the first physical surface.
  • an overlay is optionally displayed on the surface of wall 530 and wall 532 in camera view 524, once the surfaces of wall 530 and wall 532 are detected and characterized.
  • an overlay is optionally displayed on the surfaces of cabinet 548 in camera view 524, once the surfaces of cabinet 524 are detected and characterized.
  • the second graphical object is of a first type that includes an outline, a bounding box, and/or an overlay with the shape of, the first physical surface, where the second graphical object of the first type has spatial characteristics that indicate the spatial characteristics of the first physical surface.
  • the second graphical object is of a second type that includes a label, an icon, and/or an avatar of the first physical surface that indicates the type, nature, grouping, and/or category of the first physical surface, but the spatial characteristics of the second graphical object (other than the displayed position of the second graphical object) do not necessarily correspond to the spatial characteristics of the first physical surface.
  • the second graphical object transforms from the first type to the second type during the scan as more information is determined about the physical surface and the surface type is recognized from the physical characteristics of the physical surface.
  • the computer system concurrently displays both the graphical object of the first type and the graphical object of the second type, e.g., at least for a period of time, for a respective physical surface, during the scan.
  • Adding, to the representation of the field of view, at least a second graphical object that is displayed at a second position on the representation of the field of view that corresponds to the first physical surface provides improved visual feedback to the user (e.g., improved visual feedback regarding a location of the first physical surface, and/or improved visual feedback that the computer system has detected the first physical surface).
  • the computer system detects (668) a termination of the first input.
  • the computer system updates the preview of the three-dimensional model in the first user interface, including, rotating the partially completed three-dimensional model from the third orientation to a fourth orientation that corresponds to a current viewpoint of the user (e.g., the three-dimensional model is rotated so that the view of the three-dimensional model from the viewpoint of the user is the same as or similar to a view of the physical environment from the viewpoint of the user relative to the physical environment) (e.g., the partially completed three-dimensional model automatically rotates to an orientation that corresponds to an orientation of the physical environment relative to the current viewpoint of the user, after the influence of the first input is terminated) (e.g., the current viewpoint of the user is still the second viewpoint of the user and the representation of the field of view continues to show the second view of the physical environment that corresponds to the second viewpoint of the user, or the current viewpoint is a continuously updated viewpoint of the user while the representation of the field of view continues to be updated in accordance with movement of the one or more cameras that is executed during the
  • device 100 detects termination of the swipe input.
  • device 100 rotates the partially completed three-dimensional model to its original orientation that corresponds to the orientation of room 200 relative to the current viewpoint (e.g., viewpoint as indicated by object 566 in top view 564 of room 520) (as shown in Figure 5M).
  • Rotating the partially completed three- dimensional model from the third orientation to a fourth orientation that corresponds to a current viewpoint of the user reduces the number of inputs needed to display the partially completed three-dimensional model with the appropriate orientation (e.g., the user does not need to perform additional user inputs to re-align the partially completed three-dimensional model with the current viewpoint of the user).
  • the computer system while displaying the first user interface (e.g., while the scan is ongoing, and/or not completed), with the representation of the field of view including the second view of the physical environment that corresponds to the second viewpoint of the user (e.g., the second viewpoint of the user corresponds to a direction, position and/or vantage point from which the physical environment is being viewed by the user), and with the preview of the three-dimensional model including the partially completed model with the second orientation, the computer system detects (670) second input directed to the preview of the three-dimensional model in the first user interface (e.g., a pinch input or reverse pinch input on a touch-sensitive surface, or in the air; and/or an air gesture that specifies a type and magnitude of scaling) (e.g., the second input is determined to be directed to the preview because the preview has input focus, and/or the location of the second input corresponds to the position of the preview in the first user interface).
  • second input directed to the preview of the three-dimensional model in the first user interface
  • the second input
  • the computer system updates the preview of the three-dimensional model in the first user interface in accordance with the second input, including, in accordance with a determination that the second input meets second criteria different from the first criteria (e.g., the second input includes a pinch or reverse pinch input on a touch-sensitive surface, a pinch and flick air gesture, or another analogous input of a different input type, while the preview of the three-dimensional model has input focus), changing a scale of the partially completed three- dimensional model (e.g., enlarging or shrinking the partially completed three-dimensional model) relative to the representation of the field of view in accordance with the second input (e.g., based on direction and/or magnitude of second input).
  • the second input includes a pinch or reverse pinch input on a touch-sensitive surface, a pinch and flick air gesture, or another analogous input of a different input type, while the preview of the three-dimensional model has input focus
  • changing a scale of the partially completed three- dimensional model e.g., enlarging or shrink
  • device 100 in response to detecting the depinch gesture by contacts 618-1 and 618-2, device 100 enlarges the partially completed three-dimensional model in preview 568 relative to the camera view 524 in user interface 522.
  • the computer system reduces the scale of the partially completed three-dimensional model (e.g., in an amount that corresponds to a magnitude of the movement of the second input); and in accordance with a determination that the second input includes a movement in a second direction (e.g., movement to the left, movement in the counter-clockwise direction, and/or movement to increase a gap between two fingers), the computer system increases the scale of the partially completed three- dimensional model (e.g., in an amount that corresponds to a magnitude of the movement of the second input).
  • a first input that rotates the partially completed three-dimensional model and a second input that scales the partially completed three- dimensional model, relative to the representation of the field of view are optionally detected as parts of the same gesture (e.g., a pinch or depinch gesture that also include a translational movement of the whole hand), and as a result, the rotation and scaling of the partially completed three-dimensional model are executed concurrently in accordance with the gesture.
  • a pinch or depinch gesture that also include a translational movement of the whole hand
  • Changing a scale of the partially completed three-dimensional model relative to the representation of the field of view in accordance with a second input that meets second criteria provides additional control options without cluttering the UI with additional display controls (e.g., additional displayed controls for rotating the partially completed three- dimensional model and/or additional displayed controls for changing a scale of the partially completed three-dimensional model).
  • additional display controls e.g., additional displayed controls for rotating the partially completed three- dimensional model and/or additional displayed controls for changing a scale of the partially completed three-dimensional model.
  • the preview of the three-dimensional model of the physical environment includes (672) respective three-dimensional representations of one or more surfaces that have been detected in the physical environment (e.g., the respective three-dimensional representations of the one or more surfaces include representations of a floor, one or more walls, surfaces of one or more pieces of furniture laid out in three-dimensional space with spatial relationships and spatial characteristics corresponding to their spatial relationships and spatial characteristics).
  • preview 568 of the three- dimensional model of room 520 includes three-dimensional representations 530”, 530” and 540” for wall 530, wall 532, and floor 540, and three-dimensional representation 548” for cabinet 548 that include multiple surfaces corresponding to the surfaces of cabinet 548.
  • representation 534” of wall 534 is added to the partially completed model in the preview 568.
  • the respective representations of the one or more surfaces that have been detected in the physical environment include virtual surfaces, bounding boxes, and/or wireframes in the three-dimensional model that have spatial characteristics (e.g., size, orientation, shape, and/or spatial relationships) that correspond to (e.g., reduced in scaled relative to) the spatial characteristics (e.g., size, orientation, shape, and/or spatial relationships) of the one or more surfaces that have been detected in the physical environment.
  • Displaying a preview of the three-dimensional model, including respective three-dimensional representation of one or more surfaces that have been detected in the physical environment provides improved visual feedback to the user (e.g., improved visual feedback regarding the detected surfaces in the physical environment).
  • the preview of the three-dimensional model of the physical environment includes (674) respective representations of one or more physical objects that have been detected in the physical environment (e.g., the respective representations of the one or more objects include representations of one or more pieces of furniture, physical objects, people, pets, windows, and/or doors that are in the physical environment).
  • respective representations of the one or more objects include representations of one or more pieces of furniture, physical objects, people, pets, windows, and/or doors that are in the physical environment.
  • preview 568 of the three-dimensional model of room 520 includes respective representations 548” for cabinet 548, representation 546” for stool 546, and/or representation 552” for couch 552, and other representations for other objects detected in room 520.
  • the representations of the objects are three-dimensional representations.
  • the respective representations of the one or more objects that have been detected in the physical environment includes outlines, wireframes, and/or virtual surfaces in the three-dimensional preview that have spatial characteristics (e.g., size, orientation, shape, and/or spatial relationships) that correspond to (e.g., reduced in scaled relative to) the spatial characteristics (e.g., size, orientation, shape, and/or spatial relationships) of the one or more objects that have been detected in the physical environment.
  • the representations of the objects have reduced structural and visual details in the three- dimensional model as compared to their corresponding objects in the physical environment. Displaying a preview of the three-dimensional model of the physical environment, including respective representation of one or more physical objects that have been detected in the physical environment, provides improved visual feedback to the user (e.g., improved visual feedback regarding the detected physical objects in the physical environment).
  • the computer system replaces (676) display of the partially completed three-dimensional model in the preview of the three-dimensional model with display of a first view of a completed three-dimensional model of the physical environment, wherein the first view of the completed three-dimensional model includes an enlarged copy (and optionally, rotated to a preset orientation that does not correspond to the current viewpoint of the user) of the partially completed three-dimensional model that meets the preset criteria.
  • preset criteria e.g., criteria for determining when the scan of the physical environment is completed, e.g., because sufficient information has been obtained from the scan and preset conditions regarding detecting surfaces and objects in the physical environment are fulfilled; or because the user has requested that the scan be completed right away.
  • device 100 replaces display of user interface 522 with user interface 636 (as shown in Figure 5X), where user interface 636 includes an enlarged version of the completed three-dimensional model 634 of room 520.
  • the computer system determines that the scan is completed and the model of the physical environment meets preset criteria, the computer system replaces the preview of the three-dimensional model with a view of the completed three-dimensional model, where the view of the completed three-dimensional model is larger than the partially completed model shown in the preview.
  • the view of the completed three- dimensional model shows the three-dimensional model with a preset orientation (e.g., the orientation of the partially completed model shown at the time that the scan is completed, a preset orientation that is independent of the orientation of the partially completed model shown at the time that the scan is completed and independent of the current viewpoint).
  • a preset orientation e.g., the orientation of the partially completed model shown at the time that the scan is completed, a preset orientation that is independent of the orientation of the partially completed model shown at the time that the scan is completed and independent of the current viewpoint.
  • Replacing display of the partially completed three-dimensional model in the preview of the three-dimensional model with display of a first view of a completed three-dimensional model of the physical environment that includes an enlarged copy of the partially completed three- dimensional model after adding the additional information to the partially completed three-dimensional model in the preview of the three-dimensional model, reduces the number of inputs needed to display the completed three-dimensional model of the physical environment at the appropriate size (e.g., the user does not need to perform additional user inputs to enlarge the completed three-dimensional model of the physical environment, after the computer system adds the additional information to the partially completed three-dimensional model in the preview of the three-dimensional model).
  • the computer system while displaying the first view of the completed three- dimensional model in the first user interface (e.g., after the scan is just completed, or completed for some time) (e.g., optionally, with the representation of the field of view including a respective view of the physical environment that corresponds to the current viewpoint of the user (e.g., the first viewpoint, the second viewpoint of the user, or another viewpoint different from the first and second viewpoints)), the computer system detects (678) third input directed to the first view of the completed three-dimensional model in the first user interface (e.g., a swipe input on a touch-sensitive surface or in the air; or an air gesture that specifies a direction of movement or rotation) (e.g., the third input is determined to be directed to the completed three-dimensional model because the view of the three-dimensional model has input focus, or the location of the first input corresponds to the position of the view of the three-dimensional model in the first user interface).
  • third input directed to the first view of the completed three-dimensional model in the first user interface
  • the computer system updates the first view of the completed three-dimensional model in the first user interface in accordance with the third input, including, in accordance with a determination that the third input meets the first criteria (e.g., the third input includes a swipe input in a first direction, a pinch and drag air gesture, or another analogous input of a different input type, while the view of the completed three-dimensional model has input focus), rotating the completed three-dimensional model from a fourth orientation (e.g., a respective orientation that corresponds to a current viewpoint of the user, and/or a preset orientation) to a fifth orientation different from the fourth orientation in accordance with the third input.
  • a fourth orientation e.g., a respective orientation that corresponds to a current viewpoint of the user, and/or a preset orientation
  • device 100 detects a swipe input by contact 638 that is directed to the completed three-dimensional model 634 (as shown in Figure 5X).
  • device 100 rotates the completed three-dimensional model 634 in user interface 636 to a new orientation in accordance with the swipe input (as shown in Figure 5Y).
  • the computer system allows the user to rotate (e.g., freely, or under preset angular constraints) the model around one or more rotational axes (e.g., rotate around x-, y-, z-, axis, and/or tilt, yaw, pan, the view of the model) to view the three-dimensional model from different angles.
  • Rotating the completed three-dimensional model from a fourth orientation to a fifth orientation different from the fourth orientation in accordance with the third input in response to detecting the third input directed to the first view of the completed three-dimensional model in the first user interface, provides improved visual feedback to the user (e.g., improved visual feedback regarding the appearance of the three-dimensional model, as viewed with different orientations).
  • the computer system detects (680) a termination of the third input.
  • the computer system forgoes updating the first view of the completed three-dimensional model in the first user interface, including, maintaining the completed three-dimensional model in the fifth orientation (e.g., irrespective to the current viewpoint, movement of the display generation component, and/or the movement of the one or more cameras).
  • device 100 detects termination of the swipe input.
  • device 100 does not rotate the three-dimensional model 634 further, does not rotate the three-dimensional model 634 back to the orientation shown before the swipe input (e.g., the orientations of three-dimensional model 634 shown in Figure 5X and 5Y), and maintains the three-dimensional model 634 at the current orientation (as shown in Figures 5Z and 5AA).
  • maintaining a changed orientation of the completed three-dimensional model after detecting termination of the third input that rotated the model allows the user to have time to inspect the model from a desired viewing angle, deciding whether to rotate the model further to inspect the model from another viewing angle and providing the proper input to do so as desired.
  • the computer system does not change the orientation of the completed three-dimensional model (e.g., to reflect a current viewpoint of the user), and so the user does not need to perform additional user inputs to constantly readjust the orientation of the of the completed three-dimensional model back to the fifth orientation).
  • the completed three-dimensional model includes (682) a respective graphical representation of a first structural element that is detected in the physical environment and respective graphical representations of one or more physical objects that are detected in the physical environment.
  • Displaying the first view of the completed three-dimensional model includes: in accordance with a determination that a current orientation of the completed three-dimensional model in the first user interface (e.g., when the model is stationary and/or is being rotated according to user input) would cause the respective graphical representation of the first structural element (e.g., a wall, a floor, or another structural element in the physical environment) to occlude a view of the respective graphical representations of the one or more objects (e.g., physical objects that are in the interior portion of the physical environment, such as furniture, physical objects, smart home appliances, people, and/or pets), reducing an opacity of (e.g., while still displaying at least a portion of the graphical representation of the first structural element) or ceasing to display the graphical
  • the three-dimensional model 634 in user interface 636 includes representations of multiple structural elements, such as wall 530, wall 532, wall 534, wall 536, and floor 540.
  • the representation 534” of wall 534 is not displayed in the view of the three-dimensional model 634 in Figure 5X (e.g., optionally, an outline of the representation is displayed while the fill material of the representation is made transparent) because it would occlude representations of physical objects detected in the interior of room 520, such as representation 560” of TV 560, representation 556” of floor lamp 556, representation 552” of couch 552, representation 554” of side table 554, and representations of one or more other objects (e.g., boxes 562, and table lamp 558) that have been detected in room 520.
  • representation 536” of wall 536 and representation 534” of wall 534 are removed or made transparent or partially transparent (optionally leaving an outline without a fill material), because they would have occluded the representations of the objects that have been detected in room 520 (e.g., as representation 560” of TV 560, representation 556” of floor lamp 556, representation 552” of couch 552, representation 554” of side table 554, and representations of one or more other objects (e.g., boxes 562, and table lamp 558)).
  • representation 530” of wall 530 is displayed concurrently with representations of objects detected in room 520 because representation 530” would not occlude any of the objects with the current orientation of the completed three-dimensional model 634 in user interface 636.
  • representation 532” of wall 532 is displayed concurrently with representations of objects detected in room 520 because representation 532” would not occlude any of the objects with the current orientation of the completed three-dimensional model 634 in user interface 636.
  • the computer system displays (684) a respective user interface of a third-party application (e.g., any of a plurality of third-party applications that implements an application program interface for the room scanning capability described herein). While displaying the respective user interface of the third-party application, the computer system detects a respective input that is directed to the respective user interface of the third-party application, wherein the first user interface is displayed in response to detecting the respective input that is directed to the respective user interface of the third-party application and in accordance with a determination that the respective input corresponds to a request to scan the physical environment (e.g., meets the requirements of a system application programming interface (API) for the scanning of the physical environment).
  • API system application programming interface
  • the user interface 522 for scanning and modeling a physical environment can be displayed in response to activation of the “start scan” button 512 in either of the user interfaces of the browser application and the paint design application.
  • the same scanning process described herein is triggered in response to a user input directed to a respective user interface of another, different third-party application, wherein the user input corresponds to the request to scan the physical environment (e.g., meets the requirements of the system application programming interface (API) for the scanning of the physical environment).
  • API system application programming interface
  • improved visual feedback e.g., improved visual feedback regarding the progress of the partially completed three-dimensional model, and/or improved visual feedback regarding the appearance of the three-
  • the computer system redisplays (686) the third-party application (e.g., displaying the completed three-dimensional model in a user interface of the third-party application, and/or displaying content from the third-party application (e.g., a respective set of user interface objects corresponding to a respective plurality of actions in the third-party application) with at least a portion of the three-dimensional model, based on spatial information contained in the three-dimensional model).
  • the third-party application e.g., displaying the completed three-dimensional model in a user interface of the third-party application, and/or displaying content from the third-party application (e.g., a respective set of user interface objects corresponding to a respective plurality of actions in the third-party application) with at least a portion of the three-dimensional model, based on spatial information contained in the three-dimensional model.
  • a “Done” button 638 causes the device 100 to redisplay the user interface of the application (e.g., user interface 644 of the browser application, or user interface 646 of the paint design application) from which the scan and modeling process was initiated.
  • the user interface of the application e.g., user interface 644 of the browser application, or user interface 646 of the paint design application
  • multiple different third-party applications may utilize the scanning user interface and process described herein to obtain a three-dimensional model of the physical environment, and at the end of the scan, the computer system redisplays the third-party application from which the scanning process was initiated, and optionally, displays a user interface of the third-party application that provides one or more options to interact with the model and utilize the model to accomplish one or more tasks of the third-party application.
  • the user interfaces and the functions provided by different third-party applications are different from one another.
  • Redisplaying the third-party application in accordance with a determination that the generation of the three-dimensional model meets preset criteria, reduces the number of user inputs needed to redisplay the third-party application (e.g., the user does not need to perform additional user inputs to redisplay the third-party application).
  • displaying the preview of the three-dimensional model including the partially completed three-dimensional model includes (688) displaying a graphical representation of a first structural element (e.g., a wall, a floor, an entryway, a window, a door, or a ceiling) that is detected in the physical environment in a first direction relative to respective graphical representations of one or more objects (e.g., physical objects that are in the interior portion of the physical environment, such as furniture, physical objects, people, and/or pets) that have been detected in the physical environment; and rotating the partially completed three-dimensional model (e.g., from the second orientation to the third orientation, or from the third orientation to another orientation) includes: in accordance with a determination that, a respective rotation of the partially completed three-dimensional model (e.g., the respective rotation is caused by the movement of the camera that changes the viewpoint of the user, and/or caused by user input) to be executed by the partially completed three-dimensional model would cause the graphical representation of the first structural element to
  • device 100 would reduce the opacity or cease to display the representation of wall 534 when the partially completed model is rotated by the user input (e.g., from the orientation shown in Figure 5K to the orientation shown in Figure 5L) because the representation of wall 534 would occlude representations of physical objects that have been detected in the interior of room 520, such as representation 560” of TV 560, and representation 548” of cabinet 548.
  • rotating the partially completed three-dimensional model includes: in accordance with a determination that, the respective rotation of the partially completed three-dimensional model (e.g., the respective rotation is caused by the movement of the camera that changes the viewpoint of the user, and/or caused by user input) to be executed by the partially completed three-dimensional model would not cause the graphical representation of the first structural element to occlude the view of the respective graphical representations of the one or more objects in the preview of the three-dimensional model (e.g., the representation of a wall, floor, doorway, door, window, or ceiling would not block the view of one or more interior objects in the model from the current viewpoint of the user), displaying the graphical representation of the first structural element while displaying the respective representations of the one or more objects in the preview of the three-dimensional model when executing the respective rotation of the partially completed three-dimensional model.
  • the respective rotation of the partially completed three-dimensional model e.g., the respective rotation is caused by the movement of the camera that changes the viewpoint of the user, and/or caused by user input
  • ceasing to display the graphical representation of the first structural element while displaying the respective representations of the one or more objects in the preview of the three-dimensional model when executing the respective rotation of the partially completed three-dimensional model includes (690), replacing display of the graphical representation of the first structural element with display of a first visual indication at a location of the graphical representation of the first structural element, wherein the first visual indication causes less visual occlusion of the respective graphical representations of the one or more objects in the preview of the three-dimensional model during the respective rotation of the partially completed three-dimensional model, as compared to an amount of visual occlusion that would have been caused by the graphical representation of the first structural element.
  • an indication of an outline or top edge of the representation 530” of wall 530 remains displayed after device 100 ceases to display the representation 530” of wall 530, if the partially completed model were rotated to this orientation shown in Figure 5S in accordance with a swipe input directed to the partially completed model, while the camera view 524 showed a different portion of the physical environment than that shown in Figure 5S in accordance with a current viewpoint of the user.
  • the first visual indication is a more translucent version of the graphical representation of the first structural element through which the representations of the interior objects can be visible to the user from the current viewpoint of the user.
  • the first visual indication is an outline of the graphical representation of the first structural element without a fill material, or with a more transparent fill material of the graphical representation of the first structural element.
  • Replacing display of the graphical representation of the first structural element with display of a first visual indication at a location of the graphical representation of the first structural element, wherein the first visual indication causes less visual occlusion of the respective graphical representations of the one or more objects in the preview of the three-dimensional model during the respective rotation of the partially completed three-dimensional model, as compared to an amount of visual occlusion that would have been caused by the graphical representation of the first structural element reduces the number of inputs needed to display an appropriate view of the three- dimensional model (e.g., the user does not need to perform additional user inputs to adjust an opacity, or to cease to display, the first structural element, if the first structural element occludes one or more objects, and/or the user does not need to perform additional user inputs to adjust an orientation of the completed three-dimensional
  • ceasing to display the graphical representation of the first structural element while displaying the respective representations of the one or more objects in the preview of the three-dimensional model when executing the respective rotation of the partially completed three-dimensional model includes (692), in accordance with a determination that the first structural element include one or more openings (e.g., windows, doors, and/or entryways), ceasing to display respective graphical representations of the one or more openings in the first structural element (e.g., along with the graphical representation of the first structural element), while displaying the respective representations of the one or more objects in the preview of the three-dimensional model when executing the respective rotation of the partially completed three-dimensional model.
  • the first structural element include one or more openings (e.g., windows, doors, and/or entryways)
  • ceasing to display respective graphical representations of the one or more openings in the first structural element e.g., along with the graphical representation of the first structural element
  • graphical representations of the one or more openings are replaced with more transparent versions thereof or with outlines of the graphical representations, rather than completed removed from view.
  • device 100 ceases to display the representation 530” of wall 530 in response to a rotation of the partially completed model shown in Figure 5S in accordance with a swipe input directed to the partially completed model (e.g., while the camera view 524 showed a different portion of the physical environment than that shown in Figure 5S in accordance with a current viewpoint of the user)
  • representations of window 542 and entryway 544 are optionally removed from view as well.
  • the computer system displays (694) the preview of the three-dimensional model with virtual lighting (e.g., direction, position, and/or brightness of virtual lighting) that is generated based on detected (e.g., actual and/or physical) lighting (e.g., direction, position, brightness of detected lighting) in the physical environment.
  • virtual lighting e.g., direction, position, and/or brightness of virtual lighting
  • detected lighting e.g., actual and/or physical lighting
  • the partially completed model is shown with virtual lighting effects that are generated based on the detected lighting in room 520 during the scanning process.
  • the computer system displays virtual shadows, virtual highlights, and/or virtual hues on surfaces in the model that have shapes and directions that are generated based on the direction, intensity, and/or positions of physical lighting in the physical environment.
  • the computer system changes shapes, intensities, and/or directions of the virtual shadows, virtual highlights, and/or virtual hues on the surfaces in the model according to the characteristics of the physical lighting (e.g., location, intensity, color, and/or direction) in the physical environment.
  • the computer system changes the virtual lighting in the model (e.g., by changing the virtual shadows, virtual highlights, and/or virtual hues) on the surfaces in the model.
  • Displaying the preview of the three-dimensional model with virtual lighting that is generated based on detected lighting in the physical environment provides improved visual feedback to the user (e.g., improved visual feedback regarding the appearance of the three-dimensional model under the detected lighting).
  • the computer system displays (696) the preview of the three-dimensional model with preset virtual lighting (e.g., direction, position, color, color temperature, brightness and/or other optical property) that is different from (e.g., independent of) detected (e.g., actual and/or physical) lighting (e.g., direction, position, color, color temperature, brightness and/or other optical properties) in the physical environment.
  • virtual lighting e.g., direction, position, color, color temperature, brightness and/or other optical property
  • the computer system displays virtual shadows, virtual highlights, and/or virtual hues on surfaces in the model that have shapes and directions that are generated based on the direction, intensity, and/or positions of a predetermined virtual light source that is independent of physical lighting in the physical environment.
  • the computer system optionally maintains the shapes, intensities, and/or directions of the virtual shadows, virtual highlights, and/or virtual hues on the surfaces in the model according to the characteristics of the predetermined virtual light source.
  • the computer system in accordance with changes in the predetermined virtual lighting, changes the virtual lighting in the model (e.g., by changing the virtual shadows, virtual highlights, and/or virtual hues) on the surfaces in the model. Displaying the preview of the three-dimensional model with preset virtual lighting that is different from detected lighting in the physical environment, provides improved visual feedback to the user (e.g., improved visual feedback regarding the appearance of the three-dimensional model under different lighting).
  • the computer system in response to detecting the first movement of the one or more cameras: the computer system updates (698) the representation of the field of view in the first user interface in accordance with the first movement of the one or more cameras, including augmenting the representation of the field of view with respective graphical objects that correspond to the additional information that is added to the partially completed three- dimensional model.
  • camera view 524 is continually updated with new graphical objects corresponding to newly detected objects (e.g., graphical object 592 corresponding to TV 560, and graphical object 598 corresponding to floor lamp 556), while representations of newly detected objects (e.g., representation 560” for TV 560 and representation 556” for floor lamp 556) are added to the partially completed three- dimensional model in preview 568.
  • newly detected objects e.g., graphical object 592 corresponding to TV 560, and graphical object 598 corresponding to floor lamp 556
  • representations of newly detected objects e.g., representation 560” for TV 560 and representation 556” for floor lamp 556
  • depth information of more objects and/or surfaces in the physical environment are obtained by the one or more cameras, and the computer system gains more knowledge of the spatial and identity information of the structural elements (e.g., walls, ceiling, windows, doors, entryways, and/or floors) and non-structural elements (e.g., furniture, appliances, household items, home decor, smart home appliances, and/or people and pets) in the physical environment, graphical representations of these structural elements and the non- structural elements are added to the representation of the field of view as well as the partially completed three-dimensional model in the first user interface in a substantially synchronous manner.
  • structural elements e.g., walls, ceiling, windows, doors, entryways, and/or floors
  • non-structural elements e.g., furniture, appliances, household items, home decor, smart home appliances, and/or people and pets
  • Updating the representation of the field of view in the first user interface in accordance with the first movement of the one or more cameras, including augmenting the representation of the field of view with respective graphical objects that correspond to the additional information that is added to the partially completed three- dimensional model, provides improved visual feedback to the user (e.g., by adding the additional information to the partially completed three-dimensional model (e.g., as additional information is received from the one or more cameras)).
  • the user interfaces, user interface elements, physical environments and features and objects therein, feature types, annotations, representations of measurements, measurement types, and scale markers described above with reference to method 800 optionally have one or more of the characteristics of the user interfaces, user interface elements, physical environments and features and objects therein, feature types, annotations, representations of measurements, measurement types, and scale markers described herein with reference to other methods described herein (e.g., methods 700, 800, and 900). For brevity, these details are not repeated here.
  • Figures 7A-7D are flow diagrams of a method of displaying representations of objects identified in an environment during scanning and modeling of the environment, in accordance with some embodiments.
  • Method 700 is performed at a computer system (e.g., portable multifunction device 100 (Figure 1A), device 300 ( Figure 3A), or computer system 301 ( Figure 3B)) with a display device (e.g., a display, optionally touch-sensitive, a projector, a head-mounted display, a heads-up display, or the like, such as touch screen 112 ( Figure 1 A), display 340 ( Figure 3A), or display generation component(s) 304 (Figure 3B)), one or more cameras (e.g., optical sensor(s) 164 ( Figure 1A) or camera(s) 305 ( Figure 3B)), and optionally one or more depth sensing devices, such as depth sensors (e.g., one or more depth sensors such as time-of-flight sensor 220 ( Figure 2B)).
  • a display device e.g.,
  • the method 700 is a method of displaying representations of objects identified in an environment during scanning and modeling of the environment.
  • the computer system makes it easy for the user of the computer system to identify which object in the environment the computer system has identified while scanning, and to avoid cluttering a representation of a field of view of one or more cameras of the computer system with full size representation of each identified object.
  • This provides improved visual feedback to the user (e.g., improved visual feedback regarding the location and/or type of objects identified by the computer system), and minimizes the number of user inputs needed to display an appropriate representation of the field of view (e.g., the user does not need to constantly adjust a rotation and/or orientation of the field of view to view portions of the representation of the field of view that may be occluded or otherwise obstructed by a full size representation of each identified object).
  • the computer system displays (702), via the display generation component, a first user interface (e.g., a scan user interface that is displayed to show progress of an initial scan of a physical environment to build a three-dimensional model of the physical environment, a camera user interface, and/or a user interface that is displayed in response to a user’s request to perform a scan of a physical environment or to start an augmented reality session in a physical environment), wherein the first user interface includes a representation of a field of view of one or more cameras (e.g., images or video of a live feed from the camera(s), or a view of the physical environment through a transparent or semitransparent display), the representation of the field of view including a respective view of a physical environment that corresponds to a current viewpoint of a user in the physical environment (e.g., the current viewpoint of the user corresponds to a direction, position and/or vantage point from which the physical environment is being viewed by the user), and
  • the first user interface e.g., a scan user
  • device 100 displays user interface 522 which includes camera view 524 capturing a first portion of room 520.
  • the computer system displays (706), at a first time (e.g., immediately after the first object is detected, and/or before the first object is recognized as an instance of a particular object type), a first representation of the first object at a position in the representation of the field of view that corresponds to a location of the first object in the physical environment, wherein one or more spatial properties (e.g., size, length, height, and/or thickness) of the
  • device 100 while displaying user interface 522 including camera view 524, device 100 detects cabinet 548 in the portion of room 520 that is currently in camera view 524; and in response to detecting cabinet 548, device 100 displays graphical object 580 at a location of cabinet 548 in camera view 524, wherein graphical object 580 is a bounding box with spatial properties that have values corresponding to the spatial dimensions of cabinet 548 in room 520.
  • the computer system replaces (708) display of the first representation of the first object with display of a second representation of the first object (e.g., a label, an icon, a token, and/or a short textual description) in the representation of the field of view, wherein the second representation of the first object does not spatially indicate (e.g., does not use spatial properties of the first representation of the first object to indicate) the one or more spatial dimensions (e.g., size, length, height, and/or thickness) of the first object in the physical environment.
  • a second representation of the first object e.g., a label, an icon, a token, and/or a short textual description
  • the second representation of the first object does not spatially indicate (e.g., does not use spatial properties of the first representation of the first object to indicate) the one or more spatial dimensions (e.g., size, length, height, and/or thickness) of the first object in the physical environment.
  • device 100 after detecting cabinet 548 in room 520, device 100 further identifies cabinet 548 and displays representation 596 that identifies cabinet 548 but does not spatially indicate the spatial dimensions of cabinet 548.
  • representation 596 is displayed at the location of cabinet 548 in camera view 524, device 100 ceases to display graphical object 580 at the location of cabinet 548 in the camera view.
  • the second representation of the first object is an icon that graphically and/or schematically specifying the object type of the first object.
  • the second representation of the first object is a textual label specifying the object type, name, and/or model number of the first object.
  • the first object is an object (e.g., a non-structural element, such as a lamp, furniture, and/or smart home devices) that is distinct from any of the structural elements (e.g., walls, ceiling, floor, door, window) in the physical environment.
  • the second representation of the first object occupies a much smaller region in the representation of the field of view than the first object and the first representation of the first object.
  • the second representation of the first object creates less visual clutter in the field of view of the one or more cameras as compared to the first representation of the first object.
  • the second representation of the first object indicates one or more spatial dimensions of the first object using non- spatial properties of the representation, such as textual content (e.g., Table-medium, or Bed- King), numerical values (e.g., 32x22x50 inches, or 20cm dia.), descriptors (e.g., largest, smallest, medium, large, and/or XXL) that do not spatially indicate the one or more spatial dimensions of the first object in the physical environment.
  • textual content e.g., Table-medium, or Bed- King
  • numerical values e.g., 32x22x50 inches, or 20cm dia.
  • descriptors e.g., largest, smallest, medium, large, and/or XXL
  • the first representation of the first object includes (710) an outline that is displayed around a boundary of the first object in the representation of the field of view of the one or more cameras.
  • graphical object 580 that is displayed at the location of cabinet 548 that spatially represents the spatial dimensions of cabinet 548 is a three-dimensional bounding box that outlines the boundaries of cabinet 548.
  • the first representation of the first object includes a virtual outline of the first object, a two-dimensional or three-dimensional bounding box of the first object, and/or a translucent mask of the first object overlaid on a pass-through view of the first object in the representation of the field of view of the cameras (e.g., camera view, and/or a view through a transparent or semi-transparent display generation component).
  • Displaying, at the first time, the first representation of the first object that includes an outline that is displayed around a boundary of the first object in the representation of the field of view of the one or more cameras provides improved visual feedback to the user (e.g., improved visual feedback regarding the spatial dimensions of the first object).
  • the first time and the second time are (712) different time points during a scan of the physical environment that obtains depth information in the physical environment using the one or more cameras (e.g., during the scan of the physical environment, the computer system automatically updates the representation of the field of view based on movement of the one or more cameras that change the current viewpoint of the user and augments the representation of the field of view with the representations of objects that have spatial characteristics that spatially indicate the spatial dimensions of the objects; and then replacing those representations with non-spatial representations (e.g., icons, labels, or other types of non-spatial representations) as objects are gradually identified), and in the method 700: prior to the first time (e.g., after the scan has been started, and when the first object first enters the field of view of the one or more cameras): the computer system displays a first portion, less than all, of the first representation of the first object (e.g., a partial outline, and/or a partial mask or overlay that are displayed on the pass-
  • the computer system displays a
  • graphical object 580 that is initially displayed at the location of cabinet 548 includes segments 580-2 and 580-3 (e.g., as shown in Figure 5F) that extend partially along the edges of cabinet 548; and as the scan continues, graphical object 580 is updated to include segments 580-2 and 580-3 (e.g., as shown in Figure 5G) that extend along the entirety of the two front edges of cabinet 548.
  • replacing display of the first representation of the first object with display of the second representation of the first object in the representation of the field of view includes (714) fading out (e.g., reducing visual prominence, increasing translucency, and/or reducing line thickness) the first representation of the first object after the second representation of the first object is displayed, wherein the second representation of the first object identifies the first object (e.g., identifies the object type, name, model no., and/or product serial number of the first object in the representation of the field of view).
  • graphical object 580 is displayed at the location of cabinet 548 to spatially indicate the spatial dimensions of cabinet 548 (e.g., as shown in Figure 5H); after cabinet 548 is identified by device 100, graphical object 580 starts to fade out (e.g., as shown in Figure 51) while representation 596 that does not spatially indicate the spatial dimensions of cabinet 548 is displayed at the location of cabinet 548; and later, graphical object 580 ceases to be displayed while representation 596 remains displayed at the location of cabinet 548 (e.g., as shown in Figure 5 J).
  • the first representation of the first object and the second representation of the second object are concurrently displayed for a brief period of time before the first representation of the first object is removed from the representation of the field of view in the first user interface. Fading out the first representation of the first object after the second representation of the first object is displayed, wherein the second representation of the first object identifies the first object, reduces the number of inputs needed to display an appropriate representation of the first object (e.g., the user does not need to perform additional user inputs to cease displaying the first representation of the first object).
  • the first representation of the first object is (716) displayed while the representation of the field of view in the first user interface includes a first view of the physical environment that corresponds to a first viewpoint of the user in the physical environment
  • the second representation of the first object is displayed while the representation of the field of view in the first user interface includes a second view of the physical environment that corresponds to a second viewpoint of the user in the physical environment
  • the first object is identified based, at least partially, on depth information obtained during movement of the one or more cameras that changed the current viewpoint of the user from the first viewpoint to the second viewpoint.
  • graphical object 580 is first displayed at the location of cabinet 548 before cabinet 548 is identified and while the camera view 524 includes a first portion of room 520 corresponding to a first viewpoint (e.g., as shown in Figure 5H); and after the viewpoint changes and a second portion of room 520 is included in camera view 524 (e.g., as shown in Figure 51), more image and depth information is captured from the second portion of room 520 and cabinet 548 is identified.
  • representation 596 is displayed at the location of cabinet 548, as shown in Figure 51.
  • scan of the physical environment is not instant, and detection and identification of objects within a current field of view of the one or more cameras may take a finite amount of time within which the first representation of the first object is gradually completed over time and eventually replaced by the second representation of the first object.
  • the first representation of the first object is gradually completed over time and replaced by the second representation of the first object while the field of view is continuously updated with the movement of the one or more cameras in the physical environment. Identifying the first object, at least partially, based on depth information obtained during movement of the one or more cameras, provides improved visual feedback to the user (e.g., improved visual feedback identifying the first object, as the one or more cameras are moved).
  • the second representation of the first object indicates (718) an object type (e.g., the type of furniture, the type of art piece, the style of furniture, the type of appliance, the type of smart home device, a model number of the first object, the name of the first object, and/or the type of physical object) of the first object.
  • object type e.g., the type of furniture, the type of art piece, the style of furniture, the type of appliance, the type of smart home device, a model number of the first object, the name of the first object, and/or the type of physical object
  • representation 596 displayed at the location of cabinet 548 indicates an object type of cabinet using text “cab.”
  • representation 612 that is displayed at the location of TV 592 indicates the object type of TV 592 using text “TV.”
  • the representations optionally include graphics, icons, serial numbers, model number, names and/or text descriptions to indicate the object type of the identified objects. Replacing display of the first representation of the first object with display of a second representation of the first object that indicates an object type of the first object, provides improved visual feedback to the user (e.g., improved visual feedback regarding the object type of the first object).
  • the second representation of the first object includes (720) an icon or image that does not spatially indicate the one or more spatial dimensions (e.g., does not spatially indicate any of the spatial dimensions, or does not spatially indicate at least one of the spatial dimensions) of the first object.
  • the icon or image is a schematic representation that identifies the object type of the first object but does not spatially indicate the spatial dimensions of the first object.
  • representation 596 displayed at the location of cabinet 548 indicates an object type of cabinet using text “cab.”
  • the representations optionally include graphics or icons that includes a schematic or stylized image of the identified object type of the identified objects (e.g., a stylized image of a cabinet, a box, or another simplified graphics that convey the object type of cabinet 548).
  • Replacing display of the first representation of the first object with display of a second representation of the first object that includes an icon or image that does not spatially indicate the one or more spatial dimensions of the first object provides improved visual feedback to the user (e.g., improved visual feedback, conveyed through the icon or image, regarding information other than spatial dimensions (e.g., an object type)).
  • the second representation of the first object is (722) smaller than the first object (e.g., a footprint of the second representation of the first object is smaller in the first user interface than the footprint of the first object in the first user interface in all dimensions or is smaller in at least one dimension and is no larger than the first object in any other dimension).
  • representation 596 that indicates the object type of cabinet 548 is smaller than cabinet 548 and smaller than its corresponding spatial representation, graphical object 580.
  • Replacing display of the first representation of the first object with display of a second representation of the first object that is smaller than the first object provides improved visual feedback to the user (e.g., improved visual feedback, that occupies less virtual space than a full-sized representation of the first object, regarding the location and/or object type of the first object).
  • the computer system while displaying the first user interface including the representation of the field of view of the one or more cameras and including the second representation of the first object, the computer system detects (724) first movement of the one or more cameras that changes the current viewpoint of the user from a first viewpoint to a second viewpoint.
  • the computer system moves the second representation of the first object from a first position to a second position relative to the representation of the field of view, wherein the first position relative to the field of view and the second position relative to the field of view correspond to substantially the same location in the physical environment (e.g., the location of the first object, and/or the surface or plane that supports the first object).
  • the second representation of the first object is optionally turned to face toward the current viewpoint, as the current viewpoint is changed due to the movement of the one or more cameras in the physical environment.
  • representation 596 that identify the object type of cabinet 548 is displayed at a location of cabinet 548 in camera view 524, and moves with the cabinet 548 relative to the camera view 524 while the representation of the cabinet 548 moves in accordance with the movement of viewpoint of the user (e.g., representation 596 is moved from the left side of the camera view 524 in Figure 5P to right side of the camera view 524 in Figure 5Q, and then to the middle of camera view 524 in Figure 5R, as the cameras move in the physical environment and change the viewpoint of the user).
  • viewpoint of the user e.g., representation 596 is moved from the left side of the camera view 524 in Figure 5P to right side of the camera view 524 in Figure 5Q, and then to the middle of camera view 524 in Figure 5R, as the cameras move in the physical environment and change the viewpoint of the user.
  • the computer system displays (726), at a third time (e.g., immediately after the second object is detected, and/or before the second object is recognized as an instance of a particular object type) (e.g., the third time is the same as the first time, same as the second time, or different from the first and second time), a third representation of the second object at a position in the representation of the field of view that corresponds to a location of the second object in the physical environment, wherein one or more spatial properties (e.g., size, length, height, and/or thickness) of the third representation of the second object have values that correspond to one or more spatial dimensions (e.g., size, length, height, and/or thickness) of the second object in the physical environment (e.g., the third representation of the third representation of the
  • the computer system replaces display of the third representation of the second object with display of a fourth representation of the second object in the representation of the field of view, wherein the fourth representation of the first object does not spatially indicate (e.g., does not use spatial properties of the fourth representation of the second object to indicate) the one or more spatial dimensions (e.g., size, length, height, and/or thickness) of the second object in the physical environment.
  • the fourth representation of the first object does not spatially indicate (e.g., does not use spatial properties of the fourth representation of the second object to indicate) the one or more spatial dimensions (e.g., size, length, height, and/or thickness) of the second object in the physical environment.
  • device 100 detects stool 546 in the field of view of the cameras and displays graphical object 590 at a location of stool 546 to spatially indicate spatial dimensions of stool 546 (e.g., as shown in Figures 5 J and 5K); and later stool 546 is identified by device 100, and device 100 displays representation 614 at the location of stool 546 to replace graphical object 590 (as shown in Figures 5L-5M).
  • the fourth representation of the second object is an icon that graphically and/or schematically specifying the object type of the second object.
  • the fourth representation of the second object is a textual label specifying the object type, name, and/or model number of the second object.
  • the second object is an object (e.g., a non- structural element, such as a lamp, furniture, and/or smart home devices) that is distinct from any of the structural elements (e.g., walls, ceiling, floor, door, and/or window) in the physical environment.
  • the fourth representation of the second object occupies a much smaller region in the representation of the field of view than the second object and the third representation of the second object.
  • the fourth representation of the second object creates less visual clutter in the field of view of the one or more cameras as compared to the third representation of the second object.
  • the fourth representation of the second object indicates one or more spatial dimensions of the second object using non-spatial properties of the representation, such as textual content (e.g., Table-medium, or Bed-King), numerical values (e.g., 32x22x50 inches, or 20cm dia.), descriptors (e.g., largest, smallest, medium, large, and/or XXL) that do not spatially indicate the one or more spatial dimensions of the second object in the physical environment.
  • the first representation of the first object and the third representation of the second object are concurrently displayed in the first user interface.
  • the first representation of the first object and the fourth representation of the second object are concurrently displayed in the first user interface.
  • the second representation of the first object and the third representation of the second object are concurrently displayed in the first user interface. In some embodiments, the second representation of the first object and the fourth representation of the second object are concurrently displayed in the first user interface. In some embodiments, at a given moment in time, the representation of the field of view of the cameras is optionally concurrently overlaid with detailed graphical objects that spatially indicate spatial dimensions of one or more detected objects and schematical representations that do not spatially indicate spatial dimensions of one or more identified objects.
  • the representation of the field of view of the cameras is overlaid with one or more first detailed graphical object that spatially indicates spatial dimensions of one or more detected objects and one or more first schematic objects that do not spatially indicate spatial dimensions of one or more identified objects, where at least one of the first detailed graphical objects was initially displayed earlier than at least one of the first schematic objects, and/or wherein at least one of the first detailed graphical objects was initially displayed later than at least one of the first schematic objects.
  • the second representation of the first object and the fourth representation of the second object have the same appearance (e.g., the same icon or label is used by the computer system) if the first object and the second object are of the same object type (e.g., are different instances of the same object type). For example, if there is another cabinet in room 520, after both cabinets in room 520 are detected and identified, a representation that has the same appearance as representation 596 would be displayed at the location of the second cabinet in camera view 524.
  • representation 614 displayed in Figure 5M would have the same appearance as representation 596, because both would be representing cabinets and indicating the object type of the detected objects as “cabinet.”
  • the second representation of the first object and the fourth representation of the second object are concurrently displayed in the representation of the field of view (e.g., both objects are identified and both objects are in the field of view at the same time). Displaying the second representation of the first object and the fourth representation of the second object with the same appearance, if the first object and the second object are of the same object type, provides improved visual feedback (e.g., objects of the same type are displayed with the same appearance, making it easier to identify object of that object type).
  • the second representation of the first object and the fourth representation of the second object have (730) different appearances (e.g., different icons or labels are used by the computer system) if the first object and the second object are of different object types (e.g., are not different instances of the same object type).
  • representation 596 indicating the object type of cabinet 548 and representation 614 indicating the object type of stool 546 have different appearances in camera view 524.
  • the second representation of the first object and the fourth representation of the second object are concurrently displayed in the representation of the field of view (e.g., both objects are identified and both objects are in the field of view at the same time). Displaying the second representation of the first object and the fourth representation of the second object with the different appearance, if the first object and the second object are of different object types, provides improved visual feedback (e.g., improved visual feedback regarding the object type of the first object and the second object).
  • the computer system while displaying the first user interface including the representation of the field of view of the one or more cameras and including the fourth representation of the second object, the computer system detects (732) second movement of the one or more cameras that changes the current viewpoint of the user from a third viewpoint to a fourth viewpoint (e.g., the third viewpoint is same as the first viewpoint and the fourth viewpoint is the same as the second viewpoint, or the third viewpoint is different from the first viewpoint and the fourth viewpoint is different from the second viewpoint).
  • a third viewpoint is same as the first viewpoint and the fourth viewpoint is the same as the second viewpoint, or the third viewpoint is different from the first viewpoint and the fourth viewpoint is different from the second viewpoint.
  • the computer system moves the fourth representation of the second object from a third position to a fourth position relative to the representation of the field of view, wherein the third position relative to the field of view and the fourth position relative to the field of view correspond to substantially the same location in the physical environment (e.g., the location of the second object, and/or the surface or plane that supports the second object).
  • the fourth representation of the second object is optionally turned to face toward the current viewpoint, as the current viewpoint is changed due to the movement of the one or more cameras in the physical environment.
  • representation 614 that identify the object type of stool 546 is displayed at a location of stool 546 in camera view 524, and moves with the stool 546 relative to the camera view 524 while the representation of the stool 546 moves in accordance with the movement of viewpoint of the user (e.g., representation 614 is moved from the left side of the camera view 524 in Figure 5P to right side of the camera view 524 in Figure 5Q, and then to the bottom right of camera view 524 in Figure 5R, as the cameras move in the physical environment and change the viewpoint of the user).
  • viewpoint of the user e.g., representation 614 is moved from the left side of the camera view 524 in Figure 5P to right side of the camera view 524 in Figure 5Q, and then to the bottom right of camera view 524 in Figure 5R, as the cameras move in the physical environment and change the viewpoint of the user.
  • the second representation of the first object changes (734) its orientation during movement of the one or more cameras that changes the current viewpoint of the user (e.g., the second representation of the first object turns and/or translates relative to the representation of the field of view during the movement of the cameras that cause the pass-through view of the first object to shift in the representation of the field of view, so that the second representation of the first object is anchored to the pass-through view of the first object and continues to face toward the current viewpoint as the current viewpoint of the user changes in the physical environment).
  • the second representation of the first object changes (734) its orientation during movement of the one or more cameras that changes the current viewpoint of the user (e.g., the second representation of the first object turns and/or translates relative to the representation of the field of view during the movement of the cameras that cause the pass-through view of the first object to shift in the representation of the field of view, so that the second representation of the first object is anchored to the pass-through view of the first object and continues to face toward the current viewpoint as the current viewpoint of the user changes in
  • representation 614 that identify the object type of stool 546 is displayed at a location of stool 546 in camera view 524, and moves with the stool 546 relative to the camera view 524 while the representation of the stool 546 moves in accordance with the movement of viewpoint of the user (e.g., representation 614 is moved from the left side of the camera view 524 in Figure 5P to right side of the camera view 524 in Figure 5Q, and then to the bottom right of camera view 524 in Figure 5R, as the cameras move in the physical environment and change the viewpoint of the user), and the orientation of representation 614 is continuously updated such that it continues to face toward the viewpoint of the user (as shown in Figures 5P-5Q).
  • Changing the orientation of the second representation of the first object during movement of the one or more cameras that changes the current viewpoint of the user reduces the number of inputs needed to display the second representation of the first object with the appropriate orientation (e.g., the user does not need to perform additional user inputs to adjust the orientation of the second representation of the first object each time the user’s current viewpoint changes (e.g., due to movement of the user and/or the one or more cameras)).
  • the first user interface concurrently includes (736) the representation of the field of view and respective representations of a plurality of objects that are detected in the physical environment, the respective representations of the plurality of objects do not spatially indicate respective physical dimensions of the plurality of objects, and the respective representations of the plurality of objects change their respective orientations to face toward the current viewpoint of the user during movement of the one or more cameras that changes the current viewpoint of the user.
  • representation 614 that identify the object type of stool 546 is displayed at a location of stool 546 in camera view 524, and moves with the stool 546 relative to the camera view 524 while the representation of the stool 546 moves in accordance with the movement of viewpoint of the user (e.g., representation 614 is moved from the left side of the camera view 524 in Figure 5P to right side of the camera view 524 in Figure 5Q, and then to the bottom right of camera view 524 in Figure 5R, as the cameras move in the physical environment and change the viewpoint of the user), and the orientation of representation 614 is continuously updated such that it continues to face toward the viewpoint of the user (as shown in Figures 5P-5Q).
  • representation 596 that identify the object type of cabinet 548 is displayed at a location of cabinet 548 in camera view 524, and moves with the cabinet 548 relative to the camera view 524 while the representation of the cabinet 548 moves in accordance with the movement of viewpoint of the user (e.g., representation 596 is moved from the left side of the camera view 524 in Figure 5P to right side of the camera view 524 in Figure 5Q, and then to the middle of camera view 524 in Figure 5R, as the cameras move in the physical environment and change the viewpoint of the user).
  • both representation 614 and representation 596 turns to face toward the current viewpoint of the user, as the cameras move to change the viewpoint of the user.
  • the respective representations of the objects in the field of view rotate and translate by different amounts due to the movement of the current viewpoint, so that the respective representations of the objects are respectively anchored to the pass-through view of their corresponding objects and continue to face toward the current viewpoint as the current viewpoint of the user changes in the physical environment.
  • Changing the respective orientations of the plurality of objects during movement of the one or more cameras that changes the current viewpoint of the user reduces the number of inputs needed to display the representation of the plurality of objects with the appropriate orientations (e.g., the user does not need to perform additional user inputs to adjust the orientation of each representation of the respective representations of the plurality of objects each time the user’s current viewpoint changes (e.g., due to movement of the user and/or the one or more cameras)).
  • the user interfaces, user interface elements, physical environments and features and objects therein, feature types, guides, animations, and annotations described above with reference to method 700 optionally have one or more of the characteristics of the user interfaces, user interface elements, physical environments and features and objects therein, feature types, guides, animations, and annotations described herein with reference to other methods described herein (e.g., methods 650, 800, and 900). For brevity, these details are not repeated here.
  • FIGS 8A-8D are flow diagrams of a method 800 of providing guidance indicating location of a missed portion of a presumably completed portion of an environment during scanning and modeling of the environment, in accordance with some embodiments.
  • Method 800 is performed at a computer system (e.g., portable multifunction device 100 (Figure 1A), device 300 ( Figure 3A), or computer system 301 ( Figure 3B)) with a display device (e.g., a display, optionally touch-sensitive, a projector, a head-mounted display, a heads-up display, or the like, such as touch screen 112 ( Figure 1 A), display 340 ( Figure 3 A), or display generation component(s) 304 (Figure 3B)), one or more cameras (e.g., optical sensor(s) 164 ( Figure 1 A) or camera(s) 305 ( Figure 3B)), and optionally one or more depth sensing devices, such as depth sensors (e.g., one or more depth sensors such as time-of-flight sensor 220 ( Figure 2B)
  • the method 800 is a method of providing guidance indicating location of a missed portion of a presumably completed portion of an environment during scanning and modeling of the environment.
  • the computer system improves the efficiency of the scan. For example, the computer system can alert the user as soon as the missed portion is detected, so the user can scan the missed portion. This prevents the user from changing locations (e.g., moving away from the missed portion to scan further portions of the environment), and later having to return to the original location to scan the missed portion.
  • the computer system displays (804), via the display generation component, a first user interface (e.g., a scan user interface that is displayed to show progress of an initial scan of a physical environment to build a three-dimensional model of the physical environment, a camera user interface, and/or a user interface that is displayed in response to a user’s request to perform a scan of a physical environment or to start an augmented reality session in a physical environment), wherein the first user interface includes a representation of a field of view of one or more cameras (e.g., images or video of a live feed from the camera(s), or a view of the physical environment through a transparent or semitransparent display), the representation
  • the first user interface further includes a preview of a three-dimensional model of the physical environment that is being generated based on the depth information captured by the one or more cameras.
  • user interface 522 is displayed and includes camera view 524 of room 520.
  • the computer system detects (806) movement of the one or more cameras in the physical environment, including detecting first movement that changes the current viewpoint of the user from a first viewpoint in the physical environment to a second viewpoint (e.g., the first movement includes translation from a first location to a second location distinct from the first location, away from the first location, and/or not on a looped path that starts from or passes the first location; and/or the first movement includes panning left and/or right at a fixed location) in the physical environment (e.g., including back and forth movement between the first location and the second location, including a single pass movement between the first location and the second location).
  • first movement includes translation from a first location to a second location distinct from the first location, away from the first location, and/or not on a looped path that starts from or passes the first location; and/or the first movement includes panning left and/or right at a fixed location
  • the physical environment e.g., including back and forth movement between the first location and the second location, including a
  • the cameras moved in room 520 causing the viewpoint to change from a first viewpoint to a second viewpoint.
  • the first movement is not required for triggering display of the first visual indication (described below) that prompts the user to rescan a missed portion of the physical environment between two portions of the physical environment that have been scanned (e.g., camera moved passed the portion but did not obtain sufficient depth information for that portion of the environment).
  • the computer system in response to detecting the movement of the one or more cameras in the physical environment that includes the first movement that changes the current viewpoint of the user from the first viewpoint in the physical environment to the second viewpoint in the physical environment, in accordance with a determination that there is a respective portion of the physical environment that has not been scanned (e.g., depth information is not sufficiently obtained by the cameras for the respective portion of the physical environment) that is between a first portion of the physical environment that has been scanned and a second portion of the physical environment that has been scanned (808) (e.g., the respective portion of the physical environment is passed by the cameras and presumably scanned by the user from the current viewpoint of the user, but the obtained depth information is not sufficient to generate a model of the respective portion of the environment due to occlusion by another object or structural elements in the first portion, the second portion, and/or the respective portion of the physical environment from the current viewpoint(s) of the user during the scan), the computer system displays (810), in the first user interface (e.g.,
  • device 100 determines that a portion of wall 530 that is visually occluded by cabinet 548 (e.g., as shown in Figures 5H and 51) has not been scanned during the scanning of a first portion of room 520 (e.g., as shown in Figures 5E-5H) and the scanning of the second portion of room 520 (e.g., as shown in Figure I); and in response, device 100 displays object 604 to indicate the location of the missed portion of wall 530 and the region in front of it, so that the user can locate and scan the missed portion of room 520.
  • cabinet 548 e.g., as shown in Figures 5H and 51
  • device 100 displays object 604 to indicate the location of the missed portion of wall 530 and the region in front of it, so that the user can locate and scan the missed portion of room 520.
  • the determination that there is a respective portion of the physical environment that has not been scanned that is between a first portion of the physical environment that has been scanned and a second portion of the physical environment that has been scanned includes (812): a determination that first depth information that has been obtained during the first movement of the one or more cameras meets first criteria with respect to the first portion of the physical environment (e.g., a portion of the environment in the field of view corresponding to the first viewpoint) and the second portion of the physical environment (e.g., a portion of the environment corresponding to the second viewpoint).
  • the first criteria include requirements for the amount and accuracy of depth information obtained in order to generate a three-dimensional model of a scanned portion of the physical environment.
  • the scan for that portion of the physical environment is considered completed.
  • the determination that there is a respective portion of the physical environment that has not been scanned that is between a first portion of the physical environment that has been scanned and a second portion of the physical environment that has been scanned further includes (812): a determination that the first depth information indicates existence of a third portion of the physical environment between the first portion and the second portion of the physical environment.
  • the existence of a gap in the scan is predicted when there is sufficient data to generate a first portion of the model for the first portion of the physical environment and a second portion of the model for the second portion of the physical environment, but the first portion and the second portion of the model cannot be joined correctly, smoothly, and/or logically based on the scanned depth data.
  • the existence of a gap in the scan is predicted in accordance with a determination that the third portion of the physical environment entered into the field of view after the first portion of the physical environment had entered the field of view and the that the third portion of the physical environment had exited the field of view before the second portion of the physical environment exited the field of view.
  • the existence of a gap in the scan between the first portion of the physical environment and the second portion of the physical environment is determined in accordance with a determination that the cameras moved passed the first portion of the physical environment, followed by the third portion of the physical environment, and then followed by the second portion of the physical environment.
  • the determination that there is a respective portion of the physical environment that has not been scanned that is between a first portion of the physical environment that has been scanned and a second portion of the physical environment that has been scanned further includes (812): a determination that depth information that has been obtained during the scan (e.g., during the first movement of the one or more cameras, or during all prior movement of the cameras during the scan) does not meet the first criteria with respect to the third portion of the physical environment.
  • the existence of a gap in the scan is predicted when there is insufficient data to generate a third portion of the model for the third portion of the physical environment that can join the first portion and the second portion of the model correctly, smoothly, and/or logically.
  • the existence of the gap in the scan is predicted when there is insufficient data to generate the third portion of the model for the third portion of the physical environment to a preset degree of accuracy, particularly when some depth data for the third portion of the physical environment has been obtained when the one or more cameras moved past the third portion of the physical environment during the scan (e.g., during the first movement of the one or more cameras, or during all prior movements of the cameras during the scan).
  • device first scans a first portion of room 520 (e.g., as shown in Figure 5H) and generates a model for the first portion of room 520; after the scan of the first portion of room 520 is completed and the user moves the cameras to scan a second portion of room 520 (e.g., as shown in Figure 51); after at least some portion of the second portion of room 520 has been modeled, device 100 determines that there is a missing portion between the first portion of room 520 and the second portion of room 520 (e.g., because the two portions of room 520 as modeled by device 100 cannot be joined satisfactorily); and as a result of these determination, device 100 displays objects 604 and object 606 to indicate the location of the missed portion of room 520 in the already scanned first portion of room 520. When the objects 604 and 606 are displayed, as shown in Figure 5J, the missed portion of room 520 is not visible in the camera view 524.
  • the first visual indication overlaying the representation of the field of view of the one or more cameras includes (814) an graphical objects (e.g., an arrow, a pointer, or another analogous user interface object) that points out a direction of the location of the respective portion of the physical environment relative to other objects in the field of view of the one or more cameras (e.g., the respective portion of the physical environment is not visible in the field of view, hidden behind other objects in the field of view).
  • the object 604 that points the direction of the missed portion of room 520 is an arrow that points toward the portion of wall 530 that is visually obscured by cabinet 548 in camera view 524.
  • the first visual indication includes an arrow that points toward the location of the respective portion of the physical environment. In some embodiments, the first visual indication includes an arrow that points toward the location of the respective portion of the physical environment. In some embodiments, the first visual indication is a pointer (e.g., a finger, a moving triangle, or another analogous user interface object) that points toward the location of the respective portion of the physical environment.
  • a pointer e.g., a finger, a moving triangle, or another analogous user interface object
  • the computer system in response to detecting the movement of the one or more cameras in the physical environment that includes the first movement that changes the current viewpoint of the user from the first viewpoint in the physical environment to the second viewpoint in the physical environment, in accordance with the determination that there is a respective portion of the physical environment that has not been scanned that is between a first portion of the physical environment that has been scanned and a second portion of the physical environment that has been scanned, the computer system displays (816), in the first user interface (e.g., while the scan is ongoing, not completed), a second visual indication in a preview of a three-dimensional model of the physical environment (e.g., next to a model of a detected wall, a detected object, and/or a detected doorway in the preview of the three-dimensional model of the physical environment that is concurrently displayed with the representation of the field of view in the first user interface), wherein the second visual indication indicates the location of the respective portion of the physical environment in the preview of the three-dimensional model (e.g., the second visual
  • device 100 in addition to displaying object 604 in camera view 524 in response to determining that there is a missed portion of room in the already scanned first portion of room 520, device 100 also displays object 608 in the partially completed model of room 520 in preview 568, to indicate the location of the missed portion of room 520 that needs to be scanned.
  • Displaying a second visual indication in a preview of a three- dimensional model of the physical environment wherein the second visual indication indicates the location of the respective portion of the physical environment in the preview of the three-dimensional model, in accordance with the determination that the respective portion of the physical environment that has not been scanned, provides improved visual feedback to the user (e.g., improved visual feedback regarding the location of the respective portion of the physical environment, and/or improved visual feedback that the respective portion of the physical environment has not been scanned).
  • the first visual indication and the second visual indication are (818) concurrently displayed in the first user interface.
  • object 604 and object 608 respectively indicate the location of the missed portion of room 520 in camera view 524 and in preview 568, where camera view 524 and preview 568 are both included in user interface 522.
  • displaying the first visual indication overlaying the representation of the field of view of the one or more cameras includes (820) animating the first visual indication with movements that are independent of movement of the field of view of the one or more cameras.
  • object 604 displayed in Figures 5J and 5Q are animated to move in a manner that points out the location of the missed portion of wall 530, while the camera view 524 is updated based on movement of the one or more cameras.
  • displaying the second visual indication overlaying the preview of the three-dimensional model of the physical environment includes animating the second visual indication with movements that are independent of movement of the partially completed three-dimensional model of the one or more cameras.
  • the animation of the first and/or second visual indication(s) draw attention of the user toward the visual indication(s) and the location of the respective portion of the physical environment in the representation of the field of view and/or in the preview of the three-dimensional model.
  • Displaying a first visual indication that indicates a location of a third portion of the physical environment in the field of view of one or more cameras, and that is animated with movement that are independent of movement of the field of view of the one or more cameras overlaying a representation of the field of view of the one or more cameras, in accordance with a determination that the respective portion of the physical environment has not been scanned, provides improved visual feedback to the user (e.g., improved visual feedback regarding the location of the respective portion of the physical environment, that draws the user’s attention with animated movement).
  • displaying the first visual indication overlaying the representation of the field of view of the one or more cameras includes (822) displaying the first visual indication at a respective position overlaying the representation of the field of view, wherein the respective position corresponds to a respective depth of the respective portion of the physical environment from the second viewpoint in the physical environment.
  • object 604 is displayed at a first depth relative to camera view 524 to indicate that the depth of the missed portion of wall 530 is the first depth in camera view 524 (e.g., as shown in Figure 5P); and object 604 is displayed at a second depth relative to camera view 524 to indicate that the depth of the missed portion of wall 530 is the second depth in camera view 524 (e.g., as shown in Figure 5Q), wherein the depth of the missed portion of wall 530 changed due to the movement of the one or more cameras.
  • the respective position of the first visual indication corresponds to a location in the physical environment that is substantially the same depth/di stance away from the user as the respective portion of the physical environment that needs to be rescanned.
  • Displaying the first visual indication overlaying the representation of the field of view of the one or more cameras, including displaying the first visual indication at a respective position overlaying the representation of the field of view, and that corresponds to a respective depth of the respective portion of the physical environment from the second viewpoint in the physical environment provides improved visual feedback to the user (e.g., improved visual feedback conveying depth information to the user).
  • the computer system scans (824) the first portion of the physical environment during a first period of time to obtain respective depth information corresponding to the first portion of the physical environment; and the computer system scans the second portion of the physical environment during a second period of time after the first period of time to obtain respective depth information corresponding to the second portion of the physical environment, wherein the first visual indication overlaying the representation of the field of view is displayed after scanning the first portion of the physical environment and scanning the second portion of the physical environment.
  • object 604 is displayed to indicate the missed (e.g., unscanned) portion of wall 530 in a first portion of room 520 (e.g., as shown in Figure 5 J), after the first portion of room 520 is scanned (e.g., as shown in Figure 5H) and after the second portion of room 520 is scanned (e.g., as shown in Figure 51) (e.g., at least the left portion of the wall 532 is scanned in Figure 51).
  • the first visual indication is displayed after the user has presumably finished scanning the first portion of the physical environment, the respective portion of the physical environment, and the second portion of the physical environment.
  • the first visual indication is displayed after the computer system determines that the user has finished scanning the respective portion of the physical environment and moved on to the next portion of the physical environment and requests to the user to rescan the respective portion of the physical environment, as opposed to prompting the user to keep going forward to scan a new, unscanned portion of the physical environment, or to return to an origin of the scan after scanning additional portions of the physical environment to complete a scan loop around the whole physical environment.
  • a first visual indication that indicates a location of a third portion of the physical environment in the field of view of one or more cameras, overlaying the representation of the field of view of the one or more cameras, in accordance with the determination that the respective portion of the physical environment that is between the first portion of the physical environment and the second portion of the physical environment has not been scanned, and after scanning the first portion of the physical environment and scanning the second portion of the physical environment, reduces the amount of time needed to accurately scan the physical environment and/or the amount of user movement needed to completely scan the physical environment (e.g., the computer system displays the first visual indication after the first and second portions of the physical environment are scanned, so the user is immediately alerted to re-scan the third portion of the physical environment (e.g., without proceeding with scanning new, unscanned portions of the physical environment, different from the first, second, and third portions of the physical environment, which would require the user to later return to an earlier position where the user scanned the first and second portions of the physical environment)).
  • displaying the first visual indication includes (826): displaying the first graphical object at a first position relative to the representation of the field of view, wherein the first position corresponds to a first spatial region at a first depth from a current viewpoint of the user in the physical environment; and forgoing display of a respective portion of the first graphical object in accordance with a determination that a respective portion of the first spatial region is behind a first physical feature (e.g., a wall, a corner of a wall, a structural element, or a non- structural element of the physical environment) that is currently visible in the representation of the field of view of the one or more cameras, relative to the current viewpoint of the user in the physical environment.
  • a first physical feature e.g., a wall, a corner of a wall, a structural element, or a non- structural element of the physical environment
  • object 604 if the display location of object 604 would not be visually occluded by other objects in the camera view 524, object 604 is fully displayed (e.g., as shown in Figure 5 J); and if due to the movement of the cameras and change in the viewpoint of the user, the intended displayed location of object 604 would be at least partially occluded by other objects in the camera view, object 604 is displayed in a manner as if it is visually occluded by the object(s) (e.g., as shown in Figure 5Q, the tip of object 604 is not shown, and appears to be blocked by cabinet 548 in camera view 524).
  • the arrow that points out the missed portion of the physical environment may be visually occluded by one or more objects and/or structural features that are visible in the current field of view and may become visible again when the field of view continues to move.
  • the computer system in response to detecting the movement of the one or more cameras in the physical environment that includes the first movement that changes the current viewpoint of the user from the first viewpoint in the physical environment to the second viewpoint in the physical environment, in accordance with the determination that there is a respective portion of the physical environment that has not been scanned that is between a first portion of the physical environment that has been scanned and a second portion of the physical environment that has been scanned, the computer system displays (828), in the first user interface (e.g., while the scan is ongoing, and/or not completed), a third visual indication overlaying the representation of the field of view of the one or more cameras (e.g., next to a detected wall, a detected object, and/or a detected doorway in the field of view), wherein the third visual indication indicates a location from which the respective portion of the physical environment will become visible in the field of view of the one or more cameras (e.g., the visual indication is a dot overlaying a representation of a location on the floor, where
  • device 100 in response to determining that there is a missing portion of wall 530 in the already scanned first portion of room 520, device 100 displays object 606 at a location in camera view 524 to indicate the location in the physical environment from which the missed portion of wall 530 would become visible in the camera view 524.
  • Displaying a third visual indication that indicates a location from which the respective portion of the physical environment will become visible in the field of view of the one or more cameras, overlaying the representation of the field of view of the one or more cameras, in accordance with the determination that the respective portion of the physical environment has not been scanned, provides improved visual feedback to the user (e.g., improved visual feedback regarding the location where the respective portion of the physical environment can be scanned) and reduces amount of movement needed to complete the scan of the physical environment (e.g., the user does not need to move to different locations in the physical environment to first determine if the respective portion of the physical environment can be scanned from a particular location).
  • the computer system displays (830), in the first user interface (e.g., while the scan is ongoing, and/or not completed), a fourth visual indication in a preview of a three-dimensional model of the physical environment (e.g., next to a model of a detected wall, a detected object, and/or a detected doorway in the preview of the three-dimensional model of the physical environment that is concurrently displayed with the representation of the field of view in the first user interface), wherein the fourth visual indication indicates, in the preview of the three-dimensional model, a location from which the respective portion of the physical environment will become
  • device 100 in response to determining that there is a missing portion of wall 530 in the already scanned first portion of room 520, device 100 displays object 610 at a location in the partially completed model in preview 568 to indicate the location in the physical environment from which the missed portion of wall 530 would become visible in the camera view 524.
  • Displaying a fourth visual indication that indicates a location from which the respective portion of the physical environment will become visible in the field of view of the one or more cameras, in the preview of a three-dimensional model of the physical environment, in accordance with the determination that the respective portion of the physical environment has not been scanned, provides improved visual feedback to the user (e.g., improved visual feedback regarding the location where the respective portion of the physical environment can be scanned) and reduces amount of movement needed to complete the scan of the physical environment (e.g., the user does not need to move to different locations in the physical environment to first determine if the respective portion of the physical environment can be scanned from a particular location).
  • the third visual indication and the fourth visual indication are (832) concurrently displayed in the first user interface.
  • objects 608 and 610 are concurrently displayed in the partially completed model of room 520 in preview 568.
  • the computer system displays, via the display generation component, a dot overlaying the representation of the field of view and a dot overlaying the preview of the three-dimensional model of the physical environment, where the dot overlaying the representation of the field of view and the dot overlaying the preview of the three-dimensional model are both displayed at respective positions (e.g., in the field of view, and in the preview of the model, respectively) that correspond to the physical location from which the respective portion of the physical environment that needs to be rescanned would become visible in the field of view of the one or more cameras.
  • the computer system in response to detecting the movement of the one or more cameras in the physical environment that includes the first movement that changes the current viewpoint of the user from the first viewpoint in the physical environment to the second viewpoint in the physical environment, in accordance with the determination that there is a portion of the physical environment that has not been scanned that is between a first portion of the physical environment that has been scanned and a second portion of the physical environment that has been scanned, the computer system displays (834), in the first user interface (e.g., while the scan is ongoing, and/or not completed), one or more prompts (e.g., textual banners, pop-up windows, and/or another analogous user interface object) that guide a user to move to a location from which the respective portion of the physical environment will become visible in the field of view of the one or more cameras (e.g., the location that is indicated by the dots shown in the field of view and the preview of the three- dimensional model).
  • prompts e.g., textual banners, pop-up windows, and/or another analog
  • device 100 displays banner 602 that includes prompts (e.g., “Scan the missed spot” and “Move forward and face left”) to guide the user to move to a location from which the missed portion of wall 530 would become visible in camera view 524, so that the user can scan that missed portion of wall 530 and the region in front of it.
  • the third visual indication displayed in the preview of the three-dimensional model remains displayed when the first visual indication displayed in the representation of the field of view are no longer displayed (e.g., due to the movement of the field of view), and the prompts help to guide the user to the location from which to scan the missed portion of the physical environment without the aid of the first visual indication.
  • the one or more prompts include a prompt for the user to move farther away from the current portion of the physical environment that is in the field of view of the one or more cameras, a prompt for the user to keep moving the one or more cameras in a current direction of the movement of the one or more cameras, a prompt for the user to search for and include a plane in the physical environment in the field of view of the one or more cameras, a prompt for the user to bring the floor into the field of view of the one or more cameras, a prompt for the user to bring the ceiling into the field of view of the one or more cameras, and/or a prompt to move closer toward the current portion of the physical environment that is in the field of view of the one or more cameras.
  • these prompts are displayed to guide to the user to scan new, unscanned portion of the physical environment, as well as missed, and/or scanned portion of the physical environment that need to be rescanned.
  • Displaying one or more prompts that guide a user to move to a location from which a portion of the physical environment will become visible in the field of view of the one or more cameras, in accordance with the determination that the portion of the physical environment has not been scanned, provides improved visual feedback to the user (e.g., improved visual feedback regarding the location where the respective portion of the physical environment can be scanned).
  • the user interfaces, user interface elements, physical environments and features and objects therein, feature types, guides, animations, and annotations described above with reference to method 800 optionally have one or more of the characteristics of the described herein with reference to other methods described herein (e.g., methods 650, 700, and 900). For brevity, these details are not repeated here.
  • Figures 9A-9E are flow diagrams of a method 900 of displaying scan progress indication during scanning and modeling of an environment, in accordance with some embodiments.
  • Method 900 is performed at a computer system (e.g., portable multifunction device 100 (Figure 1A), device 300 ( Figure 3A), or computer system 301 ( Figure 3B)) with a display device (e.g., a display, optionally touch-sensitive, a projector, a head-mounted display, a heads-up display, or the like, such as touch screen 112 ( Figure 1A), display 340 ( Figure 3A), or display generation component(s) 304 (Figure 3B)), one or more cameras (e.g., optical sensor(s) 164 ( Figure 1A) or camera(s) 305 ( Figure 3B)), and optionally one or more depth sensing devices, such as depth sensors (e.g., one or more depth sensors such as time-of- flight sensor 220 ( Figure 2B)).
  • a display device e.g., a display
  • the method 900 is a method of displaying scan progress indication during scanning and modeling of an environment, and indicating one or more estimated spatial properties, along with a predicted accuracy of the estimated spatial properties, of a detected physical feature.
  • the computer system makes it easy for a user of the computer system to identify when a scan is complete or incomplete. This helps reduce mistakes made during scanning, resulting from a user from changing the field of view of one or more cameras of the computer system (e.g., away from an object being scanned) before the scan finishes.
  • This also provides improved visual feedback to the user (e.g., improved visual feedback regarding detected physical features, scan progress of the detected physical features, and predicted accuracy of estimated spatial properties of the detected physical features).
  • the computer system displays (904), via the display generation component, a first user interface (e.g., a scan user interface that is displayed to show progress of an initial scan of a physical environment to build a three-dimensional model of the physical environment, a camera user interface, or a user interface that is displayed in response to a user’s request to perform a scan of a physical environment or to start an augmented reality session in a physical environment), wherein the first user interface includes a representation of a field of view of one or more cameras (e.g., images or video of a live feed from the camera(s), or a view of the physical environment through a transparent or semitransparent display).
  • a first user interface e.g., a scan user interface that is displayed to show progress of an initial scan of a physical environment to build a three-dimensional model of the physical environment, a camera user interface, or a user interface that is displayed in response to a user’s request to perform a scan of a physical environment or to start an augmented reality session in a physical
  • the representation of the field of view including a respective view of a physical environment that corresponds to a current viewpoint of a user in the physical environment (e.g., the current viewpoint of the user corresponds to a direction, position and/or vantage point from which the physical environment is being viewed by the user).
  • the first user interface further includes a preview of a three-dimensional model of the physical environment that is being generated based on the depth information captured by the one or more cameras.
  • the computer system displays (906) a plurality of graphical objects overlaying the representation of the field of view of the one or more cameras, including displaying at least a first graphical object at a first location that represents (e.g., spatially represents) one or more estimated spatial properties (e.g., position, orientation, and/or size estimated based on one or more sensor measurements) of to a first physical feature (e.g., a first object and/or surface) that has been detected in a respective portion of the physical environment in the field of view of the one or more cameras, and a second graphical object at a second location that represents (e.g., spatially represents) one or more estimated spatial properties (e.g., position, orientation, and/or size estimated based on one or more sensor measurements) of a second physical feature (e.g., a second object and/or surface) that has been detected in the respective portion of the physical environment in the field of view of the one or more cameras (e.g., the two or more
  • graphical object 590 is displayed at a location of stool 546, where graphical object 590 represents one or more estimated spatial properties of stool 546 that have been estimated based on the captured depth information, and object 592 is displayed at a location of TV 560, where graphical object 592 represents one or more estimated spatial properties of TV 560.
  • the computer system changes (908) one or more visual properties (e.g., opacity, sharpness, and/or amount of feathering) of the first graphical object in accordance with variations in a respective predicted accuracy of the estimated spatial properties of the first physical feature, and the computer system changes the one more visual properties (e.g., opacity, sharpness, and/or amount of feathering) of the second graphical object in accordance with variations in a respective predicted accuracy of the estimated spatial properties of the second physical feature.
  • one or more visual properties e.g., opacity, sharpness, and/or amount of feathering
  • the display properties of graphical object 590 representing the estimated spatial properties of stool 546 and the display properties of graphical object 592 representing the estimated spatial properties of TV 560 are respectively changed (e.g., extended, and/or made more solid, opaque, and/or with less feathering) in accordance with the respective changing predicted accuracies of the estimated spatial properties of stool 546 and TV 560.
  • the first graphical object includes (910) a first set of one or more lines that represents (e.g., spatially represents) the one or more estimated spatial properties (e.g., position, orientation, and/or size estimated based on one or more sensor measurements) of the first physical feature (e.g., the first object and/or surface), and the second graphical object includes a second set of one or more lines that represents (e.g., spatially represents) the one or more estimated spatial properties (e.g., position, orientation, and/or size estimated based on one or more sensor measurements) of the second physical feature (e.g., the second object and/or surface).
  • the first graphical object includes (910) a first set of one or more lines that represents (e.g., spatially represents) the one or more estimated spatial properties (e.g., position, orientation, and/or size estimated based on one or more sensor measurements) of the first physical feature (e.g., the first object and/or surface)
  • the second graphical object includes a second set of one or more
  • graphical object 590 includes a first set of lines that represents the estimated spatial properties of stool 546 (e.g., height and width of stool 560) and graphical object 592 includes a second set of lines that represents the estimated spatial properties of TV 560 (e.g., height and width of TV 560).
  • a first set of lines are added to the field of view and extends along the edges and/or surfaces of a first physical feature or surface when the first physical feature is within the field of view of the cameras; and a second set of lines are added to the field of view and extends along the edges and/or surfaces of a second physical feature when the second physical feature is within the field of view of the cameras.
  • the first object and the second object may enter the field of view at different points in times, remain concurrently visible in the field of view for some time, and/or may exit the field of view at different points in time.
  • Displaying a first graphical object that includes a first set of one or more lines that represents one or more estimated spatial properties of the first physical feature, and displaying a second graphical object that includes a second set of one or more lines that represents one or more estimated spatial properties of the second physical feature provides improved visual feedback to the user (e.g., improves visual feedback regarding the estimated spatial properties of the first physical feature and second physical feature).
  • displaying the first graphical object includes extending respective lengths of the first set of one or more lines (e.g., with speed(s) that are selected) in accordance with the respective predicted accuracy (e.g., an initial predicted accuracy, and/or an average predicted accuracy) of the one or more estimated spatial properties of the first physical feature.
  • the respective predicted accuracy e.g., an initial predicted accuracy, and/or an average predicted accuracy
  • graphical object 580 is displayed at the location of cabinet 548 to represent the estimated spatial properties of cabinet 548, and segments of graphical object 580 are extended at a faster speed when the predicted accuracies of the estimated spatial properties are low (e.g., as shown in Figure 5F, faster line drawing around cabinet 548 in the beginning of the scan), and are extended at a lower speed when the predicted accuracies of the estimated spatial properties are high (e.g., as shown in Figure 5G, slower line drawing around cabinet 548 as scan continues).
  • displaying the second graphical object includes extending respective lengths of the second set of one or more lines with speed(s) that are selected in accordance with the respective predicted accuracy (e.g., an initial predicted accuracy, and/or an average predicted accuracy) of the one or more estimated spatial properties of the second physical feature.
  • the respective predicted accuracy e.g., an initial predicted accuracy, and/or an average predicted accuracy
  • the predicted accuracies of the one or more estimated spatial properties of the different physical features are not the same, do not change with the same rate, and/or do not change at the same time; and as a result, the speeds with which the first set of lines and the second set of lines, and/or the respective lines within the first set and/or second set of lines are drawn are not the same at a given moment in time.
  • the rates of extending the respective lengths of the first set of one or more lines are based on (e.g., proportional to, and/or positively correlated to) the predicted accuracy of the one or more estimated spatial properties of the first physical feature.
  • the rates of extending the respective lengths of the second set of one or more lines is based on (e.g., proportional to, and/or positively correlated to) the predicted accuracy of the one or more estimated spatial properties of the second physical feature.
  • the speed(s) with which the first set of lines are extended increase over time, as the respective predicted accuracy of the one or more estimated spatial properties of the first physical feature increases over time as the scan progresses and more depth information is obtained and processed.
  • the speed(s) with which the second set of lines are extended increase over time, as the respective predicted accuracy of the one or more estimated spatial properties of second physical feature increases over time as the scan progresses and more depth information is obtained and processed.
  • Displaying a first graphical object, including extending respective lengths of a first set of one or more lines that represent one or more estimated spatial properties of the first physical feature,, in accordance with the respective predicted accuracy of the one or more estimated spatial properties of the first physical feature provides improved visual feedback to the user (e.g., improves visual feedback regarding the predicated accuracy of estimated spatial properties of the first physical feature).
  • the first graphical object includes (914) a first filled area that represents (e.g., spatially represents) the one or more estimated spatial properties (e.g., position, orientation, and/or size estimated based on one or more sensor measurements) of the first physical feature (e.g., the first object and/or surface), and the second graphical object includes a second filled area that represents (e.g., spatially represents) the one or more estimated spatial properties (e.g., position, orientation, and/or size estimated based on one or more sensor measurements) of the second physical feature (e.g., the second object and/or surface).
  • a first filled area that represents (e.g., spatially represents) the one or more estimated spatial properties (e.g., position, orientation, and/or size estimated based on one or more sensor measurements) of the first physical feature (e.g., the first object and/or surface)
  • the second graphical object includes a second filled area that represents (e.g., spatially represents) the one or more estimated spatial properties (e.
  • a first overlay is displayed at the location of wall 530 in Figure 5F to represent the estimated spatial properties of wall 530
  • a second overlay is displayed at the location of the surfaces of cabinet 548 to represent the estimated spatial properties of the surfaces of cabinet 548.
  • displaying the first graphical object includes expanding the first fill area in accordance with the respective predicted accuracy (e.g., an initial predicted accuracy, and/or an average predicted accuracy) of the one or more estimated spatial properties of the first physical feature.
  • displaying the second graphical object includes expanding the second fill area in accordance with the respective predicted accuracy (e.g., an initial predicted accuracy, and/or an average predicted accuracy) of the one or more estimated spatial properties of the second physical feature.
  • the respective predicted accuracies of the one or more estimated spatial properties of the different physical features are not the same, do not change with the same rate, and/or do not change at the same time; and as a result, the speeds with which the first fill area and the second fill area are expanded are not the same at a given moment in time.
  • the rates of expanding the first fill area are based on (e.g., proportional to, and/or positively correlated to) the predicted accuracy of the one or more estimated spatial properties of the first physical feature.
  • the rates of expanding the second fill area are based on (e.g., proportional to, and/or positively correlated to) the predicted accuracy of the one or more estimated spatial properties of the second physical feature.
  • Displaying a first graphical object that includes a first filled area that represents the one or more estimated spatial properties of the first physical feature, and displaying a second graphical object that includes a second filled area that represents the one or more estimated spatial properties of the second physical feature provides improved visual feedback to the user (e.g., improves visual feedback regarding the estimated spatial properties of the first physical feature and second physical feature).
  • changing the one or more visual properties of the first graphical object in accordance with the variations in the respective predicted accuracy of the estimated spatial properties of the first physical feature includes changing a respective opacity of the first graphical object in accordance with changes in the respective predicted accuracy of the estimated spatial properties of the first physical feature.
  • an overlay is displayed at a location of wall 530 to represent the estimated spatial properties of wall 530 in Figure 5F, and as the predicted accuracies of the estimated spatial properties of wall 530 change during the scan of wall 530, device 100 changes the opacity of the overlay that is displayed at the location of wall 530.
  • changing the one or more visual properties of the second graphical object in accordance with the variations in the respective predicted accuracy of the estimated spatial properties of the second physical feature includes changing a respective opacity of the second graphical object in accordance with changes in the respective predicted accuracy of the estimated spatial properties of the second physical feature.
  • the predicted accuracies of the one or more estimated spatial properties of the different physical features do not change with the same rate, and/or do not change at the same time; and as a result, the respective opacities and/or the rate of changes in the respective opacities of the first graphical object and the second graphical object are not the same at a given moment in time.
  • the rate of changing the opacity of the first graphical object is based on (e.g., proportional to, and/or positively correlated to) the predicted accuracy of the one or more estimated spatial properties of the first physical feature.
  • the rate of changing the opacity of the second graphical object is based on (e.g., proportional to, and/or positively correlated to) the respective predicted accuracy of the one or more estimated spatial properties of the second physical feature.
  • Changing a respective opacity of the first graphical object in accordance with changes in the respective predicted accuracy of the estimated spatial properties of the first physical feature provides improved visual feedback to the user (e.g., improved visual feedback regarding changes to the predicted accuracy of the estimated spatial properties of the first physical feature).
  • changing the one or more visual properties of the first graphical object in accordance with the variations in the respective predicted accuracy of the estimated spatial properties of the first physical feature includes (918) changing a respective amount of feathering (e.g., computer-generated smoothing and/or blur) applied to edges of the first graphical object in accordance with changes in the respective predicted accuracy of the estimated spatial properties of the first physical feature.
  • feathering e.g., computer-generated smoothing and/or blur
  • graphical object 580 that is displayed to represent the estimated spatial properties of cabinet 548 is displayed with different amounts of feathering along different segments of graphical object 580 and/or as scan of cabinet 548 progresses, where the amount of feathering that is applied is based on the predicted accuracies of the estimated spatial properties of different portions of the cabinet 548 and/or at different times during the scan.
  • changing the one or more visual properties of the second graphical object in accordance with the variations in the respective predicted accuracy of the estimated spatial properties of the second physical feature includes changing a respective amount of feathering applied to edges of the second graphical object in accordance with changes in the respective predicted accuracy of the estimated spatial properties of the second physical feature.
  • the respective predicted accuracies of the one or more estimated spatial properties of the different physical features do not change with the same rate, and/or do not change at the same time; and as a result, the respective amounts of feathering and/or the rate of changes in the respective amounts of feathering applied to the edges of the first graphical object and the second graphical object are not the same at a given moment in time.
  • the rate of changing the amount of feathering applied to the first graphical object is based on (e.g., proportional to, and/or positively correlated to) the respective predicted accuracy of the one or more estimated spatial properties of the first physical feature. In some embodiments, the rate of changing the amount of feathering applied to the edges of the second graphical object is based on (e.g., proportional to, and/or positively correlated to) the respective predicted accuracy of the one or more estimated spatial properties of the second physical feature.
  • Changing a respective amount of feathering applied to edges of the first graphical object, in accordance with changes in the respective predicted accuracy of the estimated spatial properties of the first physical feature, provides improved visual feedback to the user (e.g., improved visual feedback regarding changes to the predicted accuracy of the estimated spatial properties of the first physical feature).
  • changing the respective amount (e.g., magnitude and/or radius) of feathering (e.g., computer-generated smoothing and/or blur) applied to edges of the first graphical object in accordance with changes in the respective predicted accuracy of the estimated spatial properties of the first physical feature includes (920): in accordance with a determination that scanning of a corner corresponding to the first graphical object meets first criteria (e.g., two or more detected edges of the first physical feature meet at a corner, detected edges of two or more physical features meet at a comer, indicating consistencies and higher confidence in the detected edges), the computer system decreases the respective amount of feathering applied to the edges of the first graphical object (e.g., reducing the amount of feathering due to increased predicted accuracy in the estimated spatial properties of the first physical feature); and in accordance with a determination that scanning of the corner corresponding to the first graphical object has not met the first criteria (e.g., two or more detected edges of the first physical feature failed to meet at a corner, detected edges
  • first criteria e.g.
  • graphical objects 572, 578, and 574 are displayed respectively along the those detected edges. Initially, graphical objects 572, 578, and 574 are displayed with a greater amount of featuring due to the lower predicted accuracies of the estimated spatial properties of the edges (e.g., as shown in Figures 5E and 5F).
  • increasing the respective amount of feathering and decreasing the respective amount of feathering the first criteria are (922) executed in accordance with a determination that the first graphical object includes a structural object (e.g., a wall, a floor, and/or a ceiling) and not a non- structural object (e.g., not furniture, not appliance, and not other types of non- structural elements of the physical environment).
  • a structural object e.g., a wall, a floor, and/or a ceiling
  • a non- structural object e.g., not furniture, not appliance, and not other types of non- structural elements of the physical environment.
  • the change in the amount of feathering when a comer is detected applies to the edges between wall 530, wall 532, and ceiling 538 in Figure 5G, but does not apply to the detection of corner between different faces of cabinet 548.
  • Changing a respective amount of feathering applied to edges of the first graphical object, in accordance with changes in the respective predicted accuracy of the estimated spatial properties of the first physical feature, and in accordance with a determination that the first graphical object includes a structural object and not a non- structural object, provides improved visual feedback to the user (e.g., improved visual feedback regarding changes to the predicted accuracy of the estimated spatial properties of the first physical feature).
  • changing the one or more visual properties of the first graphical object in accordance with the variations in the respective predicted accuracy of the estimated spatial properties of the first physical feature includes (924) changing a respective sharpness (e.g., resolution, contrast, focus, and/or acutance) of the first graphical object in accordance with changes in the respective predicted accuracy of the estimated spatial properties of the first physical feature.
  • a respective sharpness e.g., resolution, contrast, focus, and/or acutance
  • graphical object 580 that is displayed to represent the estimated spatial properties of cabinet 548 is displayed with different levels of sharpness along different segments of graphical object 580 and/or as scan of cabinet 548 progresses, where the levels of sharpness that are used is based on the predicted accuracies of the estimated spatial properties of different portions of the cabinet 548 and/or at different times during the scan.
  • changing the one or more visual properties of the second graphical object in accordance with the variations in the respective predicted accuracy of the estimated spatial properties of the second physical feature includes changing a respective sharpness (e.g., resolution, contrast, focus, and/or acutance) of the second graphical object in accordance with changes in the respective predicted accuracy of the estimated spatial properties of the second physical feature.
  • a respective sharpness e.g., resolution, contrast, focus, and/or acutance
  • the respective predicted accuracies of the one or more estimated spatial properties of the different physical features do not change with the same rate, and/or do not change at the same time; and as a result, the respective sharpness and/or the rate of changes in the respective sharpness of the first graphical object and the second graphical object are not the same at a given moment in time.
  • the rate of changing the sharpness of the first graphical object is based on (e.g., proportional to, and/or positively correlated to) the respective predicted accuracy of the one or more estimated spatial properties of the first physical feature.
  • the rate of changing the sharpness of the second graphical object is based on (e.g., proportional to, and/or positively correlated to) the respective predicted accuracy of the one or more estimated spatial properties of the second physical feature.
  • Changing a respective sharpness of the first graphical object in accordance with changes in the respective predicted accuracy of the estimated spatial properties of the first physical feature provides improved visual feedback to the user (e.g., improved visual feedback regarding changes to the predicted accuracy of the estimated spatial properties of the first physical feature).
  • changing the one or more visual properties of the first graphical object in accordance with variations in the respective predicted accuracy of the estimated spatial properties of the first physical feature includes (926): at a first time: in accordance with a determination that the respective predicted accuracy of the estimated spatial properties of the first physical feature is a first accuracy value (e.g., 30% confidence, and/or x error range) for a first portion of the first physical feature, displaying a first portion of the first graphical object (e.g., the portion of the first graphical object that corresponds to the first portion of the first physical feature) with a first property value for a first visual property of the one or more visual properties (e.g., a first opacity value, a first amount of feathering, and/or a first line thickness), and at a second time later than the first time: in accordance with a determination that the respective predicted accuracy of the estimated spatial properties of the first physical feature is a second accuracy value (e.g., 50% confidence, and/or 0.5x
  • graphical object 580 that is displayed to represent the estimated spatial properties of cabinet 548 is displayed with different values for a set of display properties, where the values change over time as scan of cabinet 548 progresses and the predicted accuracies of the estimated spatial properties of cabinet 548 change over time during the scan.
  • the computer system displays (928) a third portion of the first graphical object (e.g., the portion of the first graphical object that corresponds to the third portion of the first physical feature) with a third property value for a second visual property of the one or more visual properties, and the computer system displays a fourth portion of the first graphical object (e.g., the portion of the first graphical object that corresponds to the fourth portion of the first physical feature) with a fourth property value for the second visual property of the one or more visual properties, wherein the fourth portion of the first physical feature is different from the third portion of the first physical feature, the fourth accuracy value is different from the third accuracy value, and the fourth property value is different from the third property value.
  • graphical object 580 that is displayed to represent the estimated spatial properties of cabinet 548 is displayed with different values for a set of display properties along different segments of graphical object 580, where different values are selected based on the predicted accuracies of the estimated spatial properties of different portions of the cabinet 548 at a given time.
  • the values of a respective visual property are not uniform across the entirety of the first graphical objects, because the values of the predicted accuracy for an estimated spatial property of the first physical feature are not uniform across the entirety of the first physical feature at any given time during the scan.
  • the values of the respective visual property for different portions of the first graphical object continue to change in accordance with the values of the predicted accuracy of the respective spatial property for the different portions of the first physical feature.
  • the first physical feature includes (930) a fifth portion of the first physical feature and a sixth portion of the first physical feature, the fifth portion of the first physical feature is not visually occluded by another object in the field of view of the one or more cameras, and the sixth portion of the first physical feature is visually occluded by another object in the field of view of the one or more cameras, and displaying the first graphical object includes: displaying a fifth portion of the first graphical object corresponding to the fifth portion of the first physical feature with a fifth property value that corresponds to a fifth accuracy value of the respective predicted accuracy of the one or more estimated spatial properties of the first physical feature, and displaying a sixth portion of the first graphical object corresponding to the sixth portion of the first physical feature with a sixth property value corresponding to a sixth accuracy value of the respective predicted accuracy of the one or more estimated spatial properties of the first physical feature, wherein the sixth property value corresponds to a lower visibility than the fifth property value does in the first user interface.
  • the portion of the graphical object corresponding to the portion of the physical feature that is behind other physical objects is displayed with visual property values for the one or more visual properties that correspond to lower predicted accuracies for the one or more estimated spatial properties.
  • graphical object 632 is displayed to represent the spatial properties of the edge between wall 534 and floor 540, and the two end portions of graphical object 632 that correspond to portions of the edge that are not obscured by couch 552 and side table 554 are displayed with higher visibility, as compared to the middle portion of graphical object 632 that corresponds to a portion of the edge that is obscured by couch 552 and side table 554.
  • a determination that scanning of the first physical feature is completed e.g., the respective predicted accuracy of the estimated spatial properties of the first physical feature meets a preset threshold estimated accuracy, and/or the amount of information that has been obtained for the first physical feature exceeds a threshold amount of information
  • displaying a respective change in the one or more visual properties of the first graphical object e.g., displaying a respective animation such as a sudden increase followed by a decrease of luminance, and/or an increase followed by a decrease of opacity
  • displaying a respective animation such as a sudden increase followed by a decrease of luminance, and/or an increase followed by a decrease of opacity
  • graphical object 580 is displayed to represent the estimated spatial properties of cabinet 548, and values of one or more display properties of graphical object 580 changes based on the changes in the predicted accuracies of the estimated spatial properties of cabinet 548; and in Figure 5H, after the scan of cabinet is completed, a visual effect or animated change 586 is displayed to indicate the completion of the scan for cabinet 548 and a final state of graphical object 580 is displayed.
  • Displaying a respective change in the one or more visual properties of the first graphical object to indicate completion of the scan for the first physical feature, and ceasing to change the one or more visual properties of the first graphical object in accordance with the variations in the respective predicted accuracy of the estimated spatial properties of the first physical feature provides improved visual feedback to the user (e.g., improved visual feedback that the computer system has completed the scan for the first physical feature).
  • displaying the respective change in the one or more visual properties of the first graphical object includes (934): in accordance with a determination that the first physical feature is of a first feature type (e.g., a line, an edge, or another feature type), displaying a first type of change in the one or more visual properties of the first graphical object (e.g., changing from a line with feathering to a solid line) to indicate completion of scanning of the first physical feature; and in accordance with a determination that the first physical feature is of a second feature type (e.g., a surface, a plane, or another feature type) different from the first feature type, displaying a second type of change, different from the first type of change, in the one or more visual properties of the first graphical object (e.g., displaying a sudden increase of intensity
  • FIG. 5H when the scan of cabinet 548 is completed, a first animated change is applied to the lines of graphical object 580; and when the scan of wall 530 is completed, a second animated change is applied to an overlay applied to wall 530, because cabinet 548 and wall 530 are two different types of physical features (e.g., edges vs. surface).
  • the first graphical object includes (936) a set of one or more lines and displaying the first type of change in the one or more visual properties of the first graphical object to indicate completion of the scan of the first physical feature includes reducing an amount (e.g., magnitude and/or radius) of feathering (e.g., switching from displaying the set of one or more lines with feathering to displaying a set of solid lines).
  • an amount e.g., magnitude and/or radius
  • feathering e.g., switching from displaying the set of one or more lines with feathering to displaying a set of solid lines.
  • Displaying a first type of change including reducing an amount of feathering, in the one or more visual properties of the first graphical object to indicate completion of the scan of the first physical feature, in accordance with a determination that the first physical feature is of a first feature type, and displaying a second type of change different from the first type of change, in the one or more visual properties of the first graphical object to indicate completion of the scan of the first physical feature, in accordance with a determination that the first physical feature is of a second feature type different from the first feature type, provides improved visual feedback to the user (e.g., improved visual feedback regarding the feature type of the first physical feature), provides improved visual feedback to the user (e.g., improved visual feedback regarding the feature type of the first physical feature, and improved visual feedback that the computer system has completed the scan of the first physical feature).
  • improved visual feedback e.g., improved visual feedback regarding the feature type of the first physical feature
  • the user e.g., improved visual feedback regarding the feature type of the first physical feature, and improved visual feedback that the computer system
  • the first graphical object includes (e.g., 938) a surface
  • displaying the second type of change in the one or more visual properties of the first graphical object to indicate completion of the scan of the first physical feature includes displaying a preset change sequence in one or more visual properties (e.g., intensity, luminance, brightness, opacity, and/or color) in the surface.
  • a preset change sequence in one or more visual properties (e.g., intensity, luminance, brightness, opacity, and/or color) in the surface.
  • an overlay applied to the detected surfaces of cabinet 548 is changed by increasing luminance and then decreasing luminance of the overlay.
  • an overlay applied to the detected surface of wall 530 is changed by increasing luminance and then decreasing luminance of the overlay.
  • the computer system detects (940) that scanning of the first physical feature is completed (e.g., the predicted accuracy of the estimated spatial properties of the first physical feature meets a preset threshold accuracy, and/or the amount of information that has been obtained for the first physical feature exceeds a threshold amount of information).
  • the computer system reduces visual prominence of the first graphical object from a first visibility level to a second visibility level lower than the first visibility level. For example, in Figure 5H, after the detection of cabinet 548 is completed, graphical object 580 that indicate the estimated spatial properties of cabinet 548 is displayed with reduced visibility as compared to graphical object 580 that is displayed initially (e.g., in Figure 5G).
  • the graphical object when a graphical object is initially displayed in the first user interface to show the progress of the scan of a corresponding physical feature, the graphical object is displayed with an enhanced visibility (e.g., greater luminance, and/or with a greater line thickness) to alert the user that which region in the physical environment is being scanned (e.g., to guide to user to focus the field of view on that region of the physical environment); and as the scan continues, the graphical object is displayed with reduced visibility as compared to their initial appearance to guide the user to move the field of view onto newer portions of the physical environment (e.g., glowing lines around the object fade after object has been detected).
  • an enhanced visibility e.g., greater luminance, and/or with a greater line thickness
  • Reducing visual prominence of the first graphical object from a first visibility level to a second visibility level lower than the first visibility level in response to detecting that scanning of the first physical feature is completed, provides improved visual feedback to the user (e.g., improved visual feedback that the computer system has completed the scan of the first physical feature).
  • the user interfaces, user interface elements, physical environments and features and objects therein, feature types, annotation modes, and mode indications described above with reference to method 900 optionally have one or more of the characteristics of the user interfaces, user interface elements, physical environments and features and objects therein, feature types, annotation modes, and mode indications described herein with reference to other methods described herein (e.g., methods 650, 700, and 800). For brevity, these details are not repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Processing Or Creating Images (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Un système informatique affiche une prévisualisation d'un modèle tridimensionnel d'un environnement physique qui comprend un modèle tridimensionnel partiellement achevé de l'environnement physique qui est affiché avec une première orientation qui correspond à un premier point de vue d'un utilisateur. Le système informatique détecte un premier mouvement qui change un point de vue actuel de l'utilisateur dans l'environnement physique vers un second point de vue et met à jour la prévisualisation du modèle tridimensionnel, comprenant l'ajout d'informations supplémentaires au modèle tridimensionnel partiellement achevé et la rotation du modèle tridimensionnel partiellement achevé vers une seconde orientation. Tout en affichant une seconde vue de l'environnement physique qui correspond au second point de vue, le système informatique, en réponse à la détection d'une première entrée, met à jour la prévisualisation du modèle tridimensionnel dans la première interface utilisateur, comprenant la rotation du modèle tridimensionnel partiellement achevé vers une troisième orientation qui ne correspond pas au second point de vue de l'utilisateur.
PCT/US2023/021563 2022-05-10 2023-05-09 Systèmes, procédés et interfaces utilisateur graphiques pour balayage et modélisation d'environnements WO2023220071A2 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202263340444P 2022-05-10 2022-05-10
US63/340,444 2022-05-10
US18/144,746 2023-05-08
US18/144,746 US20230368458A1 (en) 2022-05-10 2023-05-08 Systems, Methods, and Graphical User Interfaces for Scanning and Modeling Environments

Publications (2)

Publication Number Publication Date
WO2023220071A2 true WO2023220071A2 (fr) 2023-11-16
WO2023220071A3 WO2023220071A3 (fr) 2024-01-11

Family

ID=86732306

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/021563 WO2023220071A2 (fr) 2022-05-10 2023-05-09 Systèmes, procédés et interfaces utilisateur graphiques pour balayage et modélisation d'environnements

Country Status (1)

Country Link
WO (1) WO2023220071A2 (fr)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10559086B1 (en) * 2015-05-15 2020-02-11 4DMobile, LLC System for volume dimensioning via holographic sensor fusion
US20190221035A1 (en) * 2018-01-12 2019-07-18 International Business Machines Corporation Physical obstacle avoidance in a virtual reality environment
US11227446B2 (en) * 2019-09-27 2022-01-18 Apple Inc. Systems, methods, and graphical user interfaces for modeling, measuring, and drawing using augmented reality
EP4254147A3 (fr) * 2020-02-03 2023-12-06 Apple Inc. Systèmes, procédés et interfaces utilisateur graphiques pour annoter, mesurer et modéliser des environnements
US11003308B1 (en) * 2020-02-03 2021-05-11 Apple Inc. Systems, methods, and graphical user interfaces for annotating, measuring, and modeling environments
US11727650B2 (en) * 2020-03-17 2023-08-15 Apple Inc. Systems, methods, and graphical user interfaces for displaying and manipulating virtual objects in augmented reality environments
US20220091722A1 (en) * 2020-09-23 2022-03-24 Apple Inc. Devices, methods, and graphical user interfaces for interacting with three-dimensional environments

Also Published As

Publication number Publication date
WO2023220071A3 (fr) 2024-01-11

Similar Documents

Publication Publication Date Title
US11727650B2 (en) Systems, methods, and graphical user interfaces for displaying and manipulating virtual objects in augmented reality environments
JP7324813B2 (ja) 拡張現実環境及び仮想現実環境と相互作用するためのシステム、方法、及びグラフィカルユーザインタフェース
US11227446B2 (en) Systems, methods, and graphical user interfaces for modeling, measuring, and drawing using augmented reality
US11797146B2 (en) Systems, methods, and graphical user interfaces for annotating, measuring, and modeling environments
US11941764B2 (en) Systems, methods, and graphical user interfaces for adding effects in augmented reality environments
US20230368458A1 (en) Systems, Methods, and Graphical User Interfaces for Scanning and Modeling Environments
WO2021158427A1 (fr) Systèmes, procédés et interfaces graphiques utilisateur pour annoter, mesurer et modéliser des environnements
US20240153219A1 (en) Systems, Methods, and Graphical User Interfaces for Adding Effects in Augmented Reality Environments
KR102397481B1 (ko) 3d 모델들에 대한 시스템 전체 거동을 위한 디바이스들, 방법들, 및 그래픽 사용자 인터페이스들
WO2023220071A2 (fr) Systèmes, procédés et interfaces utilisateur graphiques pour balayage et modélisation d'environnements
KR102666508B1 (ko) 3d 모델들에 대한 시스템 전체 거동을 위한 디바이스들, 방법들, 및 그래픽 사용자 인터페이스들
KR20240075927A (ko) 3d 모델들에 대한 시스템 전체 거동을 위한 디바이스들, 방법들, 및 그래픽 사용자 인터페이스들

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23729571

Country of ref document: EP

Kind code of ref document: A2