WO2023219612A1 - Redimensionnement adaptatif d'objets manipulables et lisibles - Google Patents

Redimensionnement adaptatif d'objets manipulables et lisibles Download PDF

Info

Publication number
WO2023219612A1
WO2023219612A1 PCT/US2022/028746 US2022028746W WO2023219612A1 WO 2023219612 A1 WO2023219612 A1 WO 2023219612A1 US 2022028746 W US2022028746 W US 2022028746W WO 2023219612 A1 WO2023219612 A1 WO 2023219612A1
Authority
WO
WIPO (PCT)
Prior art keywords
size
virtual object
angular
displayed
camera
Prior art date
Application number
PCT/US2022/028746
Other languages
English (en)
Inventor
Qi XIONG
Shuang Liang
Jinbin HUANG
Yi Xu
Yu Gao
Original Assignee
Innopeak Technology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innopeak Technology, Inc. filed Critical Innopeak Technology, Inc.
Priority to PCT/US2022/028746 priority Critical patent/WO2023219612A1/fr
Publication of WO2023219612A1 publication Critical patent/WO2023219612A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2016Rotation, translation, scaling

Definitions

  • This application relates generally to image processing technology including, but not limited to, methods, systems, devices, and non-transitory computer-readable media for rendering objects selectively and adaptively in a mixed, virtual, or augmented reality environment.
  • Objects have been detected, tracked, or rendered in various mixed, virtual, or augmented reality applications.
  • objects are rendered and integrated in real time based on natural features in augmented reality (AR) applications executed on mobile phones.
  • Virtual object rendering, tracking and operation have become a basic function of many video applications implemented in many technology fields, e.g., human computer interaction, virtual reality (VR), gaming, and video surveillance.
  • Most visual content applications follow the basic principles of perspective drawing to present an object with rendered size that decreases with an increase of the object’s distance from a camera position.
  • the object presented by perspective drawing can occupy an excessively large or small portion of a screen if the object is too close to or too far away from a camera. It would be beneficial to have a more efficient object rendering mechanism than the current practice.
  • Various embodiments of this application are directed to rendering a virtual object having a displayed object size that is adaptively controlled based on a distance of the visual object and a camera in an extended reality environment.
  • the visual object is associated with a bounding box that includes an interactive box surrounding the visual object, and a user can manipulate the visual object through the interactive box of the bounding box.
  • the interactive box includes a plurality of bounding box elements to facilitate user manipulation of the visual object.
  • the bounding box elements are adaptively expanded or contracted within a predefined size range based on the distance between the virtual object and a camera position.
  • a method is implemented at an electronic device to render visual content.
  • the method includes generating information of a virtual object to be displayed with contextual content from a perspective of a camera (i.e., from a point of view 7 of the camera).
  • the virtual object has an object location and an object size, and the camera has a camera position that enables the perspective of the camera.
  • the method further includes determining a distance of the virtual object from the camera based on the object location and the camera position and adjusting the object size of the virtual object based on the distance of the virtual object from the camera.
  • the method further includes rendering the virtual object with the contextual content, and the virtual object is displayed at the object location with the adjusted object size in a scene associated with the contextual content.
  • some implementations include an electronic device that, includes one or more processors and memory’ having instructions stored thereon, which when executed by the one or more processors cause the processors to perform any of the above methods.
  • Figure 1 is an example data processing environment having one or more servers communicatively coupled to one or more client devices, in accordance with some embodiments.
  • Figure 2 is a block diagram illustrating an electronic system, in accordance with some embodiments.
  • FIG. 3 is a flowchart of a process for processing inertial sensor data and image data of an electronic system using a SLAM module, in accordance with some embodiments.
  • Figure 4A is an image in which an object (e.g., a moving vehicle) is rendered at different distances in a one-point perspective, in accordance with some embodiments
  • Figures 4B is another image in which an object is displayed with a displayed object size that is controlled in a displayed size range, in accordance with some embodiments.
  • Figure 5 is a perspective view of a bounding box of an object, in accordance with some embodiments.
  • Figure 7 is a flowchart of a method for rendering visual content, in accordance w'ith some embodiments.
  • Extended reality includes augmented reality (AR) in which virtual objects are overlaid on a view of a real physical world, virtual reality (VR) that includes only virtual content, and mixed reality (MR) that combines both AR and VR and in which a user is allowed to interact with real-world and virtual objects.
  • AR is an interactive experience of a real- world environment where the objects that reside in the real world are enhanced by computer-generated perceptual information, e.g., across multiple sensory modalities including visual, auditory, haptic, somatosensory, and olfactory 7 .
  • a virtual object is rendered in AR and associated with a bounding box that includes an interactive box surrounding the virtual object, and a user can manipulate the virtual object through the interactive box of the bounding box.
  • the interactive box includes a plurality of bounding box elements to facilitate user manipulation of the object.
  • the bounding box elements are adaptively expanded or contracted within a predefined range of object dimension based on a distance between the object and a camera position. By these means, the bounding box elements are guaranteed to be displayed in a field of view with a reasonable dimension that facilitates user review and selection of the corresponding object.
  • FIG. 1 is an example data processing environment 100 having one or more servers 102 communicatively coupled to one or more client devices 104, in accordance with some embodiments.
  • the one or more client devices 104 may be, for example, laptop computers 104A, tablet computers 104B, mobile phones 104C, or intelligent, multi-sensing, network-connected home devices (e.g., a surveillance camera 104E, a smart television device, a drone).
  • the one or more client devices 104 include a headmounted display 104D configured to render extended reality content.
  • Each client device 104 can collect data or user inputs, executes user applications, and present outputs on its user interface.
  • the collected data or user inputs can be processed locally at the client device 104 and/or remotely by the server(s) 102.
  • the one or more seivers 102 provides system data (e.g., boot files, operating system images, and user applications) to the client devices 104, and in some embodiments, processes the data and user inputs received from the client device(s) 104 when the user applications are executed on the client devices 104.
  • the data processing environment 100 further includes a storage 106 for storing data related to the servers 102, client devices 104, and applications executed on the client devices 104.
  • storage 106 may store video content (including visual and audio content), static visual content, and/or inertial sensor data.
  • the one or more servers 102 can enable real-time data communication with the client devices 104 that are remote from each other or from the one or more servers 102. Further, in some embodiments, the one or more servers 102 can implement data processing tasks that cannot be or are preferably not completed locally by the client devices 104.
  • the client devices 104 include a game console (e.g., formed by the head-mounted display 104D) that executes an interactive online gaming application.
  • the game console receives a user instruction and sends it to a game server 102 with user data.
  • the game server 102 generates a stream of video data based on the user instruction and user data and providing the stream of video data for display on the game console and other client devices that are engaged in the same game session with the game console.
  • the one or more communication networks 108 are, optionally, implemented using any known network protocol, including various wired or wireless protocols, such as Ethernet, Universal Serial Bus (USB), FIREWIRE, Long Term Evolution (LTE), Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wi-Fi, voice over Internet Protocol (VoIP), Wi-MAX, or any other suitable communication protocol.
  • a connection to the one or more communication networks 108 may be established either directly (e.g., using 3G/4G connectivity to a wireless carrier), or through a network interface 110 (e.g., a router, switch, gateway, hub, or an intelligent, dedicated whole-home control node), or through any combination thereof.
  • a network interface 110 e.g., a router, switch, gateway, hub, or an intelligent, dedicated whole-home control node
  • the one or more communication networks 108 can represent the Internet of a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another.
  • TCP/IP Transmission Control Protocol/Internet Protocol
  • At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, governmental, educational and other electronic systems that, route data and messages.
  • the head-mounted display 104D (also called AR glasses 104D) include one or more cameras (e.g., a visible light camera, a depth camera), a microphone, a speaker, one or more inertial sensors (e.g., gyroscope, accelerometer), and a display.
  • the camera(s) and microphone are configured to capture video and audio data from a scene of the AR glasses 104D, while the one or more inertial sensors are configured to capture inertial sensor data.
  • the camera captures hand gestures of a user wearing the AR glasses 104D.
  • the microphone records ambient sound, including user’s voice commands.
  • the depth and inertial sensor data captured by the AR glasses 104D are processed by the AR glasses 104D, server(s) 102, or both to recognize the device poses.
  • the device poses are used to control the AR glasses 104D itself or interact with an application (e.g., a gaming application) executed by the AR glasses 104D.
  • the display of the AR glasses 104D displays a user interface, and the recognized or predicted device poses are used to render virtual objects with high fidelity or interact with user selectable display items on the user interface.
  • SLAM techniques are applied in the data processing environment 100 to process video data, static image data, or depth data, captured by the AR glasses 104D with inertial sensor data. Device poses are recognized and predicted, and a scene in which the AR glasses 104D is located is mapped and updated.
  • the SLAM techniques are optionally implemented by AR glasses 104D independently or by both of the server 102 and AR glasses 104D jointly.
  • FIG. 2 is a. block diagram illustrating an electronic system 200 configured to process content data (e.g., image data), in accordance with some embodiments.
  • the electronic system 200 includes a server 102, a client device 104 (e.g., AR glasses 104D in Figure 1), a storage 106, or a combination thereof.
  • the electronic system 200 typically, includes one or more processing units (CPUs) 202, one or more network interfaces 204, memory 206, and one or more communication buses 208 for interconnecting these components (sometimes called a chipset).
  • the electronic system 200 includes one or more input devices 210 that facilitate user input, such as a keyboard, a mouse, a voice-command input unit or microphone, a touch screen display, a touch-sensitive input pad, a gesture capturing camera, or other input buttons or controls.
  • the client device 104 of the electronic system 200 uses a microphone for voice recognition or a camera 260 for gesture recognition to supplement or replace the keyboard.
  • the client device 104 includes one or more optical cameras 260 (e.g., an RGB camera), scanners, or photo sensor units for capturing images, for example, of graphic serial codes printed on the electronic devices.
  • the electronic system 200 also includes one or more output devices 212 that enable presentation of user interfaces and display content, including one or more speakers and/or one or more visual displays.
  • the client device 104 includes a location detection device, such as a GPS (global positioning system) or other geolocation receiver, for determining the location of the client device 104.
  • GPS global positioning system
  • the client device 104 includes an inertial measurement unit (IMU) 280 integrating sensor data captured by multi-axes inertial sensors to provide estimation of a location and an orientation of the client device 104 in space.
  • IMU inertial measurement unit
  • the one or more inertial sensors of the IMU 280 include, but are not limited to, a gyroscope, an accelerometer, a magnetometer, and an inclinometer.
  • Memory? 206 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices; and, optionally, includes non-volatile memory', such as one or more magnetic disk storage devices, one or more optical disk storage devices, one or more flash memory' devices, or one or more other non-volatile solid state storage devices.
  • Memory’ 206 optionally, includes one or more storage devices remotely located from one or more processing units 202.
  • Memory 206, or alternatively' the non-volatile memory within memory 7 206 includes a non-transitory computer readable storage medium.
  • memory 206, or the non- transitory computer readable storage medium of memory 206 stores the following programs, modules, and data structures, or a subset or superset thereof:
  • Network communi cation module 216 for connecting each server 102 or client device 104 to other devices (e.g., server 102, client device 104, or storage 106) via one or more network interfaces 204 (wired or wireless) and one or more communication networks 108, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
  • User interface module 218 for enabling presentation of information (e.g., a graphical user interface for application(s) 224, widgets, websites and web pages thereof, and/or games, audio and/or video content, text, etc.) at each client device 104 via one or more output devices 212 (e.g., displays, speakers, etc.);
  • information e.g., a graphical user interface for application(s) 224, widgets, websites and web pages thereof, and/or games, audio and/or video content, text, etc.
  • output devices 212 e.g., displays, speakers, etc.
  • Input processing module 220 for detecting one or more user inputs or interactions from one of the one or more input devices 210 and interpreting the detected input or interaction;
  • Web browser module 222 for navigating, requesting (e.g., via HTTP), and displaying websites and web pages thereof, including a web interface for logging into a user account associated with a client device 104 or another electronic device, controlling the client or electronic device if associated with the user account, and editing and reviewing settings and data that are associated with the user account;
  • One or more user applications 224 for execution by the electronic system 200 e.g., games, social network applications, smart home applications, and/or other web or non-web based applications for controlling another electronic device and reviewing data captured by such devices;
  • Model training module 226 for receiving training data and establishing a data processing model for processing content data (e.g., video, image, audio, or textual data) to be collected or obtained by a client device 104;
  • content data e.g., video, image, audio, or textual data
  • Data processing module 228 for processing content data using data processing models 250, thereby identifying information contained in the content data, matching the content data with other data, categorizing the content data, or synthesizing related content data, where in some embodiments, the data processing module 228 is associated with one of the user applications 224 to process the content data in response to a user instruction received from the user application 224;
  • Pose determination and prediction module 230 for determining and predicting a pose of the client device 104 (e.g., AR glasses 104D), where in some embodiments, the pose is determined and predicted jointly by the pose determination and prediction module 230 and data processing module 228, and the module 230 further includes an SLAM module 232 for mapping a scene where a client device 104 is located and identifying a pose of the client device 104 within the scene using image and MJ sensor data;
  • the one or more databases 240 are stored in one of the server 102, client device 104, and storage 106 of the electronic system 200 .
  • the one or more databases 240 are distributed in more than one of the server 102, client device 104, and storage 106 of the electronic system 200 ,
  • more than one copy of the above data is stored at distinct devices, e.g., two copies of the data processing models 250 are stored at the server 102 and storage 106, respectively.
  • Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above.
  • memory 206 optionally, stores a subset of the modules and data structures identified above. Furthermore, memory 206, optionally, stores additional modules and data structures not described above.
  • FIG. 3 is a flowchart of a process 300 for processing inertial sensor data and image data of an electronic svstem (e.g., a server 102. a client device 104. or a combination of both) using a visual-inertial SLAM module 232, in accordance with some embodiments.
  • the process 300 includes measurement preprocessing 302, initialization 304, local visual- inertial odometry (VIO) with relocation 306, and global pose graph optimization 308.
  • VIO local visual- inertial odometry
  • an RGB camera 260 captures image data of a scene at an image rate (e.g., 30 FPS), and features are detected and tracked (310) from the image data.
  • An IMU 280 measures inertial sensor data at a sampling frequency (e.g., 1000 Hz) concurrently with the RGB camera 260 capturing the image data, and the inertial sensor data are pre-integrated (312) to provide data of a variation of device poses 340.
  • a sampling frequency e.g. 1000 Hz
  • the image data captured by the RGB camera 260 and the inertial sensor data measured by the IMU 280 are temporally aligned (314).
  • a vision-only structure from motion (SfM) techniques 3 14 are applied (316) to couple the image data and inertial sensor data, estimate three-dimensional structures, and map the scene of the RGB camera 260.
  • a sliding window 318 and associated states from a loop closure 320 are used to optimize (322) a VIO.
  • the VIO corresponds (324) to a keyframe of a smooth video transition and a corresponding loop is detected (326)
  • features are retrieved (328) and used to generate the associated states from the loop closure 320.
  • global pose graph optimization 308 a multi -degree-of-freedom (multi- DOF) pose graph is optimized (330) based on the states from the loop closure 320, and a keyframe database 332 is updated with the keyframe associated with the VIO.
  • the features that are detected and tracked (310) are used to monitor (334) motion of an object in the image data and estimate image-based poses 336, e.g., according to the image rate.
  • the inertial sensor data that are preintegrated (234) may be propagated (338) based on the motion of the object and used to estimate inertial -based poses 340, e.g., according to a sampling frequency of the IMU 280.
  • the image-based poses 336 and the inertial-based poses 340 are stored in the database 240 and used by the module 230 to estimate and predict poses that are used by the real time video rendering system 234.
  • the module 232 receives the inertial sensor data measured by the IMU 280 and obtains image-based poses 336 to estimate and predict more poses 340 that are further used by the time video rendering system 234.
  • high frequency pose estimation is enabled by sensor fusion, which relies on data synchronization between imaging sensors and the IMU 280.
  • the imaging sensors e.g., the RGB camera 260, a LiDAR scanner
  • the IMU 280 can measure inertial sensor data and operate at a very high frequency (e.g., 1000 samples per second) and with a negligible latency (e.g., ⁇ 0.1 millisecond).
  • Asynchronous time warping is often applied in an AR system to warp an image before it is sent to a display to correct for head movement and pose variation that occurs after the image is rendered.
  • ATW algorithms reduce a latency of the image, increase or maintain a frame rate, or reduce judders caused by missing images.
  • relevant image data and inertial sensor data are stored locally, such that they can be synchronized and used for pose estimation/predication.
  • the image and inertial sensor data are stored in one of multiple Standard Tessellation Language (STL) containers, e.g., stdzvector, std: '.queue, std: :list, etc., or other self-defined containers. These containers are generally convenient for use.
  • STL Standard Tessellation Language
  • the image and inertial sensor data are stored in the STL containers with their timestamps, and the timestamps are used for data search, data insertion, and data organization.
  • Figure 4A is an image 400 in which a virtual object 402 (e.g., a moving vehicle) is rendered at different distances according to a one-point perspective, in accordance with some embodiments.
  • the image 400 is captured by a camera 260 and displayed on a screen of an electronic device containing the camera 260. Examples of the electronic device include a mobile phone 104C, a vehicle, AR glasses 104D, and video see- through VR glasses.
  • the image 400 is captured by a camera 260 of a first electronic device (e.g., a surveillance camera 104E) and displayed on a screen of a second electronic device (e.g., a mobile phone 104C, a laptop 104A).
  • a first electronic device e.g., a surveillance camera 104E
  • a second electronic device e.g., a mobile phone 104C, a laptop 104A.
  • the virtual object 402 is rendered on the image 400 based on the basic principle of perspective drawing, i.e., with different rendered sizes based on a distance of the virtual object 402 that is measured from a camera position of the camera 260.
  • the image 400 is captured with a normal perspective by the camera 260 and shows the field of view of the camera 260 in the one-point perspective.
  • the image 400 has a horizon line 404 and a vanishing point 406.
  • the horizon line 404 passes through the vanishing point 406 and parallel lines in real life (e.g., road edges, tip of trees) merge and meet at the vanishing point 406.
  • the virtual object 402 moves along a road 408 and gets closer to the camera position 510, the virtual object 402 grows larger in size on the image 400.
  • the virtual object 402A is far away in the field of view and rendered too small in size, and cannot act as a selectable affordance item.
  • the virtual object 402 is immediately in front of the camera 260, and is rendered so large in size that it blocks a view of other objects behind this object and limits visibility of the field of view of the camera 260.
  • Figures 4B is another image 420 in which a virtual object 402 is displayed with a displayed object size that is controlled in a displayed size range, in accordance with some embodiments.
  • the field of view of the camera 260 is divided into three portions based on a distance of the virtual object 402 measured from the camera position of the camera 260 that captures the image 420.
  • a first portion of the field of view also called a medium range field
  • the distance of the virtual object 402 is between a first object distance 424 and a second object distance 426 greater than the first object distance 424
  • the virtual object 402 is rendered in the image 420 based on the basic principle of perspective drawing.
  • the displayed object size of the virtual object 402 decreases with an increase of the distance of the virtual object 402, e.g., at a first rate determined by the basic principles of perspective drawing. Specifically, on the image 420, the virtual object 402 has a first displayed size at the first object distance 424, and a second displayed size at the second object distance 426. The second displayed size is smaller than the first displayed size.
  • the distance of the virtual object 402 is greater than the second object distance 426, and the virtual object 402 is rendered in the image 420 without folkwing the basic principle of perspective drawing.
  • the size of the virtual object 402 remains constant at the second displayed size or a third displayed size in the second portion of the field of view' 428.
  • the third displayed size is distinct from (e.g., greater or less than) the second displayed size.
  • the size of the virtual object 402 decreases with an increase of the distance of the virtual object 402, e.g., at a second rate that is less than the first rate, such that the size of the virtual object 402 can keep a reasonably manageable size to guarantee visibility and selectability of the virtual object 402. Further, in some embodiments, the size of the virtual object 402 decreases at the second rate until it reaches a lower displayed size limit, allowing the size of the virtual object 402 to remain above or equal to the lower displayed size limit.
  • the distance of the virtual object 402 is less than the first object distance 424, and the virtual object 402 is also rendered in the image 420 without following the basic principle of perspective drawing.
  • the size of the virtual object 402 remains constant at the first displayed size or a fourth displayed size in the close field 430.
  • the fourth displayed size is distinct from (e.g., greater or less than) the first displayed size.
  • the size of the virtual object 402 increases with a decrease of the distance of the virtual object 402, e.g., at. a third rate that is less than the first rate, such that the size of the virtual object.
  • the size of the virtual object 402 can keep a reasonably manageable size to guarantee visibility of the virtual object 402 and other objects behind the virtual object 402 in the field of view. Further, in some embodiments, the size of the virtual object 402 increases at the third rate until it reaches an upper displayed size limit, allowing the size of the virtual object 402 to remain below or equal to the upper displayed size limit.
  • the displayed object size of the virtual object 402 is controlled within the displayed size range.
  • the displayed size range is defined by the first and second displayed sizes corresponding to the first and second object distances 424 and 426, respectively.
  • the first and second object distances 424 and 426 are not measured. Rather, the displayed size of the virtual object 402 keeps increasing with a decrease of the distance of the virtual object 402 at the first rate. If it is determined that the displayed object size of the virtual object 402 is greater than the first displayed size, the displayed object size of the virtual object 402 is controlled to make the displayed object size equal to the first displayed size.
  • the displayed object size of the virtual object 402 is controlled to make the displayed object size equal to the second displayed size. Further, in some embodiments, the displayed size range is modified by the third displayed size, fourth displayed size, lower displayed size limit, or upper displayed size limit. In some embodiments, if it is determined that the displayed object size of the virtual object 402 is greater than the upper displayed size limit, the displayed object size of the virtual object 402 is reduced to make the displayed object size equal to the upper displayed size limit.
  • the image 420 is displayed in a screen having a display size.
  • the image 420 is captured by a camera and scaled for display on the screen having the display size.
  • an electronic device includes an at least partially transparent and see-through screen via which a physical world is observed.
  • the virtual object 402 is rendered on the screen and overlaid on the physical world.
  • a ratio between the displayed size of the virtual object 402 and the display size of the screen is controlled within a display ratio range. As the displayed size of the virtual object 402 is between the first, and second displayed sizes in the medium range field 422, the display ratio range is between a first display ratio threshold and a second display ratio threshold less than the first display ratio threshold.
  • the first display ratio threshold is a ratio of the first displayed size and the display size
  • the second display ratio threshold is a ratio of the second displayed size and the display size.
  • the display ratio range is less than an upper display ratio limit
  • the displayed object size is less than an upper displayed size.
  • the display ratio range is greater than a lower display ratio limit, and the displayed object size is greater than or equal to a lower displayed size.
  • FIG. 5 is a perspective view 500 of a bounding box 502 of a virtual object 402, in accordance with some embodiments.
  • the virtual object 402 is rendered on an image 420 captured by a camera 260.
  • the virtual object 402 has an object size and is shown with a displayed object size on the image 420.
  • the object size remains unchanged, w'hile the displayed object size changes with a distance of the virtual object 402 and a camera position 510.
  • the displayed object size of the virtual object 402 is associated with an angular size 504 (i.e., a degree of a visual angle) of the virtual object 402 with respect to a camera position 510 and camera field of view (FOV).
  • angular size 504 i.e., a degree of a visual angle
  • a unit of the angular size 504 is degrees of arc, and the size of the virtual object 402 shown on the image 420 is also measured in degrees.
  • the size of the virtual object 402 on the image 420 is determined based on a projection of the object in a user’s view. The projected size of the virtual object 402 appears to be excessively large in the close field 430 and excessively small in the remote field 428.
  • the projected size of the virtual object 402 is monitored as the virtual object 402 is rendered in the close field 430, remote field 428, and medium range field 422 of the field of view of the camera 260, and the angular size 504 is controlled in an angular size range and changes adaptively according to a position of the virtual object 402 with respect to the camera position 510 and the projected size of the virtual object 402.
  • the virtual object 402 is rendered in the bounding box 502, and the bounding box 502 includes a plurality of bounding box elements, e.g., box corners 502C and box edges 502E. If the virtual object 402 is selectable, the bounding box 502 defines a region corresponding to a selectable affordance item, and a user action on the selectable affordance item initiates a selection of the virtual object 402 optionally followed by more user actions. If the size of the virtual object 402 can be manually adjusted, a user action on any of the bounding box elements initiates an expanding or collapsing operati on on the virtual object 402.
  • bounding box 502 includes a plurality of bounding box elements, e.g., box corners 502C and box edges 502E. If the virtual object 402 is selectable, the bounding box 502 defines a region corresponding to a selectable affordance item, and a user action on the selectable affordance item initiates a selection of the virtual object 402 optionally followed by more user
  • the angular size of the virtual object 402 is measured by a horizontal angular size 504H and a vertical angular size 504V.
  • a horizontal line 506H and a vertical line 506V are orthogonal to, and intersect with, each other at the camera position 510.
  • the horizontal line 506H and vertical line 506V define a plane parallel to an image sensor array of the camera 260.
  • a center of the virtual object 402 is projected to a first node 510H and a second node 510V on the horizontal line 506H and the vertical line 506V, respectively.
  • Nodes on a surface of the virtual object 402 are connected to the first node 51 OH, and lines connecting the object 402 to the first, node 51 OH form a horizontal angle having the horizontal angular size 504H.
  • the nodes on the surface of the virtual object 402 are also connected to the second node 510V, and lines connecting the object 402 to the second node 510V form a vertical angle having the vertical angular size 504V.
  • FIG. 6 is a process 600 of adjusting an object size of a virtual object 402 adaptively, in accordance with some embodiments.
  • the virtual object 402 is rendered on the image 420 captured by a camera 260.
  • An actual size of the virtual object 402, if exists, is fixed independently of where the virtual object 402 is positioned in the physical world.
  • the rendered size of the virtual object 402 varies based on a distance of the virtual object 402 from the camera 260.
  • the virtual object 402 is perceived through rays of light 602 that enter our eyes. For example, the virtual object 402A is located closer to the user or the camera 260 than the virtual object 402C.
  • the rays 602A corresponding to the virtual object 402A form a larger angle than the rays 602C corresponding to the virtual object 402C.
  • a displayed object size of the virtual object 402A on the image 420 is greater than a displayed object size of the virtual object. 402C.
  • the virtual object 402 includes a bounding box 604 having a plurality of bounding box elements 606.
  • the bounding box 604 and bounding box elements 606 are not rendered with the virtual object 402, i.e., invisible on the image 420.
  • the bounding box 604 and bounding box elements 606 are rendered with the virtual object 402, i.e., visible on the image 420.
  • the object size of the virtual object 402 includes a first size Si of the virtual object 402, a second size S2 of the bounding box 604, and a third size & of a subset of the plurality of bounding box elements 606.
  • the third size S 3 of the subset of the plurality of bounding box elements 606 is adjusted based on the distance of the virtual object 402 from the camera 510, while the first size Si of the virtual object 402 and the second size & of the bounding box 604 are kept unchanged and constant with the distance of the virtual object 402 from the camera 510.
  • the virtual object 402 and bounding box 604 appears to be normal and obeying the principle of perspective drawing in an observer’s eye while the third size S 3 of bounding box elements is still controlled to facilitate user interaction.
  • the object 402 and the bounding box 604 appear closer to the user and may take up an increasing portion of display space in the field of view of the image 420.
  • the basic principle of perspective drawing applies to the bounding box elements 606 in the medium range filed 422, and however, not in the near field 430 and far field 428.
  • the displayed third size S 3 ’ of these bounding box elements 606 is scaled up as the distance between the object 402 and camera position 510 decreases.
  • the displayed third size S 3 ' of the bounding box elements 606 are scaled down on the image 420 as the object 402 moves away from the camera 260.
  • the distance between the user and the virtual object 402 exceeds the second object distance 426, and the displayed third size S 3 ’ of the bounding box elements 606 cease to be scal ed down.
  • This maintains the visibility of the bounding box elements 606 in the image 420 and facilitates usability of a selectable bounding box elements 606.
  • adaptive resizing of the bounding box elements 606 with respect to the image 420 guarantees the ease of selecting and manipulating these bounding box elements 606 regardless of the distance between the target object and the viewer. This ensures the usability of the bounding box elements 606 and stability of all interactions depending on them.
  • An angular size (AS) of the virtual object 402 rendered on the image 420 is defined as: where AOS, DOS, DOBJ, and DOL represent an actual object size that is fixed, a displayed object size, a distance between the object 402 and a camera position 510, and a fixed distance between lens and the image sensor array inside the camera 260.
  • the angular size AS is proportional to the displayed object size DOS. Both the angular size AS and displayed object size DOS are inversely proportional to the distance D OBJ between the object 402 and camera position 510.
  • the distance of the virtual object 402 is between a first object distance 424 and a second object distance 426 greater than the first object distance 424, and the virtual object 402 is rendered in the image 420 based on the basic principle of perspective drawing.
  • the angular size of the virtual object 402 decreases with an increase of the distance of the virtual object 402, e.g., at a first rate determined by the basic principles of perspective drawing.
  • the virtual object 402 has a first angular size at the first object distance 424, and a second angular size at the second object distance 426.
  • the second angular size is less than the first angular size.
  • the distance of the virtual object 402 from the camera position 510 is greater than the second object distance 426, and the virtual object 402 is rendered in the image 420 according to the basic principle of perspective drawing.
  • the third size & of the bounding box elements 606 remains constant in the remote field 428.
  • the angular size of the bounding box elements 606 decreases with an increase of the distance of the virtual object 402, e.g., at a second rate that is less than the first rate, such that the bounding box elements 606 can keep a reasonably manageable size to guarantee selectability of the bounding box elements 606.
  • the angular size of the third size S 3 decreases at the second rate until it reaches a lower angular limit, allowing the angular size of the bounding box elements 606 to remain above or equal to the lower angular limit.
  • the distance of the virtual object 402 is less than the first object distance 424, and the virtual object 402 is rendered in the image 420 following the basic principle of perspective drawing.
  • the third size S 3 of the bounding box elements 606 remains constant in the close field 430.
  • the angular size of the bounding box elements 606 increases with a decrease of the distance of the virtual object 402, e.g., at a third rate that is less than the first rate, such that the bounding box elements 606 can keep a reasonably manageable size to guarantee to guarantee visibility of the virtual object 402 and other objects behind the virtual object 402in the field of view.
  • the angular size of the bounding box elements 606 increases at the third rate until it reaches an upper angular limit, allowing the angular size of the bounding box elements 606 to remain below or equal to the upper angular limit.
  • the angular size of the virtual object 402 is controlled within an angular size range.
  • the angular size range is defined by the first and second angular sizes corresponding to the first and second object distances 424 and 426, respectively.
  • the first and second object distances 424 and 426 are not measured. Rather, the angular size of the virtual object 402 keeps increasing with a decrease of the distance of the virtual object 402 at the first rate. If it is determined that the angular size of the virtual object is greater than the first angular size, the angular size of the virtual object is controlled to make the angular size equal to the second angular size.
  • the angular size of the virtual object is controlled to make the angular size equal to the first angular size.
  • the angular size range is modified by the third angular size, fourth angular size, lower angular limit, or upper angular limit. In some embodiments, if it is determined that the angular size of the virtual object is greater than the upper angular limit, the angular size of the virtual object is reduced to make the angular size equal to the upper angular limit.
  • the angular size of the virtual object 402 and the angular size of the bounding box 604 are not limited and follow' the principle of perspective drawings. Rather, an angular size of the bounding box elements 606 is controlled based on two angular sizes corresponding to the first and second object distances 424 and 426. In some embodiments, the angular size of the bounding box elements 606 is limited between the two angular sizes. Alternatively, in some embodiments, a varying rate of the angular size of the bounding box elements 606 are different in the fields 422, 428, and 430.
  • a first varying rate of the angular size of the bounding box elements 606 in the medium range field is greater than the varying rates of angular size of the bounding box elements 606 in the near and far fields 430 and 428.
  • the size of the virtual object 402 rendered on the image can be measured using different methods. For example, the size of the virtual object 402 is measured by an angular size that is equal to a size of an angle formed by lines connecting a camera position 510 to a surface of the virtual object 402.
  • the angular size corresponding to the first size Si of the virtual object 402 increases as the virtual object 402 moves closer to the camera 260 in the medium range field 422, and remains unchanged or varies at a slower rate as the virtual object 402 moves in the close field 430 and remote field 428.
  • the object size of the virtual object 402 e.g., the first size 5/
  • a displayed object size e.g., a pixel count
  • the displayed object size increases from a first displayed size to a second displayed size as the virtual object 402 moves from a second object distance 426 to a first object distance 424.
  • the displayed object size remains unchanged as a first displayed size in the close field 430 and as a second displayed size in the remote field 428.
  • the second displayed size is less than the first displayed size.
  • the object size of the virtual object 402 includes a first size Si of the virtual object 402, a second size Si of the bounding box 604, and a third size S 3 of a subset of the plurality of bounding box elements 606.
  • the third size S 3 of the subset of the plurality of bounding box elements 606 is adjusted based on the distance of the virtual object 402 from the camera 510, while the first, size 6) of the virtual object 402 and the second size Si of the bounding box 604 are kept unchanged and constant with the distance of the virtual object 402 from the camera 510.
  • each of the displayed first and second sizes S 1 ’ and S 2 ’ increases with a decrease of the distance from the camera position 510 based on the principle of perspective drawing.
  • the displayed third size S 3 ' increases from a size S 31 ’ to another size S 32 ’ (which is greater than S 31 ’) on the image 420, as the virtual object 402 moves from the second object distance 426 to the first object distance 424.
  • the displayed third size S 3 ’ remains unchanged as the size S 31 ’ as the object 402 moves in the close field 430 and as the size S 31 ’ as the object 402 moves in the remote field 428.
  • the displayed third size S 3 ’ in the image 420 increases with a decrease of the distance from the camera position 510 at a smaller rate as the object 402 moves in the close field 430 and the remote field 428, compared with as the object moves in the medium range field 422.
  • the virtual object 402 appears to be normal and obeying the principle of perspective drawing in an observer’s eye while the third size & is still controlled to facilitate user interaction,
  • FIG. 7 is a flowchart of a method 700 for rendering visual content, in accordance with some embodiments.
  • the method is applied in the AR glasses 104D, robotic systems, vehicles, or mobile phones.
  • the method 700 is described as being implemented by an electronic device (e.g., a client device 104).
  • the method 700 is applied to determine and predict poses, map a scene, and render both virtual and real content concurrently in extended reality' (e.g., VR, AR).
  • Method 700 is, optionally, governed by instructions that are stored in a non-transitory computer readable storage medium and that are executed by one or more processors of the electronic system.
  • Each of the operations shown in Figure 7 may correspond to instructions stored in a computer memory or non-transitory computer readable storage medium (e.g., memory 206 of the electronic system 200 in Figure 2)
  • the computer readable storage medium may include a magnetic or optical disk storage device, solid state storage devices such as Flash memory, or other non-volatile memory' device or devices.
  • the instructions stored on the computer readable storage medium may include one or more of: source code, assembly language code, object code, or other instruction format that is interpreted by one or more processors.
  • the electronic device generates (702) information of a virtual object 402 to be displayed with contextual content from a perspective of a camera 260 (i.e., from a point of view of the camera 260).
  • the virtual object 402 has (704) an object location and an object size, and the camera 260 has (706) a camera position that enables the perspective of the camera 260.
  • the electronic device determines (708) a distance of the virtual object 402 from the camera 260 based on the object location and the camera position, and adjusts (710) the object size of the virtual object 402 based on the distance of the virtual object 402 from the camera 260.
  • the virtual object 402 is rendered (712) with the contextual content, and displayed (714) at the object location with the adjusted object size in a scene associated with the contextual content. As such, the virtual object 402 is displayed jointly with the contextual content in an image 420.
  • the object size of the virtual object 402 is adjusted by determining (716) an angular size of the virtual object 402 based on the object size and the distance of the virtual object 402 from the camera 260 and controlling (718) the angular size of the virtual object 402 within an angular size range.
  • the angular size range is between a first angular size and a second angular size smaller than the first angular size.
  • the angular size of the virtual object 402 is controlled within the angular size range by (1) in accordance with a determination that the angular size of the virtual object 402 is greater than the second angular size, reducing the object size of the virtual object 402 to make the angular size equal to the second angular size and (2) in accordance with a determination that, the angular size of the virtual object 402 is less than the first angular size, increasing the object size of the virtual object 402 to make the angular size equal to the second angular size.
  • the angular size range is less than an upper angular limit.
  • the angular size of the virtual object 402 is controlled within the angular size range by in accordance with a determination that the angular size of the virtual object 402 is greater than the upper angular limit, reducing the object size of the virtual object 402 to make the angular size equal to the upper angular limit,
  • the angular size range is greater than a lower angular limit.
  • the angular size of the virtual object is controlled within the angular size range further by in accordance with a determination that the angular size of the virtual object on the display is less than the lower angular limit, increasing the object size of the virtual object to make the angular size equal to the low'er angular limit.
  • the virtual object 402 is displayed (720) with a displayed object size on a display of the electronic device, and the object size of the virtual object 402 is adjusted (722) such that a ratio of the displayed object size and a display size of the display is controlled within a display ratio range.
  • the display ratio range is (724) between a first display ratio threshold and a second display ratio threshold less than the first display ratio threshold, and the displayed object size is (726) in a displayed size range between the first displayed size and the second displayed size less than the first displayed size.
  • the display ratio range is less than an upper display ratio limit, and the displayed size range is less than an upper displayed size limit, In some embodiments, the display ratio range is greater than a lower display ratio limit, and the displayed object size is greater than a lower displayed size.
  • the virtual object 402 includes a bounding box 604 having a plurality of bounding box elements 606.
  • the bounding box 604 and bounding box elements 606 are not rendered with the virtual object 402.
  • the object size of the virtual object 402 includes a first size St of the virtual object 502, a second size S2 of the bounding box 604, and a third size S 3 of a subset of the plurality of bounding box elements 606.
  • the electronic device adjusts the object size of the virtual object 402 by adjusting the third size S 3 of the subset of the plurality of bounding box elements 606 based on the distance of the virtual object 402 from the camera 510 and keeping the first size Si of the virtual object 402 and the second size & of the bounding box 604 constant with the distance of the virtual object 402 from the camera 510.
  • the electronic device obtains a background image 420 of the contextual content captured by the camera 260, and executes an augmented realty (AR) application.
  • the virtual object 402 is overlaid on the background image 420 in an AR environment enabled by the AR application.
  • the electronic device creates a background image 420 of the contextual content including a plurality of virtual objects, and executes a virtual reality (VR) application.
  • the virtual object 402 is overlaid on the background image 420 in an VR environment enabled by the VR application.
  • VR virtual reality
  • the term “if” is, optionally, construed to mean “when” or “upon” or “in response to determining” or “in response to detecting” or “in accordance with a determination that,” depending on the context.
  • the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event]” or “in response to detecting [the stated condition or event]” or “in accordance with a determination that [a stated condition or event] is detected,” depending on the context.
  • stages that are not order dependent may be reordered and other stages may be combined or broken out. While some reordering or other groupings are specifically mentioned, others will be obvious to those of ordinary skill in the art, so the ordering and groupings presented herein are not an exhaustive list of alternatives. Moreover, it should be recognized that the stages can be implemented in hardware, firmware, software or any combination thereof.

Abstract

La présente demande concerne le rendu d'images. Un dispositif électronique génère des informations d'un objet virtuel à afficher avec un contenu contextuel à partir d'une perspective d'une caméra. L'objet virtuel a un emplacement d'objet et une taille d'objet, et la caméra a une caméra, une position qui permet la perspective de la caméra. Le dispositif électronique détermine une distance de l'objet virtuel à partir de la caméra sur la base de l'emplacement d'objet et de la position de caméra, et règle la taille d'objet de l'objet virtuel sur la base de la distance de l'objet virtuel à partir de la caméra. L'objet virtuel est rendu avec le contenu contextuel, et l'objet virtuel est affiché au niveau de l'emplacement d'objet avec la taille d'objet ajustée dans une scène associée au contenu contextuel.
PCT/US2022/028746 2022-05-11 2022-05-11 Redimensionnement adaptatif d'objets manipulables et lisibles WO2023219612A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2022/028746 WO2023219612A1 (fr) 2022-05-11 2022-05-11 Redimensionnement adaptatif d'objets manipulables et lisibles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2022/028746 WO2023219612A1 (fr) 2022-05-11 2022-05-11 Redimensionnement adaptatif d'objets manipulables et lisibles

Publications (1)

Publication Number Publication Date
WO2023219612A1 true WO2023219612A1 (fr) 2023-11-16

Family

ID=88730701

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/028746 WO2023219612A1 (fr) 2022-05-11 2022-05-11 Redimensionnement adaptatif d'objets manipulables et lisibles

Country Status (1)

Country Link
WO (1) WO2023219612A1 (fr)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220058888A1 (en) * 2019-08-28 2022-02-24 Shenzhen Sensetime Technology Co., Ltd. Image processing method and apparatus, and computer storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220058888A1 (en) * 2019-08-28 2022-02-24 Shenzhen Sensetime Technology Co., Ltd. Image processing method and apparatus, and computer storage medium

Similar Documents

Publication Publication Date Title
US10750161B2 (en) Multi-view interactive digital media representation lock screen
US10748313B2 (en) Dynamic multi-view interactive digital media representation lock screen
US20210304431A1 (en) Depth-Aware Photo Editing
EP3105921B1 (fr) Guidage de photocomposition et de position dans un appareil de prise de vues ou un système de réalité augmentée
US11935187B2 (en) Single-pass object scanning
US11776142B2 (en) Structuring visual data
JP7392105B2 (ja) 没入型ビデオコンテンツをフォービエイテッドメッシュを用いてレンダリングするための方法、システム、および媒体
CN115039166A (zh) 增强现实地图管理
EP3788464B1 (fr) Déplacement dans un cadre de réalité simulée par ordinateur
US20210150774A1 (en) Method, device, and system for delivering recommendations
WO2021178980A1 (fr) Synchronisation de données et prédiction de pose en réalité étendue
WO2023219612A1 (fr) Redimensionnement adaptatif d'objets manipulables et lisibles
JP6371547B2 (ja) 画像処理装置、方法、および、プログラム
US11044464B2 (en) Dynamic content modification of image and video based multi-view interactive digital media representations
WO2023277903A1 (fr) Architecture slam monoculaire à caméra double
US11615582B2 (en) Enclosed multi-view visual media representation
WO2023195982A1 (fr) Sous-échantillonnage d'image-clé pour réduction d'utilisation de mémoire dans slam
WO2023069591A1 (fr) Système d'entrée et de guidage de curseur double à base d'objet
WO2023129162A1 (fr) Suivi, traitement et rendu de vidéo légère en temps réel
US10964056B1 (en) Dense-based object tracking using multiple reference images
WO2023075765A1 (fr) Slam fondées sur une image de profondeur
WO2023219629A1 (fr) Reconnaissance de gestes de la main basée sur le contexte
WO2023086102A1 (fr) Visualisation de données en réalité étendue
WO2023009113A1 (fr) Guidage interactif pour cartographie et relocalisation
WO2023003558A1 (fr) Étalonnage d'affichage stéréoscopique interactif

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22941828

Country of ref document: EP

Kind code of ref document: A1