US20180101966A1 - Real-time remote collaboration and virtual presence using simultaneous localization and mapping to construct a 3d model and update a scene based on sparse data - Google Patents

Real-time remote collaboration and virtual presence using simultaneous localization and mapping to construct a 3d model and update a scene based on sparse data Download PDF

Info

Publication number
US20180101966A1
US20180101966A1 US15/726,316 US201715726316A US2018101966A1 US 20180101966 A1 US20180101966 A1 US 20180101966A1 US 201715726316 A US201715726316 A US 201715726316A US 2018101966 A1 US2018101966 A1 US 2018101966A1
Authority
US
United States
Prior art keywords
scene
objects
model
viewing
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US15/726,316
Other versions
US10380762B2 (en
Inventor
Ken Lee
Yasmin Jahir
Xin Hou
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
VanGogh Imaging Inc
Original Assignee
VanGogh Imaging Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by VanGogh Imaging Inc filed Critical VanGogh Imaging Inc
Priority to US15/726,316 priority Critical patent/US10380762B2/en
Assigned to VANGOGH IMAGING, INC. reassignment VANGOGH IMAGING, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HOU, Xin, JAHIR, YASMIN, LEE, KEN
Publication of US20180101966A1 publication Critical patent/US20180101966A1/en
Application granted granted Critical
Publication of US10380762B2 publication Critical patent/US10380762B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6587Control parameters, e.g. trick play commands, viewpoint selection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/024Multi-user, collaborative environment
    • G06T3/0037
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/06Topological mapping of higher dimensional structures onto lower dimensional surfaces
    • G06T3/067Reshaping or unfolding 3D tree structures onto 2D planes

Definitions

  • the subject matter of this application relates generally to methods and apparatuses, including computer program products, for real-time remote collaboration and virtual presence using simultaneous localization and mapping (SLAM) to construct three-dimensional (3D) models and updating a scene based upon sparse data, including two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
  • SLAM simultaneous localization and mapping
  • a video can be streamed to a viewer at the remote location, or the event can be recorded and streamed later to the viewer.
  • some form of compression such as MPEG-4 is used to reduce the amount of data being transmitted over the network by a factor of 100 or more. This allows the transmission of the video to be practical over low-bandwidth wired networks or most wireless networks.
  • Simultaneous localization and mapping is a computer modeling technique that is used to map and track the real world as a 3D model.
  • the methods and systems described herein utilize SLAM to compress in real time a live video stream of a remote scene for the purpose of viewing that scene from any location. Once re-rendered as a 3D model at the viewer's device, the live remote scene can then be viewed using a VR headset as if the viewer is at the remote location.
  • the technology described herein can be used to capture a scene (including objects in the scene) of a first location as one or more 3D models, transfer the 3D model(s) in real time to a second location that is remote from the first location, and then render viewing images of the 3D model from a different viewing perspective using the pose of a viewing element (e.g., digital screen, camera, or image viewer, headset) at the second location.
  • a viewing element e.g., digital screen, camera, or image viewer, headset
  • the second location can be equipped with a VR headset or other similar hardware to view the 3D model of first location from any viewing angle.
  • the systems and methods described herein advantageously transfer only the changing portions of the scene and/or objects in the scene to the second location.
  • the methods and systems described herein transmit an entire 3D model of the scene—and objects in the scene—from the first location to the second location, and use a graphics processor in the viewer's device to render image(s) using the 3D model. It should be appreciated that the techniques described herein provide the advantage of ‘virtually’ copying the scene and objects at the first location, storing 3D models of the scene and objects in memory of the viewer's device at the second location, and then rendering the scene and objects in real time (e.g., as a video stream) from the ‘virtual’ scene.
  • Another advantage provided by the methods and systems described herein is that the image processing device at the first location needs only to transmit changes in the ‘position’ of the objects and the sensor location relative to the scene for each frame—and not the entire scene for each frame—to the viewer's device at the second location, in order for the viewer to move the objects and the sensor location in the virtual scene to replicate the same visual experience as if the remote viewer is at the first location. Because transmission of the changes in position and sensor location involves much less data than sending the entire scene, this technique advantageously provides for substantial compression of, e.g., a video stream transmitted from the first location to the second location.
  • the systems and methods described herein advantageously transfer only the pose of the object to the viewer's device at the second location once the viewer's device has received the 3D model(s).
  • the systems and methods described herein advantageously transfer only the pose of the object to the viewer's device at the second location once the viewer's device has received the 3D model(s).
  • non-rigid objects such as people
  • subsequent transmissions need to include only the sparse feature information of the non-rigid object in order for the viewer's device at the second location to recreate the scene correctly.
  • the sparse feature information can include feature points such as points associated with aspects of the person's body (e.g., head, feet, hands, arms).
  • the sensor and server computing device need only capture and transmit the positional changes associated with these feature points to the viewing device—instead of the entire model—and the viewing device can update the 3D model at the remote location using the sparse feature information to track the person's movements through the scene.
  • the invention in one aspect, features a system for generating a video stream of a scene including one or more objects.
  • the system comprises a sensor device that captures a plurality of images of one or more objects in a scene.
  • the system further comprises a server computing device coupled to the sensor device that, for each image, generates an initial 3D model for each of the one or more objects in the scene using the image.
  • the server computing device for each image, generates an initial 3D model of the scene using the image.
  • the server computing device for each image, captures pose information of the sensor device relative to at least one of the scene or one or more of the objects in the scene as the sensor device moves in relation to the scene.
  • the system further comprises a viewing device coupled to the server computing device.
  • the viewing device receives (i) at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and (ii) the pose information of the sensor device, from the server computing device.
  • the viewing device captures pose information of a viewing perspective of the viewing device relative to at least one of the scene or one or more of the objects in the scene as the viewing perspective of the viewing device moves in relation to the scene.
  • the viewing device renders a video stream of at least one of the one or more objects or the scene on a display element of the viewing device using the received initial 3D models and at least one of (i) the pose information of the sensor device or (ii) the pose information of the viewing perspective of the viewing device.
  • the invention in another aspect, features a computerized method of generating a video stream of a scene including one or more objects.
  • a sensor device captures a plurality of images of one or more objects in a scene.
  • a server computing device coupled to the sensor device, for each image, generates an initial 3D model for each of the one or more objects in the scene using the image.
  • the server computing device for each image, generates an initial 3D model of the scene using the image.
  • the server computing device for each image, captures pose information of the sensor device relative to at least one of the scene or one or more of the objects in the scene as the sensor device moves in relation to the scene.
  • a viewing device coupled to the server computing device receives (i) at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and (ii) the pose information of the sensor device, from the server computing device.
  • the viewing device captures pose information of a viewing perspective of the viewing device relative to at least one of the scene or one or more of the objects in the scene as the viewing perspective of the viewing device moves in relation to the scene.
  • the viewing device renders a video stream of at least one of the one or more objects or the scene on a display element of the viewing device using the received initial 3D models and at least one of (i) the pose information of the sensor device or (ii) the pose information of the viewing perspective of the viewing device.
  • the server computing device stores the initial 3D models of the one or more objects, the 3D model of the scene, and the pose information in a database.
  • the viewing device receives the at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and the pose information of the sensor device, from the server computing device during a real-time streaming session.
  • the viewing device is a virtual reality (VR) headset.
  • the viewing device generates an updated 3D model of the one or more objects in the scene based upon at least one of (i) updated pose information received from the server computing device or (ii) updated pose information of the viewing perspective of the viewing device.
  • the viewing device receives an image from the server computing device and applies the image to at least one of the initial 3D model or the initial 3D models of the one or more objects to generate a photorealistic 3D model.
  • the initial 3D model of the scene is generated using simultaneous localization and mapping (SLAM). In some embodiments, the initial 3D models of the one or more objects in the scene are generated using simultaneous localization and mapping (SLAM).
  • the server computing device determines one or more changes to at least one of the 3D models of the one or more objects or the 3D model of the scene based upon the pose information. In some embodiments, the one or more changes comprise data associated with one or more feature points on the at least one of the 3D models of the one or more objects or the 3D model of the scene. In some embodiments, the viewing device receives the one or more changes from the server computing device and the viewing device updates the initial 3D models based upon the one or more changes.
  • FIG. 1 is a block diagram of a system for two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
  • SLAM simultaneous localization and mapping
  • FIG. 2 is a detailed block diagram of specific software processing modules executing in an exemplary image processing module for two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
  • SLAM simultaneous localization and mapping
  • FIG. 3 is a flow diagram of a method of two-dimensional (2D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
  • SLAM simultaneous localization and mapping
  • FIG. 4 is a flow diagram of a method of three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
  • SLAM simultaneous localization and mapping
  • FIG. 5 is an exemplary viewing device.
  • FIG. 6A depicts an exemplary object to be scanned by the sensor, and an exemplary 3D model of the object as generated by the system.
  • FIG. 6B is an exemplary user interface screen for display on the viewing device.
  • FIG. 7A is an exemplary 3D model prior to transmission to the viewing device.
  • FIG. 7B is an exemplary 3D model after being received by the viewing device.
  • FIG. 8A is an exemplary 3D scene prior to transmission to the viewing device.
  • FIG. 8B is an exemplary 3D scene after being received by the viewing device.
  • FIG. 1 is a block diagram of a system 100 for two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
  • SLAM simultaneous localization and mapping
  • the system 100 includes a sensor 103 coupled to a computing device 104 .
  • the computing device 104 includes an image processing module 106 .
  • the computing device can also be coupled to a database 108 or other data storage device, e.g., used for storing certain 3D models, images, pose information, and other data as described herein.
  • the system 100 also includes a communications network 110 coupled to the computing device 104 , and a viewing device 112 communicably coupled to the network 110 in order to receive, e.g., 3D model data, image data, and other related data from the computing device 104 for the purposes described herein.
  • the sensor 103 is positioned to capture images of a scene 101 , which includes one or more physical objects (e.g., objects 102 a - 102 b ).
  • Exemplary sensors that can be used in the system 100 include, but are not limited to, real-time 3D depth sensors, digital cameras, combination 3D depth and RGB camera devices, and other types of devices that are capable of capturing depth information of the pixels along with the images of a real-world object and/or scene to collect data on its position, location, and appearance.
  • the sensor 103 is embedded into the computing device 104 , such as a camera in a smartphone or a 3D VR capture device, for example.
  • the sensor 103 further includes an inertial measurement unit (IMU) to capture data points such as heading, linear acceleration, rotation, and the like.
  • IMU inertial measurement unit
  • the computing device 104 receives images (also called scans) of the scene 101 from the sensor 103 and processes the images to generate 3D models of objects (e.g., objects 102 a - 102 b ) represented in the scene 101 .
  • the computing device 104 can take on many forms, including both mobile and non-mobile forms.
  • Exemplary computing devices include, but are not limited to, a laptop computer, a desktop computer, a tablet computer, a smart phone, an internet of things (IoT) device, augmented reality (AR)/virtual reality (VR) devices (e.g., glasses, headset apparatuses, and so forth), or the like.
  • the senor 103 and computing device 104 can be embedded in a larger mobile structure such as a robot or unmanned aerial vehicle (UAV). It should be appreciated that other computing devices can be used without departing from the scope of the invention.
  • the computing device 104 includes network-interface components to connect to a communications network (e.g., network 110 ).
  • the network-interface components include components to connect to a wireless network, such as a Wi-Fi or cellular network, in order to access a wider network, such as the Internet.
  • the computing device 104 includes an image processing module 106 configured to receive images captured by the sensor 103 and analyze the images in a variety of ways, including detecting the position and location of objects represented in the images and generating 3D models of objects in the images.
  • the image processing module 106 is a hardware and/or software module that resides on the computing device 104 to perform functions associated with analyzing images capture by the scanner, including the generation of 3D models (e.g., .OBJ files) based upon objects in the images.
  • the functionality of the image processing module 106 is distributed among a plurality of computing devices.
  • the image processing module 106 operates in conjunction with other modules that are either also located on the computing device 104 or on other computing devices coupled to the computing device 104 . It should be appreciated that any number of computing devices, arranged in a variety of architectures, resources, and configurations (e.g., cluster computing, virtual computing, cloud computing) can be used without departing from the scope of the invention.
  • An exemplary image processing module 106 is the Starry Night SDK, available from VanGogh Imaging, Inc. of McLean, Va.
  • the image processing module 106 comprises specialized hardware (such as a processor or system-on-chip) that is embedded into, e.g., a circuit board or other similar component of another device.
  • the image processing module 106 is specifically programmed with the image processing and modeling software functionality described below.
  • FIG. 2 is a detailed block diagram 200 of specific software processing modules executing in an exemplary image processing module 106 at the computing device 104 .
  • the image processing module 106 receives images and related data 202 as input from the sensor (e.g., the RGB sensor, the 3D depth sensor and, optionally, the IMU).
  • the modules 204 - 214 each provides specific image processing and 3D model generation capabilities to the SLAM module 216 , which generates a dense map, a dense tracking, and a pose of the 3D models of the scene and objects in the scene.
  • the sparse tracking module 210 generates a sparse map and sparse tracking of the 3D models of the scene and objects in the scene.
  • the sparse tracking module 210 and the SLAM module 216 each sends its respective tracking information, along with the pose from the SLAM module 216 , to the unified tracking module 218 , which integrates the received information into a final pose of the 3D model(s).
  • the modules 210 and 216 also send its respective mapping information to the photogrammetry module 220 , which performs functions such as texture refinement, hole-filling, and geometric correction.
  • the updated 3D models from the photogrammetry module 220 are further processed by the shape-based registration module 222 .
  • the image processing module 106 provides the pose and photo-realistic 3D model as output 224 to, e.g., the viewing device 112 .
  • the database 108 is coupled to the computing device 104 , and operates to store data used by the image processing module 106 during its image analysis functions.
  • the data storage module 108 can be integrated with the server computing device 104 or be located on a separate computing device.
  • the communications network 110 may be a local network, such as a LAN, or a wide area network, such as the Internet and/or a cellular network.
  • the network 104 is comprised of several discrete networks and/or sub-networks (e.g., cellular to Internet) that enable the components of the system 100 to communicate with each other.
  • the viewing device 112 is a computing device that receives information such as image data, 3D model data, and other types of data described herein from the image processing module 106 of the server computing device 104 for rendering of the scene 101 and objects 102 a - 102 b as captured by the sensor 103 . As shown in FIG. 1 , the viewing device 112 is positioned at a second location that is remote from the first location where the sensor 103 and computing device 104 are located. It should be appreciated that, in some embodiments, the first location and second location do not need to be separate physical or geographical locations. Exemplary viewing devices include, but are not limited to, laptop computers, desktop computers, tablets, smartphones, smart televisions, VR/AR hardware (e.g., glasses, headset), IoT devices, and the like.
  • the viewing device 112 includes, e.g., a CPU 114 and a GPU 116 , which are specialized processors embedded in the viewing device 112 for the purpose of receiving 3D model data and pose data from the image processing module 106 via network 110 , updating 3D model(s) stored at the viewing device 112 using the received information, and rendering real-time image data (e.g., a video stream) based upon the updated 3D model(s) to provide a viewing experience to a user of the viewing device 112 that is the same as the viewing experience captured by the sensor 103 and the computing device 104 at the first location.
  • a CPU 114 and a GPU 116 which are specialized processors embedded in the viewing device 112 for the purpose of receiving 3D model data and pose data from the image processing module 106 via network 110 , updating 3D model(s) stored at the viewing device 112 using the received information, and rendering real-time image data (e.g., a video stream) based upon the updated 3D model(s) to provide a
  • FIG. 3 is a flow diagram of a method 300 of two-dimensional (2D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction, using the system 100 of FIG. 1 .
  • the sensor 103 captures ( 302 ) one or more images of the scene 101 , including one or more of the objects 102 a - 102 b and transmits the images to the image processing module 106 of the computing device 104 .
  • the image processing module 106 generates ( 304 ) 3D models of the scene 101 and objects 102 a - 102 b using the images captured by the sensor 103 .
  • the 3D generation is performed using Dense and/or sparse SLAM (or sometimes called Fusion) which stitches incoming scans into a single 3D model of scene or object.
  • SLAM sparse SLAM
  • Exemplary processing for the image processing module 106 is described above with respect to FIG. 2 .
  • the image processing module 106 also tracks ( 306 ) the pose of the sensor 103 relative to the scene 101 and the objects 102 a - 102 b. For example, if the sensor 103 is non-stationary (i.e., the sensor moves in relation to the scene and/or objects), the image processing module 106 receives pose information relating to the sensor 103 and stores the pose information in correlation with the captured images.
  • the image processing module 106 can perform a number of different actions relating to transferring the 3D models and pose information to a viewing device for rendering and viewing the images.
  • the image processing module 106 stores ( 308 ) the generated 3D models and relative pose information in, e.g., database 108 , for future retrieval and viewing by a viewing device (i.e., as part of a playback feature as will be described below).
  • the image processing module 106 transmits ( 310 ) the generated 3D models of the objects 102 a - 102 b and/or scene 101 to the viewing device 112 (e.g., via communications network 110 ) and, as the objects 102 a - 102 b and/or sensor 103 move in relation to each other, the image processing module 106 can stream in real-time the relative pose information to the viewing device 112 —such that the viewing device 112 can manipulate the previously-received 3D models to match the viewing experience being captured by the sensor 103 . To reduce the amount of data being transmitted to the viewing device, in some embodiments the image processing module 106 only transmits changes in the 3D model(s) to the viewing device 112 .
  • this process advantageously acts as a compressing technique because the amount of data is small but the viewing device 112 can replicate the complete viewing experience.
  • the CPU 114 of the viewing device 112 receives the pose information and the changes in the 3D model(s) from the image processing module 106 via network 108 .
  • the CPU 114 updates the 3D model(s) stored at the viewing device 112 using the received changes and pose information, and transmits the updated 3D model(s) to the GPU 116 for rendering into, e.g., stereo pair images, single 2D images, or other similar outputs—for viewing by a user at the viewing device 112 .
  • FIG. 5 depicts an exemplary viewing device 112 for the transmitted output.
  • the image processing module 106 transmits ( 312 ) the 3D models and relative pose information to the viewing device 112 , and the viewing device 112 renders the 2D video streams using the relative sensor (or camera) pose information.
  • the image processing module 106 further transmits image data captured by the sensor 103 to the viewing device 112 , and the viewing device 112 uses the image data to render, e.g., a photorealistic 3D model of the objects and/or scene captured by the sensor 103 .
  • the viewing device 112 uses the 3D model to create a virtual copy of the first location's live scene (‘virtual scene’) and the images in the video stream are rendered using the GPU 116 on the viewing device 112 .
  • FIG. 4 is a flow diagram of a method 400 of three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction, using the system 100 of FIG. 1 .
  • the sensor 103 captures ( 402 ) one or more images of the scene 101 , including one or more of the objects 102 a - 102 b and transmits the images to the image processing module 106 of the computing device 104 .
  • the image processing module 106 generates ( 404 ) 3D models of the scene 101 and objects 102 a - 102 b using the images captured by the sensor 103 .
  • the 3D models generated by the image processing module 106 are photorealistic.
  • the 3D generation is performed using dense and/or sparse SLAM (sometimes called Fusion) which stitches incoming scans into a single 3D model of a scene and/or objects in the scene.
  • SLAM sometimes called Fusion
  • Exemplary processing for the image processing module 106 is described above with respect to FIG. 2 .
  • FIG. 6A depicts an exemplary object 102 a (e.g., a toy horse) to be scanned by the sensor, and an exemplary 3D model 602 generated by the system 100 .
  • the image processing module 106 can perform a number of different actions relating to transferring the 3D models and pose information to a viewing device (e.g., viewing device 112 ) for rendering and viewing the images.
  • a viewing device e.g., viewing device 112
  • the image processing module 106 stores ( 406 ) the generated 3D models and relative pose information in, e.g., database 108 , for future retrieval and viewing by a viewing device (i.e., as part of a playback feature as will be described below).
  • the image processing module 106 transmits ( 408 ) the generated 3D models of the objects 102 a - 102 b and/or scene 101 to the viewing device 112 (e.g., via communications network 110 ) and, as the objects 102 a - 102 b and/or sensor 103 move in relation to each other, the image processing module 106 can stream in real-time the relative pose information as well as any texture changes to the viewing device 112 —such that the viewing device 112 can manipulate the previously-received 3D models to match the viewing experience being captured by the sensor 103 .
  • FIG. 6B depicts an exemplary user interface screen for display on the viewing device 112 .
  • the user interface screen includes options 604 for displaying the 3D models—including an option to Stream Object.
  • the image processing module 106 only transmits changes in the 3D model(s) to the viewing device 112 .
  • this process advantageously acts as a compressing technique because the amount of data is small but the viewing device 112 can replicate the complete viewing experience.
  • FIG. 7A depicts an exemplary 3D model prior to transmission to the viewing device 112
  • FIG. 7B depicts an exemplary 3D model after being received by the viewing device 112 —including the post-processing steps that generate a photorealistic reconstructed 3D model.
  • the image processing module 106 transmits ( 410 ) the 3D models and relative pose information to the viewing device 112 , and the viewing device 112 renders viewing images of the 3D models using both the relative pose information of the viewing device 112 and the virtual copy of the remote scene and object(s).
  • the viewing device 112 uses the 3D model to create a virtual copy of the first location's live scene (‘virtual scene’) and the images in the video stream are rendered using the GPU 116 on the viewing device 112 .
  • the viewing device 112 is not tied to the original video capture perspective of the sensor 103 because the complete 3D model is recreated at the viewing device 112 . Therefore, the viewing device (e.g., the CPU 114 and GPU 116 ) at the second location can manipulate the 3D model(s) locally in order to produce a completely independent perspective of the model(s) than what is being captured by the sensor 103 at the first location. For example, the viewing device 112 can be used to ‘walk around’ the virtual scene which is a true 3D copy of the first location (such as via a VR viewing device).
  • the scene 101 may be static (e.g., inside a museum) and the sensor 103 may move in relation to the scene 101 .
  • the sensor 103 captures one or more images of the scene 101 , and the image processing module 106 can easily generate a 3D model of the scene 101 . Further, as long as the image processing module 106 captures the exact pose of the sensor 103 relative to the scene, the image processing module 106 renders a 2D image using the 3D model. Therefore, the image processing module 106 can simply transmit the 3D model of the scene and relative pose information to the viewing device 112 for rendering, e.g., as a video stream of the static scene.
  • the image processing module 106 only needs to generate a photorealistic 3D model of the room and the sequential sensor pose information as it moves (e.g., pose one at time one, pose two at time two, etc.).
  • the viewing device 112 can render a video of the room completely and accurately based on the 3D model and pose information.
  • the resultant amount of information needed to replicate the video stream at the viewing device 112 is a fraction of the size that would be required to save the entire video stream and transmit the stream to the viewing device 112 .
  • the only cost is the conversion of 3D model(s) into 2D images using the GPU 116 at the viewing device 112 .
  • this 3D model to 2D video conversion can be done, e.g., the cloud or using another computing device.
  • FIG. 8A depicts an exemplary 3D scene prior to transmission to the viewing device 112
  • FIG. 8B depicts an exemplary 3D scene after being received by the viewing device 112 —including the post-processing steps that generate a photorealistic reconstructed 3D scene.
  • the scene 101 may be static and include one or more static or moving objects 102 a - 102 b along with a moving sensor 103 .
  • the image processing module 106 Similar to the static scene use case described previously, the image processing module 106 generates 3D models of the object(s) in the scene and captures the pose information of the sensor relative to the object(s). The image processing module 106 transmits the 3D models and the relative pose information to the viewing device 112 , and the device 112 can recreate the scene 101 plus the exact locations of the objects 102 a - 102 b within the scene to completely replicate the captured scene.
  • the scene 101 may include non-rigid, moving objects—such as people.
  • the image processing module 106 can generate a 3D model of the non-rigid object and using non-rigid registration techniques, the image processing module 106 can then send ‘sparse’ information to the viewing device 112 . This ‘sparse’ information can then be used by the viewing device 112 to reshape the 3D model and then render the 3D model from the scene 101 .
  • the image processing module 106 For example, once the image processing module 106 generates a 3D model of a human face and transfers the 3D model to the viewing device 112 , the image processing module 106 only needs to track a small number of feature points of the face and transfer those feature points to the viewing device 112 to enable recreation of a facial expression accurately, e.g., in a video on the viewing device 112 .
  • the amount of this ‘sparse’ information is a fraction of the dataset normally needed to send the entire new 3D model to the viewing device 112 .
  • the system 100 can capture and transmit a scene 101 as a changing 3D model that is viewable by the viewing device 112 (e.g., VR or AR glasses).
  • the pose information of the sensor 103 is not required because having the 3D model available at the viewing device 112 allows for rendering of the viewing images from any angle. Therefore, the sensor 103 and image processing module 103 only needs to capture and transmit to the viewing device 112 the object pose relative to the scene 101 as well as sparse feature points for the non-rigid objects along with texture change information. Using just this set of information, the viewing device 112 is capable of fully recreating for playback the scene and objects captured by the sensor and image processing module.
  • the system can capture changes in texture due to lighting conditions as a location of the lighting source—especially if the location and brightness of the lighting source is generally stable. As a result, the system can render the correct texture without having to transmit the actual texture to the viewing device 112 . Instead, the viewing device 112 can simply render the texture based on the 3D model and the location and brightness of the lighting source. In another example, some key texture information (such as high-resolution face texture(s)) can be sent separately and then added to the ‘virtual scene’ to provide more realistic texture information.
  • key texture information such as high-resolution face texture(s)
  • a sensor device at a first location captures images of objects in the scene (e.g., players, referees, game ball, etc.) and the scene itself (e.g., the playing field).
  • the sensor device also captures pose information of the sensor device—both as the sensor device moves throughout the scene and as the objects move within the scene in relation to the sensor device.
  • the server computing device coupled to the sensor device uses the captured images and the pose information to generate initial 3D models of the players, ball, and other objects in the scene and an initial 3D model of the scene.
  • the server computing device transmits the initial 3D models and the pose information to the remote viewing device (e.g., VR headset), which then generates a virtual representation of the scene (including the objects) as a video stream.
  • the VR headset can capture pose information associated with the viewing perspective of the VR headset—which is independent from the pose information of the sensor device received from the server computing device.
  • the headset can then utilize its own pose information to render a video stream from the perspective of the headset and, as the person wearing the headset moves around, the headset can render a different viewpoint of the scene and objects in the scene from that being captured by the sensor device.
  • the viewer at the remote location can traverse the scene and view the action from a completely independent perspective from the sensor that is viewing the scene locally providing an immersive and unique experience for the viewer.
  • the sensor device at the first location captures images of the museum scene as well as artwork, sculptures, people, and other objects in the museum.
  • the sensor device also captures pose information of the sensor device—both as the sensor device moves throughout the scene and as the objects move within the scene in relation to the sensor device.
  • the server computing device coupled to the sensor device uses the captured images and the pose information to generate initial 3D models of the objects in the scene and an initial 3D model of the scene.
  • a sensor device can be used to capture a guided tour of the museum, viewing the various exhibits and wings.
  • the server computing device then transmits the initial 3D models and the pose information to the remote viewing device (e.g., tablet, mobile phone), which then generates a virtual representation of the museum (including the artwork, people, etc.) using the 3D models and the pose information. Then, as the pose information changes (due to the sensor device moving through the museum), the server computing device can transmit the changing pose information to the remote viewing device, which automatically renders an updated representation of the scene and objects based upon the changing pose information, as part of a live and/or prerecorded video stream. In this way, the person viewing the stream can feel as though they are walking through the museum and seeing the exhibits and artwork in a first-person perspective.
  • the remote viewing device e.g., tablet, mobile phone
  • Live Streaming for example, in order to live stream a 3D scene such as a sports event, a concert, a live presentation, and the like, the techniques described herein can be used to immediately send out a sparse frame to the viewing device at the remote location. As the 3D model becomes more complete, the techniques provide for adding full texture.
  • the techniques can leverage 3D model compression to further reduce the geometric complexity and provide a seamless streaming experience; Recording for Later ‘Replay’—the techniques can advantageously be used to store images and relative pose information (as described above) in order to replay the scene and objects at a later time.
  • the computing device can store 3D models, image data, pose data, and sparse feature point data associated with the sensor capturing, e.g., a video of the scene and objects in the scene. Then, the viewing device 112 can later receive this information and recreate the entire video using the models, images, pose data and feature point data.
  • the above-described techniques can be implemented in digital and/or analog electronic circuitry, or in computer hardware, firmware, software, or in combinations of them.
  • the implementation can be as a computer program product, i.e., a computer program tangibly embodied in a machine-readable storage device, for execution by, or to control the operation of, a data processing apparatus, e.g., a programmable processor, a computer, and/or multiple computers.
  • a computer program can be written in any form of computer or programming language, including source code, compiled code, interpreted code, and/or machine code, and the computer program can be deployed in any form, including as a stand-alone program or as a subroutine, element, or other unit suitable for use in a computing environment.
  • a computer program can be deployed to be executed on one computer or on multiple computers at one or more sites.
  • Method steps can be performed by one or more specialized processors executing a computer program to perform functions by operating on input data and/or generating output data. Method steps can also be performed by, and an apparatus can be implemented as, special purpose logic circuitry, e.g., a FPGA (field programmable gate array), a FPAA (field-programmable analog array), a CPLD (complex programmable logic device), a PSoC (Programmable System-on-Chip), ASIP (application-specific instruction-set processor), or an ASIC (application-specific integrated circuit), or the like.
  • Subroutines can refer to portions of the stored computer program and/or the processor, and/or the special circuitry that implement one or more functions.
  • processors suitable for the execution of a computer program include, by way of example, special purpose microprocessors.
  • a processor receives instructions and data from a read-only memory or a random access memory or both.
  • the essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and/or data.
  • Memory devices such as a cache, can be used to temporarily store data. Memory devices can also be used for long-term data storage.
  • a computer also includes, or is operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
  • a computer can also be operatively coupled to a communications network in order to receive instructions and/or data from the network and/or to transfer instructions and/or data to the network.
  • Computer-readable storage mediums suitable for embodying computer program instructions and data include all forms of volatile and non-volatile memory, including by way of example semiconductor memory devices, e.g., DRAM, SRAM, EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and optical disks, e.g., CD, DVD, HD-DVD, and Blu-ray disks.
  • the processor and the memory can be supplemented by and/or incorporated in special purpose logic circuitry.
  • the above described techniques can be implemented on a computer in communication with a display device, e.g., a CRT (cathode ray tube), plasma, or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse, a trackball, a touchpad, or a motion sensor, by which the user can provide input to the computer (e.g., interact with a user interface element).
  • a display device e.g., a CRT (cathode ray tube), plasma, or LCD (liquid crystal display) monitor
  • a keyboard and a pointing device e.g., a mouse, a trackball, a touchpad, or a motion sensor, by which the user can provide input to the computer (e.g., interact with a user interface element).
  • feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, and/or tactile input.
  • feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback
  • input from the user can be received in any form, including acoustic, speech, and/or tactile input.
  • the above described techniques can be implemented in a distributed computing system that includes a back-end component.
  • the back-end component can, for example, be a data server, a middleware component, and/or an application server.
  • the above described techniques can be implemented in a distributed computing system that includes a front-end component.
  • the front-end component can, for example, be a client computer having a graphical user interface, a Web browser through which a user can interact with an example implementation, and/or other graphical user interfaces for a transmitting device.
  • the above described techniques can be implemented in a distributed computing system that includes any combination of such back-end, middleware, or front-end components.
  • Transmission medium can include any form or medium of digital or analog data communication (e.g., a communication network).
  • Transmission medium can include one or more packet-based networks and/or one or more circuit-based networks in any configuration.
  • Packet-based networks can include, for example, the Internet, a carrier internet protocol (IP) network (e.g., local area network (LAN), wide area network (WAN), campus area network (CAN), metropolitan area network (MAN), home area network (HAN)), a private IP network, an IP private branch exchange (IPBX), a wireless network (e.g., radio access network (RAN), Bluetooth, Wi-Fi, WiMAX, general packet radio service (GPRS) network, HiperLAN), and/or other packet-based networks.
  • IP carrier internet protocol
  • RAN radio access network
  • GPRS general packet radio service
  • HiperLAN HiperLAN
  • Circuit-based networks can include, for example, the public switched telephone network (PSTN), a legacy private branch exchange (PBX), a wireless network (e.g., RAN, code-division multiple access (CDMA) network, time division multiple access (TDMA) network, global system for mobile communications (GSM) network), and/or other circuit-based networks.
  • PSTN public switched telephone network
  • PBX legacy private branch exchange
  • CDMA code-division multiple access
  • TDMA time division multiple access
  • GSM global system for mobile communications
  • Communication protocols can include, for example, Ethernet protocol, Internet Protocol (IP), Voice over IP (VOIP), a Peer-to-Peer (P2P) protocol, Hypertext Transfer Protocol (HTTP), Session Initiation Protocol (SIP), H.323, Media Gateway Control Protocol (MGCP), Signaling System #7 (SS7), a Global System for Mobile Communications (GSM) protocol, a Push-to-Talk (PTT) protocol, a PTT over Cellular (POC) protocol, Universal Mobile Telecommunications System (UMTS), 3GPP Long Term Evolution (LTE) and/or other communication protocols.
  • IP Internet Protocol
  • VOIP Voice over IP
  • P2P Peer-to-Peer
  • HTTP Hypertext Transfer Protocol
  • SIP Session Initiation Protocol
  • H.323 H.323
  • MGCP Media Gateway Control Protocol
  • SS7 Signaling System #7
  • GSM Global System for Mobile Communications
  • PTT Push-to-Talk
  • POC PTT over Cellular
  • UMTS
  • Devices of the computing system can include, for example, a computer, a computer with a browser device, a telephone, an IP phone, a mobile device (e.g., cellular phone, personal digital assistant (PDA) device, smart phone, tablet, laptop computer, electronic mail device), and/or other communication devices.
  • the browser device includes, for example, a computer (e.g., desktop computer and/or laptop computer) with a World Wide Web browser (e.g., ChromeTM from Google, Inc., Microsoft® Internet Explorer® available from Microsoft Corporation, and/or Mozilla® Firefox available from Mozilla Corporation).
  • Mobile computing device include, for example, a Blackberry® from Research in Motion, an iPhone® from Apple Corporation, and/or an AndroidTM-based device.
  • IP phones include, for example, a Cisco® Unified IP Phone 7985G and/or a Cisco® Unified Wireless Phone 7920 available from Cisco Systems, Inc.
  • Comprise, include, and/or plural forms of each are open ended and include the listed parts and can include additional parts that are not listed. And/or is open ended and includes one or more of the listed parts and combinations of the listed parts.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Described are methods and systems for generating a video stream of a scene including one or more objects. A sensor captures images of objects in a scene. A server coupled to the sensor, for each image, generates an initial 3D model for the objects and an initial 3D model of the scene. The server, for each image, captures pose information of the sensor as the sensor moves in relation to the scene or as the objects move in relation to the sensor. A viewing device receives the models and the pose information from the server. The viewing device captures pose information of the viewing device as the viewing device moves in relation to the scene. The viewing device renders a video stream on a display element using the received 3D models and at least one of the pose information of the sensor or the pose information of the viewing device.

Description

    RELATED APPLICATIONS
  • This application claims priority to U.S. Provisional Patent Application No. 62/405,372, filed on Oct. 7, 2016, the entirety of which is incorporated herein by reference.
  • TECHNICAL FIELD
  • The subject matter of this application relates generally to methods and apparatuses, including computer program products, for real-time remote collaboration and virtual presence using simultaneous localization and mapping (SLAM) to construct three-dimensional (3D) models and updating a scene based upon sparse data, including two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
  • BACKGROUND
  • To visually experience a live event from a remote location, a video can be streamed to a viewer at the remote location, or the event can be recorded and streamed later to the viewer. However, because bandwidth is limited in most cases, some form of compression (either lossless or lossy) such as MPEG-4 is used to reduce the amount of data being transmitted over the network by a factor of 100 or more. This allows the transmission of the video to be practical over low-bandwidth wired networks or most wireless networks.
  • With the advent of virtual reality (VR) and associated viewing devices (such as VR headsets), there is an emerging interest in virtually experiencing live events remotely. But, the amount of data required for transmission over a network may cause significant problems with quality and efficiency of the viewing experience, because an example data size for a single 3D model could be in tens of megabytes. As an example, for sixty frames per second, transmitting and processing the frames in sequence could result in gigabytes of data per second. Even with significant compression such as not transmitting portions of the scene that do not change from frame to frame (similar to video compression strategy), the process still results in tens of megabytes of data to be transmitted remotely—which makes it impractical, especially for wireless networks. Also, methods to further compress the data, such as traditional 3D compression to reduce the number of triangles, can significantly reduce visual quality.
  • SUMMARY
  • Therefore, what is needed are methods and systems for lossless (or slightly lossy) compression to transmit a live three-dimensional scene, which in some cases includes objects in the scene, to a remote location by segmenting the scene as set of rigid and non-rigid photo-realistic 3D model objects and backgrounds (these are also called assets)—and then transmitting the data to the remote location once. Once the data transmission is accomplished, only the sparse pose information of the assets needs to be transmitted to the remote location. At the receiving device, a local computer graphics unit is used to render a replica of the remote scene while using a fraction of the bandwidth of traditional approaches. The bandwidth savings enables the application of these techniques to wireless networks. In the case of rapid scene changes when new assets are presented, the system is capable of transmitting new assets to the remote location and rendering the assets accordingly.
  • Simultaneous localization and mapping (SLAM) is a computer modeling technique that is used to map and track the real world as a 3D model. The methods and systems described herein utilize SLAM to compress in real time a live video stream of a remote scene for the purpose of viewing that scene from any location. Once re-rendered as a 3D model at the viewer's device, the live remote scene can then be viewed using a VR headset as if the viewer is at the remote location. It should be appreciated that, in one embodiment, the technology described herein can be used to capture a scene (including objects in the scene) of a first location as one or more 3D models, transfer the 3D model(s) in real time to a second location that is remote from the first location, and then render viewing images of the 3D model from a different viewing perspective using the pose of a viewing element (e.g., digital screen, camera, or image viewer, headset) at the second location. In some cases, the second location can be equipped with a VR headset or other similar hardware to view the 3D model of first location from any viewing angle. Even when there are substantive changes in the scene at the first location, the systems and methods described herein advantageously transfer only the changing portions of the scene and/or objects in the scene to the second location.
  • Therefore, instead of traditional methods that involve streaming new 2D image frames from the first location to the second location, the methods and systems described herein transmit an entire 3D model of the scene—and objects in the scene—from the first location to the second location, and use a graphics processor in the viewer's device to render image(s) using the 3D model. It should be appreciated that the techniques described herein provide the advantage of ‘virtually’ copying the scene and objects at the first location, storing 3D models of the scene and objects in memory of the viewer's device at the second location, and then rendering the scene and objects in real time (e.g., as a video stream) from the ‘virtual’ scene.
  • Another advantage provided by the methods and systems described herein is that the image processing device at the first location needs only to transmit changes in the ‘position’ of the objects and the sensor location relative to the scene for each frame—and not the entire scene for each frame—to the viewer's device at the second location, in order for the viewer to move the objects and the sensor location in the virtual scene to replicate the same visual experience as if the remote viewer is at the first location. Because transmission of the changes in position and sensor location involves much less data than sending the entire scene, this technique advantageously provides for substantial compression of, e.g., a video stream transmitted from the first location to the second location.
  • Similarly, for moving, rigid objects in the scene at the first location, the systems and methods described herein advantageously transfer only the pose of the object to the viewer's device at the second location once the viewer's device has received the 3D model(s). For non-rigid objects such as people, once the viewer's device at the second location has received the full 3D model of a non-rigid object from the first location, subsequent transmissions need to include only the sparse feature information of the non-rigid object in order for the viewer's device at the second location to recreate the scene correctly.
  • For example, the sparse feature information can include feature points such as points associated with aspects of the person's body (e.g., head, feet, hands, arms). As the person moves in the scene, the sensor and server computing device need only capture and transmit the positional changes associated with these feature points to the viewing device—instead of the entire model—and the viewing device can update the 3D model at the remote location using the sparse feature information to track the person's movements through the scene.
  • The invention, in one aspect, features a system for generating a video stream of a scene including one or more objects. The system comprises a sensor device that captures a plurality of images of one or more objects in a scene. The system further comprises a server computing device coupled to the sensor device that, for each image, generates an initial 3D model for each of the one or more objects in the scene using the image. The server computing device, for each image, generates an initial 3D model of the scene using the image. The server computing device, for each image, captures pose information of the sensor device relative to at least one of the scene or one or more of the objects in the scene as the sensor device moves in relation to the scene. The system further comprises a viewing device coupled to the server computing device. The viewing device receives (i) at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and (ii) the pose information of the sensor device, from the server computing device. The viewing device captures pose information of a viewing perspective of the viewing device relative to at least one of the scene or one or more of the objects in the scene as the viewing perspective of the viewing device moves in relation to the scene. The viewing device renders a video stream of at least one of the one or more objects or the scene on a display element of the viewing device using the received initial 3D models and at least one of (i) the pose information of the sensor device or (ii) the pose information of the viewing perspective of the viewing device.
  • The invention, in another aspect, features a computerized method of generating a video stream of a scene including one or more objects. A sensor device captures a plurality of images of one or more objects in a scene. A server computing device coupled to the sensor device, for each image, generates an initial 3D model for each of the one or more objects in the scene using the image. The server computing device, for each image, generates an initial 3D model of the scene using the image. The server computing device, for each image, captures pose information of the sensor device relative to at least one of the scene or one or more of the objects in the scene as the sensor device moves in relation to the scene. A viewing device coupled to the server computing device receives (i) at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and (ii) the pose information of the sensor device, from the server computing device. The viewing device captures pose information of a viewing perspective of the viewing device relative to at least one of the scene or one or more of the objects in the scene as the viewing perspective of the viewing device moves in relation to the scene. The viewing device renders a video stream of at least one of the one or more objects or the scene on a display element of the viewing device using the received initial 3D models and at least one of (i) the pose information of the sensor device or (ii) the pose information of the viewing perspective of the viewing device.
  • Any of the above aspects can include one or more of the following features. In some embodiments, the server computing device stores the initial 3D models of the one or more objects, the 3D model of the scene, and the pose information in a database. In some embodiments, the viewing device receives the at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and the pose information of the sensor device, from the server computing device during a real-time streaming session.
  • In some embodiments, the viewing device is a virtual reality (VR) headset. In some embodiments, the viewing device generates an updated 3D model of the one or more objects in the scene based upon at least one of (i) updated pose information received from the server computing device or (ii) updated pose information of the viewing perspective of the viewing device. In some embodiments, the viewing device receives an image from the server computing device and applies the image to at least one of the initial 3D model or the initial 3D models of the one or more objects to generate a photorealistic 3D model.
  • In some embodiments, the initial 3D model of the scene is generated using simultaneous localization and mapping (SLAM). In some embodiments, the initial 3D models of the one or more objects in the scene are generated using simultaneous localization and mapping (SLAM). In some embodiments, the server computing device determines one or more changes to at least one of the 3D models of the one or more objects or the 3D model of the scene based upon the pose information. In some embodiments, the one or more changes comprise data associated with one or more feature points on the at least one of the 3D models of the one or more objects or the 3D model of the scene. In some embodiments, the viewing device receives the one or more changes from the server computing device and the viewing device updates the initial 3D models based upon the one or more changes.
  • Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating the principles of the invention by way of example only.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The advantages of the invention described above, together with further advantages, may be better understood by referring to the following description taken in conjunction with the accompanying drawings. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention.
  • FIG. 1 is a block diagram of a system for two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
  • FIG. 2 is a detailed block diagram of specific software processing modules executing in an exemplary image processing module for two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
  • FIG. 3 is a flow diagram of a method of two-dimensional (2D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
  • FIG. 4 is a flow diagram of a method of three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
  • FIG. 5 is an exemplary viewing device.
  • FIG. 6A depicts an exemplary object to be scanned by the sensor, and an exemplary 3D model of the object as generated by the system.
  • FIG. 6B is an exemplary user interface screen for display on the viewing device.
  • FIG. 7A is an exemplary 3D model prior to transmission to the viewing device.
  • FIG. 7B is an exemplary 3D model after being received by the viewing device.
  • FIG. 8A is an exemplary 3D scene prior to transmission to the viewing device.
  • FIG. 8B is an exemplary 3D scene after being received by the viewing device.
  • DETAILED DESCRIPTION
  • FIG. 1 is a block diagram of a system 100 for two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction. Certain embodiments of the systems and methods described in this application utilize:
  • the real-time object recognition and modeling techniques as described in U.S. Pat. No. 9,715,761, titled “Real-Time 3D Computer Vision Processing Engine for Object Recognition, Reconstruction, and Analysis;”
    the dynamic 3D modeling techniques as described in U.S. patent application Ser. No. 14/849,172, titled “Real-Time Dynamic Three-Dimensional Adaptive Object Recognition and Model Reconstruction;”
    the shape-based registration and modeling techniques described in U.S. patent application Ser. No. 15/441,166, titled “Shape-Based Registration for Non-Rigid Objects with Large Holes;”
    the 3D photogrammetry techniques described in U.S. patent application Ser. No. 15/596,590, titled “3D Photogrammetry;” and
    the sparse SLAM techniques described in U.S. patent application Ser. No. 15/638,278, titled “Sparse Simultaneous Localization and Mapping with Unified Tracking.”
  • Each of the above-referenced patents and patent applications is incorporated by reference herein in its entirety. The methods and systems described in the above patents and patent applications, and in the present patent application, are available by implementing the Starry Night SDK, available from VanGogh Imaging, Inc. of McLean, Virginia.
  • The system 100 includes a sensor 103 coupled to a computing device 104. The computing device 104 includes an image processing module 106. In some embodiments, the computing device can also be coupled to a database 108 or other data storage device, e.g., used for storing certain 3D models, images, pose information, and other data as described herein. The system 100 also includes a communications network 110 coupled to the computing device 104, and a viewing device 112 communicably coupled to the network 110 in order to receive, e.g., 3D model data, image data, and other related data from the computing device 104 for the purposes described herein.
  • The sensor 103 is positioned to capture images of a scene 101, which includes one or more physical objects (e.g., objects 102 a-102 b). Exemplary sensors that can be used in the system 100 include, but are not limited to, real-time 3D depth sensors, digital cameras, combination 3D depth and RGB camera devices, and other types of devices that are capable of capturing depth information of the pixels along with the images of a real-world object and/or scene to collect data on its position, location, and appearance. In some embodiments, the sensor 103 is embedded into the computing device 104, such as a camera in a smartphone or a 3D VR capture device, for example. In some embodiments, the sensor 103 further includes an inertial measurement unit (IMU) to capture data points such as heading, linear acceleration, rotation, and the like.
  • The computing device 104 receives images (also called scans) of the scene 101 from the sensor 103 and processes the images to generate 3D models of objects (e.g., objects 102 a-102 b) represented in the scene 101. The computing device 104 can take on many forms, including both mobile and non-mobile forms. Exemplary computing devices include, but are not limited to, a laptop computer, a desktop computer, a tablet computer, a smart phone, an internet of things (IoT) device, augmented reality (AR)/virtual reality (VR) devices (e.g., glasses, headset apparatuses, and so forth), or the like. In some embodiments, the sensor 103 and computing device 104 can be embedded in a larger mobile structure such as a robot or unmanned aerial vehicle (UAV). It should be appreciated that other computing devices can be used without departing from the scope of the invention. The computing device 104 includes network-interface components to connect to a communications network (e.g., network 110). In some embodiments, the network-interface components include components to connect to a wireless network, such as a Wi-Fi or cellular network, in order to access a wider network, such as the Internet.
  • The computing device 104 includes an image processing module 106 configured to receive images captured by the sensor 103 and analyze the images in a variety of ways, including detecting the position and location of objects represented in the images and generating 3D models of objects in the images.
  • The image processing module 106 is a hardware and/or software module that resides on the computing device 104 to perform functions associated with analyzing images capture by the scanner, including the generation of 3D models (e.g., .OBJ files) based upon objects in the images. In some embodiments, the functionality of the image processing module 106 is distributed among a plurality of computing devices. In some embodiments, the image processing module 106 operates in conjunction with other modules that are either also located on the computing device 104 or on other computing devices coupled to the computing device 104. It should be appreciated that any number of computing devices, arranged in a variety of architectures, resources, and configurations (e.g., cluster computing, virtual computing, cloud computing) can be used without departing from the scope of the invention. An exemplary image processing module 106 is the Starry Night SDK, available from VanGogh Imaging, Inc. of McLean, Va.
  • It should be appreciated that in one embodiment, the image processing module 106 comprises specialized hardware (such as a processor or system-on-chip) that is embedded into, e.g., a circuit board or other similar component of another device. In this embodiment, the image processing module 106 is specifically programmed with the image processing and modeling software functionality described below.
  • FIG. 2 is a detailed block diagram 200 of specific software processing modules executing in an exemplary image processing module 106 at the computing device 104. As shown in FIG. 2, the image processing module 106 receives images and related data 202 as input from the sensor (e.g., the RGB sensor, the 3D depth sensor and, optionally, the IMU). The modules 204-214 each provides specific image processing and 3D model generation capabilities to the SLAM module 216, which generates a dense map, a dense tracking, and a pose of the 3D models of the scene and objects in the scene. Also, as shown, the sparse tracking module 210 generates a sparse map and sparse tracking of the 3D models of the scene and objects in the scene. The sparse tracking module 210 and the SLAM module 216 each sends its respective tracking information, along with the pose from the SLAM module 216, to the unified tracking module 218, which integrates the received information into a final pose of the 3D model(s). The modules 210 and 216 also send its respective mapping information to the photogrammetry module 220, which performs functions such as texture refinement, hole-filling, and geometric correction. The updated 3D models from the photogrammetry module 220 are further processed by the shape-based registration module 222. As a result, the image processing module 106 provides the pose and photo-realistic 3D model as output 224 to, e.g., the viewing device 112.
  • Further details about the specific functionality and processing for each module described above with respect to FIG. 2 is described in the real-time object recognition and modeling techniques as described in U.S. Pat. No. 9,715,761, titled “Real-Time 3D Computer Vision Processing Engine for Object Recognition, Reconstruction, and Analysis;” U.S. patent application Ser. No. 14/849,172, titled “Real-Time Dynamic Three-Dimensional Adaptive Object Recognition and Model Reconstruction;” U.S. patent application Ser. No. 15/441,166, titled “Shape-Based Registration for Non-Rigid Objects with Large Holes;” U.S. patent application Ser. No. 15/596,590, titled “3D Photogrammetry;” and U.S. patent application Ser. No. 15/638,278, titled “Sparse Simultaneous Localization and Mapping with Unified Tracking,” each of which is incorporated herein by reference.
  • The database 108 is coupled to the computing device 104, and operates to store data used by the image processing module 106 during its image analysis functions. The data storage module 108 can be integrated with the server computing device 104 or be located on a separate computing device.
  • The communications network 110 may be a local network, such as a LAN, or a wide area network, such as the Internet and/or a cellular network. In some embodiments, the network 104 is comprised of several discrete networks and/or sub-networks (e.g., cellular to Internet) that enable the components of the system 100 to communicate with each other.
  • The viewing device 112 is a computing device that receives information such as image data, 3D model data, and other types of data described herein from the image processing module 106 of the server computing device 104 for rendering of the scene 101 and objects 102 a-102 b as captured by the sensor 103. As shown in FIG. 1, the viewing device 112 is positioned at a second location that is remote from the first location where the sensor 103 and computing device 104 are located. It should be appreciated that, in some embodiments, the first location and second location do not need to be separate physical or geographical locations. Exemplary viewing devices include, but are not limited to, laptop computers, desktop computers, tablets, smartphones, smart televisions, VR/AR hardware (e.g., glasses, headset), IoT devices, and the like.
  • The viewing device 112 includes, e.g., a CPU 114 and a GPU 116, which are specialized processors embedded in the viewing device 112 for the purpose of receiving 3D model data and pose data from the image processing module 106 via network 110, updating 3D model(s) stored at the viewing device 112 using the received information, and rendering real-time image data (e.g., a video stream) based upon the updated 3D model(s) to provide a viewing experience to a user of the viewing device 112 that is the same as the viewing experience captured by the sensor 103 and the computing device 104 at the first location.
  • FIG. 3 is a flow diagram of a method 300 of two-dimensional (2D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction, using the system 100 of FIG. 1. The sensor 103 captures (302) one or more images of the scene 101, including one or more of the objects 102 a-102 b and transmits the images to the image processing module 106 of the computing device 104. The image processing module 106 generates (304) 3D models of the scene 101 and objects 102 a-102 b using the images captured by the sensor 103. The 3D generation is performed using Dense and/or sparse SLAM (or sometimes called Fusion) which stitches incoming scans into a single 3D model of scene or object. Exemplary processing for the image processing module 106 is described above with respect to FIG. 2.
  • The image processing module 106 also tracks (306) the pose of the sensor 103 relative to the scene 101 and the objects 102 a-102 b. For example, if the sensor 103 is non-stationary (i.e., the sensor moves in relation to the scene and/or objects), the image processing module 106 receives pose information relating to the sensor 103 and stores the pose information in correlation with the captured images.
  • Once the image processing module 106 has generated the 3D models and captured the relative pose information, the image processing module 106 can perform a number of different actions relating to transferring the 3D models and pose information to a viewing device for rendering and viewing the images. In one action, the image processing module 106 stores (308) the generated 3D models and relative pose information in, e.g., database 108, for future retrieval and viewing by a viewing device (i.e., as part of a playback feature as will be described below).
  • In another action, the image processing module 106 transmits (310) the generated 3D models of the objects 102 a-102 b and/or scene 101 to the viewing device 112 (e.g., via communications network 110) and, as the objects 102 a-102 b and/or sensor 103 move in relation to each other, the image processing module 106 can stream in real-time the relative pose information to the viewing device 112—such that the viewing device 112 can manipulate the previously-received 3D models to match the viewing experience being captured by the sensor 103. To reduce the amount of data being transmitted to the viewing device, in some embodiments the image processing module 106 only transmits changes in the 3D model(s) to the viewing device 112. As explained above, this process advantageously acts as a compressing technique because the amount of data is small but the viewing device 112 can replicate the complete viewing experience. For example, the CPU 114 of the viewing device 112 receives the pose information and the changes in the 3D model(s) from the image processing module 106 via network 108. The CPU 114 updates the 3D model(s) stored at the viewing device 112 using the received changes and pose information, and transmits the updated 3D model(s) to the GPU 116 for rendering into, e.g., stereo pair images, single 2D images, or other similar outputs—for viewing by a user at the viewing device 112. FIG. 5 depicts an exemplary viewing device 112 for the transmitted output.
  • In another action, the image processing module 106 transmits (312) the 3D models and relative pose information to the viewing device 112, and the viewing device 112 renders the 2D video streams using the relative sensor (or camera) pose information. In some embodiments, the image processing module 106 further transmits image data captured by the sensor 103 to the viewing device 112, and the viewing device 112 uses the image data to render, e.g., a photorealistic 3D model of the objects and/or scene captured by the sensor 103. Thus, the viewing device 112 uses the 3D model to create a virtual copy of the first location's live scene (‘virtual scene’) and the images in the video stream are rendered using the GPU 116 on the viewing device 112.
  • FIG. 4 is a flow diagram of a method 400 of three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction, using the system 100 of FIG. 1. The sensor 103 captures (402) one or more images of the scene 101, including one or more of the objects 102 a-102 b and transmits the images to the image processing module 106 of the computing device 104. The image processing module 106 generates (404) 3D models of the scene 101 and objects 102 a-102 b using the images captured by the sensor 103. In some embodiments, the 3D models generated by the image processing module 106 are photorealistic. The 3D generation is performed using dense and/or sparse SLAM (sometimes called Fusion) which stitches incoming scans into a single 3D model of a scene and/or objects in the scene. Exemplary processing for the image processing module 106 is described above with respect to FIG. 2. FIG. 6A depicts an exemplary object 102 a (e.g., a toy horse) to be scanned by the sensor, and an exemplary 3D model 602 generated by the system 100.
  • Once the image processing module 106 has generated the 3D models and captured the relative pose information, the image processing module 106 can perform a number of different actions relating to transferring the 3D models and pose information to a viewing device (e.g., viewing device 112) for rendering and viewing the images. In one action, the image processing module 106 stores (406) the generated 3D models and relative pose information in, e.g., database 108, for future retrieval and viewing by a viewing device (i.e., as part of a playback feature as will be described below).
  • In another action, the image processing module 106 transmits (408) the generated 3D models of the objects 102 a-102 b and/or scene 101 to the viewing device 112 (e.g., via communications network 110) and, as the objects 102 a-102 b and/or sensor 103 move in relation to each other, the image processing module 106 can stream in real-time the relative pose information as well as any texture changes to the viewing device 112—such that the viewing device 112 can manipulate the previously-received 3D models to match the viewing experience being captured by the sensor 103. FIG. 6B depicts an exemplary user interface screen for display on the viewing device 112. As shown, the user interface screen includes options 604 for displaying the 3D models—including an option to Stream Object. To reduce the amount of data being transmitted to the viewing device, in some embodiments the image processing module 106 only transmits changes in the 3D model(s) to the viewing device 112. As explained above, this process advantageously acts as a compressing technique because the amount of data is small but the viewing device 112 can replicate the complete viewing experience. FIG. 7A depicts an exemplary 3D model prior to transmission to the viewing device 112, and FIG. 7B depicts an exemplary 3D model after being received by the viewing device 112—including the post-processing steps that generate a photorealistic reconstructed 3D model.
  • In another action, the image processing module 106 transmits (410) the 3D models and relative pose information to the viewing device 112, and the viewing device 112 renders viewing images of the 3D models using both the relative pose information of the viewing device 112 and the virtual copy of the remote scene and object(s). Thus, the viewing device 112 uses the 3D model to create a virtual copy of the first location's live scene (‘virtual scene’) and the images in the video stream are rendered using the GPU 116 on the viewing device 112.
  • It should be appreciated that, in the above-described 3D embodiments, the viewing device 112 is not tied to the original video capture perspective of the sensor 103 because the complete 3D model is recreated at the viewing device 112. Therefore, the viewing device (e.g., the CPU 114 and GPU 116) at the second location can manipulate the 3D model(s) locally in order to produce a completely independent perspective of the model(s) than what is being captured by the sensor 103 at the first location. For example, the viewing device 112 can be used to ‘walk around’ the virtual scene which is a true 3D copy of the first location (such as via a VR viewing device).
  • As can be appreciated, there are wide variety of different use cases that can take advantage of the systems and methods described herein:
  • In one example, the scene 101 may be static (e.g., inside a museum) and the sensor 103 may move in relation to the scene 101. The sensor 103 captures one or more images of the scene 101, and the image processing module 106 can easily generate a 3D model of the scene 101. Further, as long as the image processing module 106 captures the exact pose of the sensor 103 relative to the scene, the image processing module 106 renders a 2D image using the 3D model. Therefore, the image processing module 106 can simply transmit the 3D model of the scene and relative pose information to the viewing device 112 for rendering, e.g., as a video stream of the static scene. For example, if the sensor 103 captures a ten-minute video of a room as the sensor 103 moves around the room, the image processing module 106 only needs to generate a photorealistic 3D model of the room and the sequential sensor pose information as it moves (e.g., pose one at time one, pose two at time two, etc.). The viewing device 112 can render a video of the room completely and accurately based on the 3D model and pose information. Hence, the resultant amount of information needed to replicate the video stream at the viewing device 112 is a fraction of the size that would be required to save the entire video stream and transmit the stream to the viewing device 112. The only cost is the conversion of 3D model(s) into 2D images using the GPU 116 at the viewing device 112. In some embodiments, this 3D model to 2D video conversion can be done, e.g., the cloud or using another computing device. FIG. 8A depicts an exemplary 3D scene prior to transmission to the viewing device 112, and FIG. 8B depicts an exemplary 3D scene after being received by the viewing device 112—including the post-processing steps that generate a photorealistic reconstructed 3D scene.
  • In another example, the scene 101 may be static and include one or more static or moving objects 102 a-102 b along with a moving sensor 103. Similar to the static scene use case described previously, the image processing module 106 generates 3D models of the object(s) in the scene and captures the pose information of the sensor relative to the object(s). The image processing module 106 transmits the 3D models and the relative pose information to the viewing device 112, and the device 112 can recreate the scene 101 plus the exact locations of the objects 102 a-102 b within the scene to completely replicate the captured scene.
  • In another example, the scene 101 may include non-rigid, moving objects—such as people. In these cases, the image processing module 106 can generate a 3D model of the non-rigid object and using non-rigid registration techniques, the image processing module 106 can then send ‘sparse’ information to the viewing device 112. This ‘sparse’ information can then be used by the viewing device 112 to reshape the 3D model and then render the 3D model from the scene 101. For example, once the image processing module 106 generates a 3D model of a human face and transfers the 3D model to the viewing device 112, the image processing module 106 only needs to track a small number of feature points of the face and transfer those feature points to the viewing device 112 to enable recreation of a facial expression accurately, e.g., in a video on the viewing device 112. The amount of this ‘sparse’ information is a fraction of the dataset normally needed to send the entire new 3D model to the viewing device 112.
  • In another example, the system 100 can capture and transmit a scene 101 as a changing 3D model that is viewable by the viewing device 112 (e.g., VR or AR glasses). In this example, the pose information of the sensor 103 is not required because having the 3D model available at the viewing device 112 allows for rendering of the viewing images from any angle. Therefore, the sensor 103 and image processing module 103 only needs to capture and transmit to the viewing device 112 the object pose relative to the scene 101 as well as sparse feature points for the non-rigid objects along with texture change information. Using just this set of information, the viewing device 112 is capable of fully recreating for playback the scene and objects captured by the sensor and image processing module.
  • In another example, the system can capture changes in texture due to lighting conditions as a location of the lighting source—especially if the location and brightness of the lighting source is generally stable. As a result, the system can render the correct texture without having to transmit the actual texture to the viewing device 112. Instead, the viewing device 112 can simply render the texture based on the 3D model and the location and brightness of the lighting source. In another example, some key texture information (such as high-resolution face texture(s)) can be sent separately and then added to the ‘virtual scene’ to provide more realistic texture information.
  • The following section provides exemplary use cases describing specific applications of the systems and methods described herein.
  • In a first use case, a sensor device at a first location (e.g., a live sporting event) captures images of objects in the scene (e.g., players, referees, game ball, etc.) and the scene itself (e.g., the playing field). The sensor device also captures pose information of the sensor device—both as the sensor device moves throughout the scene and as the objects move within the scene in relation to the sensor device. The server computing device coupled to the sensor device uses the captured images and the pose information to generate initial 3D models of the players, ball, and other objects in the scene and an initial 3D model of the scene. The server computing device then transmits the initial 3D models and the pose information to the remote viewing device (e.g., VR headset), which then generates a virtual representation of the scene (including the objects) as a video stream. At this point, the VR headset can capture pose information associated with the viewing perspective of the VR headset—which is independent from the pose information of the sensor device received from the server computing device. The headset can then utilize its own pose information to render a video stream from the perspective of the headset and, as the person wearing the headset moves around, the headset can render a different viewpoint of the scene and objects in the scene from that being captured by the sensor device. As such, the viewer at the remote location can traverse the scene and view the action from a completely independent perspective from the sensor that is viewing the scene locally providing an immersive and unique experience for the viewer.
  • In another use case, the sensor device at the first location (e.g., a museum) captures images of the museum scene as well as artwork, sculptures, people, and other objects in the museum. The sensor device also captures pose information of the sensor device—both as the sensor device moves throughout the scene and as the objects move within the scene in relation to the sensor device. The server computing device coupled to the sensor device uses the captured images and the pose information to generate initial 3D models of the objects in the scene and an initial 3D model of the scene. For example, a sensor device can be used to capture a guided tour of the museum, viewing the various exhibits and wings. The server computing device then transmits the initial 3D models and the pose information to the remote viewing device (e.g., tablet, mobile phone), which then generates a virtual representation of the museum (including the artwork, people, etc.) using the 3D models and the pose information. Then, as the pose information changes (due to the sensor device moving through the museum), the server computing device can transmit the changing pose information to the remote viewing device, which automatically renders an updated representation of the scene and objects based upon the changing pose information, as part of a live and/or prerecorded video stream. In this way, the person viewing the stream can feel as though they are walking through the museum and seeing the exhibits and artwork in a first-person perspective.
  • It should be appreciated that the methods, systems, and techniques described herein are applicable to a wide variety of useful commercial and/or technical applications. Such applications can include, but are not limited to:
  • Augmented Reality/Virtual Reality, Robotics, Education, Part Inspection, E-Commerce, Social Media, Internet of Things—to capture, track, and interact with real-world objects from a scene for representation in a virtual environment, such as remote interaction with objects and/or scenes by a viewing device in another location, including any applications where there may be constraints on file size and transmission speed but a high-definition image is still capable of being rendered on the viewing device;
    Live Streaming—for example, in order to live stream a 3D scene such as a sports event, a concert, a live presentation, and the like, the techniques described herein can be used to immediately send out a sparse frame to the viewing device at the remote location. As the 3D model becomes more complete, the techniques provide for adding full texture. This is similar to video applications that display a low-resolution image first while the applications download a high-definition image. Furthermore, the techniques can leverage 3D model compression to further reduce the geometric complexity and provide a seamless streaming experience;
    Recording for Later ‘Replay’—the techniques can advantageously be used to store images and relative pose information (as described above) in order to replay the scene and objects at a later time. For example, the computing device can store 3D models, image data, pose data, and sparse feature point data associated with the sensor capturing, e.g., a video of the scene and objects in the scene. Then, the viewing device 112 can later receive this information and recreate the entire video using the models, images, pose data and feature point data.
  • The above-described techniques can be implemented in digital and/or analog electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The implementation can be as a computer program product, i.e., a computer program tangibly embodied in a machine-readable storage device, for execution by, or to control the operation of, a data processing apparatus, e.g., a programmable processor, a computer, and/or multiple computers. A computer program can be written in any form of computer or programming language, including source code, compiled code, interpreted code, and/or machine code, and the computer program can be deployed in any form, including as a stand-alone program or as a subroutine, element, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one or more sites.
  • Method steps can be performed by one or more specialized processors executing a computer program to perform functions by operating on input data and/or generating output data. Method steps can also be performed by, and an apparatus can be implemented as, special purpose logic circuitry, e.g., a FPGA (field programmable gate array), a FPAA (field-programmable analog array), a CPLD (complex programmable logic device), a PSoC (Programmable System-on-Chip), ASIP (application-specific instruction-set processor), or an ASIC (application-specific integrated circuit), or the like. Subroutines can refer to portions of the stored computer program and/or the processor, and/or the special circuitry that implement one or more functions.
  • Processors suitable for the execution of a computer program include, by way of example, special purpose microprocessors. Generally, a processor receives instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and/or data. Memory devices, such as a cache, can be used to temporarily store data. Memory devices can also be used for long-term data storage. Generally, a computer also includes, or is operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. A computer can also be operatively coupled to a communications network in order to receive instructions and/or data from the network and/or to transfer instructions and/or data to the network. Computer-readable storage mediums suitable for embodying computer program instructions and data include all forms of volatile and non-volatile memory, including by way of example semiconductor memory devices, e.g., DRAM, SRAM, EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and optical disks, e.g., CD, DVD, HD-DVD, and Blu-ray disks. The processor and the memory can be supplemented by and/or incorporated in special purpose logic circuitry.
  • To provide for interaction with a user, the above described techniques can be implemented on a computer in communication with a display device, e.g., a CRT (cathode ray tube), plasma, or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse, a trackball, a touchpad, or a motion sensor, by which the user can provide input to the computer (e.g., interact with a user interface element). Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, and/or tactile input.
  • The above described techniques can be implemented in a distributed computing system that includes a back-end component. The back-end component can, for example, be a data server, a middleware component, and/or an application server. The above described techniques can be implemented in a distributed computing system that includes a front-end component. The front-end component can, for example, be a client computer having a graphical user interface, a Web browser through which a user can interact with an example implementation, and/or other graphical user interfaces for a transmitting device. The above described techniques can be implemented in a distributed computing system that includes any combination of such back-end, middleware, or front-end components.
  • The components of the computing system can be interconnected by transmission medium, which can include any form or medium of digital or analog data communication (e.g., a communication network). Transmission medium can include one or more packet-based networks and/or one or more circuit-based networks in any configuration. Packet-based networks can include, for example, the Internet, a carrier internet protocol (IP) network (e.g., local area network (LAN), wide area network (WAN), campus area network (CAN), metropolitan area network (MAN), home area network (HAN)), a private IP network, an IP private branch exchange (IPBX), a wireless network (e.g., radio access network (RAN), Bluetooth, Wi-Fi, WiMAX, general packet radio service (GPRS) network, HiperLAN), and/or other packet-based networks. Circuit-based networks can include, for example, the public switched telephone network (PSTN), a legacy private branch exchange (PBX), a wireless network (e.g., RAN, code-division multiple access (CDMA) network, time division multiple access (TDMA) network, global system for mobile communications (GSM) network), and/or other circuit-based networks.
  • Information transfer over transmission medium can be based on one or more communication protocols. Communication protocols can include, for example, Ethernet protocol, Internet Protocol (IP), Voice over IP (VOIP), a Peer-to-Peer (P2P) protocol, Hypertext Transfer Protocol (HTTP), Session Initiation Protocol (SIP), H.323, Media Gateway Control Protocol (MGCP), Signaling System #7 (SS7), a Global System for Mobile Communications (GSM) protocol, a Push-to-Talk (PTT) protocol, a PTT over Cellular (POC) protocol, Universal Mobile Telecommunications System (UMTS), 3GPP Long Term Evolution (LTE) and/or other communication protocols.
  • Devices of the computing system can include, for example, a computer, a computer with a browser device, a telephone, an IP phone, a mobile device (e.g., cellular phone, personal digital assistant (PDA) device, smart phone, tablet, laptop computer, electronic mail device), and/or other communication devices. The browser device includes, for example, a computer (e.g., desktop computer and/or laptop computer) with a World Wide Web browser (e.g., Chrome™ from Google, Inc., Microsoft® Internet Explorer® available from Microsoft Corporation, and/or Mozilla® Firefox available from Mozilla Corporation). Mobile computing device include, for example, a Blackberry® from Research in Motion, an iPhone® from Apple Corporation, and/or an Android™-based device. IP phones include, for example, a Cisco® Unified IP Phone 7985G and/or a Cisco® Unified Wireless Phone 7920 available from Cisco Systems, Inc.
  • Comprise, include, and/or plural forms of each are open ended and include the listed parts and can include additional parts that are not listed. And/or is open ended and includes one or more of the listed parts and combinations of the listed parts.
  • One skilled in the art will realize the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting of the invention described herein.

Claims (22)

1. A system for generating a video stream of a scene including one or more objects, the system comprising:
a sensor device that captures a plurality of images of one or more objects in a scene;
a server computing device coupled to the sensor device that, for each image:
generates an initial 3D model for each of the one or more objects in the scene using the image;
generates an initial 3D model of the scene using the image;
captures pose information of the sensor device relative to at least one of the scene or one or more of the objects in the scene (i) as the sensor device moves in relation to the scene or (ii) as the one or more objects in the scene move in relation to the sensor device; and
a viewing device coupled to the server computing device that:
receives (i) at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene, or (ii) the pose information of the sensor device, from the server computing device;
captures pose information of a viewing perspective of the viewing device relative to at least one of the scene or one or more of the objects in the scene as the viewing perspective of the viewing device moves in relation to the scene; and
renders a video stream of at least one of the one or more objects or the scene on a display element of the viewing device using the received initial 3D models and at least one of (i) the pose information of the sensor device or (ii) the pose information of the viewing perspective of the viewing device.
2. The system of claim 1, wherein the server computing device stores the initial 3D models of the one or more objects, the 3D model of the scene, and the pose information in a database.
3. The system of claim 1, wherein the viewing device receives the at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and the pose information of the sensor device, from the server computing device, during a real-time streaming session.
4. The system of claim 1, wherein the viewing device is a virtual reality (VR) headset.
5. The system of claim 1, wherein the viewing device generates an updated 3D model of the one or more objects in the scene based upon at least one of (i) updated pose information received from the server computing device or (ii) updated pose information of the viewing perspective of the viewing device.
6. The system of claim 1, wherein the viewing device receives an image from the server computing device and applies the image to at least one of the initial 3D model or the initial 3D models of the one or more objects to generate a photorealistic 3D model.
7. The system of claim 1, wherein the initial 3D model of the scene is generated using simultaneous localization and mapping (SLAM).
8. The system of claim 1, wherein the initial 3D models of the one or more objects in the scene are generated using simultaneous localization and mapping (SLAM).
9. The system of claim 1, wherein the server computing device determines one or more changes to at least one of the 3D models of the one or more objects or the 3D model of the scene based upon the pose information.
10. The system of claim 9, wherein the one or more changes comprise data associated with one or more feature points on the at least one of the 3D models of the one or more objects or the 3D model of the scene.
11. The system of claim 10, wherein the viewing device receives the one or more changes from the server computing device and the viewing device updates the initial 3D models based upon the one or more changes.
12. A computerized method of generating a video stream of a scene including one or more objects, the method comprising:
capturing, by a sensor device, a plurality of images of one or more objects in a scene;
for each image:
generating, by a server computing device coupled to the sensor device, an initial 3D model for each of the one or more objects in the scene using the image;
generating, by the server computing device, an initial 3D model of the scene using the image;
capturing, by the server computing device, pose information of the sensor device relative to at least one of the scene or one or more of the objects in the scene as the sensor device moves in relation to the scene; and
receiving, by a viewing device coupled to the server computing device, (i) at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and (ii) the pose information of the sensor device, from the server computing device;
capturing, by the viewing device, pose information of a viewing perspective of the viewing device relative to at least one of the scene or one or more of the objects in the scene as the viewing perspective of the viewing device moves in relation to the scene; and
rendering, by the viewing device, a video stream of at least one of the one or more objects or the scene on a display element of the viewing device using the received initial 3D models and at least one of (i) the pose information of the sensor device or (ii) the pose information of the viewing perspective of the viewing device.
13. The method of claim 12, wherein the server computing device stores the initial 3D models of the one or more objects, the 3D model of the scene, and the pose information in a database.
14. The method of claim 12, wherein the viewing device receives the at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and the pose information of the sensor device, from the server computing device during a real-time streaming session.
15. The method of claim 12, wherein the viewing device is a virtual reality (VR) headset.
16. The method of claim 12, wherein the viewing device generates an updated 3D model of the one or more objects in the scene based upon at least one of (i) updated pose information received from the server computing device or (ii) updated pose information of the viewing perspective of the viewing device.
17. The method of claim 12, wherein the viewing device receives an image from the server computing device and applies the image to at least one of the initial 3D model or the initial 3D models of the one or more objects to generate a photorealistic 3D model.
18. The method of claim 12, wherein the initial 3D model of the scene is generated using simultaneous localization and mapping (SLAM).
19. The method of claim 12, wherein the initial 3D models of the one or more objects in the scene are generated using simultaneous localization and mapping (SLAM).
20. The method of claim 12, further comprising determining, by the server computing device, one or more changes to at least one of the 3D models of the one or more objects or the 3D model of the scene based upon the pose information.
21. The method of claim 20, wherein the one or more changes comprise data associated with one or more feature points on the at least one of the 3D models of the one or more objects or the 3D model of the scene.
22. The system of claim 21, wherein the viewing device receives the one or more changes from the server computing device and the viewing device updates the initial 3D models based upon the one or more changes.
US15/726,316 2016-10-07 2017-10-05 Real-time remote collaboration and virtual presence using simultaneous localization and mapping to construct a 3D model and update a scene based on sparse data Active US10380762B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/726,316 US10380762B2 (en) 2016-10-07 2017-10-05 Real-time remote collaboration and virtual presence using simultaneous localization and mapping to construct a 3D model and update a scene based on sparse data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662405372P 2016-10-07 2016-10-07
US15/726,316 US10380762B2 (en) 2016-10-07 2017-10-05 Real-time remote collaboration and virtual presence using simultaneous localization and mapping to construct a 3D model and update a scene based on sparse data

Publications (2)

Publication Number Publication Date
US20180101966A1 true US20180101966A1 (en) 2018-04-12
US10380762B2 US10380762B2 (en) 2019-08-13

Family

ID=61829752

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/726,316 Active US10380762B2 (en) 2016-10-07 2017-10-05 Real-time remote collaboration and virtual presence using simultaneous localization and mapping to construct a 3D model and update a scene based on sparse data

Country Status (1)

Country Link
US (1) US10380762B2 (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180122142A1 (en) * 2016-10-31 2018-05-03 Verizon Patent And Licensing Inc. Methods and Systems for Dynamically Customizing a Scene for Presentation to a User
US20180151045A1 (en) * 2016-11-28 2018-05-31 Korea Institute Of Civil Engineering And Building Technology Facility management system using internet of things (iot) based sensor and unmanned aerial vehicle (uav), and method for the same
CN108650523A (en) * 2018-05-22 2018-10-12 广州虎牙信息科技有限公司 The display of direct broadcasting room and virtual objects choosing method, server, terminal and medium
US10269116B2 (en) * 2016-12-26 2019-04-23 Intel Corporation Proprioception training method and apparatus
US20190139300A1 (en) * 2017-11-08 2019-05-09 Siemens Healthcare Gmbh Medical scene model
CN110503631A (en) * 2019-07-24 2019-11-26 山东师范大学 A kind of method for detecting change of remote sensing image
CN111340922A (en) * 2018-12-18 2020-06-26 北京三星通信技术研究有限公司 Positioning and mapping method and electronic equipment
CN111641841A (en) * 2020-05-29 2020-09-08 广州华多网络科技有限公司 Virtual trampoline activity data exchange method, device, medium and electronic equipment
CN111862163A (en) * 2020-08-03 2020-10-30 湖北亿咖通科技有限公司 Trajectory optimization method and device
CN112017242A (en) * 2020-08-21 2020-12-01 北京市商汤科技开发有限公司 Display method and device, equipment and storage medium
CN112423014A (en) * 2020-11-19 2021-02-26 上海电气集团股份有限公司 Remote review method and device
US11127212B1 (en) * 2017-08-24 2021-09-21 Sean Asher Wilens Method of projecting virtual reality imagery for augmenting real world objects and surfaces
US11159798B2 (en) * 2018-08-21 2021-10-26 International Business Machines Corporation Video compression using cognitive semantics object analysis
US20220076402A1 (en) * 2019-04-05 2022-03-10 Waymo Llc High bandwidth camera data transmission
US20220182596A1 (en) * 2020-12-03 2022-06-09 Samsung Electronics Co., Ltd. Method of providing adaptive augmented reality streaming and apparatus performing the method
US11741673B2 (en) 2018-11-30 2023-08-29 Interdigital Madison Patent Holdings, Sas Method for mirroring 3D objects to light field displays
US11868675B2 (en) 2015-10-08 2024-01-09 Interdigital Vc Holdings, Inc. Methods and systems of automatic calibration for dynamic display configurations
WO2024100028A1 (en) * 2022-11-08 2024-05-16 Nokia Technologies Oy Signalling for real-time 3d model generation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040010493A1 (en) * 1997-11-19 2004-01-15 Ns Solutions Corporation Database system and a method of data retrieval from the system
US20140017653A1 (en) * 2012-07-10 2014-01-16 Gordon W. Romney Apparatus, system, and method for a virtual instruction cloud
US20180001885A1 (en) * 2016-06-29 2018-01-04 Ford Global Technologies, Llc Method and system for torque control
US20180014454A1 (en) * 2016-07-12 2018-01-18 Yetter Manufacturing Company Seed furrow closing wheel

Family Cites Families (69)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6525722B1 (en) 1995-08-04 2003-02-25 Sun Microsystems, Inc. Geometry compression for regular and irregular mesh structures
US6275235B1 (en) 1998-12-21 2001-08-14 Silicon Graphics, Inc. High precision texture wrapping method and device
US6259815B1 (en) 1999-03-04 2001-07-10 Mitsubishi Electric Research Laboratories, Inc. System and method for recognizing scanned objects with deformable volumetric templates
US6525725B1 (en) 2000-03-15 2003-02-25 Sun Microsystems, Inc. Morphing decompression in a graphics system
US20040104935A1 (en) * 2001-01-26 2004-06-03 Todd Williamson Virtual reality immersion system
US7248257B2 (en) 2001-02-14 2007-07-24 Technion Research & Development Foundation Ltd. Low bandwidth transmission of 3D graphical data
GB0126526D0 (en) 2001-11-05 2002-01-02 Canon Europa Nv Three-dimensional computer modelling
WO2004003850A1 (en) 2002-06-28 2004-01-08 Fujitsu Limited Three-dimensional image comparing program, three-dimensionalimage comparing method, and three-dimensional image comparing device
US7317456B1 (en) 2002-12-02 2008-01-08 Ngrain (Canada) Corporation Method and apparatus for transforming point cloud data to volumetric data
JP2005353047A (en) 2004-05-13 2005-12-22 Sanyo Electric Co Ltd Three-dimensional image processing method and three-dimensional image processor
US7657081B2 (en) 2004-09-03 2010-02-02 National Research Council Of Canada Recursive 3D model optimization
WO2006027339A2 (en) 2004-09-06 2006-03-16 The European Community, Represented By The European Commission Method and system for 3d scene change detection
US7602398B2 (en) 2005-01-28 2009-10-13 Microsoft Corporation Decorating surfaces with textures
JP4871352B2 (en) 2005-03-11 2012-02-08 クリアフォーム インク. Automatic reference system and apparatus for 3D scanning
US8625854B2 (en) 2005-09-09 2014-01-07 Industrial Research Limited 3D scene scanner and a position and orientation system
WO2007038330A2 (en) 2005-09-22 2007-04-05 3M Innovative Properties Company Artifact mitigation in three-dimensional imaging
US8194074B2 (en) 2006-05-04 2012-06-05 Brown Battle M Systems and methods for photogrammetric rendering
US8139067B2 (en) 2006-07-25 2012-03-20 The Board Of Trustees Of The Leland Stanford Junior University Shape completion, animation and marker-less motion capture of people, animals or characters
US8090194B2 (en) 2006-11-21 2012-01-03 Mantis Vision Ltd. 3D geometric modeling and motion capture using both single and dual imaging
US8290305B2 (en) 2009-02-13 2012-10-16 Harris Corporation Registration of 3D point cloud data to 2D electro-optical image data
WO2010129363A2 (en) 2009-04-28 2010-11-11 The Regents Of The University Of California Markerless geometric registration of multiple projectors on extruded surfaces using an uncalibrated camera
US8542252B2 (en) 2009-05-29 2013-09-24 Microsoft Corporation Target digitization, extraction, and tracking
KR101619076B1 (en) 2009-08-25 2016-05-10 삼성전자 주식회사 Method of detecting and tracking moving object for mobile platform
KR101697184B1 (en) 2010-04-20 2017-01-17 삼성전자주식회사 Apparatus and Method for generating mesh, and apparatus and method for processing image
KR101054736B1 (en) 2010-05-04 2011-08-05 성균관대학교산학협력단 Method for 3d object recognition and pose estimation
US8437506B2 (en) 2010-09-07 2013-05-07 Microsoft Corporation System for fast, probabilistic skeletal tracking
US8676623B2 (en) 2010-11-18 2014-03-18 Navteq B.V. Building directory aided navigation
US8587583B2 (en) 2011-01-31 2013-11-19 Microsoft Corporation Three-dimensional environment reconstruction
US20170054954A1 (en) 2011-04-04 2017-02-23 EXTEND3D GmbH System and method for visually displaying information on real objects
DE102011015987A1 (en) 2011-04-04 2012-10-04 EXTEND3D GmbH System and method for visual presentation of information on real objects
US9053571B2 (en) 2011-06-06 2015-06-09 Microsoft Corporation Generating computer models of 3D objects
US9520072B2 (en) 2011-09-21 2016-12-13 University Of South Florida Systems and methods for projecting images onto an object
CA3041707C (en) 2011-11-15 2021-04-06 Manickam UMASUTHAN Method of real-time tracking of moving/flexible surfaces
US8908913B2 (en) 2011-12-19 2014-12-09 Mitsubishi Electric Research Laboratories, Inc. Voting-based pose estimation for 3D sensors
US8766979B2 (en) 2012-01-20 2014-07-01 Vangogh Imaging, Inc. Three dimensional data compression
US8682049B2 (en) 2012-02-14 2014-03-25 Terarecon, Inc. Cloud-based medical image processing system with access control
US9041711B1 (en) 2012-05-08 2015-05-26 Google Inc. Generating reduced resolution textured model from higher resolution model
US10127722B2 (en) * 2015-06-30 2018-11-13 Matterport, Inc. Mobile capture visualization incorporating three-dimensional and two-dimensional imagery
WO2014052824A1 (en) 2012-09-27 2014-04-03 Vangogh Imaging Inc. 3d vision processing
US9898848B2 (en) 2012-10-05 2018-02-20 Max-Planck-Gesellschaft Zur Foerderung Der Wissenschaften E. V. Co-registration—simultaneous alignment and modeling of articulated 3D shapes
US9058693B2 (en) * 2012-12-21 2015-06-16 Dassault Systemes Americas Corp. Location correction of virtual objects
US9251590B2 (en) 2013-01-24 2016-02-02 Microsoft Technology Licensing, Llc Camera pose estimation for 3D reconstruction
US9940553B2 (en) 2013-02-22 2018-04-10 Microsoft Technology Licensing, Llc Camera/object pose from predicted coordinates
US9269003B2 (en) 2013-04-30 2016-02-23 Qualcomm Incorporated Diminished and mediated reality effects from reconstruction
US9171402B1 (en) 2013-06-19 2015-10-27 Google Inc. View-dependent textures for interactive geographic information system
US9715761B2 (en) 2013-07-08 2017-07-25 Vangogh Imaging, Inc. Real-time 3D computer vision processing engine for object recognition, reconstruction, and analysis
US20170278293A1 (en) 2013-07-18 2017-09-28 Google Inc. Processing a Texture Atlas Using Manifold Neighbors
EP2874118B1 (en) 2013-11-18 2017-08-02 Dassault Systèmes Computing camera parameters
US9613388B2 (en) 2014-01-24 2017-04-04 Here Global B.V. Methods, apparatuses and computer program products for three dimensional segmentation and textured modeling of photogrammetry surface meshes
KR102211592B1 (en) 2014-03-19 2021-02-04 삼성전자주식회사 Electronic device for processing image and method thereof
US9299195B2 (en) 2014-03-25 2016-03-29 Cisco Technology, Inc. Scanning and tracking dynamic objects with depth cameras
US20150325044A1 (en) 2014-05-09 2015-11-12 Adornably, Inc. Systems and methods for three-dimensional model texturing
US10055876B2 (en) 2014-06-06 2018-08-21 Matterport, Inc. Optimal texture memory allocation
US20150371440A1 (en) 2014-06-19 2015-12-24 Qualcomm Incorporated Zero-baseline 3d map initialization
EP3192057A4 (en) 2014-09-10 2018-03-21 Vangogh Imaging Inc. Real-time dynamic three-dimensional adaptive object recognition and model reconstruction
US9607388B2 (en) 2014-09-19 2017-03-28 Qualcomm Incorporated System and method of pose estimation
US9710960B2 (en) 2014-12-04 2017-07-18 Vangogh Imaging, Inc. Closed-form 3D model generation of non-rigid complex objects from incomplete and noisy scans
EP3032495B1 (en) 2014-12-10 2019-11-13 Dassault Systèmes Texturing a 3d modeled object
US9769443B2 (en) 2014-12-11 2017-09-19 Texas Instruments Incorporated Camera-assisted two dimensional keystone correction
US10347031B2 (en) 2015-03-09 2019-07-09 Carestream Dental Technology Topco Limited Apparatus and method of texture mapping for dental 3D scanner
US20160358382A1 (en) 2015-06-04 2016-12-08 Vangogh Imaging, Inc. Augmented Reality Using 3D Depth Sensor and 3D Projection
US10169917B2 (en) 2015-08-20 2019-01-01 Microsoft Technology Licensing, Llc Augmented reality
US10249087B2 (en) 2016-01-29 2019-04-02 Magic Leap, Inc. Orthogonal-projection-based texture atlas packing of three-dimensional meshes
US10169676B2 (en) 2016-02-24 2019-01-01 Vangogh Imaging, Inc. Shape-based registration for non-rigid objects with large holes
US9922443B2 (en) 2016-04-29 2018-03-20 Adobe Systems Incorporated Texturing a three-dimensional scanned model with localized patch colors
US10192347B2 (en) 2016-05-17 2019-01-29 Vangogh Imaging, Inc. 3D photogrammetry
US20180005015A1 (en) 2016-07-01 2018-01-04 Vangogh Imaging, Inc. Sparse simultaneous localization and matching with unified tracking
US10573018B2 (en) * 2016-07-13 2020-02-25 Intel Corporation Three dimensional scene reconstruction based on contextual analysis
US20180114363A1 (en) 2016-10-25 2018-04-26 Microsoft Technology Licensing, Llc Augmented scanning of 3d models

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040010493A1 (en) * 1997-11-19 2004-01-15 Ns Solutions Corporation Database system and a method of data retrieval from the system
US20140017653A1 (en) * 2012-07-10 2014-01-16 Gordon W. Romney Apparatus, system, and method for a virtual instruction cloud
US20180001885A1 (en) * 2016-06-29 2018-01-04 Ford Global Technologies, Llc Method and system for torque control
US20180014454A1 (en) * 2016-07-12 2018-01-18 Yetter Manufacturing Company Seed furrow closing wheel

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11868675B2 (en) 2015-10-08 2024-01-09 Interdigital Vc Holdings, Inc. Methods and systems of automatic calibration for dynamic display configurations
US20180122142A1 (en) * 2016-10-31 2018-05-03 Verizon Patent And Licensing Inc. Methods and Systems for Dynamically Customizing a Scene for Presentation to a User
US10388072B2 (en) * 2016-10-31 2019-08-20 Verizon Patent And Licensing Inc. Methods and systems for dynamically customizing a scene for presentation to a user
US10839610B2 (en) 2016-10-31 2020-11-17 Verizon Patent And Licensing Inc. Methods and systems for customizing a scene for presentation to a user
US10643444B2 (en) * 2016-11-28 2020-05-05 Korea Institute Of Civil Engineering And Building Technology Facility management system using Internet of things (IoT) based sensor and unmanned aerial vehicle (UAV), and method for the same
US20180151045A1 (en) * 2016-11-28 2018-05-31 Korea Institute Of Civil Engineering And Building Technology Facility management system using internet of things (iot) based sensor and unmanned aerial vehicle (uav), and method for the same
US10269116B2 (en) * 2016-12-26 2019-04-23 Intel Corporation Proprioception training method and apparatus
US11127212B1 (en) * 2017-08-24 2021-09-21 Sean Asher Wilens Method of projecting virtual reality imagery for augmenting real world objects and surfaces
US11107270B2 (en) * 2017-11-08 2021-08-31 Siemens Healthcare Gmbh Medical scene model
US20190139300A1 (en) * 2017-11-08 2019-05-09 Siemens Healthcare Gmbh Medical scene model
CN108650523A (en) * 2018-05-22 2018-10-12 广州虎牙信息科技有限公司 The display of direct broadcasting room and virtual objects choosing method, server, terminal and medium
US11159798B2 (en) * 2018-08-21 2021-10-26 International Business Machines Corporation Video compression using cognitive semantics object analysis
US11741673B2 (en) 2018-11-30 2023-08-29 Interdigital Madison Patent Holdings, Sas Method for mirroring 3D objects to light field displays
CN111340922A (en) * 2018-12-18 2020-06-26 北京三星通信技术研究有限公司 Positioning and mapping method and electronic equipment
US20220076402A1 (en) * 2019-04-05 2022-03-10 Waymo Llc High bandwidth camera data transmission
CN110503631A (en) * 2019-07-24 2019-11-26 山东师范大学 A kind of method for detecting change of remote sensing image
CN111641841A (en) * 2020-05-29 2020-09-08 广州华多网络科技有限公司 Virtual trampoline activity data exchange method, device, medium and electronic equipment
CN111862163A (en) * 2020-08-03 2020-10-30 湖北亿咖通科技有限公司 Trajectory optimization method and device
CN112017242A (en) * 2020-08-21 2020-12-01 北京市商汤科技开发有限公司 Display method and device, equipment and storage medium
CN112423014A (en) * 2020-11-19 2021-02-26 上海电气集团股份有限公司 Remote review method and device
US20220182596A1 (en) * 2020-12-03 2022-06-09 Samsung Electronics Co., Ltd. Method of providing adaptive augmented reality streaming and apparatus performing the method
US11758107B2 (en) * 2020-12-03 2023-09-12 Samsung Electronics Co., Ltd. Method of providing adaptive augmented reality streaming and apparatus performing the method
WO2024100028A1 (en) * 2022-11-08 2024-05-16 Nokia Technologies Oy Signalling for real-time 3d model generation

Also Published As

Publication number Publication date
US10380762B2 (en) 2019-08-13

Similar Documents

Publication Publication Date Title
US10380762B2 (en) Real-time remote collaboration and virtual presence using simultaneous localization and mapping to construct a 3D model and update a scene based on sparse data
US10839585B2 (en) 4D hologram: real-time remote avatar creation and animation control
US10586395B2 (en) Remote object detection and local tracking using visual odometry
US20160358382A1 (en) Augmented Reality Using 3D Depth Sensor and 3D Projection
US11861797B2 (en) Method and apparatus for transmitting 3D XR media data
CN109471842B (en) Image file format, image file generating method, image file generating device and application
CN110663257B (en) Method and system for providing virtual reality content using 2D captured images of a scene
US11170552B2 (en) Remote visualization of three-dimensional (3D) animation with synchronized voice in real-time
US20190073825A1 (en) Enhancing depth sensor-based 3d geometry reconstruction with photogrammetry
WO2013165440A1 (en) 3d reconstruction of human subject using a mobile device
US20220385721A1 (en) 3d mesh generation on a server
US11620779B2 (en) Remote visualization of real-time three-dimensional (3D) facial animation with synchronized voice
KR102141319B1 (en) Super-resolution method for multi-view 360-degree image and image processing apparatus
Han Mobile immersive computing: Research challenges and the road ahead
US11908068B2 (en) Augmented reality methods and systems
US11335063B2 (en) Multiple maps for 3D object scanning and reconstruction
US20190304161A1 (en) Dynamic real-time texture alignment for 3d models
US20130127994A1 (en) Video compression using virtual skeleton
CN112308977A (en) Video processing method, video processing apparatus, and storage medium
CN110433491A (en) Movement sync response method, system, device and the storage medium of virtual spectators
US20240062467A1 (en) Distributed generation of virtual content
Bortolon et al. Multi-view data capture for dynamic object reconstruction using handheld augmented reality mobiles
EP3899870A1 (en) Cloud-based camera calibration
Eisert et al. Volumetric video–acquisition, interaction, streaming and rendering
US20230022344A1 (en) System and method for dynamic images virtualisation

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

AS Assignment

Owner name: VANGOGH IMAGING, INC., VIRGINIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEE, KEN;JAHIR, YASMIN;HOU, XIN;REEL/FRAME:044433/0421

Effective date: 20171208

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 4