US20180101966A1 - Real-time remote collaboration and virtual presence using simultaneous localization and mapping to construct a 3d model and update a scene based on sparse data - Google Patents
Real-time remote collaboration and virtual presence using simultaneous localization and mapping to construct a 3d model and update a scene based on sparse data Download PDFInfo
- Publication number
- US20180101966A1 US20180101966A1 US15/726,316 US201715726316A US2018101966A1 US 20180101966 A1 US20180101966 A1 US 20180101966A1 US 201715726316 A US201715726316 A US 201715726316A US 2018101966 A1 US2018101966 A1 US 2018101966A1
- Authority
- US
- United States
- Prior art keywords
- scene
- objects
- model
- viewing
- initial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013507 mapping Methods 0.000 title claims description 20
- 230000004807 localization Effects 0.000 title claims description 19
- 238000000034 method Methods 0.000 claims abstract description 60
- 238000009877 rendering Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 description 63
- 230000006835 compression Effects 0.000 description 15
- 238000007906 compression Methods 0.000 description 15
- 230000005540 biological transmission Effects 0.000 description 13
- 238000004891 communication Methods 0.000 description 13
- 230000003993 interaction Effects 0.000 description 11
- 238000004590 computer program Methods 0.000 description 10
- 230000009471 action Effects 0.000 description 9
- 238000012546 transfer Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 6
- 230000001413 cellular effect Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000003068 static effect Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000013500 data storage Methods 0.000 description 3
- 239000011521 glass Substances 0.000 description 3
- 230000033001 locomotion Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/579—Depth or shape recovery from multiple images from motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/001—Model-based coding, e.g. wire frame
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44012—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/65—Transmission of management data between client and server
- H04N21/658—Transmission by the client directed to the server
- H04N21/6587—Control parameters, e.g. trick play commands, viewpoint selection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30244—Camera pose
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2219/00—Indexing scheme for manipulating 3D models or images for computer graphics
- G06T2219/024—Multi-user, collaborative environment
-
- G06T3/0037—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/06—Topological mapping of higher dimensional structures onto lower dimensional surfaces
- G06T3/067—Reshaping or unfolding 3D tree structures onto 2D planes
Definitions
- the subject matter of this application relates generally to methods and apparatuses, including computer program products, for real-time remote collaboration and virtual presence using simultaneous localization and mapping (SLAM) to construct three-dimensional (3D) models and updating a scene based upon sparse data, including two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
- SLAM simultaneous localization and mapping
- a video can be streamed to a viewer at the remote location, or the event can be recorded and streamed later to the viewer.
- some form of compression such as MPEG-4 is used to reduce the amount of data being transmitted over the network by a factor of 100 or more. This allows the transmission of the video to be practical over low-bandwidth wired networks or most wireless networks.
- Simultaneous localization and mapping is a computer modeling technique that is used to map and track the real world as a 3D model.
- the methods and systems described herein utilize SLAM to compress in real time a live video stream of a remote scene for the purpose of viewing that scene from any location. Once re-rendered as a 3D model at the viewer's device, the live remote scene can then be viewed using a VR headset as if the viewer is at the remote location.
- the technology described herein can be used to capture a scene (including objects in the scene) of a first location as one or more 3D models, transfer the 3D model(s) in real time to a second location that is remote from the first location, and then render viewing images of the 3D model from a different viewing perspective using the pose of a viewing element (e.g., digital screen, camera, or image viewer, headset) at the second location.
- a viewing element e.g., digital screen, camera, or image viewer, headset
- the second location can be equipped with a VR headset or other similar hardware to view the 3D model of first location from any viewing angle.
- the systems and methods described herein advantageously transfer only the changing portions of the scene and/or objects in the scene to the second location.
- the methods and systems described herein transmit an entire 3D model of the scene—and objects in the scene—from the first location to the second location, and use a graphics processor in the viewer's device to render image(s) using the 3D model. It should be appreciated that the techniques described herein provide the advantage of ‘virtually’ copying the scene and objects at the first location, storing 3D models of the scene and objects in memory of the viewer's device at the second location, and then rendering the scene and objects in real time (e.g., as a video stream) from the ‘virtual’ scene.
- Another advantage provided by the methods and systems described herein is that the image processing device at the first location needs only to transmit changes in the ‘position’ of the objects and the sensor location relative to the scene for each frame—and not the entire scene for each frame—to the viewer's device at the second location, in order for the viewer to move the objects and the sensor location in the virtual scene to replicate the same visual experience as if the remote viewer is at the first location. Because transmission of the changes in position and sensor location involves much less data than sending the entire scene, this technique advantageously provides for substantial compression of, e.g., a video stream transmitted from the first location to the second location.
- the systems and methods described herein advantageously transfer only the pose of the object to the viewer's device at the second location once the viewer's device has received the 3D model(s).
- the systems and methods described herein advantageously transfer only the pose of the object to the viewer's device at the second location once the viewer's device has received the 3D model(s).
- non-rigid objects such as people
- subsequent transmissions need to include only the sparse feature information of the non-rigid object in order for the viewer's device at the second location to recreate the scene correctly.
- the sparse feature information can include feature points such as points associated with aspects of the person's body (e.g., head, feet, hands, arms).
- the sensor and server computing device need only capture and transmit the positional changes associated with these feature points to the viewing device—instead of the entire model—and the viewing device can update the 3D model at the remote location using the sparse feature information to track the person's movements through the scene.
- the invention in one aspect, features a system for generating a video stream of a scene including one or more objects.
- the system comprises a sensor device that captures a plurality of images of one or more objects in a scene.
- the system further comprises a server computing device coupled to the sensor device that, for each image, generates an initial 3D model for each of the one or more objects in the scene using the image.
- the server computing device for each image, generates an initial 3D model of the scene using the image.
- the server computing device for each image, captures pose information of the sensor device relative to at least one of the scene or one or more of the objects in the scene as the sensor device moves in relation to the scene.
- the system further comprises a viewing device coupled to the server computing device.
- the viewing device receives (i) at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and (ii) the pose information of the sensor device, from the server computing device.
- the viewing device captures pose information of a viewing perspective of the viewing device relative to at least one of the scene or one or more of the objects in the scene as the viewing perspective of the viewing device moves in relation to the scene.
- the viewing device renders a video stream of at least one of the one or more objects or the scene on a display element of the viewing device using the received initial 3D models and at least one of (i) the pose information of the sensor device or (ii) the pose information of the viewing perspective of the viewing device.
- the invention in another aspect, features a computerized method of generating a video stream of a scene including one or more objects.
- a sensor device captures a plurality of images of one or more objects in a scene.
- a server computing device coupled to the sensor device, for each image, generates an initial 3D model for each of the one or more objects in the scene using the image.
- the server computing device for each image, generates an initial 3D model of the scene using the image.
- the server computing device for each image, captures pose information of the sensor device relative to at least one of the scene or one or more of the objects in the scene as the sensor device moves in relation to the scene.
- a viewing device coupled to the server computing device receives (i) at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and (ii) the pose information of the sensor device, from the server computing device.
- the viewing device captures pose information of a viewing perspective of the viewing device relative to at least one of the scene or one or more of the objects in the scene as the viewing perspective of the viewing device moves in relation to the scene.
- the viewing device renders a video stream of at least one of the one or more objects or the scene on a display element of the viewing device using the received initial 3D models and at least one of (i) the pose information of the sensor device or (ii) the pose information of the viewing perspective of the viewing device.
- the server computing device stores the initial 3D models of the one or more objects, the 3D model of the scene, and the pose information in a database.
- the viewing device receives the at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and the pose information of the sensor device, from the server computing device during a real-time streaming session.
- the viewing device is a virtual reality (VR) headset.
- the viewing device generates an updated 3D model of the one or more objects in the scene based upon at least one of (i) updated pose information received from the server computing device or (ii) updated pose information of the viewing perspective of the viewing device.
- the viewing device receives an image from the server computing device and applies the image to at least one of the initial 3D model or the initial 3D models of the one or more objects to generate a photorealistic 3D model.
- the initial 3D model of the scene is generated using simultaneous localization and mapping (SLAM). In some embodiments, the initial 3D models of the one or more objects in the scene are generated using simultaneous localization and mapping (SLAM).
- the server computing device determines one or more changes to at least one of the 3D models of the one or more objects or the 3D model of the scene based upon the pose information. In some embodiments, the one or more changes comprise data associated with one or more feature points on the at least one of the 3D models of the one or more objects or the 3D model of the scene. In some embodiments, the viewing device receives the one or more changes from the server computing device and the viewing device updates the initial 3D models based upon the one or more changes.
- FIG. 1 is a block diagram of a system for two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
- SLAM simultaneous localization and mapping
- FIG. 2 is a detailed block diagram of specific software processing modules executing in an exemplary image processing module for two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
- SLAM simultaneous localization and mapping
- FIG. 3 is a flow diagram of a method of two-dimensional (2D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
- SLAM simultaneous localization and mapping
- FIG. 4 is a flow diagram of a method of three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
- SLAM simultaneous localization and mapping
- FIG. 5 is an exemplary viewing device.
- FIG. 6A depicts an exemplary object to be scanned by the sensor, and an exemplary 3D model of the object as generated by the system.
- FIG. 6B is an exemplary user interface screen for display on the viewing device.
- FIG. 7A is an exemplary 3D model prior to transmission to the viewing device.
- FIG. 7B is an exemplary 3D model after being received by the viewing device.
- FIG. 8A is an exemplary 3D scene prior to transmission to the viewing device.
- FIG. 8B is an exemplary 3D scene after being received by the viewing device.
- FIG. 1 is a block diagram of a system 100 for two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
- SLAM simultaneous localization and mapping
- the system 100 includes a sensor 103 coupled to a computing device 104 .
- the computing device 104 includes an image processing module 106 .
- the computing device can also be coupled to a database 108 or other data storage device, e.g., used for storing certain 3D models, images, pose information, and other data as described herein.
- the system 100 also includes a communications network 110 coupled to the computing device 104 , and a viewing device 112 communicably coupled to the network 110 in order to receive, e.g., 3D model data, image data, and other related data from the computing device 104 for the purposes described herein.
- the sensor 103 is positioned to capture images of a scene 101 , which includes one or more physical objects (e.g., objects 102 a - 102 b ).
- Exemplary sensors that can be used in the system 100 include, but are not limited to, real-time 3D depth sensors, digital cameras, combination 3D depth and RGB camera devices, and other types of devices that are capable of capturing depth information of the pixels along with the images of a real-world object and/or scene to collect data on its position, location, and appearance.
- the sensor 103 is embedded into the computing device 104 , such as a camera in a smartphone or a 3D VR capture device, for example.
- the sensor 103 further includes an inertial measurement unit (IMU) to capture data points such as heading, linear acceleration, rotation, and the like.
- IMU inertial measurement unit
- the computing device 104 receives images (also called scans) of the scene 101 from the sensor 103 and processes the images to generate 3D models of objects (e.g., objects 102 a - 102 b ) represented in the scene 101 .
- the computing device 104 can take on many forms, including both mobile and non-mobile forms.
- Exemplary computing devices include, but are not limited to, a laptop computer, a desktop computer, a tablet computer, a smart phone, an internet of things (IoT) device, augmented reality (AR)/virtual reality (VR) devices (e.g., glasses, headset apparatuses, and so forth), or the like.
- the senor 103 and computing device 104 can be embedded in a larger mobile structure such as a robot or unmanned aerial vehicle (UAV). It should be appreciated that other computing devices can be used without departing from the scope of the invention.
- the computing device 104 includes network-interface components to connect to a communications network (e.g., network 110 ).
- the network-interface components include components to connect to a wireless network, such as a Wi-Fi or cellular network, in order to access a wider network, such as the Internet.
- the computing device 104 includes an image processing module 106 configured to receive images captured by the sensor 103 and analyze the images in a variety of ways, including detecting the position and location of objects represented in the images and generating 3D models of objects in the images.
- the image processing module 106 is a hardware and/or software module that resides on the computing device 104 to perform functions associated with analyzing images capture by the scanner, including the generation of 3D models (e.g., .OBJ files) based upon objects in the images.
- the functionality of the image processing module 106 is distributed among a plurality of computing devices.
- the image processing module 106 operates in conjunction with other modules that are either also located on the computing device 104 or on other computing devices coupled to the computing device 104 . It should be appreciated that any number of computing devices, arranged in a variety of architectures, resources, and configurations (e.g., cluster computing, virtual computing, cloud computing) can be used without departing from the scope of the invention.
- An exemplary image processing module 106 is the Starry Night SDK, available from VanGogh Imaging, Inc. of McLean, Va.
- the image processing module 106 comprises specialized hardware (such as a processor or system-on-chip) that is embedded into, e.g., a circuit board or other similar component of another device.
- the image processing module 106 is specifically programmed with the image processing and modeling software functionality described below.
- FIG. 2 is a detailed block diagram 200 of specific software processing modules executing in an exemplary image processing module 106 at the computing device 104 .
- the image processing module 106 receives images and related data 202 as input from the sensor (e.g., the RGB sensor, the 3D depth sensor and, optionally, the IMU).
- the modules 204 - 214 each provides specific image processing and 3D model generation capabilities to the SLAM module 216 , which generates a dense map, a dense tracking, and a pose of the 3D models of the scene and objects in the scene.
- the sparse tracking module 210 generates a sparse map and sparse tracking of the 3D models of the scene and objects in the scene.
- the sparse tracking module 210 and the SLAM module 216 each sends its respective tracking information, along with the pose from the SLAM module 216 , to the unified tracking module 218 , which integrates the received information into a final pose of the 3D model(s).
- the modules 210 and 216 also send its respective mapping information to the photogrammetry module 220 , which performs functions such as texture refinement, hole-filling, and geometric correction.
- the updated 3D models from the photogrammetry module 220 are further processed by the shape-based registration module 222 .
- the image processing module 106 provides the pose and photo-realistic 3D model as output 224 to, e.g., the viewing device 112 .
- the database 108 is coupled to the computing device 104 , and operates to store data used by the image processing module 106 during its image analysis functions.
- the data storage module 108 can be integrated with the server computing device 104 or be located on a separate computing device.
- the communications network 110 may be a local network, such as a LAN, or a wide area network, such as the Internet and/or a cellular network.
- the network 104 is comprised of several discrete networks and/or sub-networks (e.g., cellular to Internet) that enable the components of the system 100 to communicate with each other.
- the viewing device 112 is a computing device that receives information such as image data, 3D model data, and other types of data described herein from the image processing module 106 of the server computing device 104 for rendering of the scene 101 and objects 102 a - 102 b as captured by the sensor 103 . As shown in FIG. 1 , the viewing device 112 is positioned at a second location that is remote from the first location where the sensor 103 and computing device 104 are located. It should be appreciated that, in some embodiments, the first location and second location do not need to be separate physical or geographical locations. Exemplary viewing devices include, but are not limited to, laptop computers, desktop computers, tablets, smartphones, smart televisions, VR/AR hardware (e.g., glasses, headset), IoT devices, and the like.
- the viewing device 112 includes, e.g., a CPU 114 and a GPU 116 , which are specialized processors embedded in the viewing device 112 for the purpose of receiving 3D model data and pose data from the image processing module 106 via network 110 , updating 3D model(s) stored at the viewing device 112 using the received information, and rendering real-time image data (e.g., a video stream) based upon the updated 3D model(s) to provide a viewing experience to a user of the viewing device 112 that is the same as the viewing experience captured by the sensor 103 and the computing device 104 at the first location.
- a CPU 114 and a GPU 116 which are specialized processors embedded in the viewing device 112 for the purpose of receiving 3D model data and pose data from the image processing module 106 via network 110 , updating 3D model(s) stored at the viewing device 112 using the received information, and rendering real-time image data (e.g., a video stream) based upon the updated 3D model(s) to provide a
- FIG. 3 is a flow diagram of a method 300 of two-dimensional (2D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction, using the system 100 of FIG. 1 .
- the sensor 103 captures ( 302 ) one or more images of the scene 101 , including one or more of the objects 102 a - 102 b and transmits the images to the image processing module 106 of the computing device 104 .
- the image processing module 106 generates ( 304 ) 3D models of the scene 101 and objects 102 a - 102 b using the images captured by the sensor 103 .
- the 3D generation is performed using Dense and/or sparse SLAM (or sometimes called Fusion) which stitches incoming scans into a single 3D model of scene or object.
- SLAM sparse SLAM
- Exemplary processing for the image processing module 106 is described above with respect to FIG. 2 .
- the image processing module 106 also tracks ( 306 ) the pose of the sensor 103 relative to the scene 101 and the objects 102 a - 102 b. For example, if the sensor 103 is non-stationary (i.e., the sensor moves in relation to the scene and/or objects), the image processing module 106 receives pose information relating to the sensor 103 and stores the pose information in correlation with the captured images.
- the image processing module 106 can perform a number of different actions relating to transferring the 3D models and pose information to a viewing device for rendering and viewing the images.
- the image processing module 106 stores ( 308 ) the generated 3D models and relative pose information in, e.g., database 108 , for future retrieval and viewing by a viewing device (i.e., as part of a playback feature as will be described below).
- the image processing module 106 transmits ( 310 ) the generated 3D models of the objects 102 a - 102 b and/or scene 101 to the viewing device 112 (e.g., via communications network 110 ) and, as the objects 102 a - 102 b and/or sensor 103 move in relation to each other, the image processing module 106 can stream in real-time the relative pose information to the viewing device 112 —such that the viewing device 112 can manipulate the previously-received 3D models to match the viewing experience being captured by the sensor 103 . To reduce the amount of data being transmitted to the viewing device, in some embodiments the image processing module 106 only transmits changes in the 3D model(s) to the viewing device 112 .
- this process advantageously acts as a compressing technique because the amount of data is small but the viewing device 112 can replicate the complete viewing experience.
- the CPU 114 of the viewing device 112 receives the pose information and the changes in the 3D model(s) from the image processing module 106 via network 108 .
- the CPU 114 updates the 3D model(s) stored at the viewing device 112 using the received changes and pose information, and transmits the updated 3D model(s) to the GPU 116 for rendering into, e.g., stereo pair images, single 2D images, or other similar outputs—for viewing by a user at the viewing device 112 .
- FIG. 5 depicts an exemplary viewing device 112 for the transmitted output.
- the image processing module 106 transmits ( 312 ) the 3D models and relative pose information to the viewing device 112 , and the viewing device 112 renders the 2D video streams using the relative sensor (or camera) pose information.
- the image processing module 106 further transmits image data captured by the sensor 103 to the viewing device 112 , and the viewing device 112 uses the image data to render, e.g., a photorealistic 3D model of the objects and/or scene captured by the sensor 103 .
- the viewing device 112 uses the 3D model to create a virtual copy of the first location's live scene (‘virtual scene’) and the images in the video stream are rendered using the GPU 116 on the viewing device 112 .
- FIG. 4 is a flow diagram of a method 400 of three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction, using the system 100 of FIG. 1 .
- the sensor 103 captures ( 402 ) one or more images of the scene 101 , including one or more of the objects 102 a - 102 b and transmits the images to the image processing module 106 of the computing device 104 .
- the image processing module 106 generates ( 404 ) 3D models of the scene 101 and objects 102 a - 102 b using the images captured by the sensor 103 .
- the 3D models generated by the image processing module 106 are photorealistic.
- the 3D generation is performed using dense and/or sparse SLAM (sometimes called Fusion) which stitches incoming scans into a single 3D model of a scene and/or objects in the scene.
- SLAM sometimes called Fusion
- Exemplary processing for the image processing module 106 is described above with respect to FIG. 2 .
- FIG. 6A depicts an exemplary object 102 a (e.g., a toy horse) to be scanned by the sensor, and an exemplary 3D model 602 generated by the system 100 .
- the image processing module 106 can perform a number of different actions relating to transferring the 3D models and pose information to a viewing device (e.g., viewing device 112 ) for rendering and viewing the images.
- a viewing device e.g., viewing device 112
- the image processing module 106 stores ( 406 ) the generated 3D models and relative pose information in, e.g., database 108 , for future retrieval and viewing by a viewing device (i.e., as part of a playback feature as will be described below).
- the image processing module 106 transmits ( 408 ) the generated 3D models of the objects 102 a - 102 b and/or scene 101 to the viewing device 112 (e.g., via communications network 110 ) and, as the objects 102 a - 102 b and/or sensor 103 move in relation to each other, the image processing module 106 can stream in real-time the relative pose information as well as any texture changes to the viewing device 112 —such that the viewing device 112 can manipulate the previously-received 3D models to match the viewing experience being captured by the sensor 103 .
- FIG. 6B depicts an exemplary user interface screen for display on the viewing device 112 .
- the user interface screen includes options 604 for displaying the 3D models—including an option to Stream Object.
- the image processing module 106 only transmits changes in the 3D model(s) to the viewing device 112 .
- this process advantageously acts as a compressing technique because the amount of data is small but the viewing device 112 can replicate the complete viewing experience.
- FIG. 7A depicts an exemplary 3D model prior to transmission to the viewing device 112
- FIG. 7B depicts an exemplary 3D model after being received by the viewing device 112 —including the post-processing steps that generate a photorealistic reconstructed 3D model.
- the image processing module 106 transmits ( 410 ) the 3D models and relative pose information to the viewing device 112 , and the viewing device 112 renders viewing images of the 3D models using both the relative pose information of the viewing device 112 and the virtual copy of the remote scene and object(s).
- the viewing device 112 uses the 3D model to create a virtual copy of the first location's live scene (‘virtual scene’) and the images in the video stream are rendered using the GPU 116 on the viewing device 112 .
- the viewing device 112 is not tied to the original video capture perspective of the sensor 103 because the complete 3D model is recreated at the viewing device 112 . Therefore, the viewing device (e.g., the CPU 114 and GPU 116 ) at the second location can manipulate the 3D model(s) locally in order to produce a completely independent perspective of the model(s) than what is being captured by the sensor 103 at the first location. For example, the viewing device 112 can be used to ‘walk around’ the virtual scene which is a true 3D copy of the first location (such as via a VR viewing device).
- the scene 101 may be static (e.g., inside a museum) and the sensor 103 may move in relation to the scene 101 .
- the sensor 103 captures one or more images of the scene 101 , and the image processing module 106 can easily generate a 3D model of the scene 101 . Further, as long as the image processing module 106 captures the exact pose of the sensor 103 relative to the scene, the image processing module 106 renders a 2D image using the 3D model. Therefore, the image processing module 106 can simply transmit the 3D model of the scene and relative pose information to the viewing device 112 for rendering, e.g., as a video stream of the static scene.
- the image processing module 106 only needs to generate a photorealistic 3D model of the room and the sequential sensor pose information as it moves (e.g., pose one at time one, pose two at time two, etc.).
- the viewing device 112 can render a video of the room completely and accurately based on the 3D model and pose information.
- the resultant amount of information needed to replicate the video stream at the viewing device 112 is a fraction of the size that would be required to save the entire video stream and transmit the stream to the viewing device 112 .
- the only cost is the conversion of 3D model(s) into 2D images using the GPU 116 at the viewing device 112 .
- this 3D model to 2D video conversion can be done, e.g., the cloud or using another computing device.
- FIG. 8A depicts an exemplary 3D scene prior to transmission to the viewing device 112
- FIG. 8B depicts an exemplary 3D scene after being received by the viewing device 112 —including the post-processing steps that generate a photorealistic reconstructed 3D scene.
- the scene 101 may be static and include one or more static or moving objects 102 a - 102 b along with a moving sensor 103 .
- the image processing module 106 Similar to the static scene use case described previously, the image processing module 106 generates 3D models of the object(s) in the scene and captures the pose information of the sensor relative to the object(s). The image processing module 106 transmits the 3D models and the relative pose information to the viewing device 112 , and the device 112 can recreate the scene 101 plus the exact locations of the objects 102 a - 102 b within the scene to completely replicate the captured scene.
- the scene 101 may include non-rigid, moving objects—such as people.
- the image processing module 106 can generate a 3D model of the non-rigid object and using non-rigid registration techniques, the image processing module 106 can then send ‘sparse’ information to the viewing device 112 . This ‘sparse’ information can then be used by the viewing device 112 to reshape the 3D model and then render the 3D model from the scene 101 .
- the image processing module 106 For example, once the image processing module 106 generates a 3D model of a human face and transfers the 3D model to the viewing device 112 , the image processing module 106 only needs to track a small number of feature points of the face and transfer those feature points to the viewing device 112 to enable recreation of a facial expression accurately, e.g., in a video on the viewing device 112 .
- the amount of this ‘sparse’ information is a fraction of the dataset normally needed to send the entire new 3D model to the viewing device 112 .
- the system 100 can capture and transmit a scene 101 as a changing 3D model that is viewable by the viewing device 112 (e.g., VR or AR glasses).
- the pose information of the sensor 103 is not required because having the 3D model available at the viewing device 112 allows for rendering of the viewing images from any angle. Therefore, the sensor 103 and image processing module 103 only needs to capture and transmit to the viewing device 112 the object pose relative to the scene 101 as well as sparse feature points for the non-rigid objects along with texture change information. Using just this set of information, the viewing device 112 is capable of fully recreating for playback the scene and objects captured by the sensor and image processing module.
- the system can capture changes in texture due to lighting conditions as a location of the lighting source—especially if the location and brightness of the lighting source is generally stable. As a result, the system can render the correct texture without having to transmit the actual texture to the viewing device 112 . Instead, the viewing device 112 can simply render the texture based on the 3D model and the location and brightness of the lighting source. In another example, some key texture information (such as high-resolution face texture(s)) can be sent separately and then added to the ‘virtual scene’ to provide more realistic texture information.
- key texture information such as high-resolution face texture(s)
- a sensor device at a first location captures images of objects in the scene (e.g., players, referees, game ball, etc.) and the scene itself (e.g., the playing field).
- the sensor device also captures pose information of the sensor device—both as the sensor device moves throughout the scene and as the objects move within the scene in relation to the sensor device.
- the server computing device coupled to the sensor device uses the captured images and the pose information to generate initial 3D models of the players, ball, and other objects in the scene and an initial 3D model of the scene.
- the server computing device transmits the initial 3D models and the pose information to the remote viewing device (e.g., VR headset), which then generates a virtual representation of the scene (including the objects) as a video stream.
- the VR headset can capture pose information associated with the viewing perspective of the VR headset—which is independent from the pose information of the sensor device received from the server computing device.
- the headset can then utilize its own pose information to render a video stream from the perspective of the headset and, as the person wearing the headset moves around, the headset can render a different viewpoint of the scene and objects in the scene from that being captured by the sensor device.
- the viewer at the remote location can traverse the scene and view the action from a completely independent perspective from the sensor that is viewing the scene locally providing an immersive and unique experience for the viewer.
- the sensor device at the first location captures images of the museum scene as well as artwork, sculptures, people, and other objects in the museum.
- the sensor device also captures pose information of the sensor device—both as the sensor device moves throughout the scene and as the objects move within the scene in relation to the sensor device.
- the server computing device coupled to the sensor device uses the captured images and the pose information to generate initial 3D models of the objects in the scene and an initial 3D model of the scene.
- a sensor device can be used to capture a guided tour of the museum, viewing the various exhibits and wings.
- the server computing device then transmits the initial 3D models and the pose information to the remote viewing device (e.g., tablet, mobile phone), which then generates a virtual representation of the museum (including the artwork, people, etc.) using the 3D models and the pose information. Then, as the pose information changes (due to the sensor device moving through the museum), the server computing device can transmit the changing pose information to the remote viewing device, which automatically renders an updated representation of the scene and objects based upon the changing pose information, as part of a live and/or prerecorded video stream. In this way, the person viewing the stream can feel as though they are walking through the museum and seeing the exhibits and artwork in a first-person perspective.
- the remote viewing device e.g., tablet, mobile phone
- Live Streaming for example, in order to live stream a 3D scene such as a sports event, a concert, a live presentation, and the like, the techniques described herein can be used to immediately send out a sparse frame to the viewing device at the remote location. As the 3D model becomes more complete, the techniques provide for adding full texture.
- the techniques can leverage 3D model compression to further reduce the geometric complexity and provide a seamless streaming experience; Recording for Later ‘Replay’—the techniques can advantageously be used to store images and relative pose information (as described above) in order to replay the scene and objects at a later time.
- the computing device can store 3D models, image data, pose data, and sparse feature point data associated with the sensor capturing, e.g., a video of the scene and objects in the scene. Then, the viewing device 112 can later receive this information and recreate the entire video using the models, images, pose data and feature point data.
- the above-described techniques can be implemented in digital and/or analog electronic circuitry, or in computer hardware, firmware, software, or in combinations of them.
- the implementation can be as a computer program product, i.e., a computer program tangibly embodied in a machine-readable storage device, for execution by, or to control the operation of, a data processing apparatus, e.g., a programmable processor, a computer, and/or multiple computers.
- a computer program can be written in any form of computer or programming language, including source code, compiled code, interpreted code, and/or machine code, and the computer program can be deployed in any form, including as a stand-alone program or as a subroutine, element, or other unit suitable for use in a computing environment.
- a computer program can be deployed to be executed on one computer or on multiple computers at one or more sites.
- Method steps can be performed by one or more specialized processors executing a computer program to perform functions by operating on input data and/or generating output data. Method steps can also be performed by, and an apparatus can be implemented as, special purpose logic circuitry, e.g., a FPGA (field programmable gate array), a FPAA (field-programmable analog array), a CPLD (complex programmable logic device), a PSoC (Programmable System-on-Chip), ASIP (application-specific instruction-set processor), or an ASIC (application-specific integrated circuit), or the like.
- Subroutines can refer to portions of the stored computer program and/or the processor, and/or the special circuitry that implement one or more functions.
- processors suitable for the execution of a computer program include, by way of example, special purpose microprocessors.
- a processor receives instructions and data from a read-only memory or a random access memory or both.
- the essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and/or data.
- Memory devices such as a cache, can be used to temporarily store data. Memory devices can also be used for long-term data storage.
- a computer also includes, or is operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- a computer can also be operatively coupled to a communications network in order to receive instructions and/or data from the network and/or to transfer instructions and/or data to the network.
- Computer-readable storage mediums suitable for embodying computer program instructions and data include all forms of volatile and non-volatile memory, including by way of example semiconductor memory devices, e.g., DRAM, SRAM, EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and optical disks, e.g., CD, DVD, HD-DVD, and Blu-ray disks.
- the processor and the memory can be supplemented by and/or incorporated in special purpose logic circuitry.
- the above described techniques can be implemented on a computer in communication with a display device, e.g., a CRT (cathode ray tube), plasma, or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse, a trackball, a touchpad, or a motion sensor, by which the user can provide input to the computer (e.g., interact with a user interface element).
- a display device e.g., a CRT (cathode ray tube), plasma, or LCD (liquid crystal display) monitor
- a keyboard and a pointing device e.g., a mouse, a trackball, a touchpad, or a motion sensor, by which the user can provide input to the computer (e.g., interact with a user interface element).
- feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, and/or tactile input.
- feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback
- input from the user can be received in any form, including acoustic, speech, and/or tactile input.
- the above described techniques can be implemented in a distributed computing system that includes a back-end component.
- the back-end component can, for example, be a data server, a middleware component, and/or an application server.
- the above described techniques can be implemented in a distributed computing system that includes a front-end component.
- the front-end component can, for example, be a client computer having a graphical user interface, a Web browser through which a user can interact with an example implementation, and/or other graphical user interfaces for a transmitting device.
- the above described techniques can be implemented in a distributed computing system that includes any combination of such back-end, middleware, or front-end components.
- Transmission medium can include any form or medium of digital or analog data communication (e.g., a communication network).
- Transmission medium can include one or more packet-based networks and/or one or more circuit-based networks in any configuration.
- Packet-based networks can include, for example, the Internet, a carrier internet protocol (IP) network (e.g., local area network (LAN), wide area network (WAN), campus area network (CAN), metropolitan area network (MAN), home area network (HAN)), a private IP network, an IP private branch exchange (IPBX), a wireless network (e.g., radio access network (RAN), Bluetooth, Wi-Fi, WiMAX, general packet radio service (GPRS) network, HiperLAN), and/or other packet-based networks.
- IP carrier internet protocol
- RAN radio access network
- GPRS general packet radio service
- HiperLAN HiperLAN
- Circuit-based networks can include, for example, the public switched telephone network (PSTN), a legacy private branch exchange (PBX), a wireless network (e.g., RAN, code-division multiple access (CDMA) network, time division multiple access (TDMA) network, global system for mobile communications (GSM) network), and/or other circuit-based networks.
- PSTN public switched telephone network
- PBX legacy private branch exchange
- CDMA code-division multiple access
- TDMA time division multiple access
- GSM global system for mobile communications
- Communication protocols can include, for example, Ethernet protocol, Internet Protocol (IP), Voice over IP (VOIP), a Peer-to-Peer (P2P) protocol, Hypertext Transfer Protocol (HTTP), Session Initiation Protocol (SIP), H.323, Media Gateway Control Protocol (MGCP), Signaling System #7 (SS7), a Global System for Mobile Communications (GSM) protocol, a Push-to-Talk (PTT) protocol, a PTT over Cellular (POC) protocol, Universal Mobile Telecommunications System (UMTS), 3GPP Long Term Evolution (LTE) and/or other communication protocols.
- IP Internet Protocol
- VOIP Voice over IP
- P2P Peer-to-Peer
- HTTP Hypertext Transfer Protocol
- SIP Session Initiation Protocol
- H.323 H.323
- MGCP Media Gateway Control Protocol
- SS7 Signaling System #7
- GSM Global System for Mobile Communications
- PTT Push-to-Talk
- POC PTT over Cellular
- UMTS
- Devices of the computing system can include, for example, a computer, a computer with a browser device, a telephone, an IP phone, a mobile device (e.g., cellular phone, personal digital assistant (PDA) device, smart phone, tablet, laptop computer, electronic mail device), and/or other communication devices.
- the browser device includes, for example, a computer (e.g., desktop computer and/or laptop computer) with a World Wide Web browser (e.g., ChromeTM from Google, Inc., Microsoft® Internet Explorer® available from Microsoft Corporation, and/or Mozilla® Firefox available from Mozilla Corporation).
- Mobile computing device include, for example, a Blackberry® from Research in Motion, an iPhone® from Apple Corporation, and/or an AndroidTM-based device.
- IP phones include, for example, a Cisco® Unified IP Phone 7985G and/or a Cisco® Unified Wireless Phone 7920 available from Cisco Systems, Inc.
- Comprise, include, and/or plural forms of each are open ended and include the listed parts and can include additional parts that are not listed. And/or is open ended and includes one or more of the listed parts and combinations of the listed parts.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
- This application claims priority to U.S. Provisional Patent Application No. 62/405,372, filed on Oct. 7, 2016, the entirety of which is incorporated herein by reference.
- The subject matter of this application relates generally to methods and apparatuses, including computer program products, for real-time remote collaboration and virtual presence using simultaneous localization and mapping (SLAM) to construct three-dimensional (3D) models and updating a scene based upon sparse data, including two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction.
- To visually experience a live event from a remote location, a video can be streamed to a viewer at the remote location, or the event can be recorded and streamed later to the viewer. However, because bandwidth is limited in most cases, some form of compression (either lossless or lossy) such as MPEG-4 is used to reduce the amount of data being transmitted over the network by a factor of 100 or more. This allows the transmission of the video to be practical over low-bandwidth wired networks or most wireless networks.
- With the advent of virtual reality (VR) and associated viewing devices (such as VR headsets), there is an emerging interest in virtually experiencing live events remotely. But, the amount of data required for transmission over a network may cause significant problems with quality and efficiency of the viewing experience, because an example data size for a single 3D model could be in tens of megabytes. As an example, for sixty frames per second, transmitting and processing the frames in sequence could result in gigabytes of data per second. Even with significant compression such as not transmitting portions of the scene that do not change from frame to frame (similar to video compression strategy), the process still results in tens of megabytes of data to be transmitted remotely—which makes it impractical, especially for wireless networks. Also, methods to further compress the data, such as traditional 3D compression to reduce the number of triangles, can significantly reduce visual quality.
- Therefore, what is needed are methods and systems for lossless (or slightly lossy) compression to transmit a live three-dimensional scene, which in some cases includes objects in the scene, to a remote location by segmenting the scene as set of rigid and non-rigid photo-realistic 3D model objects and backgrounds (these are also called assets)—and then transmitting the data to the remote location once. Once the data transmission is accomplished, only the sparse pose information of the assets needs to be transmitted to the remote location. At the receiving device, a local computer graphics unit is used to render a replica of the remote scene while using a fraction of the bandwidth of traditional approaches. The bandwidth savings enables the application of these techniques to wireless networks. In the case of rapid scene changes when new assets are presented, the system is capable of transmitting new assets to the remote location and rendering the assets accordingly.
- Simultaneous localization and mapping (SLAM) is a computer modeling technique that is used to map and track the real world as a 3D model. The methods and systems described herein utilize SLAM to compress in real time a live video stream of a remote scene for the purpose of viewing that scene from any location. Once re-rendered as a 3D model at the viewer's device, the live remote scene can then be viewed using a VR headset as if the viewer is at the remote location. It should be appreciated that, in one embodiment, the technology described herein can be used to capture a scene (including objects in the scene) of a first location as one or more 3D models, transfer the 3D model(s) in real time to a second location that is remote from the first location, and then render viewing images of the 3D model from a different viewing perspective using the pose of a viewing element (e.g., digital screen, camera, or image viewer, headset) at the second location. In some cases, the second location can be equipped with a VR headset or other similar hardware to view the 3D model of first location from any viewing angle. Even when there are substantive changes in the scene at the first location, the systems and methods described herein advantageously transfer only the changing portions of the scene and/or objects in the scene to the second location.
- Therefore, instead of traditional methods that involve streaming new 2D image frames from the first location to the second location, the methods and systems described herein transmit an entire 3D model of the scene—and objects in the scene—from the first location to the second location, and use a graphics processor in the viewer's device to render image(s) using the 3D model. It should be appreciated that the techniques described herein provide the advantage of ‘virtually’ copying the scene and objects at the first location, storing 3D models of the scene and objects in memory of the viewer's device at the second location, and then rendering the scene and objects in real time (e.g., as a video stream) from the ‘virtual’ scene.
- Another advantage provided by the methods and systems described herein is that the image processing device at the first location needs only to transmit changes in the ‘position’ of the objects and the sensor location relative to the scene for each frame—and not the entire scene for each frame—to the viewer's device at the second location, in order for the viewer to move the objects and the sensor location in the virtual scene to replicate the same visual experience as if the remote viewer is at the first location. Because transmission of the changes in position and sensor location involves much less data than sending the entire scene, this technique advantageously provides for substantial compression of, e.g., a video stream transmitted from the first location to the second location.
- Similarly, for moving, rigid objects in the scene at the first location, the systems and methods described herein advantageously transfer only the pose of the object to the viewer's device at the second location once the viewer's device has received the 3D model(s). For non-rigid objects such as people, once the viewer's device at the second location has received the full 3D model of a non-rigid object from the first location, subsequent transmissions need to include only the sparse feature information of the non-rigid object in order for the viewer's device at the second location to recreate the scene correctly.
- For example, the sparse feature information can include feature points such as points associated with aspects of the person's body (e.g., head, feet, hands, arms). As the person moves in the scene, the sensor and server computing device need only capture and transmit the positional changes associated with these feature points to the viewing device—instead of the entire model—and the viewing device can update the 3D model at the remote location using the sparse feature information to track the person's movements through the scene.
- The invention, in one aspect, features a system for generating a video stream of a scene including one or more objects. The system comprises a sensor device that captures a plurality of images of one or more objects in a scene. The system further comprises a server computing device coupled to the sensor device that, for each image, generates an initial 3D model for each of the one or more objects in the scene using the image. The server computing device, for each image, generates an initial 3D model of the scene using the image. The server computing device, for each image, captures pose information of the sensor device relative to at least one of the scene or one or more of the objects in the scene as the sensor device moves in relation to the scene. The system further comprises a viewing device coupled to the server computing device. The viewing device receives (i) at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and (ii) the pose information of the sensor device, from the server computing device. The viewing device captures pose information of a viewing perspective of the viewing device relative to at least one of the scene or one or more of the objects in the scene as the viewing perspective of the viewing device moves in relation to the scene. The viewing device renders a video stream of at least one of the one or more objects or the scene on a display element of the viewing device using the received initial 3D models and at least one of (i) the pose information of the sensor device or (ii) the pose information of the viewing perspective of the viewing device.
- The invention, in another aspect, features a computerized method of generating a video stream of a scene including one or more objects. A sensor device captures a plurality of images of one or more objects in a scene. A server computing device coupled to the sensor device, for each image, generates an initial 3D model for each of the one or more objects in the scene using the image. The server computing device, for each image, generates an initial 3D model of the scene using the image. The server computing device, for each image, captures pose information of the sensor device relative to at least one of the scene or one or more of the objects in the scene as the sensor device moves in relation to the scene. A viewing device coupled to the server computing device receives (i) at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and (ii) the pose information of the sensor device, from the server computing device. The viewing device captures pose information of a viewing perspective of the viewing device relative to at least one of the scene or one or more of the objects in the scene as the viewing perspective of the viewing device moves in relation to the scene. The viewing device renders a video stream of at least one of the one or more objects or the scene on a display element of the viewing device using the received initial 3D models and at least one of (i) the pose information of the sensor device or (ii) the pose information of the viewing perspective of the viewing device.
- Any of the above aspects can include one or more of the following features. In some embodiments, the server computing device stores the initial 3D models of the one or more objects, the 3D model of the scene, and the pose information in a database. In some embodiments, the viewing device receives the at least one of the initial 3D models of the one or more objects or the initial 3D model of the scene and the pose information of the sensor device, from the server computing device during a real-time streaming session.
- In some embodiments, the viewing device is a virtual reality (VR) headset. In some embodiments, the viewing device generates an updated 3D model of the one or more objects in the scene based upon at least one of (i) updated pose information received from the server computing device or (ii) updated pose information of the viewing perspective of the viewing device. In some embodiments, the viewing device receives an image from the server computing device and applies the image to at least one of the initial 3D model or the initial 3D models of the one or more objects to generate a photorealistic 3D model.
- In some embodiments, the initial 3D model of the scene is generated using simultaneous localization and mapping (SLAM). In some embodiments, the initial 3D models of the one or more objects in the scene are generated using simultaneous localization and mapping (SLAM). In some embodiments, the server computing device determines one or more changes to at least one of the 3D models of the one or more objects or the 3D model of the scene based upon the pose information. In some embodiments, the one or more changes comprise data associated with one or more feature points on the at least one of the 3D models of the one or more objects or the 3D model of the scene. In some embodiments, the viewing device receives the one or more changes from the server computing device and the viewing device updates the initial 3D models based upon the one or more changes.
- Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating the principles of the invention by way of example only.
- The advantages of the invention described above, together with further advantages, may be better understood by referring to the following description taken in conjunction with the accompanying drawings. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention.
-
FIG. 1 is a block diagram of a system for two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction. -
FIG. 2 is a detailed block diagram of specific software processing modules executing in an exemplary image processing module for two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction. -
FIG. 3 is a flow diagram of a method of two-dimensional (2D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction. -
FIG. 4 is a flow diagram of a method of three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction. -
FIG. 5 is an exemplary viewing device. -
FIG. 6A depicts an exemplary object to be scanned by the sensor, and an exemplary 3D model of the object as generated by the system. -
FIG. 6B is an exemplary user interface screen for display on the viewing device. -
FIG. 7A is an exemplary 3D model prior to transmission to the viewing device. -
FIG. 7B is an exemplary 3D model after being received by the viewing device. -
FIG. 8A is an exemplary 3D scene prior to transmission to the viewing device. -
FIG. 8B is an exemplary 3D scene after being received by the viewing device. -
FIG. 1 is a block diagram of asystem 100 for two-dimensional (2D) and three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction. Certain embodiments of the systems and methods described in this application utilize: - the real-time object recognition and modeling techniques as described in U.S. Pat. No. 9,715,761, titled “Real-
Time 3D Computer Vision Processing Engine for Object Recognition, Reconstruction, and Analysis;”
the dynamic 3D modeling techniques as described in U.S. patent application Ser. No. 14/849,172, titled “Real-Time Dynamic Three-Dimensional Adaptive Object Recognition and Model Reconstruction;”
the shape-based registration and modeling techniques described in U.S. patent application Ser. No. 15/441,166, titled “Shape-Based Registration for Non-Rigid Objects with Large Holes;”
the 3D photogrammetry techniques described in U.S. patent application Ser. No. 15/596,590, titled “3D Photogrammetry;” and
the sparse SLAM techniques described in U.S. patent application Ser. No. 15/638,278, titled “Sparse Simultaneous Localization and Mapping with Unified Tracking.” - Each of the above-referenced patents and patent applications is incorporated by reference herein in its entirety. The methods and systems described in the above patents and patent applications, and in the present patent application, are available by implementing the Starry Night SDK, available from VanGogh Imaging, Inc. of McLean, Virginia.
- The
system 100 includes asensor 103 coupled to acomputing device 104. Thecomputing device 104 includes animage processing module 106. In some embodiments, the computing device can also be coupled to adatabase 108 or other data storage device, e.g., used for storing certain 3D models, images, pose information, and other data as described herein. Thesystem 100 also includes acommunications network 110 coupled to thecomputing device 104, and aviewing device 112 communicably coupled to thenetwork 110 in order to receive, e.g., 3D model data, image data, and other related data from thecomputing device 104 for the purposes described herein. - The
sensor 103 is positioned to capture images of ascene 101, which includes one or more physical objects (e.g., objects 102 a-102 b). Exemplary sensors that can be used in thesystem 100 include, but are not limited to, real-time 3D depth sensors, digital cameras,combination 3D depth and RGB camera devices, and other types of devices that are capable of capturing depth information of the pixels along with the images of a real-world object and/or scene to collect data on its position, location, and appearance. In some embodiments, thesensor 103 is embedded into thecomputing device 104, such as a camera in a smartphone or a 3D VR capture device, for example. In some embodiments, thesensor 103 further includes an inertial measurement unit (IMU) to capture data points such as heading, linear acceleration, rotation, and the like. - The
computing device 104 receives images (also called scans) of thescene 101 from thesensor 103 and processes the images to generate 3D models of objects (e.g., objects 102 a-102 b) represented in thescene 101. Thecomputing device 104 can take on many forms, including both mobile and non-mobile forms. Exemplary computing devices include, but are not limited to, a laptop computer, a desktop computer, a tablet computer, a smart phone, an internet of things (IoT) device, augmented reality (AR)/virtual reality (VR) devices (e.g., glasses, headset apparatuses, and so forth), or the like. In some embodiments, thesensor 103 andcomputing device 104 can be embedded in a larger mobile structure such as a robot or unmanned aerial vehicle (UAV). It should be appreciated that other computing devices can be used without departing from the scope of the invention. Thecomputing device 104 includes network-interface components to connect to a communications network (e.g., network 110). In some embodiments, the network-interface components include components to connect to a wireless network, such as a Wi-Fi or cellular network, in order to access a wider network, such as the Internet. - The
computing device 104 includes animage processing module 106 configured to receive images captured by thesensor 103 and analyze the images in a variety of ways, including detecting the position and location of objects represented in the images and generating 3D models of objects in the images. - The
image processing module 106 is a hardware and/or software module that resides on thecomputing device 104 to perform functions associated with analyzing images capture by the scanner, including the generation of 3D models (e.g., .OBJ files) based upon objects in the images. In some embodiments, the functionality of theimage processing module 106 is distributed among a plurality of computing devices. In some embodiments, theimage processing module 106 operates in conjunction with other modules that are either also located on thecomputing device 104 or on other computing devices coupled to thecomputing device 104. It should be appreciated that any number of computing devices, arranged in a variety of architectures, resources, and configurations (e.g., cluster computing, virtual computing, cloud computing) can be used without departing from the scope of the invention. An exemplaryimage processing module 106 is the Starry Night SDK, available from VanGogh Imaging, Inc. of McLean, Va. - It should be appreciated that in one embodiment, the
image processing module 106 comprises specialized hardware (such as a processor or system-on-chip) that is embedded into, e.g., a circuit board or other similar component of another device. In this embodiment, theimage processing module 106 is specifically programmed with the image processing and modeling software functionality described below. -
FIG. 2 is a detailed block diagram 200 of specific software processing modules executing in an exemplaryimage processing module 106 at thecomputing device 104. As shown inFIG. 2 , theimage processing module 106 receives images andrelated data 202 as input from the sensor (e.g., the RGB sensor, the 3D depth sensor and, optionally, the IMU). The modules 204-214 each provides specific image processing and 3D model generation capabilities to theSLAM module 216, which generates a dense map, a dense tracking, and a pose of the 3D models of the scene and objects in the scene. Also, as shown, thesparse tracking module 210 generates a sparse map and sparse tracking of the 3D models of the scene and objects in the scene. Thesparse tracking module 210 and theSLAM module 216 each sends its respective tracking information, along with the pose from theSLAM module 216, to theunified tracking module 218, which integrates the received information into a final pose of the 3D model(s). Themodules photogrammetry module 220, which performs functions such as texture refinement, hole-filling, and geometric correction. The updated 3D models from thephotogrammetry module 220 are further processed by the shape-basedregistration module 222. As a result, theimage processing module 106 provides the pose and photo-realistic 3D model asoutput 224 to, e.g., theviewing device 112. - Further details about the specific functionality and processing for each module described above with respect to
FIG. 2 is described in the real-time object recognition and modeling techniques as described in U.S. Pat. No. 9,715,761, titled “Real-Time 3D Computer Vision Processing Engine for Object Recognition, Reconstruction, and Analysis;” U.S. patent application Ser. No. 14/849,172, titled “Real-Time Dynamic Three-Dimensional Adaptive Object Recognition and Model Reconstruction;” U.S. patent application Ser. No. 15/441,166, titled “Shape-Based Registration for Non-Rigid Objects with Large Holes;” U.S. patent application Ser. No. 15/596,590, titled “3D Photogrammetry;” and U.S. patent application Ser. No. 15/638,278, titled “Sparse Simultaneous Localization and Mapping with Unified Tracking,” each of which is incorporated herein by reference. - The
database 108 is coupled to thecomputing device 104, and operates to store data used by theimage processing module 106 during its image analysis functions. Thedata storage module 108 can be integrated with theserver computing device 104 or be located on a separate computing device. - The
communications network 110 may be a local network, such as a LAN, or a wide area network, such as the Internet and/or a cellular network. In some embodiments, thenetwork 104 is comprised of several discrete networks and/or sub-networks (e.g., cellular to Internet) that enable the components of thesystem 100 to communicate with each other. - The
viewing device 112 is a computing device that receives information such as image data, 3D model data, and other types of data described herein from theimage processing module 106 of theserver computing device 104 for rendering of thescene 101 and objects 102 a-102 b as captured by thesensor 103. As shown inFIG. 1 , theviewing device 112 is positioned at a second location that is remote from the first location where thesensor 103 andcomputing device 104 are located. It should be appreciated that, in some embodiments, the first location and second location do not need to be separate physical or geographical locations. Exemplary viewing devices include, but are not limited to, laptop computers, desktop computers, tablets, smartphones, smart televisions, VR/AR hardware (e.g., glasses, headset), IoT devices, and the like. - The
viewing device 112 includes, e.g., aCPU 114 and aGPU 116, which are specialized processors embedded in theviewing device 112 for the purpose of receiving 3D model data and pose data from theimage processing module 106 vianetwork 110, updating 3D model(s) stored at theviewing device 112 using the received information, and rendering real-time image data (e.g., a video stream) based upon the updated 3D model(s) to provide a viewing experience to a user of theviewing device 112 that is the same as the viewing experience captured by thesensor 103 and thecomputing device 104 at the first location. -
FIG. 3 is a flow diagram of amethod 300 of two-dimensional (2D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction, using thesystem 100 ofFIG. 1 . Thesensor 103 captures (302) one or more images of thescene 101, including one or more of the objects 102 a-102 b and transmits the images to theimage processing module 106 of thecomputing device 104. Theimage processing module 106 generates (304) 3D models of thescene 101 and objects 102 a-102 b using the images captured by thesensor 103. The 3D generation is performed using Dense and/or sparse SLAM (or sometimes called Fusion) which stitches incoming scans into a single 3D model of scene or object. Exemplary processing for theimage processing module 106 is described above with respect toFIG. 2 . - The
image processing module 106 also tracks (306) the pose of thesensor 103 relative to thescene 101 and the objects 102 a-102 b. For example, if thesensor 103 is non-stationary (i.e., the sensor moves in relation to the scene and/or objects), theimage processing module 106 receives pose information relating to thesensor 103 and stores the pose information in correlation with the captured images. - Once the
image processing module 106 has generated the 3D models and captured the relative pose information, theimage processing module 106 can perform a number of different actions relating to transferring the 3D models and pose information to a viewing device for rendering and viewing the images. In one action, theimage processing module 106 stores (308) the generated 3D models and relative pose information in, e.g.,database 108, for future retrieval and viewing by a viewing device (i.e., as part of a playback feature as will be described below). - In another action, the
image processing module 106 transmits (310) the generated 3D models of the objects 102 a-102 b and/orscene 101 to the viewing device 112 (e.g., via communications network 110) and, as the objects 102 a-102 b and/orsensor 103 move in relation to each other, theimage processing module 106 can stream in real-time the relative pose information to theviewing device 112—such that theviewing device 112 can manipulate the previously-received 3D models to match the viewing experience being captured by thesensor 103. To reduce the amount of data being transmitted to the viewing device, in some embodiments theimage processing module 106 only transmits changes in the 3D model(s) to theviewing device 112. As explained above, this process advantageously acts as a compressing technique because the amount of data is small but theviewing device 112 can replicate the complete viewing experience. For example, theCPU 114 of theviewing device 112 receives the pose information and the changes in the 3D model(s) from theimage processing module 106 vianetwork 108. TheCPU 114 updates the 3D model(s) stored at theviewing device 112 using the received changes and pose information, and transmits the updated 3D model(s) to theGPU 116 for rendering into, e.g., stereo pair images, single 2D images, or other similar outputs—for viewing by a user at theviewing device 112.FIG. 5 depicts anexemplary viewing device 112 for the transmitted output. - In another action, the
image processing module 106 transmits (312) the 3D models and relative pose information to theviewing device 112, and theviewing device 112 renders the 2D video streams using the relative sensor (or camera) pose information. In some embodiments, theimage processing module 106 further transmits image data captured by thesensor 103 to theviewing device 112, and theviewing device 112 uses the image data to render, e.g., a photorealistic 3D model of the objects and/or scene captured by thesensor 103. Thus, theviewing device 112 uses the 3D model to create a virtual copy of the first location's live scene (‘virtual scene’) and the images in the video stream are rendered using theGPU 116 on theviewing device 112. -
FIG. 4 is a flow diagram of a method 400 of three-dimensional (3D) video compression using simultaneous localization and mapping (SLAM) for real-time remote interaction, using thesystem 100 ofFIG. 1 . Thesensor 103 captures (402) one or more images of thescene 101, including one or more of the objects 102 a-102 b and transmits the images to theimage processing module 106 of thecomputing device 104. Theimage processing module 106 generates (404) 3D models of thescene 101 and objects 102 a-102 b using the images captured by thesensor 103. In some embodiments, the 3D models generated by theimage processing module 106 are photorealistic. The 3D generation is performed using dense and/or sparse SLAM (sometimes called Fusion) which stitches incoming scans into a single 3D model of a scene and/or objects in the scene. Exemplary processing for theimage processing module 106 is described above with respect toFIG. 2 .FIG. 6A depicts anexemplary object 102 a (e.g., a toy horse) to be scanned by the sensor, and anexemplary 3D model 602 generated by thesystem 100. - Once the
image processing module 106 has generated the 3D models and captured the relative pose information, theimage processing module 106 can perform a number of different actions relating to transferring the 3D models and pose information to a viewing device (e.g., viewing device 112) for rendering and viewing the images. In one action, theimage processing module 106 stores (406) the generated 3D models and relative pose information in, e.g.,database 108, for future retrieval and viewing by a viewing device (i.e., as part of a playback feature as will be described below). - In another action, the
image processing module 106 transmits (408) the generated 3D models of the objects 102 a-102 b and/orscene 101 to the viewing device 112 (e.g., via communications network 110) and, as the objects 102 a-102 b and/orsensor 103 move in relation to each other, theimage processing module 106 can stream in real-time the relative pose information as well as any texture changes to theviewing device 112—such that theviewing device 112 can manipulate the previously-received 3D models to match the viewing experience being captured by thesensor 103.FIG. 6B depicts an exemplary user interface screen for display on theviewing device 112. As shown, the user interface screen includesoptions 604 for displaying the 3D models—including an option to Stream Object. To reduce the amount of data being transmitted to the viewing device, in some embodiments theimage processing module 106 only transmits changes in the 3D model(s) to theviewing device 112. As explained above, this process advantageously acts as a compressing technique because the amount of data is small but theviewing device 112 can replicate the complete viewing experience.FIG. 7A depicts an exemplary 3D model prior to transmission to theviewing device 112, andFIG. 7B depicts an exemplary 3D model after being received by theviewing device 112—including the post-processing steps that generate a photorealistic reconstructed 3D model. - In another action, the
image processing module 106 transmits (410) the 3D models and relative pose information to theviewing device 112, and theviewing device 112 renders viewing images of the 3D models using both the relative pose information of theviewing device 112 and the virtual copy of the remote scene and object(s). Thus, theviewing device 112 uses the 3D model to create a virtual copy of the first location's live scene (‘virtual scene’) and the images in the video stream are rendered using theGPU 116 on theviewing device 112. - It should be appreciated that, in the above-described 3D embodiments, the
viewing device 112 is not tied to the original video capture perspective of thesensor 103 because the complete 3D model is recreated at theviewing device 112. Therefore, the viewing device (e.g., theCPU 114 and GPU 116) at the second location can manipulate the 3D model(s) locally in order to produce a completely independent perspective of the model(s) than what is being captured by thesensor 103 at the first location. For example, theviewing device 112 can be used to ‘walk around’ the virtual scene which is a true 3D copy of the first location (such as via a VR viewing device). - As can be appreciated, there are wide variety of different use cases that can take advantage of the systems and methods described herein:
- In one example, the
scene 101 may be static (e.g., inside a museum) and thesensor 103 may move in relation to thescene 101. Thesensor 103 captures one or more images of thescene 101, and theimage processing module 106 can easily generate a 3D model of thescene 101. Further, as long as theimage processing module 106 captures the exact pose of thesensor 103 relative to the scene, theimage processing module 106 renders a 2D image using the 3D model. Therefore, theimage processing module 106 can simply transmit the 3D model of the scene and relative pose information to theviewing device 112 for rendering, e.g., as a video stream of the static scene. For example, if thesensor 103 captures a ten-minute video of a room as thesensor 103 moves around the room, theimage processing module 106 only needs to generate a photorealistic 3D model of the room and the sequential sensor pose information as it moves (e.g., pose one at time one, pose two at time two, etc.). Theviewing device 112 can render a video of the room completely and accurately based on the 3D model and pose information. Hence, the resultant amount of information needed to replicate the video stream at theviewing device 112 is a fraction of the size that would be required to save the entire video stream and transmit the stream to theviewing device 112. The only cost is the conversion of 3D model(s) into 2D images using theGPU 116 at theviewing device 112. In some embodiments, this 3D model to 2D video conversion can be done, e.g., the cloud or using another computing device.FIG. 8A depicts an exemplary 3D scene prior to transmission to theviewing device 112, andFIG. 8B depicts an exemplary 3D scene after being received by theviewing device 112—including the post-processing steps that generate a photorealistic reconstructed 3D scene. - In another example, the
scene 101 may be static and include one or more static or moving objects 102 a-102 b along with a movingsensor 103. Similar to the static scene use case described previously, theimage processing module 106 generates 3D models of the object(s) in the scene and captures the pose information of the sensor relative to the object(s). Theimage processing module 106 transmits the 3D models and the relative pose information to theviewing device 112, and thedevice 112 can recreate thescene 101 plus the exact locations of the objects 102 a-102 b within the scene to completely replicate the captured scene. - In another example, the
scene 101 may include non-rigid, moving objects—such as people. In these cases, theimage processing module 106 can generate a 3D model of the non-rigid object and using non-rigid registration techniques, theimage processing module 106 can then send ‘sparse’ information to theviewing device 112. This ‘sparse’ information can then be used by theviewing device 112 to reshape the 3D model and then render the 3D model from thescene 101. For example, once theimage processing module 106 generates a 3D model of a human face and transfers the 3D model to theviewing device 112, theimage processing module 106 only needs to track a small number of feature points of the face and transfer those feature points to theviewing device 112 to enable recreation of a facial expression accurately, e.g., in a video on theviewing device 112. The amount of this ‘sparse’ information is a fraction of the dataset normally needed to send the entire new 3D model to theviewing device 112. - In another example, the
system 100 can capture and transmit ascene 101 as a changing 3D model that is viewable by the viewing device 112 (e.g., VR or AR glasses). In this example, the pose information of thesensor 103 is not required because having the 3D model available at theviewing device 112 allows for rendering of the viewing images from any angle. Therefore, thesensor 103 andimage processing module 103 only needs to capture and transmit to theviewing device 112 the object pose relative to thescene 101 as well as sparse feature points for the non-rigid objects along with texture change information. Using just this set of information, theviewing device 112 is capable of fully recreating for playback the scene and objects captured by the sensor and image processing module. - In another example, the system can capture changes in texture due to lighting conditions as a location of the lighting source—especially if the location and brightness of the lighting source is generally stable. As a result, the system can render the correct texture without having to transmit the actual texture to the
viewing device 112. Instead, theviewing device 112 can simply render the texture based on the 3D model and the location and brightness of the lighting source. In another example, some key texture information (such as high-resolution face texture(s)) can be sent separately and then added to the ‘virtual scene’ to provide more realistic texture information. - The following section provides exemplary use cases describing specific applications of the systems and methods described herein.
- In a first use case, a sensor device at a first location (e.g., a live sporting event) captures images of objects in the scene (e.g., players, referees, game ball, etc.) and the scene itself (e.g., the playing field). The sensor device also captures pose information of the sensor device—both as the sensor device moves throughout the scene and as the objects move within the scene in relation to the sensor device. The server computing device coupled to the sensor device uses the captured images and the pose information to generate initial 3D models of the players, ball, and other objects in the scene and an initial 3D model of the scene. The server computing device then transmits the initial 3D models and the pose information to the remote viewing device (e.g., VR headset), which then generates a virtual representation of the scene (including the objects) as a video stream. At this point, the VR headset can capture pose information associated with the viewing perspective of the VR headset—which is independent from the pose information of the sensor device received from the server computing device. The headset can then utilize its own pose information to render a video stream from the perspective of the headset and, as the person wearing the headset moves around, the headset can render a different viewpoint of the scene and objects in the scene from that being captured by the sensor device. As such, the viewer at the remote location can traverse the scene and view the action from a completely independent perspective from the sensor that is viewing the scene locally providing an immersive and unique experience for the viewer.
- In another use case, the sensor device at the first location (e.g., a museum) captures images of the museum scene as well as artwork, sculptures, people, and other objects in the museum. The sensor device also captures pose information of the sensor device—both as the sensor device moves throughout the scene and as the objects move within the scene in relation to the sensor device. The server computing device coupled to the sensor device uses the captured images and the pose information to generate initial 3D models of the objects in the scene and an initial 3D model of the scene. For example, a sensor device can be used to capture a guided tour of the museum, viewing the various exhibits and wings. The server computing device then transmits the initial 3D models and the pose information to the remote viewing device (e.g., tablet, mobile phone), which then generates a virtual representation of the museum (including the artwork, people, etc.) using the 3D models and the pose information. Then, as the pose information changes (due to the sensor device moving through the museum), the server computing device can transmit the changing pose information to the remote viewing device, which automatically renders an updated representation of the scene and objects based upon the changing pose information, as part of a live and/or prerecorded video stream. In this way, the person viewing the stream can feel as though they are walking through the museum and seeing the exhibits and artwork in a first-person perspective.
- It should be appreciated that the methods, systems, and techniques described herein are applicable to a wide variety of useful commercial and/or technical applications. Such applications can include, but are not limited to:
- Augmented Reality/Virtual Reality, Robotics, Education, Part Inspection, E-Commerce, Social Media, Internet of Things—to capture, track, and interact with real-world objects from a scene for representation in a virtual environment, such as remote interaction with objects and/or scenes by a viewing device in another location, including any applications where there may be constraints on file size and transmission speed but a high-definition image is still capable of being rendered on the viewing device;
Live Streaming—for example, in order to live stream a 3D scene such as a sports event, a concert, a live presentation, and the like, the techniques described herein can be used to immediately send out a sparse frame to the viewing device at the remote location. As the 3D model becomes more complete, the techniques provide for adding full texture. This is similar to video applications that display a low-resolution image first while the applications download a high-definition image. Furthermore, the techniques can leverage 3D model compression to further reduce the geometric complexity and provide a seamless streaming experience;
Recording for Later ‘Replay’—the techniques can advantageously be used to store images and relative pose information (as described above) in order to replay the scene and objects at a later time. For example, the computing device can store 3D models, image data, pose data, and sparse feature point data associated with the sensor capturing, e.g., a video of the scene and objects in the scene. Then, theviewing device 112 can later receive this information and recreate the entire video using the models, images, pose data and feature point data. - The above-described techniques can be implemented in digital and/or analog electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The implementation can be as a computer program product, i.e., a computer program tangibly embodied in a machine-readable storage device, for execution by, or to control the operation of, a data processing apparatus, e.g., a programmable processor, a computer, and/or multiple computers. A computer program can be written in any form of computer or programming language, including source code, compiled code, interpreted code, and/or machine code, and the computer program can be deployed in any form, including as a stand-alone program or as a subroutine, element, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one or more sites.
- Method steps can be performed by one or more specialized processors executing a computer program to perform functions by operating on input data and/or generating output data. Method steps can also be performed by, and an apparatus can be implemented as, special purpose logic circuitry, e.g., a FPGA (field programmable gate array), a FPAA (field-programmable analog array), a CPLD (complex programmable logic device), a PSoC (Programmable System-on-Chip), ASIP (application-specific instruction-set processor), or an ASIC (application-specific integrated circuit), or the like. Subroutines can refer to portions of the stored computer program and/or the processor, and/or the special circuitry that implement one or more functions.
- Processors suitable for the execution of a computer program include, by way of example, special purpose microprocessors. Generally, a processor receives instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and/or data. Memory devices, such as a cache, can be used to temporarily store data. Memory devices can also be used for long-term data storage. Generally, a computer also includes, or is operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. A computer can also be operatively coupled to a communications network in order to receive instructions and/or data from the network and/or to transfer instructions and/or data to the network. Computer-readable storage mediums suitable for embodying computer program instructions and data include all forms of volatile and non-volatile memory, including by way of example semiconductor memory devices, e.g., DRAM, SRAM, EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and optical disks, e.g., CD, DVD, HD-DVD, and Blu-ray disks. The processor and the memory can be supplemented by and/or incorporated in special purpose logic circuitry.
- To provide for interaction with a user, the above described techniques can be implemented on a computer in communication with a display device, e.g., a CRT (cathode ray tube), plasma, or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse, a trackball, a touchpad, or a motion sensor, by which the user can provide input to the computer (e.g., interact with a user interface element). Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, and/or tactile input.
- The above described techniques can be implemented in a distributed computing system that includes a back-end component. The back-end component can, for example, be a data server, a middleware component, and/or an application server. The above described techniques can be implemented in a distributed computing system that includes a front-end component. The front-end component can, for example, be a client computer having a graphical user interface, a Web browser through which a user can interact with an example implementation, and/or other graphical user interfaces for a transmitting device. The above described techniques can be implemented in a distributed computing system that includes any combination of such back-end, middleware, or front-end components.
- The components of the computing system can be interconnected by transmission medium, which can include any form or medium of digital or analog data communication (e.g., a communication network). Transmission medium can include one or more packet-based networks and/or one or more circuit-based networks in any configuration. Packet-based networks can include, for example, the Internet, a carrier internet protocol (IP) network (e.g., local area network (LAN), wide area network (WAN), campus area network (CAN), metropolitan area network (MAN), home area network (HAN)), a private IP network, an IP private branch exchange (IPBX), a wireless network (e.g., radio access network (RAN), Bluetooth, Wi-Fi, WiMAX, general packet radio service (GPRS) network, HiperLAN), and/or other packet-based networks. Circuit-based networks can include, for example, the public switched telephone network (PSTN), a legacy private branch exchange (PBX), a wireless network (e.g., RAN, code-division multiple access (CDMA) network, time division multiple access (TDMA) network, global system for mobile communications (GSM) network), and/or other circuit-based networks.
- Information transfer over transmission medium can be based on one or more communication protocols. Communication protocols can include, for example, Ethernet protocol, Internet Protocol (IP), Voice over IP (VOIP), a Peer-to-Peer (P2P) protocol, Hypertext Transfer Protocol (HTTP), Session Initiation Protocol (SIP), H.323, Media Gateway Control Protocol (MGCP), Signaling System #7 (SS7), a Global System for Mobile Communications (GSM) protocol, a Push-to-Talk (PTT) protocol, a PTT over Cellular (POC) protocol, Universal Mobile Telecommunications System (UMTS), 3GPP Long Term Evolution (LTE) and/or other communication protocols.
- Devices of the computing system can include, for example, a computer, a computer with a browser device, a telephone, an IP phone, a mobile device (e.g., cellular phone, personal digital assistant (PDA) device, smart phone, tablet, laptop computer, electronic mail device), and/or other communication devices. The browser device includes, for example, a computer (e.g., desktop computer and/or laptop computer) with a World Wide Web browser (e.g., Chrome™ from Google, Inc., Microsoft® Internet Explorer® available from Microsoft Corporation, and/or Mozilla® Firefox available from Mozilla Corporation). Mobile computing device include, for example, a Blackberry® from Research in Motion, an iPhone® from Apple Corporation, and/or an Android™-based device. IP phones include, for example, a Cisco® Unified IP Phone 7985G and/or a Cisco® Unified Wireless Phone 7920 available from Cisco Systems, Inc.
- Comprise, include, and/or plural forms of each are open ended and include the listed parts and can include additional parts that are not listed. And/or is open ended and includes one or more of the listed parts and combinations of the listed parts.
- One skilled in the art will realize the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting of the invention described herein.
Claims (22)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/726,316 US10380762B2 (en) | 2016-10-07 | 2017-10-05 | Real-time remote collaboration and virtual presence using simultaneous localization and mapping to construct a 3D model and update a scene based on sparse data |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662405372P | 2016-10-07 | 2016-10-07 | |
US15/726,316 US10380762B2 (en) | 2016-10-07 | 2017-10-05 | Real-time remote collaboration and virtual presence using simultaneous localization and mapping to construct a 3D model and update a scene based on sparse data |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180101966A1 true US20180101966A1 (en) | 2018-04-12 |
US10380762B2 US10380762B2 (en) | 2019-08-13 |
Family
ID=61829752
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/726,316 Active US10380762B2 (en) | 2016-10-07 | 2017-10-05 | Real-time remote collaboration and virtual presence using simultaneous localization and mapping to construct a 3D model and update a scene based on sparse data |
Country Status (1)
Country | Link |
---|---|
US (1) | US10380762B2 (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180122142A1 (en) * | 2016-10-31 | 2018-05-03 | Verizon Patent And Licensing Inc. | Methods and Systems for Dynamically Customizing a Scene for Presentation to a User |
US20180151045A1 (en) * | 2016-11-28 | 2018-05-31 | Korea Institute Of Civil Engineering And Building Technology | Facility management system using internet of things (iot) based sensor and unmanned aerial vehicle (uav), and method for the same |
CN108650523A (en) * | 2018-05-22 | 2018-10-12 | 广州虎牙信息科技有限公司 | The display of direct broadcasting room and virtual objects choosing method, server, terminal and medium |
US10269116B2 (en) * | 2016-12-26 | 2019-04-23 | Intel Corporation | Proprioception training method and apparatus |
US20190139300A1 (en) * | 2017-11-08 | 2019-05-09 | Siemens Healthcare Gmbh | Medical scene model |
CN110503631A (en) * | 2019-07-24 | 2019-11-26 | 山东师范大学 | A kind of method for detecting change of remote sensing image |
CN111340922A (en) * | 2018-12-18 | 2020-06-26 | 北京三星通信技术研究有限公司 | Positioning and mapping method and electronic equipment |
CN111641841A (en) * | 2020-05-29 | 2020-09-08 | 广州华多网络科技有限公司 | Virtual trampoline activity data exchange method, device, medium and electronic equipment |
CN111862163A (en) * | 2020-08-03 | 2020-10-30 | 湖北亿咖通科技有限公司 | Trajectory optimization method and device |
CN112017242A (en) * | 2020-08-21 | 2020-12-01 | 北京市商汤科技开发有限公司 | Display method and device, equipment and storage medium |
CN112423014A (en) * | 2020-11-19 | 2021-02-26 | 上海电气集团股份有限公司 | Remote review method and device |
US11127212B1 (en) * | 2017-08-24 | 2021-09-21 | Sean Asher Wilens | Method of projecting virtual reality imagery for augmenting real world objects and surfaces |
US11159798B2 (en) * | 2018-08-21 | 2021-10-26 | International Business Machines Corporation | Video compression using cognitive semantics object analysis |
US20220076402A1 (en) * | 2019-04-05 | 2022-03-10 | Waymo Llc | High bandwidth camera data transmission |
US20220182596A1 (en) * | 2020-12-03 | 2022-06-09 | Samsung Electronics Co., Ltd. | Method of providing adaptive augmented reality streaming and apparatus performing the method |
US11741673B2 (en) | 2018-11-30 | 2023-08-29 | Interdigital Madison Patent Holdings, Sas | Method for mirroring 3D objects to light field displays |
US11868675B2 (en) | 2015-10-08 | 2024-01-09 | Interdigital Vc Holdings, Inc. | Methods and systems of automatic calibration for dynamic display configurations |
WO2024100028A1 (en) * | 2022-11-08 | 2024-05-16 | Nokia Technologies Oy | Signalling for real-time 3d model generation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040010493A1 (en) * | 1997-11-19 | 2004-01-15 | Ns Solutions Corporation | Database system and a method of data retrieval from the system |
US20140017653A1 (en) * | 2012-07-10 | 2014-01-16 | Gordon W. Romney | Apparatus, system, and method for a virtual instruction cloud |
US20180001885A1 (en) * | 2016-06-29 | 2018-01-04 | Ford Global Technologies, Llc | Method and system for torque control |
US20180014454A1 (en) * | 2016-07-12 | 2018-01-18 | Yetter Manufacturing Company | Seed furrow closing wheel |
Family Cites Families (69)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6525722B1 (en) | 1995-08-04 | 2003-02-25 | Sun Microsystems, Inc. | Geometry compression for regular and irregular mesh structures |
US6275235B1 (en) | 1998-12-21 | 2001-08-14 | Silicon Graphics, Inc. | High precision texture wrapping method and device |
US6259815B1 (en) | 1999-03-04 | 2001-07-10 | Mitsubishi Electric Research Laboratories, Inc. | System and method for recognizing scanned objects with deformable volumetric templates |
US6525725B1 (en) | 2000-03-15 | 2003-02-25 | Sun Microsystems, Inc. | Morphing decompression in a graphics system |
US20040104935A1 (en) * | 2001-01-26 | 2004-06-03 | Todd Williamson | Virtual reality immersion system |
US7248257B2 (en) | 2001-02-14 | 2007-07-24 | Technion Research & Development Foundation Ltd. | Low bandwidth transmission of 3D graphical data |
GB0126526D0 (en) | 2001-11-05 | 2002-01-02 | Canon Europa Nv | Three-dimensional computer modelling |
WO2004003850A1 (en) | 2002-06-28 | 2004-01-08 | Fujitsu Limited | Three-dimensional image comparing program, three-dimensionalimage comparing method, and three-dimensional image comparing device |
US7317456B1 (en) | 2002-12-02 | 2008-01-08 | Ngrain (Canada) Corporation | Method and apparatus for transforming point cloud data to volumetric data |
JP2005353047A (en) | 2004-05-13 | 2005-12-22 | Sanyo Electric Co Ltd | Three-dimensional image processing method and three-dimensional image processor |
US7657081B2 (en) | 2004-09-03 | 2010-02-02 | National Research Council Of Canada | Recursive 3D model optimization |
WO2006027339A2 (en) | 2004-09-06 | 2006-03-16 | The European Community, Represented By The European Commission | Method and system for 3d scene change detection |
US7602398B2 (en) | 2005-01-28 | 2009-10-13 | Microsoft Corporation | Decorating surfaces with textures |
JP4871352B2 (en) | 2005-03-11 | 2012-02-08 | クリアフォーム インク. | Automatic reference system and apparatus for 3D scanning |
US8625854B2 (en) | 2005-09-09 | 2014-01-07 | Industrial Research Limited | 3D scene scanner and a position and orientation system |
WO2007038330A2 (en) | 2005-09-22 | 2007-04-05 | 3M Innovative Properties Company | Artifact mitigation in three-dimensional imaging |
US8194074B2 (en) | 2006-05-04 | 2012-06-05 | Brown Battle M | Systems and methods for photogrammetric rendering |
US8139067B2 (en) | 2006-07-25 | 2012-03-20 | The Board Of Trustees Of The Leland Stanford Junior University | Shape completion, animation and marker-less motion capture of people, animals or characters |
US8090194B2 (en) | 2006-11-21 | 2012-01-03 | Mantis Vision Ltd. | 3D geometric modeling and motion capture using both single and dual imaging |
US8290305B2 (en) | 2009-02-13 | 2012-10-16 | Harris Corporation | Registration of 3D point cloud data to 2D electro-optical image data |
WO2010129363A2 (en) | 2009-04-28 | 2010-11-11 | The Regents Of The University Of California | Markerless geometric registration of multiple projectors on extruded surfaces using an uncalibrated camera |
US8542252B2 (en) | 2009-05-29 | 2013-09-24 | Microsoft Corporation | Target digitization, extraction, and tracking |
KR101619076B1 (en) | 2009-08-25 | 2016-05-10 | 삼성전자 주식회사 | Method of detecting and tracking moving object for mobile platform |
KR101697184B1 (en) | 2010-04-20 | 2017-01-17 | 삼성전자주식회사 | Apparatus and Method for generating mesh, and apparatus and method for processing image |
KR101054736B1 (en) | 2010-05-04 | 2011-08-05 | 성균관대학교산학협력단 | Method for 3d object recognition and pose estimation |
US8437506B2 (en) | 2010-09-07 | 2013-05-07 | Microsoft Corporation | System for fast, probabilistic skeletal tracking |
US8676623B2 (en) | 2010-11-18 | 2014-03-18 | Navteq B.V. | Building directory aided navigation |
US8587583B2 (en) | 2011-01-31 | 2013-11-19 | Microsoft Corporation | Three-dimensional environment reconstruction |
US20170054954A1 (en) | 2011-04-04 | 2017-02-23 | EXTEND3D GmbH | System and method for visually displaying information on real objects |
DE102011015987A1 (en) | 2011-04-04 | 2012-10-04 | EXTEND3D GmbH | System and method for visual presentation of information on real objects |
US9053571B2 (en) | 2011-06-06 | 2015-06-09 | Microsoft Corporation | Generating computer models of 3D objects |
US9520072B2 (en) | 2011-09-21 | 2016-12-13 | University Of South Florida | Systems and methods for projecting images onto an object |
CA3041707C (en) | 2011-11-15 | 2021-04-06 | Manickam UMASUTHAN | Method of real-time tracking of moving/flexible surfaces |
US8908913B2 (en) | 2011-12-19 | 2014-12-09 | Mitsubishi Electric Research Laboratories, Inc. | Voting-based pose estimation for 3D sensors |
US8766979B2 (en) | 2012-01-20 | 2014-07-01 | Vangogh Imaging, Inc. | Three dimensional data compression |
US8682049B2 (en) | 2012-02-14 | 2014-03-25 | Terarecon, Inc. | Cloud-based medical image processing system with access control |
US9041711B1 (en) | 2012-05-08 | 2015-05-26 | Google Inc. | Generating reduced resolution textured model from higher resolution model |
US10127722B2 (en) * | 2015-06-30 | 2018-11-13 | Matterport, Inc. | Mobile capture visualization incorporating three-dimensional and two-dimensional imagery |
WO2014052824A1 (en) | 2012-09-27 | 2014-04-03 | Vangogh Imaging Inc. | 3d vision processing |
US9898848B2 (en) | 2012-10-05 | 2018-02-20 | Max-Planck-Gesellschaft Zur Foerderung Der Wissenschaften E. V. | Co-registration—simultaneous alignment and modeling of articulated 3D shapes |
US9058693B2 (en) * | 2012-12-21 | 2015-06-16 | Dassault Systemes Americas Corp. | Location correction of virtual objects |
US9251590B2 (en) | 2013-01-24 | 2016-02-02 | Microsoft Technology Licensing, Llc | Camera pose estimation for 3D reconstruction |
US9940553B2 (en) | 2013-02-22 | 2018-04-10 | Microsoft Technology Licensing, Llc | Camera/object pose from predicted coordinates |
US9269003B2 (en) | 2013-04-30 | 2016-02-23 | Qualcomm Incorporated | Diminished and mediated reality effects from reconstruction |
US9171402B1 (en) | 2013-06-19 | 2015-10-27 | Google Inc. | View-dependent textures for interactive geographic information system |
US9715761B2 (en) | 2013-07-08 | 2017-07-25 | Vangogh Imaging, Inc. | Real-time 3D computer vision processing engine for object recognition, reconstruction, and analysis |
US20170278293A1 (en) | 2013-07-18 | 2017-09-28 | Google Inc. | Processing a Texture Atlas Using Manifold Neighbors |
EP2874118B1 (en) | 2013-11-18 | 2017-08-02 | Dassault Systèmes | Computing camera parameters |
US9613388B2 (en) | 2014-01-24 | 2017-04-04 | Here Global B.V. | Methods, apparatuses and computer program products for three dimensional segmentation and textured modeling of photogrammetry surface meshes |
KR102211592B1 (en) | 2014-03-19 | 2021-02-04 | 삼성전자주식회사 | Electronic device for processing image and method thereof |
US9299195B2 (en) | 2014-03-25 | 2016-03-29 | Cisco Technology, Inc. | Scanning and tracking dynamic objects with depth cameras |
US20150325044A1 (en) | 2014-05-09 | 2015-11-12 | Adornably, Inc. | Systems and methods for three-dimensional model texturing |
US10055876B2 (en) | 2014-06-06 | 2018-08-21 | Matterport, Inc. | Optimal texture memory allocation |
US20150371440A1 (en) | 2014-06-19 | 2015-12-24 | Qualcomm Incorporated | Zero-baseline 3d map initialization |
EP3192057A4 (en) | 2014-09-10 | 2018-03-21 | Vangogh Imaging Inc. | Real-time dynamic three-dimensional adaptive object recognition and model reconstruction |
US9607388B2 (en) | 2014-09-19 | 2017-03-28 | Qualcomm Incorporated | System and method of pose estimation |
US9710960B2 (en) | 2014-12-04 | 2017-07-18 | Vangogh Imaging, Inc. | Closed-form 3D model generation of non-rigid complex objects from incomplete and noisy scans |
EP3032495B1 (en) | 2014-12-10 | 2019-11-13 | Dassault Systèmes | Texturing a 3d modeled object |
US9769443B2 (en) | 2014-12-11 | 2017-09-19 | Texas Instruments Incorporated | Camera-assisted two dimensional keystone correction |
US10347031B2 (en) | 2015-03-09 | 2019-07-09 | Carestream Dental Technology Topco Limited | Apparatus and method of texture mapping for dental 3D scanner |
US20160358382A1 (en) | 2015-06-04 | 2016-12-08 | Vangogh Imaging, Inc. | Augmented Reality Using 3D Depth Sensor and 3D Projection |
US10169917B2 (en) | 2015-08-20 | 2019-01-01 | Microsoft Technology Licensing, Llc | Augmented reality |
US10249087B2 (en) | 2016-01-29 | 2019-04-02 | Magic Leap, Inc. | Orthogonal-projection-based texture atlas packing of three-dimensional meshes |
US10169676B2 (en) | 2016-02-24 | 2019-01-01 | Vangogh Imaging, Inc. | Shape-based registration for non-rigid objects with large holes |
US9922443B2 (en) | 2016-04-29 | 2018-03-20 | Adobe Systems Incorporated | Texturing a three-dimensional scanned model with localized patch colors |
US10192347B2 (en) | 2016-05-17 | 2019-01-29 | Vangogh Imaging, Inc. | 3D photogrammetry |
US20180005015A1 (en) | 2016-07-01 | 2018-01-04 | Vangogh Imaging, Inc. | Sparse simultaneous localization and matching with unified tracking |
US10573018B2 (en) * | 2016-07-13 | 2020-02-25 | Intel Corporation | Three dimensional scene reconstruction based on contextual analysis |
US20180114363A1 (en) | 2016-10-25 | 2018-04-26 | Microsoft Technology Licensing, Llc | Augmented scanning of 3d models |
-
2017
- 2017-10-05 US US15/726,316 patent/US10380762B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040010493A1 (en) * | 1997-11-19 | 2004-01-15 | Ns Solutions Corporation | Database system and a method of data retrieval from the system |
US20140017653A1 (en) * | 2012-07-10 | 2014-01-16 | Gordon W. Romney | Apparatus, system, and method for a virtual instruction cloud |
US20180001885A1 (en) * | 2016-06-29 | 2018-01-04 | Ford Global Technologies, Llc | Method and system for torque control |
US20180014454A1 (en) * | 2016-07-12 | 2018-01-18 | Yetter Manufacturing Company | Seed furrow closing wheel |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11868675B2 (en) | 2015-10-08 | 2024-01-09 | Interdigital Vc Holdings, Inc. | Methods and systems of automatic calibration for dynamic display configurations |
US20180122142A1 (en) * | 2016-10-31 | 2018-05-03 | Verizon Patent And Licensing Inc. | Methods and Systems for Dynamically Customizing a Scene for Presentation to a User |
US10388072B2 (en) * | 2016-10-31 | 2019-08-20 | Verizon Patent And Licensing Inc. | Methods and systems for dynamically customizing a scene for presentation to a user |
US10839610B2 (en) | 2016-10-31 | 2020-11-17 | Verizon Patent And Licensing Inc. | Methods and systems for customizing a scene for presentation to a user |
US10643444B2 (en) * | 2016-11-28 | 2020-05-05 | Korea Institute Of Civil Engineering And Building Technology | Facility management system using Internet of things (IoT) based sensor and unmanned aerial vehicle (UAV), and method for the same |
US20180151045A1 (en) * | 2016-11-28 | 2018-05-31 | Korea Institute Of Civil Engineering And Building Technology | Facility management system using internet of things (iot) based sensor and unmanned aerial vehicle (uav), and method for the same |
US10269116B2 (en) * | 2016-12-26 | 2019-04-23 | Intel Corporation | Proprioception training method and apparatus |
US11127212B1 (en) * | 2017-08-24 | 2021-09-21 | Sean Asher Wilens | Method of projecting virtual reality imagery for augmenting real world objects and surfaces |
US11107270B2 (en) * | 2017-11-08 | 2021-08-31 | Siemens Healthcare Gmbh | Medical scene model |
US20190139300A1 (en) * | 2017-11-08 | 2019-05-09 | Siemens Healthcare Gmbh | Medical scene model |
CN108650523A (en) * | 2018-05-22 | 2018-10-12 | 广州虎牙信息科技有限公司 | The display of direct broadcasting room and virtual objects choosing method, server, terminal and medium |
US11159798B2 (en) * | 2018-08-21 | 2021-10-26 | International Business Machines Corporation | Video compression using cognitive semantics object analysis |
US11741673B2 (en) | 2018-11-30 | 2023-08-29 | Interdigital Madison Patent Holdings, Sas | Method for mirroring 3D objects to light field displays |
CN111340922A (en) * | 2018-12-18 | 2020-06-26 | 北京三星通信技术研究有限公司 | Positioning and mapping method and electronic equipment |
US20220076402A1 (en) * | 2019-04-05 | 2022-03-10 | Waymo Llc | High bandwidth camera data transmission |
CN110503631A (en) * | 2019-07-24 | 2019-11-26 | 山东师范大学 | A kind of method for detecting change of remote sensing image |
CN111641841A (en) * | 2020-05-29 | 2020-09-08 | 广州华多网络科技有限公司 | Virtual trampoline activity data exchange method, device, medium and electronic equipment |
CN111862163A (en) * | 2020-08-03 | 2020-10-30 | 湖北亿咖通科技有限公司 | Trajectory optimization method and device |
CN112017242A (en) * | 2020-08-21 | 2020-12-01 | 北京市商汤科技开发有限公司 | Display method and device, equipment and storage medium |
CN112423014A (en) * | 2020-11-19 | 2021-02-26 | 上海电气集团股份有限公司 | Remote review method and device |
US20220182596A1 (en) * | 2020-12-03 | 2022-06-09 | Samsung Electronics Co., Ltd. | Method of providing adaptive augmented reality streaming and apparatus performing the method |
US11758107B2 (en) * | 2020-12-03 | 2023-09-12 | Samsung Electronics Co., Ltd. | Method of providing adaptive augmented reality streaming and apparatus performing the method |
WO2024100028A1 (en) * | 2022-11-08 | 2024-05-16 | Nokia Technologies Oy | Signalling for real-time 3d model generation |
Also Published As
Publication number | Publication date |
---|---|
US10380762B2 (en) | 2019-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10380762B2 (en) | Real-time remote collaboration and virtual presence using simultaneous localization and mapping to construct a 3D model and update a scene based on sparse data | |
US10839585B2 (en) | 4D hologram: real-time remote avatar creation and animation control | |
US10586395B2 (en) | Remote object detection and local tracking using visual odometry | |
US20160358382A1 (en) | Augmented Reality Using 3D Depth Sensor and 3D Projection | |
US11861797B2 (en) | Method and apparatus for transmitting 3D XR media data | |
CN109471842B (en) | Image file format, image file generating method, image file generating device and application | |
CN110663257B (en) | Method and system for providing virtual reality content using 2D captured images of a scene | |
US11170552B2 (en) | Remote visualization of three-dimensional (3D) animation with synchronized voice in real-time | |
US20190073825A1 (en) | Enhancing depth sensor-based 3d geometry reconstruction with photogrammetry | |
WO2013165440A1 (en) | 3d reconstruction of human subject using a mobile device | |
US20220385721A1 (en) | 3d mesh generation on a server | |
US11620779B2 (en) | Remote visualization of real-time three-dimensional (3D) facial animation with synchronized voice | |
KR102141319B1 (en) | Super-resolution method for multi-view 360-degree image and image processing apparatus | |
Han | Mobile immersive computing: Research challenges and the road ahead | |
US11908068B2 (en) | Augmented reality methods and systems | |
US11335063B2 (en) | Multiple maps for 3D object scanning and reconstruction | |
US20190304161A1 (en) | Dynamic real-time texture alignment for 3d models | |
US20130127994A1 (en) | Video compression using virtual skeleton | |
CN112308977A (en) | Video processing method, video processing apparatus, and storage medium | |
CN110433491A (en) | Movement sync response method, system, device and the storage medium of virtual spectators | |
US20240062467A1 (en) | Distributed generation of virtual content | |
Bortolon et al. | Multi-view data capture for dynamic object reconstruction using handheld augmented reality mobiles | |
EP3899870A1 (en) | Cloud-based camera calibration | |
Eisert et al. | Volumetric video–acquisition, interaction, streaming and rendering | |
US20230022344A1 (en) | System and method for dynamic images virtualisation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
AS | Assignment |
Owner name: VANGOGH IMAGING, INC., VIRGINIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEE, KEN;JAHIR, YASMIN;HOU, XIN;REEL/FRAME:044433/0421 Effective date: 20171208 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 4 |