WO2023075810A1

WO2023075810A1 - System and method for extracting, transplanting live images for streaming blended, hyper-realistic reality

Info

Publication number: WO2023075810A1
Application number: PCT/US2021/062965
Authority: WO
Inventors: William J. Benman
Original assignee: Benman William J
Priority date: 2021-10-28
Filing date: 2021-12-10
Publication date: 2023-05-04

Abstract

Presented is a system for extracting and transplanting live video avatar images providing silhouette live video avatar and a system for providing multimedia service to external metaverses and client platforms. The system for extracting and transplanting live video avatar images including a sensor for creating a first live video avatar of a user or object disposed in a heterogeneous first environment with an arbitrary background; a processor coupled to the sensor; code fixed in a tangible medium for execution by the processor for extracting a map from the first environment to provide an extracted map based live video avatar; and a display system coupled to the processor for showing the extracted map based live video avatar in a second environment diverse from the first environment. The code further includes code for geo-fixing a location of the live video image stream of the first user in the second environment for viewing by the second user, whereby movement of the mobile display by the second user does not change the position of the first user in the second user's environment.

Description

SYSTEM AND METHOD FOR EXTRACTING, TRANSPLANTING LIVE IMAGES FOR STREAMING BLENDED, HYPER-REALISTIC REALITY

REFERENCE TO RELATED APPLICATIONS

This application is related to and claims internal US priority to U.S. patent application 16/899,760 filed June 12, 2020, of inventor W. Benman and entitled SYSTEM AND METHOD FOR EXTRACTING AND TRANSPLANTING LIVE VIDEO AVATAR IMAGES, which application is hereby incorporated by reference. This application claims the benefit of and priority to U.S. patent application No 17/537,246, filed November 29, 2021, for inventor W. Benman, entitled SYSTEM AND METHOD FOR STREAMING MULTIMEDIA CONTENT, which claims the benefit of and priority to U.S. provisional patent application no. 63/272,859 filed October 28, 2021, by inventor W Benman, entitled SYSTEM AND METHOD FOR INTERWORLD OPERABILITY, both of which are incorporated by reference herein.

BACKGROUND

Field:

The present technology relates to communications technology. Particularly, the present invention relates to systems and methods for streaming multimedia content. In greater particularity, the present technology pertains to a system and method for extracting and transplaning live video avatar images for blended and hyper-realistic reality and for streaming multimedia content. Description of the Related Art:

U. S. Patent No. 5,966,130 entitled INTEGRATED VIRTUAL NETWORKS issued July 21, 1998, to W. J. Benman, the teachings of which are incorporated herein by reference and hereinafter referred to as the ‘ 130 patent (Docket No. Virtual-IA), disclosed and claimed a computer-based system which allows a user to see a realistic three-dimensional representation of an environment, such as an office, on a computer screen. Real world functionality is mapped onto numerous objects in the environment with a motion-based input system allowing the user to use the objects in the environment (e.g., computer, desk, file cabinets, documents, etc.) in same manner as the objects would be used in the real world.

In addition, Benman’s system allows the user to travel into the work areas of coworkers and see and interact with live images of the coworkers in the environment. In order to display an image of the user ora coworker in the environment, it is necessary to remove any background imagery inconsistent with the computer-generated environment from the transplanted image prior to display. For example, if the coworker is in a remote office using a computer equipped with software effective to create a virtual environment as described by Benman, and the user has a wall, window, bookshelf or other scene in the background, that information would have to be removed in order to place the person’s image into the virtual environment in such a way as to create an image of the person sitting in the computer generated office environment.

Use of monochromatic (e.g., blue and green) screens have been used in television and film productions to extract a foreground image and overlay it over a background image. For example, this process is used daily in television to allow a person standing in front of a blue screen to have their image extracted and combined with a video image of a map to provide a weather report.

However, it would be impractical to require each coworker located in an office, hotel, home or other environment to have a monochromatic background. Accordingly, there was a need for an image processing system or technique which could transplant a desired image from one scene into another scene regardless of the background in the first scene.

The need in the art was addressed by U.S. Patent Number 6,798,407, entitled SYSTEM AND METHOD FOR PROVIDING A FUNCTIONAL VIRTUAL ENVIRONMENT WITH REAL TIME EXTRACTED AND TRANSPLANTED IMAGES by William J. Benman, issued September 28, 2004, and U.S. Patent Number 5,966,130, INTEGRATED VIRTUAL NETWORKS by William J. Benman, issued October 12, 1999, the teachings of both of which are incorporated herein by reference, disclose and claim systems for enabling users to see and interact with each other as live images in computer generated environments in real time. This technology is named Silhouette® and will soon be offered as a service via a highly realistic computer generated environment called the Nexossm by Integrated Virtual Networks, Inc. of Los Angeles, California.

The referenced applications disclose illustrative embodiments of Silhouette utilizing a spatial filtering scheme to effect extraction of each user’s image, though not limited thereto. Unfortunately, it is often difficult to obtain acceptable image quality at the edges of, aka the ‘silhouette’ of, the user’s video image due to a variety of optical effects measurement inaccuracies etc. Hence, there is a need in the art for a system or method for further improving the image quality of Silhouette live video avatars, particularly along the edges thereof.

In addition, computer generated massively multi-user online virtual worlds, also known as ‘MMORPGs’ or ‘metaverses’, are experiencing rapid growth in users for a variety of applications. Generally, users experience a virtual environment through a headset which enables the user to have an immersive experience of the virtual environment. However, the use of a headset requires the user to be represented in the virtual environment as a computer-generated avatar. These avatars are either cartoonish or lack sufficient realism. In any case, conventional avatars are impersonal and not sufficiently realistic for business and other applications other than gaming.

In addition, conventional avatars are often time consuming to create. Moreover, an avatar created for one world cannot be used in another world due to the fact that metaverses are typically not based on a common framework, run-time architecture, rendering engine and/or protocol. Accordingly, an avatar created in one metaverse, cannot be quickly and easily used in another metaverse. Hence, a user in one world cannot easily navigate to another. In short, there is no interworld operability.

Hence, a need also remains in the art for a system or method for providing a more realistic avatar that can be used in any metaverse and enables navigating between metaverses on diverse platforms.

SUMMARY

The need in the art is addressed by the system for extracting and transplanting live video avatar images for blended and hyper-realistic reality of the present invention and the system for streaming multimedia content of the present invention.

In one embodiment, the system for extracting and transplanting live video avatar images of the present invention includes a sensor for creating a first live video avatar of a user or object disposed in a heterogeneous first environment with an arbitrary background; a processor coupled to the sensor; code fixed in a tangible medium for execution by the processor for extracting a map from the first environment to provide an extracted map based live video avatar; and a display system coupled to the processor for showing the extracted map based live video avatar in a second environment diverse from the first environment. The code further includes code for geo-fixing a location of the live video image stream of the first user in the second environment for viewing by the second user, whereby movement of the mobile display by the second user does not change the position of the first user in the second user’s environment.

In a second embodiment, the system includes a camera coupled to the processor to provide live video images of the user in the first environment and code for spatially filtering the images to provide a spatially filtered extracted second live video avatar. This embodiment further includes code for combining the first live video avatar with the second live video avatar to provide an enhanced extracted depth map based third live video avatar. Images from multiple cameras and or depth sensors are combined simultaneously to provide the third live video avatar using the spatially enhanced extracted depth map.

In a third embodiment, the inventive system includes code for extracting a live video avatar from film or video. Another embodiment includes an arrangement with multiple displays for sensing a position of a user with automatic camera, display, microphone and or speaker activation and switching based on user position and viewing angle.

A routing server is included for receiving streams from multiple users and sending to each user the live video avatar images from other users based on their locations in a shared space or for use in a local user’s AR environment.

In the illustrative embodiment, the inventive system for streaming multimedia content of the present invention includes a metaverse server for providing an artificial reality environment in accordance with a first operational paradigm; a first client platform operationally coupled to the server for enabling a first user to experience the artificial environment; a second client platform operationally coupled to the server for enabling a second user to experience the artificial environment; and a routing server operationally coupled to the first and second client machines effective to route a multimedia stream from the first user so that it is displayed to the second user in the artificial environment at a location determined by the first user and executed by the routing server.

In a best mode, the streaming multimedia content provides a live video (Silhouette) avatar with associated audio. The routing server can be operationally coupled to the metaverse server however the routing server can also be operationally coupled directly to the first and second client platforms.

To provide interworld operability, the routing server is operationally coupled to a second metaverse server operating on a fifth platform and the routing server route a real time multimedia stream from the first or the second user into the second metaverse allowing a third user operationally coupled to the second metaverse to view and hear multimedia content from the first or second user in the second metaverse. In accordance with the present teachings, the routing server provides interworld ‘passport’ operability between metaverses operating in accordance with the first and second operational paradigms respectively.

A further embodiment of this presentation comprises a system for extracting and transplanting live video image streams, the system comprising: an image sensor for providing a live video image stream of a first user or object disposed in a heterogeneous first environment with an arbitrary background; a processor operationally coupled to the image sensor to receive the live video image stream; code stored in a non-transitory tangible medium for execution by the processor for extracting a live video image stream of the first user from the live video image stream of the arbitrary background of the heterogeneous first environment to provide an extracted live video image stream of the first user; a mobile display system coupled to the processor for showing the extracted live video image stream to a second user in a second environment separate and distinct from the first environment, said second environment being an augmented reality environment including at least part of said second user’s second environment; and said code further including code for geo-fixing a location of the live video image stream of the first user in the second environment for viewing by the second user, whereby movement of the mobile display by the second user does not change the position of the first user in the second user’s environment.

According to an embodiment of this presentation, the image sensor includes a depth sensor.

According to an embodiment of this presentation, the code further includes code for combining the first live video image stream with a second live video image stream from the depth sensor to provide an enhanced extracted depth map based third live video image stream. According to an embodiment of this presentation, the code further includes code for combining images from multiple cameras or depth sensors simultaneously to provide the third live video image stream.

According to an embodiment of this presentation, the system includes multiple displays.

According to an embodiment of this presentation, the system includes an arrangement for sensing a position of a user with a camera or microphone and automatically selectively activates a display based on user position and viewing angle in response thereto.

According to an embodiment of this presentation, the system includes multiple cameras.

According to an embodiment of this presentation, the code further includes code for effecting automatic camera activation based on a user’s position in a user’s environment.

According to an embodiment of this presentation, the system further includes an arrangement for sending the extracted live video image stream from the first user to the second user via a routing server.

According to an embodiment of this presentation, the system further includes a second arrangement for receiving said extracted live video image stream from said routing server and displaying the live video stream to the second user.

According to an embodiment of this presentation, the display includes augmented reality goggles, augmented reality glasses or a free space display.

According to an embodiment of this presentation, the processor is mounted on a first platform and the display is mounted on a second physically separate platform.

According to an embodiment of this presentation, the second platform includes a second processor for executing code fixed in a non-transitory tangible medium for effecting a user selectable multimode display operation, said multimode operation including a video conferencing mode, a virtual conferencing mode and a mixed reality conferencing mode. According to an embodiment of this presentation, the code further includes code for displaying extracted image data in each of said modes.

According to an embodiment of this presentation, the system includes a system for transplanting the extracted live video image stream into a computer rendered hyper- realistic augmented reality representation of a user’s environment.

According to an embodiment of this presentation, the code further includes code for enabling a user to experience said hyper-realistic augmented reality representation of the user’s environment as a blended reality environment.

According to an embodiment of this presentation, the code further includes code for enabling a second user to be present in said hyper-realistic environment by which the first user’s environment is rendered in virtual reality.

Another embodiment of this presentation comprises, a system for extracting and transplanting live video image streams, the system comprising: a computing and communications platform; a sensor coupled to the platform for creating first live video image stream of a user or object disposed in a heterogeneous first environment with an arbitrary background; a first processor coupled to the sensor; code stored in a non-transitory tangible medium for execution by the first processor for extracting live video image stream of a first user from the live video image stream of the arbitrary background of the heterogeneous first environment to provide an extracted depth map based live video avatar; a routing server; a second platform having a second processor coupled to the routing server; and code stored in a non-transitory tangible medium on the second platform for receiving the live video image stream from the routing server and for causing a display system coupled to the second processor to show the extracted live video image stream in a second environment independent from the first environment, said second environment being an augmented reality environment and said code further including code for geo-fixing a location of the live video image stream of the first user in the second environment for viewing by the second user, whereby movement of the mobile display by the second user does not change the position of the first user in the second user’s environment.

Another embodiment of this presentation comprises a system for streaming multimedia content into a metaverse, the system comprising: a metaverse server for providing an artificial reality environment in accordance with a first operational paradigm, the server being implemented with software fixed on a tangible medium and adapted to be executed by a processor mounted in a first housing; a first client platform operationally coupled to the server for enabling a first user to experience the artificial environment, the first client platform being implemented in software fixed on a tangible medium and adapted to be executed by a first processor mounted within a second housing; a second client platform operationally coupled to the server for enabling a second user to experience the artificial environment, the second client platform being implemented in software fixed on a tangible medium and adapted to be executed by a second processor mounted within a third housing; and a routing server mounted implemented in software fixed in a tangible medium and executed by a processor mounted within a fourth housing physically independent from the first, second and third housings, the routing server operationally coupled to the first and second client machines whereby multimedia from the first user is displayed to the second user in the artificial environment at a location determined by the first user and executed by the routing server.

According to an embodiment of this presentation, the routing server is operationally coupled to the metaverse server.

According to an embodiment of this presentation, the routing server is operationally coupled directly to the first and second client platforms. According to an embodiment of this presentation, the routing server is operationally coupled to a second metaverse server operating on a fifth platform in accordance with a second operational paradigm via software executed by a processor mounted within a fifth housing physically independent from the first, second, third and fourth housings.

According to an embodiment of this presentation, the routing server is adapted to route a real time multimedia stream from the first or the second user into the second metaverse allowing a third user operationally coupled to the second metaverse to view and hear multimedia content from the first or second user in the second metaverse.

According to an embodiment of this presentation, the routing server is adapted to route real time multimedia content from the first or second user to the third user through the first and second metaverses operating in accordance with the first and second operational paradigms respectively.

Another embodiment of this presentation comprises a system for streaming multimedia content from a first platform to a second platform comprising: a first platform for sending and receiving multimedia content; a second platform operationally coupled to the first platform sending and receiving multimedia content; and software stored on a medium on the first and second platforms adapted for execution by first and second processors on the first and second platforms respectively for streaming multimedia content from the first platform into an artificial reality environment for display on the second platform.

According to an embodiment of this presentation, the software further includes code for execution by the first and second processors for streaming multimedia content from the second platform into the artificial reality environment for display on the first platform.

According to an embodiment of this presentation, the multimedia content is a real time video data stream with synchronized audio. According to an embodiment of this presentation, the multimedia content is live video imagery of a user along with audio and position data.

According to an embodiment of this presentation, the first and second platforms are client platforms.

According to an embodiment of this presentation, the artificial reality environment is on a server.

According to an embodiment of this presentation, the system further includes a server for routing the streaming multimedia content between the first and the second client platforms.

According to an embodiment of this presentation, the first client platform is coupled to the server via a first metaverse and the second client platform is coupled to the server via a second metaverse.

According to an embodiment of this presentation, the first and second metaverses are stored on first and second metaverse servers respectively.

According to an embodiment of this presentation, the first and second metaverses are implemented by first and second processors mounted on first and second independent systems respectively executing software stored on the first and second independent systems whereby the first and second metaverses operate in accordance with first and second, diverse run time architectures, frameworks, engines or protocols respectively.

According to an embodiment of this presentation, the artificial reality environment is an augmented reality environment.

According to an embodiment of this presentation, the location of the streaming multimedia content in a virtual or augmented reality world is determined by a transmitting platform user as it is transmitted by the first or second platform.

According to an embodiment of this presentation, the location of the streaming multimedia content in the virtual or augmented reality world is determined by a receiving platform user as it is received by the second or the first platform. Another embodiment of this presentation comprises a method for creating an interworld avatar and using the avatar to navigate between virtual worlds on disparate platforms including the steps of: providing at least one client machine; providing at least one world server; providing at least one routing server; interconnecting each of the servers and connecting at least one of the servers to the client machine; and executing software stored on a tangible medium with a processor on the client machine or one of the servers to provide a live video avatar for use in a world provided by the world server via the routing server.

Other embodiments of this presentation comprise a system for extracting and transplanting live video avatar images providing silhouette live video avatar and a system for providing multimedia service to external metaverses and client platforms. The system for extracting and transplanting live video avatar images includes a sensor for creating a first live video avatar of a user or object disposed in a heterogeneous first environment with an arbitrary background; a processor coupled to the sensor; code fixed in a tangible medium for execution by the processor for extracting a map from the first environment to provide an extracted map based live video avatar; and a display system coupled to the processor for showing the extracted map based live video avatar in a second environment diverse from the first environment. The code further includes code for geo-fixing a location of the live video image stream of the first user in the second environment for viewing by the second user, whereby movement of the mobile display by the second user does not change the position of the first user in the second user’s environment. The system for providing silhouette live video avatar and multimedia service includes a server with metaverse software executed by a processor to provide an artificial reality environment in accordance with a first operational paradigm. A first client platform is operationally coupled to the server for enabling a first user to experience the artificial environment. A second client platform is operationally coupled to the server for enabling a second user to experience the artificial environment. A routing server is operationally coupled to the first and second client machines to route multimedia from the first user so that it is displayed to the second user in the artificial environment at a location provided by the routing server. To provide interworld operability, the routing server is operationally coupled to a second metaverse server to route a real time multimedia stream from the first or the second user into the second metaverse allowing a third user operationally coupled to the second metaverse to view and hear multimedia content from the first or second user in the second metaverse.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 is a block diagram of an illustrative implementation of a mobile wireless platform configured to send and receive Silhouette streams in accordance with the present teachings.

Figure 2 is a flow diagram showing the Silhouette applet of the present invention in more detail.

Figure 3 is a block diagram of an illustrative implementation of a system for capturing and displaying Silhouette imagery via the mobile wireless platform of Figure 1 in connection with the teachings of the present invention.

Figure 4 is a block diagram of an illustrative embodiment of a display subsystem adapted for use in connection with the present invention.

Figure 5 is a flow diagram of an illustrative embodiment of the technique for capturing and displaying Silhouette images on mobile wireless platforms of the present invention.

Figure 6 is a set of diagrams that illustrate the unique multi-mode conferencing capability of the present invention.

Figure 7 is a high-level block diagram showing an illustrative embodiment of the interoperable system of the present invention. Figure 8 is a block diagram showing the interworld portal interface of Figure 7 in more detail.

Figure 9 is a high-level block diagram showing an illustrative embodiment of the interoperable system of the present invention with Silhouette extraction and transplantation functionality distributed throughout the ecosystem in varies implementations.

Figure 10 is a high-level block diagram showing an illustrative embodiment of the interoperable system of the present invention in a simple interoperable mode of operation by which a user moves from a first metaverse to a second metaverse by simply logging into the second metaverse and selecting silhouette avatar functionality in accordance with the present teachings.

Figure 11 is a high-level block diagram showing an illustrative embodiment of the interoperable system of the present invention in an alternative interoperable mode of operation by which a user’s (User #l’s) multimedia (extracted video and audio stream) is sent to multiple metaverses (#2 - 5) via User #l’s home metaverse (Metaverse #1) through the portal interface and routing server in accordance with the present teachings.

DESCRIPTION

Illustrative embodiments and exemplary applications will now be described to disclose the advantageous teachings of the present invention.

While the present invention is described herein with reference to illustrative embodiments for particular applications, it should be understood that the invention is not limited thereto. Those having ordinary skill in the art and access to the teachings provided herein will recognize additional modifications, applications, and embodiments within the scope thereof and additional fields in which the present invention would be of significant utility. U. S. Patent Number 6,798,407, SYSTEM AND METHOD FOR PROVIDING A FUNCTIONAL VIRTUAL ENVIRONMENT WITH REAL TIME EXTRACTED AND TRANSPLANTED IMAGES by William J. Benman, issued September 28, 2004, and U.S. Patent Number 5,966,130, INTEGRATED VIRTUAL NETWORKS by William J. Benman, issued October 12, 1999, the teachings of both of which are incorporated herein by reference, disclose and claim systems for enabling users to see and interact with each other as live images in computer generated (aka virtual, augmented reality, artificial reality and/or metaverse) environments in real time. This technology is known as Silhouette® and is currently offered as a service via a highly realistic computer-generated environment called the Nexos® by Integrated Virtual Networks, Inc. of Los Angeles, California.

As disclosed in these patents, a live avatar or ‘silhouette’ is an avatar with a real time live video image texture. In accordance with the present teachings, this Silhouette live streaming video image technology is extended as a communications service to third party metaverses outside of the Silhouette/Nexos system architecture to offer highly realistic, real-time presence in such metaverses.

The above-referenced Benman patents disclose illustrative embodiments of Silhouette utilizing a spatial filtering scheme to effect extraction of each user’s image, though not limited thereto. While the disclosed technique is effective, there is a need to further improve the edge detection performance of this Silhouette spatial filtering technique.

As discussed more fully below, in accordance with the present teachings, depth sensing technology is used to improve edge detection performance in Silhouette systems.

Dual Mode Extraction with Depth Enhanced Spatial Filtering:

Figure 1 is a block diagram of an illustrative implementation of a mobile wireless platform configured to send and receive Silhouette streams in accordance with the present teachings. As shown in Figure 1, in accordance with the teachings of the present invention, live video (Silhouette) avatars are created on mobile platforms (e.g. smartphones) 100 with forward and/or rearward facing cameras 102. Depth data is provided by the video camera 102, an infrared camera 104 or a range finder (such as a laser range finder) 106. The user’s live video image is extracted from the user’s background by a Silhouette applet 108. The applet is software (referred to herein as ‘code’) stored in a tangible medium (memory 142) and executed by a processor 110.

In the best mode, the applet extracts the Silhouette using any technique for creating a depth map such as binocular, triangulation or dual camera disparity,

TrueDepth sensing, focus pixels, optical (e.g. laser), acoustic and/or other range finding methods currently known in the art. For an excellent article on this, see iPhone XR: A

Deep Dive Into Depth, published October 29, 2018 by B. Sandofsky. See also:

As is well known in the art, the processing of depth data, particularly for video at the real time frames required for Silhouette, is computationally intense and requires large amounts of memory and, in Silhouette, bandwidth.

Hence, an additional feature of the present invention is the provision of a scheme by which the heavy data processing, storage and transmission, typically associated with the creation and use of depth data, is minimized. Where the data and processing load is too heavy using one or more of these depth sensing technologies, the optional additional solution provided by the present teachings is to combine the depth sensing operation with the Benman spatial filtering operation to provide improved edge detection performance without the heavy data processing and/or storage typically associated with the creation and handing of depth data.

This is achieved by using edge data from depth sensing to provide a boundary for a mask in a key frame, at a much lower frame rate, to set an edge around a user (aka a ‘silhouette’) within which the spatial filtering operation is thereafter performed on all the frames at a high (e.g., real time) frame rate. The spatial filtering is then performed with an exclusive NOR operation during a pixel-by-pixel comparison of a current video image frame to a reference frame obtained with the subject out of sight of the camera or by other suitable means such as by simply detecting movement of the user.

Figure 2 is a flow diagram showing the Silhouette applet 108 of Figure 1 in more detail. As shown in Figure 2, the applet 108 includes a spatial data stream buffer or module 109 coupled to the video camera 102 of Figure 1 and a depth data stream buffer or module 111 coupled to the depth sensors 102, 104 and 106 of Figure 1. The spatial fdter data stream is fed to a Silhouette live video extraction module 115 through a stream selection and combination module 113.

As disclosed in the above referenced Benman patents, the Silhouette extraction module captures and extracts a user’s live video imagery from the user’s physical heterogeneous environment without requiring the use of a homogenous monochromatic background such as a blue screen or a green screen as is common in film and television production.

A key feature of the present invention resides in the fact that the live video Silhouette avatar, extracted by the extraction module 115, is sent to an edge analyzer module 117. The edge analyzer module 117 either automatically assesses extraction edge quality or responds to a manual inspection of edge quality perceived by the user via the display and signaled by the user via the user interface 140.

If the edge analyzer module 117 is operating in ‘automatic edge detection and optimization mode’ the edge analyzer 117 assesses edge quality using any of many known image analysis techniques. (See “Edge Detection” in Wikipedia at

In accordance with the present teachings, the edge analyzer 117 examines the edges of the spatially fdtered data stream by checking on a pixel-by-pixel basis for optical noise around the periphery of the extracted live video avatar for all or a subset of the live video avatar image frames.

The edges of the live video avatar extracted by the extraction module from the depth data stream are also automatically assessed by the edge analyzer module 117. Once again, optical noise is assessed along the edges of the user’s live video avatar image stream. The edge analyzer 117 then selects the live video avatar with the best edge performance for handling by the processor 110.

It should be noted that, in accordance with the present teachings, the edge analyzer 117 can also select a stream that is a combination of spatial image data and depth sensed image data. That is, the user’s image may be based on the spatial image processed data stream as to the bulk of the user’s face and body, and the edge data pixels may be provided by the depth image data processed stream and vice versa.

This approach enables the user or the analyzer the take the best imagery from either stream as to the bulk of the image along with the best edges, regardless of the source of each. This is useful inasmuch as depth sensed data streams are subject to distortion, discoloration and other errors and artifacts that limit image quality. Moreover, the processing of depth data tends to burden processors, memory and bandwidth capabilities of the system.

Further, as mentioned above and discussed in the Benman patents, live video avatars based on spatial image processing, while offering excellent image quality, often suffer from undesirable edge effects.

For these and other reasons, the present invention takes the best of both techniques, when needed and/or necessary, and allows user input as to the best image stream.

In this ‘dual mode’ embodiment employing depth data and spatially fdtered data, a mask and a logic operation can be utilized. The mask is provided by the depth map. Within the mask, the logical AND operation is executed between the RGB pixels from the depth sensor and the RGB pixels from the spatial fdter. Pixels from the spatial fdter outside the edges detected will be discarded. Pixels resulting from the AND operation within the silhouette will be passed on for processing, transmission, reception, transplantation and display. This results in better edge detection than conventional spatial filtering schemes without the data load typically associated with depth sensing of video images at real time frame rates.

In this embodiment, a depth map is combined with the result of the exclusive NOR operation to yield a point cloud of RGB-D or RGBA-D data. This data stream is bundled with user location data in a VR or AR environment and forwarded through the processor 110 to a server 20 (Figure 3) for routing in accordance with the teachings of the above-referenced Benman patents as discussed more fully below. However, the invention is not limited to the method by which the depth map data and the spatially fdtered data are cross-correlated and/or combined.

Each of the modules in Figure 2 may be implemented in software and/or hardware. If implemented in software, the modules may be stored as code in memory 142 and executed by the processor 110.

The output of the edge analyzer module is fed into the processor 110 of

Figure 1.

Mobile Platform Implementation:

As shown in Figure 1, the platform 100, may be a PC, Smartphone, tablet or other suitable wireless computing and communications device. However, in the illustrative embodiment, the platform is a Smartphone or Tablet. In either case, the platform processor 110 communicates with a routing server 20 (see Figure 3) via a WiFi transceiver 120 and/or a cellular transceiver 130 in response to commands from a user via a conventional input/output interface 140.

Figure 3 is a block diagram of an illustrative implementation of a system 10 for capturing and displaying Silhouette imagery via the mobile wireless platform 100 of Figure 1 in connection with the teachings of the present invention. The system 10 includes the routing server 20 adapted to route extracted video and audio streams received from remote client machines in accordance with the above referenced Benman patents and applications. The image and audio data streams are communicated between the server 20 and a platform client 100 via either a cellular network 60 or a WiFi receiver 80, an Internet Service Provider 40 and the Internet 30. Those of ordinary skill in the art will appreciate that the present teachings are not limited to the Internet and may be implemented on an Intranet or a circuit switched network without departing from the scope of the present teachings. In receive mode, the Silhouette applet 108 receives spatial filter enhanced RGB- D or RGBA-D streams from the routing server 20 and outputs the extracted image data to an onboard display (not shown) or a remote display via a WiFi transceiver 120 or a Bluetooth transceiver 132. Obviously, a wired connection may be used for this purpose as well.

In the illustrative embodiment, as shown in Figure 1, the Bluetooth transceiver 132 couples VR or AR enabled display enabled glasses or goggles (not shown) to the mobile wireless platform 100 to output extracted images in one of three modes as discussed more fully below. In another embodiment, the display is an onboard 3D display with integrated eye -tracking capability such as that currently offered by LG as the DX2000 display.

Figure 4 is a block diagram of an illustrative embodiment of a display subsystem 50 adapted for use in connection with the present invention. The inventive display subsystem 50 includes a Bluetooth transceiver 52 coupled to aprocessor 54. In agoggle implementation, the processor 54 is coupled to a laser (not shown) or other mechanism adapted to output an image on the lens of goggles (not shown), glasses (not shown) or other display device 56 such as the screen of a Smartphone or tablet, free space display, desktop monitor or a standalone wired or wireless display.

In the goggle embodiment, in accordance with the present teachings, a miniature solid-state electronic compass 55 is included within the frame of the goggle along with an accelerometer 53 and an eye tracker 58. Eye tracking in goggles is known in the art. See SensoMotoric Instruments (SMI) of Boston, MA.

In the best mode, the goggles or virtual glasses worn by each user are optionally detected and electronically removed from the live avatar imagery depicted at the receiver. In addition, and as an alternative, the components of the inventive goggle system may be implemented as an add-on or retrofit for a user’s conventional glasses, prescription or otherwise. Ear buds or other audio output devices 57 are included as is common in the art.

Figure 5 is a flow diagram of an illustrative embodiment of the technique for capturing and displaying Silhouette images on mobile wireless platforms of the present invention. At step 202, a local user of the wireless platform 100 activates the Silhouette applet or application 108 (see Figures 1 and 2) and at step 204 logs into the Silhouette server 20. At this point, a usage monitor 205 is activated. The usage monitor runs preferably, but not necessarily, at the server 20 and maintains a database of data relating to the duration of time for which each live video avatar stream is received by the user in accordance with the teachings of the above -referenced Benman patents.

Multi-mode Operation:

In accordance with the present teachings, the system 100 is adapted to provide Silhouette live video avatar communication in a computer generated (virtual reality or VR) environment; an augmented reality (AR) environment or a simple video conferencing mode with or our without the user’s background being extracted using the techniques disclosed herein.

In accordance with the present invention, after a successful login to the server at step 204 and achieving account authentication, etc., the user is given a prompt to select a Silhouette environment mode for the session at step 206. In the best mode, a voice recognition system is provided to enable the user to select the desired mode via speech or voice input. Nonetheless, manual selection is contemplated within the scope of the present teachings as well.

If, at this step, the user selects a conventional videoconference, then at step 208, a sender with whom the user is ultimately connected is displayed in the sender’s actual environment in accordance with a typical conventional video -conferencing call while the user may be seen on the remote end by the sender as extracted without his or her actual background. This affords considerably more privacy for video conferencing compared to conventional solutions such as Zoom or Skype in which the user’s home is on display in the background for a video -conferencing call that may be broadcast on television to millions as was the case during the Covid- 19 pandemic in 2020. This problem is obviated with the technology disclosed and claimed in the present application.

If at step 206 the user selects an augmented reality (i.e. a free-space) conference mode, then at step 210, the user is ultimately connected to one or more senders and each sender is displayed in the user’s actual environment. In this case, the remote senders are displayed via virtual goggles or a free space display. In the goggle mode, the positions of the senders are fixed in the user’s actual environment such that when the local user moves his or her head, the remote sender’s position in the local user’s environment remains unchanged. This compensation is achieved by the goggle processor 54 or the platform processor 110 using data from the three-axis linear and rotational accelerometer 53 and onboard compass 55 and thereby effects a geo-fixing of the sender in the user’s environment.

While the remote sender is geo-fixed in the local user’s environment, the remote sender remains free to move about in her environment. In the illustrative embodiment, this movement will cause the remote sender to move about in the local user’s environment as well assuming the remote user is utilizing the Silhouette technology disclosed herein.

In accordance with the present invention, in a blended reality mode, multiple cameras and/or depth sensors are deployed around the local user’s actual environment, or the remote sender’s actual environment, so that the user’s environment is accurately mapped to provide a corresponding virtual environment at 1: 1 scale. In this case, the virtual environment may be shown or not shown (totally transparent) but simply used for reference as to the location of the participants enabling their positions to be displayed in a realistic and accurate manner.

This would be particularly useful in certain applications such as a conference room whereby the local user is able to sit at a real or virtual table and see each participant seated around the table as though present in the local user’s mixed or augmented reality space using Silhouette as described in the above -referenced Benman patents and modified herein.

If, at step 206, the user selects a Silhouette virtual conference, then at step 212, the user navigates a virtual world and, when in range and line of sight of other users, receives and processes live video avatar streams from other users at step 214. These ‘other users’ or ‘senders’ are then displayed in the virtual environment at step 216 as per the teachings of the above-referenced Benman patents which have been incorporated herein by reference.

The unique multi-mode conferencing capability of the present invention is illustrated in Figure 6. Figure 6(a) shows a sender in her actual background and depicts a conventional video conferencing image seen by a receiver per step 208 of Figure 5.

In the event the receiver selects at step 206 (Figure 5) either the augmented reality conferencing mode or the virtual conferencing mode, then a Silhouette image of the sender is extracted as shown in Figure 6(b). This extracted Silhouette may then be displayed in a computer generated virtual (3D) environment in the virtual conferencing mode of step 216 (Figure 5) as depicted in Figure 6(c).

In accordance with the present teachings, if the augmented reality conferencing mode is selected at step 206, the extracted Silhouette of the sender is depicted in the receiver’s actual environment as shown in Figure 6(d). In the best mode, this is achieved with the novel goggle system disclosed herein.

As an alternative, another technique may be used for a mixed or augmented reality free-space display such as the Heliodisplaytm sold by IO2 Technology of San Bruno, California (http://www.io2technology.com/).

In the best mode, the extracted and transplanted live video avatars used for the augmented reality conference mode as well as the virtual conference mode are three- dimensional (3D) avatars. This is achieved using an onboard 3D depth sensing camera system such as that provided by the Apple’s iPhone X class smartphones with TrueDepth cameras, HTC EVO 3D, LG Optimus 3D and Sharp Aquos SH-12C model smart phones or a 2D camera and software processing such as the capability provided by Extreme Reality Ltd of Israel As another

alternative, an external camera, such as a Microsoft’s Kinect may be coupled, wirelessly or via a wired connection, to the platform to provide 3D imagery.

In addition, a particularly novel aspect of the present invention is the provision of a live 3D avatar in a video conferencing mode. In this ‘hyper- realism’ mode, implemented in software in accordance with the present teachings, a user’s background (either the sender’s, the receiver’s or another real world environment) is rendered preferably in 3 dimensions. The extracted Silhouette live avatars are then transplanted into the 3D (or 2D) rendering of the real world environment for presentation in a virtual conferencing mode or augmented reality mode in accordance with the present teachings.

Returning to Figure 5, at step 218, during or after any session using either of the three above-described modes (video, mixed reality or virtual conferencing), the user is enabled to effect a mode switch. This may be achieved via a button or icon activation or via an audio (speech enabled) or video (gesture enabled cue. If a mode switch is desired, at step 218 the system returns to mode selection step 206. If mode switching is not desired, then at step 218, the user is given an option to initiate a new call.

If, and whenever, a new call is desired, the system first enables voice or manual selection of a user from a contact list or phone number or virtual address (not shown) and again returns to step 206. If no new call or mode switch is desired, then at step 222 the session is terminated and at step 224 the user logs off.

Automatic Display and Camera Following:

In the best mode, the system 10 includes software stored in memory 142 for tracking the local user’s position in the local user’s actual environment and sending the incoming streams from the server to the local user’s smartphone, tablet, laptop, desktop, television, internet enabled appliance, free space display, cave or etc. to allow the user to move about in his or her environment without interruption of the VR or AR conferencing session. In this process, the system will automatically activate each display as the user comes into range and looks in the direction of each display, in a multiple display setup, using facial recognition technology. Simultaneously, the system will activate cameras, located on or near these devices or distributed throughout the user’s environment, to follow the user to provide continuous live video extraction of the user during the session and subject to muting per the voice or other commands and/or preferences of the user.

VR to AR Transition and Vice Versa:

In an alternative ‘blended reality’ or ‘BR’ mode of operation, the system 10 is programmed to enable the user to move seamlessly from Silhouette VR mode to Silhouette AR mode. In this mode, a user might engage someone in the Nexos or some other VR environment using Silhouette and then continue the conference in the user’s real world environment in an AR mode and vice versa. The system 10 may effect this in a number of ways including simply automatically switching the incoming live video streams to the user’s AR display instead of the VR display in the manner disclosed above or upon voice command.

In the blended reality embodiment, the system 10 sends either a 2D rendering or a 3D rendering of the local user’s environment to the remote user(s) to enable navigation by the remote user(s) in the local user’s environment. This will require the user to scan his or her environment with a camera with software, preferably on the sending system, that converts the image to 3D. Many programs are currently available for this purpose. See for example Make3D:

In accordance with the present teachings, the phrase ‘navigation functionality’ means enabling User 2 to move around in User s environment and vice versa. This can be accomplished using the iPhone X class phone or other environment scanner to capture each User s environment. With an iPhone with a scanning app, User 1 can simply hold up the phone and turn around to capture a 360 view ... the app then detects and renders the surfaces in the environment. Those surfaces are sent to User 2 allowing User 2 to navigate within User Us environment.

By ‘multi-user functionality’ we mean allowing multiple users to share the same environment simultaneously in real time using Silhouette. This would require each person’s stream to be sent to the others, as would be the case in a conference call but with streaming live video avatars per our technology.

Other Alternative Embodiments:

Thus, the invention has been disclosed as including a depth sensor for creating a depth map based first live video avatar of a user or object disposed in a heterogeneous first environment with an arbitrary background; a processor coupled to the depth sensor; code fixed in a tangible medium for execution by the processor for extracting the depth map from the first environment to provide an extracted depth map based live video avatar; and a display system coupled to the processor for showing the extracted depth map based live video avatar in a second environment diverse from the first environment.

In a second embodiment, the system has been disclosed as including a camera coupled to the processor to provide live video images of the user in the first environment and code for spatially filtering the images to provide a spatially filtered extracted second live video avatar. This embodiment further includes code for combining the first live video avatar with the second live video avatar to provide an enhanced extracted depth map based third live video avatar. Images from multiple cameras and or depth sensors are combined simultaneously to provide the third live video avatar using the spatially enhanced extracted depth map.

A routing server is disclosed for receiving streams from multiple users and sending to each user the live video avatar images from other users based on their locations in a shared space or for use in a local user’s AR environment. The display may be holographic, distributed, free space and/or optical (glass or goggles). In the best mode, an arrangement is included for providing a heads up display showing where users are onscreen.

The system can include code for enabling voice activation along with code for enabling automatic signaling by which navigation into someone’s virtual space prior to connecting through the routing server will ping (via text or call) his or her phone to meet you at your coordinates in the virtual world from wherever he or she is in reality. The system can include code for effecting gaze correction, beautification and/or age reduction. The software can include code for providing a heads-up display showing where users are onscreen, hyper-realism (enhancement of augmented reality environments), persistent (always present in the second environment) experience, age and gender filtering. Further, code may be included for enabling automatic signaling by which navigation into someone’s virtual room or office will ping his or her phone to meet you there wherever he or she is in reality.

The present invention has been described herein with reference to particular embodiments for a particular application. Those having ordinary skill in the art and access to the present teachings will recognize additional modifications, applications and embodiments within the scope thereof. For example, it should be noted that Silhouette’s live video user stream can be interlaced with a virtual key manually passed in-world to provide for access to secure computers, systems and/or network assets.

Interworld Operability

As discussed more fully below, Silhouette functionality enables inter-world operability. That is, the present teachings allow a user to move from a first metaverse to a second metaverse - regardless of whether each metaverse operates on a different runtime architecture, framework, engine or protocol such as Unity, Unreal, X3D, Web- XR and others

Figure 7 is a high-level block diagram showing an illustrative embodiment of the interoperable system of the present invention. Figure 7 shows plural metaverses including, by way of example, Unity, Web-XR, X3d, Unreal and a generic server representing every other server type. The metaverses depicted in Figure 7 are exemplary of a number of metaverses current known and used. The present teachings are not limited to use with the metaverse servers depicted in Figure 7. Numerous additional types of metaverses may be used without departing from the scope of the present teachings inasmuch as the present invention is adapted to operate with metaverses of any run time architecture, framework, engine or protocol.

Each metaverse server is typically implemented in software stored on a tangible medium for execution by an onboard processor. Each metaverse is typically, though not necessarily, mounted within a unique housing. In any case, each metaverse provides a platform for a variety of users to enter and experience a virtual reality or gaming environment. Each metaverse can host millions of users. In Figure 7, plural users as illustrated as being operationally coupled to a respective metaverse via one of a plurality of user platforms.

The user platforms are typically a desktop or laptop computer or a mobile device such as a tablet or a smartphone. In any case, Figure 7 shows multiple such user platforms operationally coupled to each of the metaverses.

Conventionally, the user platform also typically includes a headset through which the user is enabled to view the chosen metaverse in an immersive manner. As noted above, while the user experience with a headset can be rich; headsets necessitate a rendering of the actual user as a cartoonish avatar, fantasy character (typically for gaming) or a lifelike computer-generated replica. The cartoon avatar and fantasy characters interfere with a sense of realism in the environment. While lifelike computer-generated replicas are typically more realistic, they are not yet able to convince the human brain of their realism and cause a well-known and disturbing uncanny valley experience on the part of the user. For more on the uncanny valley effect, see

To address this problem, many designers of metaverses have endeavored to make the replicas more realistic. However, the rendering of lifelike replicas places a computational burden on the host processor which limits the scalability of the metaverse.

To address this problem, the present invention provides Silhouette live video streaming technology to such diverse and independent metaverses. As described in the above-referenced Benman patents, the teachings of which have been incorporated herein by reference, Silhouette makes a web camera intelligent, allowing it to capture live video of a user and extract the user’s video from the user’s background environment. Silhouette then combines the user’s live video data stream with the user’s synchronized audio, along with orientation and position data and sends it to a dedicated routing server for duplication as necessary and routing to other users within range and line of sight in the virtual world or metaverse

Previously, Silhouette has been disclosed for use with a dedicated and operationally coupled virtual world server. In accordance with the present teachings, a system and method are disclosed for extending Silhouette functionality to diverse and sundry off-platform metaverses such as those depicted in Figure 7 by way of illustration.

As shown in Figure 7, this accomplished by a Silhouette routing server implemented in accordance with the teachings of the above-referenced Benman patents and a portal interface. In the best mode, the portal interface is implemented on the same platform as the routing server. However, the portal interface may be implemented within each metaverse without departing from the scope of the present teachings.

The portal interface serves to provide a uniform data stream to the routing server (or, in an alternative embodiment discussed below: a user platform) despite the run time architecture, framework, engine or protocol of the off platform metaverse to which the routing server is operationally coupled. This is accomplished by converting the incoming data stream from each diverse metaverse into a single protocol such as X3D, by way of example. This conversion may be performed by the portal interface, however, in the best mode, the conversion is performed on the metaverse platforms, thereby freeing the portal interface to perform other functions such as compressing, decompressing encrypting, decrypting, and directing data streams between the between the metaverses and the routing server. Online real-time protocol converters are known in the art, see for example:

1. InstantLabs online converter at:

and

2. Any Conv at

In any case, each metaverse delivers data to the routing server via the portal interface including a user’s live video avatar or ‘silhouette’. This is made possible by the deployment of Silhouette extraction and transplantation modules to each metaverse or on the user’s platforms through each metaverse. The Silhouette extraction and transplantation technology is disclosed in detail in the Benman patents incorporated herein by reference.

In the illustrative embodiment, a Silhouette applet is distributed to each user by a host metaverse. The applet or module may provide for extraction and transplantation or some subset thereof depending on the extent to which the metaverse operator desires to perform these functions on the metaverse server. In the best mode, the applets are deployed to the user platforms and operate on the client side to minimize the load on the metaverse server processors. Figure 8 below illustrates various options for deployment of the Silhouette modules within each metaverse ecosystem in accordance with the present teachings.

Figure 8 is a high-level block diagram showing an illustrative embodiment of the interoperable system of the present invention with Silhouette extraction and transplantation functionality distributed throughout the ecosystem in varies implementations. In Figure 8, ‘SET’ represents a complete silhouette module adapted to perform extraction and transplantation functions. ‘SE’ represents a module limited to performing extraction only and ‘ST’ represents a module adapted to perform the transplantation function only.

In any case, each Silhouette module, whether located at the metaverse server or on the user platform, in whole or in part, (see Figure 9 below), performs the function of extracting the user’s personal live video image and audio stream and sending it, along with any other multimedia the user desires to stream in a blended reality or hyperreality mode as discussed more fully below, and transplanting received streams in a user’s chosen metaverse environment.

Figure 9 is a block diagram showing the interworld portal interface of Figure 7 in more detail. In the illustrative embodiment, the interworld portal interface is implemented in software stored on a tangible medium and executed by a processor and includes a platform coordinate interface module. The platform coordinate interface module receives metaverse avatar coordinate data from each metaverse and sends it to the routing server via a coordinate translator. For this purpose, an N-dimensional grid or addressing system may be employed by which each external metaverse is assigned a point a coordinate location or address at which the entire grid of the external metaverse is located. This function of assigning the metaverse address in the N-dimensional grid can be performed by the routing server or the coordinate interface. In the best mode, this function is performed by the routing server as each external metaverse is registered in the system.

In addition, the platform coordinate interface module receives position coordinate data from the routing server, through the coordinate translator, for distribution as necessary to other user platforms that are determined by the routing server as intended recipients for each received multimedia stream. Outgoing coordinate translation is key for interworld and multi-world operability as discussed more fully below.

The platform coordinate interface module also includes an avatar orientation and scaling interface that receives data from each metaverse as to the three-axis orientation

for each axis of rotation) of each avatar and the scaling to be employed (e.g., x,y,z dimensions) at each the received coordinates. This is to account for any differences in scales of the external metaverses being served by the Silhouette routing server.

As an alternative, an optional world server interface is employed by which a copy of a host metaverse is stored on the portal interface and serves the function of ascertaining the necessary orientation and scaling of each outgoing silhouette stream as well as line of sight between avatars as is needed to determine to whom the outgoing streams are to be sent.

An optional avatar transformation controller is also included in the portal interface to facilitate smooth avatar switching functionality onboard between silhouette avatar and host world avatar types at the option of the user. This eliminates the need for each metaverse operator to develop a system for performing this function onsite.

Incoming and outgoing management of Silhouette streams is handled by a stream routing server interface, a stream location interface and an audio/video stream module under the control of the Silhouette routing and world server module. The Silhouette routing and world server module determines the coordinates to which each incoming stream is to be directed and passes the coordinates to the stream routing server interface. The stream routing server interface passes the stream receiving coordinates to the audio/video stream location interface and the Silhouette audio/video streaming module. The Silhouette audio/video streaming module sends and receives the Silhouette streams while the audio/video stream location interface provides IP addressing to audio/video streaming module for each outgoing stream packet as each packet of data is sent by the audio/video streaming unit.

Thus, in the best mode, a live video (Silhouette) avatar is streamed from a first user in any metaverse using any run time architecture or framework such as Unity, Web-XR, X3D etc., and is extracted in accordance with the teachings of the abovereferenced Benman patents nos. 5,966,130 and 6,798,407, the teachings of which are incorporated into this application by reference. The extracted live video/audio avatar stream is forwarded to the Silhouette routing server and duplicated as necessary to provide a stream to other user’s inworld within a predetermined range of the first user’s location and with a clear line of sight in world.

Interoperability:

Those skilled in the art can appreciate that the present invention provides interoperability between metaverses allowing a user to appear in multiple metaverses with one avatar. In the simple case, a user (User #1) in one metaverse (Metaverse #1) simply logs into another metaverse (Metaverse #2) and selects a silhouette avatar type when presented with an avatar option using an avatar switching module provided by the metaverse platform for the Silhouette routing server platform. Thereafter, the user can use his or her silhouette as their avatar in the second metaverse just as it is employed in the first metaverse. This simple case is depicted in Figure 10.

Figure 10 is a high-level block diagram showing an illustrative embodiment of the interoperable system of the present invention in a simple interoperable mode of operation by which a user (User #1) moves from a first metaverse (Metaverse #1) to a second metaverse by simply logging into the second metaverse (Metaverse #2) and selecting silhouette avatar functionality in accordance with the present teachings.

However, as an alternative, a user can move from one metaverse into another and, if it is desired, experience both metaverses simultaneously using the ‘passport’ feature of the present invention.

In a passport mode of operation, the user moves from one metaverse to another via the portal interface of Figures 7 - 9. In this case, the portal interface handles authentication of the user to enter a second and/or third metaverse from the first selected (e.g., home) metaverse.

This passport mode offers several additional features including the ability to stream a user’s ‘micro-verse’ from the first metaverse to the second metaverse. As defined herein, a ‘micro-verse’ is a portion of the user’s environment in the virtual world of the first metaverse or in the user’s actual real -world environment as is or scanned and rendered in hyper-realistic mode in which other users can navigate as well.

The benefit of having one’s actual environment scanned and rendered in 3D is that it enables visitors to navigate within one’s local environment via their silhouettes and be seen by the host in the host’s actual real world environment using 3D enabled glasses or other immersive display technology. In accordance with the present teachings, this mode of operation is referred to as ‘blended reality’ mode and the environment in which it is enabled is herein referred to as being ‘hyper-realistic’.

In any case, the present teachings provide a passport mode or operation that enables each user to appear in multiple metaverses simultaneously with or without silhouette avatars and with or without micro-verses inasmuch as the system of the present invention is adapted to stream any multimedia content from one metaverse to another under the control of the end user as to metaverse(s), location within metaverses, avatar type, environment type (e.g. hyper-realistic or not). This is depicted in Figure 11.

Figure 11 is a high-level block diagram showing an illustrative embodiment of the interoperable system of the present invention in an alternative interoperable mode of operation by which a user’s (User #l’s) multimedia (extracted video and audio stream represented generally as a silhouette 70) is sent to multiple metaverses (#2 - 5) via User #l’s home metaverse (Metaverse #1) through the portal interface and routing server in accordance with the present teachings. As a result, users in any other Metaverses 2 - 5, as selected by User #1, are able the see User #1 in Metaverses 2 - 5 transplanted as a live video silhouette avatar along with any additional multimedia content chosen for transmission by User #1 in accordance with the present teachings.

In short, a system is disclosed for streaming multimedia content into a metaverse comprising: a metaverse server for providing an artificial reality environment in accordance with a first operational paradigm, the server being implemented with software fixed on a tangible medium and adapted to be executed by a processor mounted in a first housing; a first client platform operationally coupled to the server for enabling a first user to experience the artificial environment, the first client platform being implemented in software fixed on a tangible medium and adapted to be executed by a first processor mounted within a second housing; a second client platform operationally coupled to the server for enabling a second user to experience the artificial environment, the second client platform being implemented in software fixed on a tangible medium and adapted to be executed by a second processor mounted within a third housing; and a routing server mounted implemented in software fixed in a tangible medium and executed by a processor mounted within a fourth housing physically independent from the first, second and third housings, the routing server operationally coupled to the first and second client machines whereby multimedia from the first user is displayed to the second user in the artificial environment at a location determined by the first user and executed by the routing server.

The routing server may be operationally may be coupled to the metaverse server or directly to the first and second client platforms.

The routing server is adapted to be operationally coupled to a second metaverse server operating on a fifth platform in accordance with a second operational paradigm via software executed by a processor mounted within a fifth housing physically independent from the first, second, third and fourth housings. The routing server is adapted to route a real time multimedia stream from the first or the second user into the second metaverse allowing a third user, operationally coupled to the second metaverse, to view and hear multimedia content from the first or second user in the second metaverse.

In any case, the routing server is adapted to route real time multimedia content from the first or second user to the third user through the first and second metaverses operating in accordance with the first and second operational paradigms respectively.

The present invention has been described herein with reference to a particular embodiment for a particular application. Those having ordinary skill in the art and access to the present teachings will recognize additional modifications, applications and embodiments within the scope thereof. For example, the portal interface can be distributed between the metaverses. In addition, the portal interface can be integrated into the routing server without departing from the scope of the present teachings. It is therefore intended by the appended claims to cover any and all such applications, modifications and embodiments within the scope of the present invention.

All elements, parts, and steps described herein are preferably included. It is to be understood that any of these elements, parts and steps may be replaced by other elements, parts, and steps or deleted altogether as will be obvious to those skilled in the art.

The foregoing description of the technology has been presented for purposes of illustration and description and is not intended to be exhaustive or to limit the technology to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments disclosed were meant only to explain the principles of the technology and its practical application to thereby enable others skilled in the art to best use the technology in various embodiments and with various modifications suited to the particular use contemplated. The scope of the technology is to be defined by the following claims.

Broadly, this writing discloses a system for extracting and transplanting live video avatar images providing silhouette live video avatar and a system for providing multimedia service to external metaverses and client platforms. The system for extracting and transplanting live video avatar images including a sensor for creating a first live video avatar of a user or object disposed in a heterogeneous first environment with an arbitrary background; a processor coupled to the sensor; code fixed in a tangible medium for execution by the processor for extracting a map from the first environment to provide an extracted map based live video avatar; and a display system coupled to the processor for showing the extracted map based live video avatar in a second environment diverse from the first environment. The code further includes code for geo-fixing a location of the live video image stream of the first user in the second environment for viewing by the second user, whereby movement of the mobile display by the second user does not change the position of the first user in the second user’s environment. The system for providing silhouette live video avatar and multimedia service includes a server with metaverse software executed by a processor to provide an artificial reality environment in accordance with a first operational paradigm. A first client platform is operationally coupled to the server for enabling a first user to experience the artificial environment. A second client platform is operationally coupled to the server for enabling a second user to experience the artificial environment. A routing server is operationally coupled to the first and second client machines to route multimedia from the first user so that it is displayed to the second user in the artificial environment at a location provided by the routing server. To provide interworld operability, the routing server is operationally coupled to a second metaverse server to route a real time multimedia stream from the first or the second user into the second metaverse allowing a third user operationally coupled to the second metaverse to view and hear multimedia content from the first or second user in the second metaverse.

This writing also discloses the following implementations.

As a first implementation is a system for extracting and transplanting live video image streams comprising: an image sensor for providing a live video image stream of a first user or object disposed in a heterogeneous first environment with an arbitrary background; a processor operationally coupled to the image sensor to receive the live video image stream; code stored in a non-transitory tangible medium for execution by the processor for extracting a live video image stream of the first user from the live video image stream of the arbitrary background of the heterogeneous first environment to provide an extracted live video image stream of the first user; a mobile display system coupled to the processor for showing the extracted live video image stream to a second user in a second environment separate and distinct from the first environment, said second environment being an augmented reality environment including at least part of said second user’s second environment; and said code further including code for geo-fixing a location of the live video image stream of the first user in the second environment for viewing by the second user, whereby movement of the mobile display by the second user does not change the position of the first user in the second user’s environment.

A further implementation of any of the preceding or following implementations occurs in which the image sensor includes a depth sensor.

A further implementation of any of the preceding or following implementations occurs wherein the code further includes code for combining the first live video image stream with a second live video image stream from the depth sensor to provide an enhanced extracted depth map based third live video image stream.

A further implementation of any of the preceding or following implementations occurs wherein the code further includes code for combining images from multiple cameras or depth sensors simultaneously to provide the third live video image stream.

A further implementation of any of the preceding or following implementations wherein there are multiple displays.

A further implementation of any of the preceding or following implementations further including an arrangement for sensing a position of a user with a camera or microphone and automatically selectively activating a display based on user position and viewing angle in response thereto.

A further implementation of any of the preceding or following implementations further including multiple cameras.

A further implementation of any of the preceding or following implementations wherein the code further includes code for effecting automatic camera activation based on a user’s position in a user’s environment. A further implementation of any of the preceding or following implementations further including an arrangement for sending the extracted live video image stream from the first user to the second user via a routing server.

A further implementation of any of the preceding or following implementations including a second arrangement for receiving said extracted live video image stream from said routing server and displaying the live video stream to the second user.

A further implementation of any of the preceding or following implementations wherein the display includes augmented reality goggles, augmented reality glasses or a free space display.

A further implementation of any of the preceding or following implementations wherein the processor is mounted on a first platform and the display is mounted on a second physically separate platform.

A further implementation of any of the preceding or following implementations wherein the second platform includes a second processor for executing code fixed in a non-transitory tangible medium for effecting a user selectable multimode display operation, said multimode operation including a video conferencing mode, a virtual conferencing mode and a mixed reality conferencing mode.

A further implementation of any of the preceding or following implementations wherein the code further includes code for displaying extracted image data in each of said modes.

A further implementation of any of the preceding or following implementations wherein the system includes a system for transplanting the extracted live video image stream into a computer rendered hyper-realistic augmented reality representation of a user’s environment.

A further implementation of any of the preceding or following implementations wherein the code further includes code for enabling a user to experience said hyper- realistic augmented reality representation of the user’s environment as a blended reality environment.

A further implementation of any of the preceding or following implementations wherein the code further includes code for enabling a second user to be present in said hyper-realistic environment by which the first user’s environment is rendered in virtual reality.

As a further implementation is a system implementation for extracting and transplanting live video image streams comprising: a computing and communications platform; a sensor coupled to the platform for creating first live video image stream of a user or object disposed in a heterogeneous first environment with an arbitrary background; a first processor coupled to the sensor; code stored in a non-transitory tangible medium for execution by the first processor for extracting live video image stream of a first user from the live video image stream of the arbitrary background of the heterogeneous first environment to provide an extracted depth map based live video avatar; a routing server; a second platform having a second processor coupled to the routing server; and code stored in a non-transitory tangible medium on the second platform for receiving the live video image stream from the routing server and for causing a display system coupled to the second processor to show the extracted live video image stream in a second environment independent from the first environment, said second environment being an augmented reality environment and said code further including code for geo-fixing a location of the live video image stream of the first user in the second environment for viewing by the second user, whereby movement of the mobile display by the second user does not change the position of the first user in the second user’s environment.

A further implementation is a system implementations wherein there is system for streaming multimedia content into a metaverse comprising: a metaverse server for providing an artificial reality environment in accordance with a first operational paradigm, the server being implemented with software fixed on a tangible medium and adapted to be executed by a processor mounted in a first housing; a first client platform operationally coupled to the server for enabling a first user to experience the artificial environment, the first client platform being implemented in software fixed on a tangible medium and adapted to be executed by a first processor mounted within a second housing; a second client platform operationally coupled to the server for enabling a second user to experience the artificial environment, the second client platform being implemented in software fixed on a tangible medium and adapted to be executed by a second processor mounted within a third housing; and a routing server mounted implemented in software fixed in a tangible medium and executed by a processor mounted within a fourth housing physically independent from the first, second and third housings, the routing server operationally coupled to the first and second client machines whereby multimedia from the first user is displayed to the second user in the artificial environment at a location determined by the first user and executed by the routing server.

A further implementation of any of the preceding or following system implementations occurs wherein the routing server is operationally coupled to the metaverse server.

A further implementation of any of the preceding or following system implementations occurs wherein the routing server is operationally coupled directly to the first and second client platforms.

A further implementation of any of the preceding or following system implementations occurs wherein the routing server is operationally coupled to a second metaverse server operating on a fifth platform in accordance with a second operational paradigm via software executed by a processor mounted within a fifth housing physically independent from the first, second, third and fourth housings.

A further implementation of any of the preceding or following system implementations occurs wherein the routing server is adapted to route a real time multimedia stream from the first or the second user into the second metaverse allowing a third user operationally coupled to the second metaverse to view and hear multimedia content from the first or second user in the second metaverse.

A further implementation of any of the preceding or following system implementations occurs wherein the routing server is adapted to route real time multimedia content from the first or second user to the third user through the first and second metaverses operating in accordance with the first and second operational paradigms respectively.

As a further implementation is a system for streaming multimedia content from a first platform to a second platform comprising: a first platform for sending and receiving multimedia content; a second platform operationally coupled to the first platform sending and receiving multimedia content; and software stored on a medium on the first and second platforms adapted for execution by first and second processors on the first and second platforms respectively for streaming multimedia content from the first platform into an artificial reality environment for display on the second platform.

A further implementation of any of the preceding or following implementations occurs wherein the software further includes code for execution by the first and second processors for streaming multimedia content from the second platform into the artificial reality environment for display on the first platform. A further implementation of any of the preceding or following implementations occurs wherein the multimedia content is a real time video data stream with synchronized audio.

A further implementation of any of the preceding or following implementations occurs wherein the multimedia content is live video imagery of a user along with audio and position data.

A further implementation of any of the preceding or following implementations occurs wherein the first and second platforms are client platforms.

A further implementation of any of the preceding or following implementations occurs wherein the artificial reality environment is on a server.

A further implementation of any of the preceding or following implementations occurs wherein there includes a server for routing the streaming multimedia content between the first and the second client platforms.

A further implementation of any of the preceding or following implementations occurs wherein the first client platform is coupled to the server via a first metaverse and the second client platform is coupled to the server via a second metaverse.

A further implementation of any of the preceding or following implementations occurs wherein the first and second metaverses are stored on first and second metaverse servers respectively.

A further implementation of any of the preceding or following implementations occurs wherein the first and second metaverses are implemented by first and second processors mounted on first and second independent systems respectively executing software stored on the first and second independent systems whereby the first and second metaverses operate in accordance with first and second, diverse run time architectures, frameworks, engines or protocols respectively. A further implementation of any of the preceding or following system implementations occurs wherein the artificial reality environment is an augmented reality environment.

A further implementation of any of the preceding or following system implementations occurs wherein the location of the streaming multimedia content in a virtual or augmented reality world is determined by a transmitting platform user as it is transmitted by the first or second platform.

A further implementation of any of the preceding or following system implementations occurs wherein the location of the streaming multimedia content in the virtual or augmented reality world is determined by a receiving platform user as it is received by the second or the first platform.

As a further implementation there is a method for creating an interworld avatar and using the avatar to navigate between virtual worlds on disparate platforms including the steps of: providing at least one client machine; providing at least one world server; providing at least one routing server; interconnecting each of the servers and connecting at least one of the servers to the client machine; and executing software stored on a tangible medium with a processor on the client machine or one of the servers to provide a live video avatar for use in a world provided by the world server via the routing server.

It is possible that in the examination and ultimate allowance of this writing as a patent, some text may have been omitted by requirement of the jurisdiction examining this writing. In interpreting this writing, the original text without deletions is to be used.

Amendments, alterations, or characterizations made in order to expedite allowance are to be considered to have been made without any prejudice, waiver, disclaimer, or estoppel, and without forfeiture or dedication to the public of any subject matter as originally presented.

Reviewers of this writing or any related writing shall not reasonably infer any disclaimers or disavowals of any subject matter as originally contained herein. To the extent any amendments, alterations, characterizations, or other assertions previously made in this or in any related writing with respect to any art, prior or otherwise, could be construed as a disclaimer of any subject matter supported by the original writing herein, any such disclaimer is hereby rescinded and retracted.

Claims

1. A system for extracting and transplanting live video image streams comprising: an image sensor for providing a live video image stream of a first user or object disposed in a heterogeneous first environment with an arbitrary background; a processor operationally coupled to the image sensor to receive the live video image stream; code stored in a non-transitory tangible medium for execution by the processor for extracting a live video image stream of the first user from the live video image stream of the arbitrary background of the heterogeneous first environment to provide an extracted live video image stream of the first user; a mobile display system coupled to the processor for showing the extracted live video image stream to a second user in a second environment separate and distinct from the first environment, said second environment being an augmented reality environment including at least part of said second user’s second environment; and said code further including code for geo-fixing a location of the live video image stream of the first user in the second environment for viewing by the second user, whereby movement of the mobile display by the second user does not change the position of the first user in the second user’s environment.

2. The system of Claim 1 wherein the image sensor includes a depth sensor.

3. The system of Claim 2 wherein the code further includes code for combining the first live video image stream with a second live video image stream from the depth sensor to provide an enhanced extracted depth map based third live video image stream.

4. The system of Claim 3 wherein the code further includes code for combining images from multiple cameras or depth sensors simultaneously to provide the third live video image stream.

5. The system of Claim 1 further including multiple displays.

6. The system of Claim 5 including an arrangement for sensing a position of a user with a camera or microphone and automatically selectively activating a display based on user position and viewing angle in response thereto.

7. The system of Claim 1 further including multiple cameras.

8. The system of Claim 7 wherein the code further includes code for effecting automatic camera activation based on a user’s position in a user’s environment.

9. The system of Claim 1 further including an arrangement for sending the extracted live video image stream from the first user to the second user via a routing server.

10. The system of Claim 9 further including a second arrangement for receiving said extracted live video image stream from said routing server and displaying the live video stream to the second user.

11. The system of Claim 1 wherein the display includes augmented reality goggles, augmented reality glasses or a free space display.

12. The system of Claim 1 wherein the processor is mounted on a first platform and the display is mounted on a second physically separate platform.

13. The system of Claim 12 wherein the second platform includes a second processor for executing code fixed in a non-transitory tangible medium for effecting a user selectable multimode display operation, said multimode operation including a video conferencing mode, a virtual conferencing mode and a mixed reality conferencing mode.

14. The system of Claim 13 wherein the code further includes code for displaying extracted image data in each of said modes.

15. The system of Claim 1 wherein the system includes a system for transplanting the extracted live video image stream into a computer rendered hyper- realistic augmented reality representation of a user’s environment .

16. The system of Claim 15 wherein the code further includes code for enabling a user to experience said hyper-realistic augmented reality representation of the user’s environment as a blended reality environment.

17. The system of Claim 15 further wherein the code further includes code for enabling a second user to be present in said hyper-realistic environment by which the first user’s environment is rendered in virtual reality.

18. A system for extracting and transplanting live video image streams comprising: a computing and communications platform; a sensor coupled to the platform for creating first live video image stream of a user or object disposed in a heterogeneous first environment with an arbitrary background; a first processor coupled to the sensor; code stored in a non-transitory tangible medium for execution by the first processor for extracting live video image stream of a first user from the live video image stream of the arbitrary background of the heterogeneous first environment to provide an extracted depth map based live video avatar; a routing server; a second platform having a second processor coupled to the routing server; and code stored in a non-transitory tangible medium on the second platform for receiving the live video image stream from the routing server and for causing a display system coupled to the second processor to show the extracted live video image stream in a second environment independent from the first environment, said second environment being an augmented reality environment and said code further including code for geo-fixing a location of the live video image stream of the first user in the second environment for viewing by the second user, whereby movement of the mobile display by the second user does not change the position of the first user in the second user’s environment.

19. A system for streaming multimedia content into a metaverse comprising: a metaverse server for providing an artificial reality environment in accordance a tangible medium and adapted to be executed by a processor mounted in a first housing; a first client platform operationally coupled to the server for enabling a first user to experience the artificial environment, the first client platform being implemented in software fixed on a tangible medium and adapted to be executed by a first processor mounted within a second housing; a second client platform operationally coupled to the server for enabling a second user to experience the artificial environment, the second client platform being implemented in software fixed on a tangible medium and adapted to be executed by a second processor mounted within a third housing; and a routing server mounted implemented in software fixed in a tangible medium and executed by a processor mounted within a fourth housing physically independent from the first, second and third housings, the routing server operationally coupled to the first and second client machines whereby multimedia from the first user is displayed to the second user in the artificial environment at a location determined by the first user and executed by the routing server.

20. The system of Claim 19 wherein the routing server is operationally coupled to the metaverse server.

21. The system of Claim 19 wherein the routing server is operationally coupled directly to the first and second client platforms.

22. The system of Claim 19 wherein the routing server is operationally coupled to a second metaverse server operating on a fifth platform in accordance with a second operational paradigm via software executed by a processor mounted within a fifth housing physically independent from the first, second, third and fourth housings.

23. The system of Claim 22 wherein the routing server is adapted to route a real time multimedia stream from the first or the second user into the second metaverse allowing a third user operationally coupled to the second metaverse to view and hear multimedia content from the first or second user in the second metaverse.

24. The system of Claim 23 wherein the routing server is adapted to route real time multimedia content from the first or second user to the third user through the first and second metaverses operating in accordance with the first and second operational paradigms respectively.

25. A system for streaming multimedia content from a first platform to a second platform comprising: a first platform for sending and receiving multimedia content; a second platform operationally coupled to the first platform sending and receiving multimedia content; and software stored on a medium on the first and second platforms adapted for execution by first and second processors on the first and second platforms respectively for streaming multimedia content from the first platform into an artificial reality environment for display on the second platform.

26. The system of Claim 25 wherein the software further includes code for execution by the first and second processors for streaming multimedia content from the second platform into the artificial reality environment for display on the first platform.

27. The system of Claim 25 wherein the multimedia content is a real time video data stream with synchronized audio.

28. The system of Claim 27 wherein the multimedia content is live video imagery of a user along with audio and position data.

29. The system of Claim 25 wherein the first and second platforms are client platforms.

30. The system of Claim 25 wherein the artificial reality environment is on a server.

31. The system of Claim 25 further including a server for routing the streaming multimedia content between the first and the second client platforms.

32. The system of Claim 31 wherein the first client platform is coupled to the server via a first metaverse and the second client platform is coupled to the server via a second metaverse.

33. The system of Claim 25 wherein the first and second metaverses are stored on first and second metaverse servers respectively.

34. The system of Claim 33 wherein the first and second metaverses are implemented by first and second processors mounted on first and second independent systems respectively executing software stored on the first and second independent systems whereby the first and second metaverses operate in accordance with first and second, diverse run time architectures, frameworks, engines or protocols respectively.

35. The system of Claim 25 wherein the artificial reality environment is an augmented reality environment.

36. The system of Claim 25 wherein the location of the streaming multimedia content in a virtual or augmented reality world is determined by a transmitting platform user as it is transmitted by the first or second platform.

37. The system of Claim 25 wherein the location of the streaming multimedia content in the virtual or augmented reality world is determined by a receiving platform user as it is received by the second or the first platform.

38. A method for creating an interworld avatar and using the avatar to navigate between virtual worlds on disparate platforms including the steps of: providing at least one client machine; providing at least one world server; providing at least one routing server; interconnecting each of the servers and connecting at least one of the servers to the client machine; and executing software stored on a tangible medium with a processor on the client machine or one of the servers to provide a live video avatar for use in a world provided by the world server via the routing server.