WO2022129646A1

WO2022129646A1 - Virtual reality environment

Info

Publication number: WO2022129646A1
Application number: PCT/EP2021/086907
Authority: WO
Inventors: Basil LIM; James D. CARSWELL; Veronica O'KEANE; Darren W. RODDY; Erik O'HANLON
Original assignee: Technological University Dublin; The Provost, Fellows, Foundation Scholars, And The Other Members Of Board, Of The College Of The Holy And Undivided Trinity Of Queen Elizabeth, Near Dublin
Priority date: 2020-12-18
Filing date: 2021-12-20
Publication date: 2022-06-23
Also published as: GB202020196D0

Abstract

A method (300) of generating a virtual reality environment comprises obtaining (302) depth data of at least a part of a real scene using one or more sensors (102). The method comprises generating (304) a virtual environment by generating a three-dimensional virtual representation of the at least a part of the real scene using the obtained depth data. The method further comprises displaying (306) the generated virtual representation on at least one display device (108).

Description

VIRTUAL REALITY ENVIRONMENT

FIELD

The present invention relates to a method and a system for generating a virtual reality environment, and in particular but not exclusively to generating a virtual reality surgical environment.

BACKGROUND

Recent advances in technologies such as virtual reality, augmented reality and mixed reality have highlighted how such technologies might be used to improve performance in many fields, including medicine.

For example, collaboration between surgeons, and providing training opportunities for trainee surgeons, can be difficult. One problem in particular may be that only a limited or restricted number of people may be able to be present in an operating room. Typically, remote collaboration and training (supervision, consultation) can be achieved by displaying a video feed (live or recorded) of the operating environment to remote surgeons and/or trainee surgeons.

Virtual and/or augmented reality presents an alternative approach for enabling remote collaboration between surgeons located in different places, and also for improving training of surgeons who might otherwise not be exposed to many surgeries. However, a number of shortcomings and obstacles have thus far prevented the widespread adoption of virtual and/or augmented reality technology in the medical field.

The present invention has been devised with the foregoing in mind.

SUMMARY

According to a first aspect, there is provided a method of generating a virtual reality environment. The method may comprise obtaining depth data of at least a part of a real scene using one or more sensors. The method may also comprise generating a virtual environment by generating a three-dimensional virtual representation of the at least a part of the real scene using the obtained depth data. The method may further comprise displaying the virtual environment on at least one display device.

Generating a virtual environment which can be displayed on a display device may enable the virtual environment to be accessed by one or more remote viewers. That may enable improved remote collaboration, supervision and training in a large number of different fields, including medical surgery. The one or more sensors may be worn by a person located at the real scene. Using sensors worn by a person located at the real scene to obtain depth data of the real scene may prevent or inhibit a line-of-sight from the sensors to the real scene being blocked. That may enable the virtual environment to be accurately generated without any lost or missing information. Line-of-sight issues are common in conventional surgical assistance systems (e.g., tracking and navigation systems). Using sensors worn by a person located at the real scene to obtain depth data may also enable a simple way to generate a virtual environment, for example as the person moves around the real scene. Sensors worn by a person may also be convenient, easily portable and widely available, increasing accessibility and ease of generating a virtual environment without requiring expensive and typically bulky specialist technology.

The one or more sensors may be disposed in or on a head-mounted display (HMD) worn by the person located at the scene. That may enable the virtual environment to be both generated and accessed by the person located at the scene using a single, portable device, increasing simplicity of generating and accessing a virtual environment. A HMD may also be worn without impeding normal movement of the wearer. The HMD may further comprise a processor for generating the virtual environment. That may provide a fully self-contained device capable of generating and accessing a virtual environment.

Generating the virtual environment may be performed in substantially real-time as the depth data is obtained. That may enable the virtual environment to be generated substantially immediately as the depth data of the real scene is obtained by the one or more sensors. That may prevent or reduce the need for the virtual environment to be generated in advance of when it is required (for example, for a surgery requiring remote collaboration with one or more additional surgeons). That may improve ease and efficiency of generating a virtual environment which can be accessed remotely. That may enable significant time savings in setting up the virtual environment in comparison virtual environments which are pre-generated or pre-rendered before use. In addition, generating the virtual environment in substantially real-time may allow the virtual environment to better reflect the current real scene. A virtual environment that is pre-rendered may not accurately reflect the real scene on which it is based, as the real scene may change between generating the virtual environment and accessing the virtual environment.

Displaying the virtual environment may be performed in substantially real-time as the virtual environment is generated.

Obtaining the depth data may be performed substantially continuously. The method may further comprise updating the virtual environment using the most recently obtained depth data. That may enable the virtual environment to most accurately reflect the real scene on which it is based. It may allow the virtual environment to constantly adapt to any changes in the real scene which are captured in the depth data obtained by the one or more sensors. That may provide increased accuracy between the real scene and the virtual environment, which may be critical in applications such as surgery in which events can unfold rapidly which alter the real scene. That, in combination with generating the virtual environment in substantially real-time as the depth data is obtained, may prevent or reduce any time lag between events unfolding at the real scene and being reflected in the virtual environment. That may ensure that remote viewers see changes in the virtual environment substantially at the same time as a person located at the real scene sees corresponding changes in the real scene.

The method may further comprise determining a relative position, in the virtual environment, of a viewpoint of a person viewing the virtual environment on the at least one display device. Determining the relative position may comprise using one or more simultaneous localization and mapping, SLAM, algorithms. If a relative position, in the virtual environment, of a person viewing the virtual environment is known, that may enable the viewpoint to be changed. That may enable the person to view the virtual environment from a number of different viewpoints. That may increase the amount of information the person may be able to glean from the real scene, which may improve understanding of the real scene and aid in one or both of remote collaboration and training.

The method may further comprise navigating the virtual environment using the at least one display device. That may enable a remote viewer to experience the virtual environment in virtual reality and/or explore the virtual environment, for example by viewing the virtual environment from a different position or perspective, rather than simply visualizing the virtual environment from a single perspective and/or without any individual control.

The at least one display device comprises a head-mounted display, HMD. That may provide a remote viewer with a more immersive experience, which may further improve remote collaboration and/or training.

The method may further comprise incorporating, in the virtual environment, a virtual representation of the person viewing the virtual environment based on the determined relative position of the viewpoint. That may enable remote viewers to be visualized in the virtual environment as if the remote viewers were present in the real scene.

The method may further comprise determining a position or spatial location of the one or more sensors relative to the real scene. The position or spatial location may be determined using one or more simultaneous localization and mapping (SLAM) algorithms. That may enable the virtual environment to include a virtual representation of different parts of the real scene such that the spatial relationship in the virtual environment mirrors or corresponds to the spatial relationship of the respect parts of the real scene. That may enable a more comprehensive virtual environment to be generated even if only a single sensor is used to obtain depth data of the real scene. A single sensor can be moved around the real scene to obtain depth data for multiple parts of the real scene. The depth data can then be used to generate a virtual environment in which the different parts are correct spatially positioned relative to one another.

The method may further comprise incorporating, in the virtual environment, a virtual representation of a person located at the real scene based on the determined relative position of the one or more sensors. That may enable the person to be visualized in the virtual environment by remote viewers.

The method may further comprise detecting an interaction of the person located at the real scene and/or a viewer of the at least one display device with the virtual environment. The interaction may be detected using one or more sensors to detect hand movements and gestures of the person located at the real scene and/or the viewer of the at least one display device. The method may further comprise displaying the interaction in the virtual environment. That may enable the person located at the real scene and/or a remote viewer to add further visual information to the virtual environment which can be used for collaboration, supervision or training purposes. The interaction with the virtual environment may be or comprise generating an annotation that is incorporated into the virtual environment.

The method may comprise displaying the interaction in augmented reality over the real scene to the person located at the real scene.

The depth data may be or comprise point cloud data. Alternatively, the depth data may be or comprise a depth map. Obtaining point cloud data or a depth map of the real scene may be advantageous as that data can be obtained using compact, lightweight sensors such as time-of- flight sensors. Such sensors can easily be worn by a person located at the real scene without incurring strenuous physical effort.

The method may comprise obtaining colour and/or texture data of the real scene. The method may further comprise projecting the colour and/or texture data onto the virtual representation. The colour and/or texture data of the real scene may be obtained using the same sensor(s) used to obtain the depth data of the real scene.

According to a second aspect, there is provided a system for generating a virtual reality environment. The system may comprise one or more sensors configured to obtain depth data of at least a part of a real scene. The system may further comprise a processor configured to generate a virtual environment by generating a virtual representation of the at least a part of the real scene using the obtained depth data. The system may also comprise one or more displays configured to display the virtual environment.

The system of the second aspect may be configured to perform one or more method steps described with respect to the method of the first aspect. The optional features from any aspect may be combined with the features of any other aspect, in any combination. For example, the system of the second aspect may be configured to perform the method of the first aspect, and may comprise any one or more features corresponding to features described with reference to the method of the first aspect. Furthermore, the method of the first aspect may comprise any of the optional features described with reference to the system of the second aspect. Features may be interchangeable between different aspects and embodiments, and may be removed from and/or added to different aspects and embodiments.

Features which are described in the context of separate aspects and embodiments of the invention may be used together and/or be interchangeable wherever possible. Similarly, where features are described in the context of a single embodiment for brevity, those features may also be provided separately or in any suitable sub-combination. Features described in connection with the method of the first aspect may have corresponding features definable with respect to the system of the second aspect, and these embodiments are specifically envisaged.

BRIEF DESCRIPTION OF DRAWINGS

The invention will now be described by way of example only with reference to the accompanying drawings in which:

FIG. 1 shows a system for generating a virtual reality environment in accordance with an embodiment of the invention, the system comprising sensors located on a head-mounted display; and

FIG. 2 shows a virtual reality environment generated by the system shown in FIG. 1 ;

FIG. 3 shows a virtual reality environment generated by the system shown in FIG. 1 and containing an annotation;

FIG. 4 shows another system for generating a virtual reality environment in accordance with an embodiment of the invention, the system comprising a light field array camera; and

FIG. 5 shows a method of generating a virtual reality environment in accordance with an embodiment of the invention.

Like reference numerals and designations in the various drawings may indicate like elements.

DETAILED DESCRIPTION

Figure 1 shows a system 100 configured to generate a virtual environment in accordance with an embodiment of the invention. In the embodiment shown, the system 100 is configured to generate a virtual surgical environment, but the system 100 may equally be configured to generate any type of virtual environment, for example depending upon a desired application. The system 100 comprises one or more sensors 102 configured to obtain depth data of at least a part of a real scene at which the sensors 102 are located. In the embodiment shown, the real scene is a surgical environment. In the present disclosure, the term ‘depth data’ refers to data capturing or detailing the three-dimensional shape or structural appearance of the real scene, e.g., topography of the real scene. In the embodiment shown, the sensors 102 are or comprise a time-of flight depth sensor, such as a LIDAR sensor. The sensors 102 are configured to obtain depth data of a plurality of points of the real scene in order to create a point cloud of the real scene. Alternatively or additionally, the sensors 102 may be or comprise a different type of depth sensor such as an interferometry sensor, a stereo triangulation sensor, a structured light sensor, a depth sensing camera (for example, an RGB-D camera) etc. The sensors 102 may alternatively be configured to obtain a depth map of the real scene. Alternatively, the sensors 102 may be or comprise one or more cameras configured to obtain images and/or video from which depth data such as point cloud data can be extracted using image analysis (for example, using a processor such as described below). The image analysis may comprise using one or simultaneous localization and mapping (SLAM) algorithms, or one or more machine learning algorithms to extract depth data from the captured images and/or video.

In the embodiment shown, the sensors 102 are located on a head-mounted display (HMD) 104 worn by a person located at a real scene. The HMD 104 may be a virtual or augmented reality HMD, such as a Microsoft Hololens 2. In the embodiment shown, the sensors 102 are located on the HMD 104 such that the field of view of the sensors 102 from which depth data is obtained is substantially similar or identical to the field of view of the person wearing the sensors 102, although this is not essential. Alternatively, the sensors 102 may be worn on a different part of the person’s body, for example the chest or shoulders.

Alternatively or additionally, one or more sensors 102 may not be worn by the person. For example, the sensors 102 may be located at one or more points, locations or positions around the real scene. One or more of the sensors 102 may be stationary (e.g., located at a fixed point) relative to the real scene. For example, the support may be mounted on a stationary support such as a tripod. The sensors 102 may be or comprise one or more light field camera arrays such as a Google light field camera array. A light field camera array comprises a plurality of cameras each configured to capture a real scene from a different perspective, for example a different location on a spherical surface to which each of the cameras in the array is mounted, in order to obtain depth data of the real scene. The plurality of perspectives can then be merged together to provide a three-dimensional model of the real scene which can be viewed from different viewpoints. Alternatively, one or more of the sensors 102 may be movable relative to the scene, for example the sensors 102 may be mounted on a movable support such as a movable arm or a support that is slidably movable along a track or rail. It will be appreciated that a HMD 104 may not necessarily be required to be worn by a person located at the real scene (e.g., a surgeon) if the HMD 104 does not comprise the sensors 102. However, a HMD 104 may be worn by a person located at the real scene at least for display purposes (discussed further below), if not for obtaining depth data of the real scene.

The system 100 further comprises a processor 106 configured to generate a three-dimensional (3D) virtual representation (e.g., at least a shape or structure) of at least a part of the real scene using the depth data obtained by the sensors 102. If the system comprises a plurality of sensors 102, data obtained from multiple sensors 102 may be combined for increased accuracy. In the embodiment shown, the processor 106 is configured to use a virtual environment engine such as Unity to generate the virtual representation, although other suitable software or platforms may alternatively be used. In the embodiment shown, the processor 106 is configured to generate a 3D virtual representation of the real scene using point cloud data obtained by the time-of-flight depth sensor(s) 102. In some embodiments, the processor 106 is further configured to generate a 3D mesh using each point of the point cloud data as a vertex and creating polygons between the points, although this is not essential. Accuracy of the mesh may be increased, for example, by increasing a number and/or density of measured points in the point cloud. Alternatively, accuracy of the mesh may be decreased (increasing the speed of reconstruction or generation of the virtual representation of the real scene), for example, by removing points from the point cloud. Removing points from the point cloud may be performed randomly using a Gaussian distribution or other suitable optimization algorithm.

In the embodiment shown, the processor 106 is located in or on the HMD 104 itself. Alternatively, the processor 106 may not be located in or on the HMD 104, and may instead be located, for example, in a computer. The HMD 104 may be configured to form a wireless connection with the processor 106 (e.g., with the computer in which the processor 106 is located) to transferthe depth data obtained by the sensors 102 to the processor 106. Alternatively, a wired connection between the HMD 104 and the processor 106 may be used.

If the sensors 102 are worn elsewhere on the body of the person, or are instead located around the scene, the processor 106 may be configured to form a wired or wireless connection with the sensors 102, either directly or indirectly, to receive the depth data obtained by the sensors 102.

The system 100 further comprises one or more displays 108 configured to display the generated virtual representation. This allows one or more remote viewers to access the virtual environment generated by the processor 106, using the one or more displays 108. The processor 106 is configured to transmit or relay the generated virtual environment to the displays 108 shortly or substantially immediately (in substantially real-time, or on a short delay) after the virtual environment is generated. The various components of the system 100, in particular the sensors 102, HMD 104, processor 106 and displays 108, may form a network. The system 100 may utilise a networking API such as Photon to form the network. The network structure of the system 100 may enable the virtual environment to be shared and experienced by multiple parties simultaneously.

In the embodiment shown, the displays 108 are located remotely from the real scene (for example, at a different location than the real scene). The virtual environment may be accessed by a remote viewer using a virtual or augmented reality device, for example a peripheral device such as a virtual reality HMD or an augmented reality HMD, or a projector configured to project the virtual environment. In the embodiment shown, the displays 108 comprise a HMD (for example, a virtual reality HMD) worn by a second person (e.g., a remote viewer). That may enable a remote viewer to view and/or experience the virtual representation of the real scene in an immersive manner, substantially similar to how the second person could view and/or experience the real scene itself. Additionally or alternatively, the displays 108 may comprise one or more screens or monitors on which a remote viewer is able to view the virtual representation of the real scene. For example, the display screens 108 may be arranged to partially or fully surround or enclose the remote viewer (for example, in a substantially circular or spherical arrangement around the remote viewer) in order to provide an immersive experience of the virtual environment, similar to that provided by an HMD display 108. Alternatively, the display screen(s) 108 may be configured to provide a conventional substantially planar display of the virtual environment. Alternatively, the displays 108 may be or comprise any suitable display that enables a viewer to access and/or experience the virtual environment, such as a holographic display device.

In the embodiment shown, the displays 108 are configured to form a wireless connection with the processor 106 (e.g., with the HMD 104 of the person located at the real scene in or on which the processor 106 is located, or with a computer in which the processor 106 is located) to receive the virtual representation of the real scene from the processor 106, as indicated by the dashed line in Figure 1. Alternatively, a wired connection between the processor 106 and the displays 108 may be used. In the embodiment shown, the second person or remote viewer is a remote collaborating or observing surgeon and/or trainee surgeon. However, the specific connection type between the various components of the system 100 is not essential.

In the embodiment shown, the sensors 102 are configured to obtain depth data of the real scene substantially continuously. The processor 106 is configured to generate an updated virtual representation of the real scene using the depth data obtained by the sensors 102. In the embodiment shown, the processor 106 is configured to generate an updated virtual representation of the scene substantially continuously (for example, in substantially real-time). That may be useful for generating a virtual representation of a real scene which may change rapidly and/or frequently (such as a surgical operation). Alternatively, the processor 106 may be configured to generate an updated virtual representation of the scene periodically (for example, after a pre-determined period of time has elapsed such as substantially 5 seconds, substantially 10 seconds, substantially 30 seconds etc.), or in response to the processor 106 detecting a change in the depth data that is above a threshold (for example, a predetermined threshold). That may reduce the processing requirements of generating the updated virtual representation of the real scene. Generating an updated virtual representation of the real scene periodically may be appropriate for a real scene which may change slowly and/or infrequently. Alternatively, the sensors 102 may be configured to obtain depth data of the real scene periodically after a predetermined time period has elapsed. Alternatively, the sensors 102 may be configured to obtain depth data of the real scene at a single point of time or for a single pre-determined period of time initially, to generate a virtual environment corresponding to the real scene in its initial state.

In some embodiments, the system 100 further comprises one or more cameras 110 configured to obtain colour data of the real scene. The cameras 1 10 may be worn by the person located at the scene. The cameras 110 may be located on the HMD 104 worn by the person. Alternatively, the cameras 110 may not be worn by the person, but may be placed at one or more fixed points relative to the real scene, or may be moveable with respect to the real scene as described above for the sensors 102. Captured image and/or video data from multiple cameras 110 may be combined for increased accuracy. That may enable the processor 106 to project colour and/or texture information onto the virtual representation of the real scene (e.g., using well-known projection methods), such that the virtual representation viewed by a remote viewer contains both shape and colour information. To reduce processing requirements, the projected colour and/or texture information may only be updated for parts of the virtual representation of the real scene which are within the field of view of the remote viewer viewing the display 108. In some embodiments, the cameras 110 may be or comprise the sensors 102 configured to obtain depth data of the real scene, for example where image analysis is used to extract depth data from images and/or video captured by the cameras 1 10, as described above.

In the embodiment shown, the processor 106 is configured to determine a relative position (spatial location) of the sensors 102 (which are mounted on the HMD 104 worn by the person located at the real scene) as the person and sensors 102 move around the real scene, for example using one or more well-known simultaneous localization and mapping (SLAM) algorithms. The processor 106 may be configured to determine a relative position of the sensors 102 by, for example, comparing or correlating obtained depth data to previously obtained depth data. Additionally or alternatively, the system 100 may comprise an inertial measurement unit (IMU) worn by the person located at the real scene, or mounted on the same moveable support as the sensors 102. The IMU may be configured to measure the movement and position (spatial location) of the sensors 102 as the sensors 102 move around the real scene. The IMU may be located in or on the HMD 104 worn by the person. The processor 106 may be configured to utilise movement data from the IMU to determine a relative position of the sensors 102 as the person and sensors 102 move around the real scene. Alternatively, the system 100 may comprise one or more trackers which can be placed or mounted on or adjacent the sensors 102. The trackers may form part of a conventional triangulation-based tracking system by which the position of the trackers (and therefore the sensors 102) can be determined. An advantage of determining the position of the sensors 102 using SLAM and/or an IMU is that there are no potential line-of-sight requirements which may be necessary for a triangulation-based tracking system. In any case, determining the position of the sensors 102 may enable the virtual representation of the real scene to include virtual representations of different parts of the real scene such that their spatial relationship in the virtual environment mirrors the spatial relationship of the respective parts of the real scene. That may be achieved even if using only a single sensor 102 to obtain depth data. A single sensor 102 can be moved around the real scene to obtain depth data for multiple parts of the real scene, which can then be used to generate a virtual environment in which the different parts are correctly spatially positioned relative to one another, based on the determined position of the sensor 102 from which the depth data was obtained. That may enable the system 100 to generate a virtual environment which a remote viewer can move around, explore and interact with (discussed further below). The same approach may be utilized for one or more sensors 102 which are not worn by the person but are nonetheless movable relative to the real scene, for example mounted on a movable support as described above.

In some embodiments, the processor 106 is also configured to update the virtual representation of the real scene based on the determined position of the sensors 102. If, based on a determined position of the sensors 102, the processor 106 detects that the sensors 102 are obtaining depth data from a part of the real scene for which depth data has previously been obtained, the processor 106 is configured to update the virtual environment using the newly obtained depth data for that part of the real scene. In that way, the virtual representation of each part of the real scene may be retained until new or more recent depth data is obtained for that part of the real scene. Once new depth data is obtained for that part of the real scene, the virtual representation of that part of the real scene is updated using the new depth data. That may allow the virtual representation of the real scene to be as up to date as possible with respect to the real scene.

Alternatively, the processor 106 may not be configured to determine a position of the sensors 102 as the sensors 102 move around the real scene. The virtual environment may only contain a virtual representation of a part of the real scene that is currently within the field of view of the sensors 102, and may not contain or retain a virtual representation of a part of the real scene that was previously within the field of view of the sensors 102. In the embodiment shown, with the sensors 102 located in the HMD 104 worn by the person located at the real scene, , the virtual environment may therefore contain only a virtual representation of what the person located at the real scene can currently see.

Alternatively, the sensors 102 may be located at a fixed position relative to the real scene rather than movable relative to the real scene as described above. If the system 100 comprises a single fixed sensor 102, the processor 106 may not be configured to determine a position of the sensor 102. There may be no need to do so, as a single fixed sensor 102 provides only a single unchanging perspective of the real scene. If the system 100 comprises a plurality of fixed sensors 102, the processor 106 may be configured to determine a position of the sensors 102. The processor 106 may be configured to correlate one or more common features in depth data obtained by two or more sensors 102 in order to determine a position of the sensors 102 relative to one another, for example using one or more well-known SLAM algorithms. That may enable the processor 106 to generate a virtual environment in which the virtual representation of the parts of the real scene for which each sensor 102 obtains depth data are positioned relative to one another spatially accurately. Alternatively, the processor 106 may not be configured to determine a position of each sensor 102, but the known position of each fixed sensor 102 may be provided in order for the processor to generate the virtual environment.

In the embodiment shown, the HMD display 108 further comprises a processor 116. The processor 116 may be located in the HMD display 108 itself, or in a separate computer (for example, having a wireless or wired connection to the HMD display 108). The processor 116 is configured to determine a relative position of the HMD display 108 in the virtual environment, for example using one or more well-known SLAM algorithms. Additionally or alternatively, the HMD display 108 may comprise an IMU. The processor 116 may be configured to correlate movement data from the IMU to a relative position in the virtual environment displayed on the HMD display 108. In either case, the part or location of the virtual environment displayed on the HMD display 108 changes as the actual spatial location of the remote viewer changes. The part of the virtual environment displayed on the HMD display 108 therefore reflects the relative position and/or movement of the remote viewer in real space. In that way, the remote viewer experiences moving through the virtual environment as if they were moving through the real scene itself. Alternatively, if the virtual environment is being displayed on one or more display screens 108 arranged to partially or fully surround or enclose the remote viewer, a remote viewer may be able to navigate through the virtual environment similarly to if they were wearing the HMD display 108 (for example, by using one or more IMUs to correlate movement of the remote viewer to a relative position in the virtual environment, or by having the remote viewer use an omnidirectional treadmill to monitor movement). Alternatively, if the virtual environment is being displayed on a conventional display screen 108, the remote viewer may be able to navigate the virtual environment using manual controls (buttons, mouse etc.). For display screens on mobile devices, for example smartphones or tablets, the display screen 108 may be configured to provide a moveable window into the virtual environment. For example, device sensors (e.g., accelerometers, gyroscopes etc.) of the mobile device may enable tracking of movement (e.g., distance and/or direction) of the mobile device in the local environment, such as when being held by a remote viewer who is walking. That may allow movement of the remote viewer (e.g., mobile device) in the local environment to be reflected by a corresponding movement in the virtual environment (optionally at a substantially 1 :1 rate), which can be used to alter which part of the virtual environment is displayed on the display screen. For example, 1 m of forward movement by the remote viewer in the local environment may be tracked by one or more device sensors. The display screen of the mobile device may then show a corresponding part of the virtual environment which is substantially 1 m further forward than a part of the virtual environment previously shown on the display screen (if relative movement in the local environment and virtual environment is tracked at a substantially 1 :1 rate). Additionally, topographical mapping of the local environment, for example using vision-based or sensor-based (such as LIDAR or a rangefinder laser) mapping, may be performed. The topographical mapping may be performed using the mobile device (for example, a smartphone camera), or using HMD-mounted sensors as described above. The topographical data may be combined with the device sensor data (e.g., accelerometer data) to ensure accurate tracking of movement in the local environment and a substantially 1 :1 rate of local environment movement to virtual environment movement shown on the display screen. Alternatively, movement of the remote viewer in the local environment may be tracked directly using topographical mapping. In each case, the processor 116 is configured to determine a relative position of the viewpoint of the remote viewer in the virtual environment as the remote viewer navigates the virtual environment.

Alternatively, the processor 116 may not be configured to determine a relative position of the HMD display 108 (or of the viewpoint of the remote viewer) in the virtual environment. In that case, the remote viewer may not be able to experience moving or navigating through the virtual environment. However, the HMD display 108 (or other display 108) may still display a virtual representation of a part of the real scene that is currently within the field of view of the sensors 102 worn by the person at the real scene. If the system 100 comprises a plurality of sensors 102 at different positions, the displays 108 may display a virtual representation of a part of the real scene that is currently within the field of view of one of the sensors 102. The remote viewer may be able to selectively instruct the display 108 to display a virtual representation corresponding to one of the sensors 102.

In the embodiment shown, because the processor 106 is configured to determine a relative position of the sensors 102 in the real scene, a position of the person wearing the sensors 102 (wearing the HMD 104 in the embodiment shown) can also be determined or inferred. In the embodiment shown, the processor 106 is further configured to generate a virtual representation or avatar of the person at the real scene, based on the determined or inferred position of the person, which can be displayed to a remote viewer together with the virtual representation of the real scene. The remote viewer may be able to select whether or not the virtual representation or avatar of the person at the real scene is displayed together with the virtual representation of the real scene. Similarly, because the processor 1 16 is configured to determine a relative position of the HMD display 108, a position of the remote viewer can also be determined or inferred. A virtual representation or avatar of the remote viewer may therefore be included in the virtual environment. The avatar of the remote viewer may be displayed in augmented reality to the person located at the real scene, for example using the HMD 104. In the embodiment shown, the processor 106 is configured to determine hand movements and gestures (e.g., a specific hand shape or movement) of the person at the real scene which are detected by the sensors 102. In the embodiment shown, the processor 106 is configured to use the Mixed Reality Tool Kit 2 (MRTK2) API to recognize gestures. The MRTK2 enables exact gestures to be configured, because MRTK2 allows for joint recognition. However, any suitable software, such as a different Mixed Reality API, may alternatively be used for hand movement and gesture recognition. In this way, a virtual representation of the person’s hands may be displayed on the displays 108. In addition, the processor 106 may be configured to generate annotations resulting from any gestures determined to have been made by the person at the real scene. The annotations can then be displayed over the virtual representation of the real scene on the displays 108, for example to highlight a particular part of the virtual representation of the real scene. The remote viewer may be able to select whether or not annotations or a virtual representation of the person’s hands are displayed together with the virtual representation of the real scene. Alternatively, the processor 106 may not be configured to determine hand movements and gestures.

Similarly, in the embodiment shown, the HMD display 108 worn by the remote viewer comprises one or more sensors 118. The processor 116 is configured to determine hand movements and gestures of the remote viewer which are detected by the sensors 118. The processor 116 is configured to use the MRTK2 API to recognize gestures. The processor 116 is configured to generate annotations resulting from gestures determined to have been made by the remote viewer. The annotations can be then be displayed in the virtual environment, together with the virtual representation of the real scene. The annotations made by the remote viewer may also be displayed to the person at the real scene, for example, in augmented reality, using the HMD 104 to display the annotation over the real scene. The relative position of the annotation in the virtual environment is known, and the position of the HMD 104 relative to the corresponding position of the annotation in the real scene is also known. An annotation may therefore appear to be substantially fixed in space to the person at the real scene, when displayed in augmented reality. Alternatively, the sensors 118 may not be a part of the HMD display 108 worn by the remote viewer, but may be provided as a separate part of the system 100. The system 100 may comprise the sensors 118 irrespective of the type of display 108 used. The sensors 118 may be or comprise one or more IMUs configured to be worn by the remote viewer in order to track movements and gestures. The sensors 118 may alternative be configured to capture visual data of the remote viewer in order to determine hand movements and gestures. The sensors 118 may be configured to be placed relative to the remote viewer in order to capture that visual data. Alternatively, the system 100 may not comprise the sensors 1 18 and the processor 116 may not be configured to determine hand movements and gestures of the remote viewer.

In the embodiment shown, both the HMD 104 and the HMD display 108 each comprise one or more microphones, one or more speakers and one or more cameras (e.g., sensors 118) configured to obtain sound and visual information which may be used in or included in the virtual environment. For example, audio information recorded by the microphones of the HMD 104 may be played through the speakers of the HMD display 108, and vice versa, to enable two-way audio communication between the person at the real scene and the remote viewer. It will be appreciated that the system 100 may comprise microphones and speakers capable of two-way communication between the real scene and a remote viewer, irrespective of the type of display 108 used. Alternatively, the system 100 may not comprise audio components and may be configured to generate a virtual environment containing visual information only.

In some embodiments, the processor 106 is configured to identify one or more objects located in the real scene. The processor 106 may be configured to use depth data, such as point cloud data, obtained by the one or more sensors 102, to identify the objects in the real scene. For example, in a surgical application, the processor 106 may be configured to identify surgical implements such as scalpels, probes etc. from the depth data obtained by the sensors 102. The processor 106 may further be configured to generate a virtual environment including a label for the identified object which is viewable in the virtual environment. That may be useful for training purposes, for example when demonstrating a surgical technique to trainee surgeons who are viewing remotely. The processor 106 may be configured to identify the one or more objects by comparison to known objects in a database, or using a machine learning algorithm trained to identify objects.

Figures 2A to 2C show an example of the operation of the system 100, and the virtual environment that the system 100 is configured to create.

Figure 2A shows a person at the real scene wearing the sensors 102. In the embodiment shown, the person is an operating surgeon wearing a HMD 104 comprising the sensors 102, and the real scene is an operating theater. Figure 2B shows a virtual environment comprising a virtual representation of the real scene or real world environment shown in Figure 2A, together with the relative position of a remote viewer wearing a HMD display 108. Figure 2C shows the remote viewer viewing the virtual environment shown in Figure 2B.

The sensors 102 worn by the person (on the HMD 104 in the embodiment shown) at the real scene obtain depth data of the real scene. The processor 106 generates a 3D virtual representation of the real scene using the obtained depth data. The processor 106 also tracks the movement of the sensors 102 relative to the real scene. The sensors 102 also monitor the person’s hand gestures and the processor 106 can generate virtual annotations 120 which can be incorporated into the virtual environment together with the virtual representation of the real scene (as shown in Figure 3). In some cases, sound and visual information recorded using microphones and cameras at the real scene (e.g., on the HMD 104) can be incorporated into the virtual environment. All of that information may be used to generate and update the virtual environment, as described above. The remote viewer is able to view the virtual environment generated by the processor 116. The processor 116 tracks the movement of the HMD display 108 of the remote viewer. The sensors 118 monitor the remote viewer’s hand gestures and the processor 116 can generate virtual annotations which can be incorporated into the virtual environment together with the virtual representation of the real scene. Similarly, sound and visual information recorded using microphones and cameras at the remote location (e.g., on the HMD display 108) can be incorporated into the virtual environment.

The virtual environment shown in Figure 2B therefore synchronises movement, audio, gestures and annotations detected at the real scene and at the remote location. Additional remote viewers may also connect to the virtual environment (e.g., via a display 108) and similarly update their position, hand gestures, sound and visual information for other users of the system 100 (e.g., person at the real scene and other remote viewers) in the virtual environment.

As described in detail above, the system 100 is configured to create a shared, networked virtual environment capable of being updated in real-time, including entity position and created entities to be shared across multiple users. The processors 106, 1 16 may use a networking API such as Photon to enable the networked virtual environment. Interactions and/or annotations detected by the sensors 102 on the HMD 104 may be accessed by the processor 106 (for example, running a virtual environment engine such as Unity). The processor 106 may then update the virtual environment, and relay the updated virtual environment to the one or more displays 108. The same may apply, vice versa, for interactions and/or annotations detected by sensors 118 on the HMD display 108.

In some embodiments, the HMD 104 may act as a host, generating and updating a real time virtual environment, for example using an on-board processor 106 and determining its spatial position relative to the real scene or real world environment. The other HMD display 108 may connect to the HMD 104 and share spatial information (and optionally visual or audio information) with the HMD 104 to update the virtual environment. Alternative, both (or more) HMDs 104, 108 can connect to a separate computer that handles spatial (and optionally audio and visual information) from both HMDs 104, 108, updates the virtual environment and then transmits the updated virtual environment to each HMD 104, 108.

In some embodiments, the processor 106 is further configured to recognize one or more specific objects (e.g., surgical implements) in the real scene, and the spatial position of the objects relative to the remote viewer’s position in the virtual environment. In that way, the objects may be displayed in the virtual environment, oriented and positioned accurately.

For surgical applications, the system 100 enables flexible and intuitive intraoperative collaboration. A remote surgeon can view a virtual representation of the operating theater. The virtual operating environment can be generated, at the time required, using sensors 102 worn by the operating surgeon. That may simplify remote surgical collaboration, by reducing or avoiding the need for the virtual environment to be pre-rendered or pre-generated. Instead, the virtual environment can be generated and updated, in substantially real-time if required, as the operating surgeon moves around the operating room. In addition, the components used to generate the virtual operating environment (sensors 102 etc.) may be portable, lightweight and easily available, meaning that a virtual representation of substantially any operating room can be generated without bulky equipment. Wireless connectivity may also allow the operating surgeon to freely move around the operating room without contacting equipment or wires/cables which might otherwise affect or interfere with generation of the virtual environment. That may substantially increase the accessibility of remote surgical collaboration to a much greater number of surgeons, whilst simultaneously reducing cost.

Similarly, the system 100 may enable virtual supervision and surgical training. Trainee surgeons may be able to perform surgery on virtual representations of patients within a virtual environment. Trainee surgeons may also be able to view a real surgery without the need to be physically present, by viewing a virtual representation of the surgery. Supervising surgeons viewing remotely can also supervise training from their own display, annotating and communicating in real-time.

Figure 4 shows a system 200 configured to generate a virtual environment in accordance with an embodiment of the invention. The system 200 is substantially similar to the system 100 described above, and comprises one or more sensors 202, a HMD 204 worn by an operating surgeon, a processor 206 and one or more displays 208.

In the embodiment shown, the one or more sensors 202 comprise a light field camera array, depicted by the circular outline in Figure 4. The light field camera array 202 comprises a plurality of cameras arranged in a spherical (or at least partially spherical, for example hemispherical) array. Each of the plurality of cameras is configured to capture image and/or video data of a real scene from a different angles or perspectives. The plurality of cameras are configured to capture image and/or video data simultaneously. The processor 206 is configured to combine the image and/or video data captured by the plurality of cameras to create a 3D model of the real scene, the 3D model forms at least a part of a virtual environment comprising a three-dimensional virtual representation of the real scene. In the embodiment shown, the processor 206 is configured to update the 3D model in substantially real-time as the cameras in the array capture new data.

The HMD 208 worn by a remote viewer is configured to display the virtual representation of the real scene generated by the processor 206. The processor 206 is configured to determine a spatial position of the HMD 208 substantially as described above with respect to the system 100, such that the remote viewer can navigate and explore the virtual environment. Due to the physical nature of a light field camera array, the display of the virtual representation is limited to viewpoints which are contained within a radius of the spherical array of cameras. The relative motion of the HMD 208 must stay within the spatial bounds of the light field camera array 202. However, the system 200 may comprise multiple light field camera arrays 202 positioned at different locations relative to the real scene. That may provide a virtual environment comprising a virtual representation of the real scene from multiple locations. The remote viewer may be able to selectively instruct the HMD 208 to display a virtual representation corresponding to one of the light field camera arrays 202.

Figure 4A shows a person at the real scene wearing a HMD 204, with a light field camera array 202 configured to obtain depth data of the real scene. In the embodiment shown, the person is an operating surgeon wearing the HMD 204, and the real scene is an operating theater. Figure 4B shows a remote viewer wearing a HMD 208 viewing the virtual environment generated by the processor 206. It will be appreciated that if the operating surgeon is captured by the light field camera array 202, it may not be necessary to generate a virtual representation of the operating surgeon to include in the virtual environment.

In the embodiment shown, the system 200 is also configured to enable the operating surgeon and a remote viewer to interact with the virtual environment, for example by making virtual annotations 220, substantially as described above with respect to the system 100.

Figure 5 shows a method 300 of generating a virtual environment in accordance with an embodiment of the invention. In the embodiment shown, the method 200 comprises generating a virtual surgical environment, but the method 200 may be used to generate any type of virtual environment. In the embodiment shown, the method 200 comprises generating a virtual environment using the system 100 described above, but the method 200 may be implemented using any suitable system.

At step 302, the method 300 comprises obtaining depth data of at least a part of a real scene using one or more sensors. The depth data may be obtained substantially as described above with respect to the systems 100, 200. At step 304, the method 300 comprises generating a virtual environment by generating a three-dimensional virtual representation of the at least a part of the real scene using the obtained depth data. The virtual environment may be generated substantially as described above with respect to the systems 100, 200. At step 306, the method 300 comprises displaying the generated virtual representation on at least one display device. The at least one display device may be a display device 108, 208 as described above with respect to the systems 100, 200. It will be appreciated that the method 300 may comprise one or more steps corresponding to one or more functions and/or operations of one or more components (or combinations of components) of the systems 100, 200 described above.

From reading the present disclosure, other variations and modifications will be apparent to the skilled person. Such variations and modifications may involve equivalent and other features which are already known in the art of virtual reality, and which may be used instead of, or in addition to, features already described herein.

For the sake of completeness, it is also stated that the term "comprising" does not exclude other elements or steps, the term "a" or "an" does not exclude a plurality, a single processor or other unit may fulfil the functions of several means recited in the claims and any reference signs in the claims shall not be construed as limiting the scope of the claims.

Claims

1 . A method of generating a virtual reality environment, the method comprising: obtaining depth data of at least a part of a real scene using one or more sensors ; generating a virtual environment by generating a three-dimensional virtual representation of the at least a part of the real scene using the obtained depth data; displaying the generated virtual representation on at least one display device.

2. The method of claim 1 , wherein the one or more sensors are worn by a person located at the real scene.

3. The method of claim 2, wherein the one or more sensors are disposed in or on a headmounted display, HMD, worn by the person.

4. The method of any preceding claim, wherein generating the virtual environment is performed in substantially real-time as the depth data is obtained.

5. The method of claim 4, wherein displaying the virtual representation is performed in substantially real-time as the virtual representation is generated.

6. The method of any preceding claim, wherein obtaining depth data is performed substantially continuously, and wherein the method further comprises updating the virtual environment using the most-recently obtained depth data.

7. The method of any preceding claim, further comprising determining a relative position, in the virtual environment, of a viewpoint of a person viewing the virtual environment on the at least one display device, optionally using one or more simultaneous localization and mapping, SLAM, algorithms.

8. The method of claim 7, further comprising navigating the virtual environment using the at least one display device.

9. The method of any preceding claim, wherein the at least one display device comprises a head-mounted display, HMD.

10. The method of any of claims 7 to 9, further comprising incorporating, in the virtual environment, a virtual representation of the person viewing the virtual environment based on the determined relative position of the viewpoint.

11 . The method of any preceding claim, further comprising determining a position or spatial location of the one or more sensors relative to the real scene, optionally using one or more simultaneous localization and mapping, SLAM, algorithms.

12. The method of any preceding claim dependent from claim 2, further comprising incorporating, in the virtual environment, a virtual representation of the person located at the real scene based on the determined relative position of the one or more sensors.

13. The method of any preceding claim, further comprising detecting an interaction of the person located at the real scene and/or a viewer of the at least one display device with the virtual environment, optionally by using one or more sensors to detect hand movements and gestures of the person located at the real scene and/or the viewer of the at least one display device.

14. The method of claim 13, further comprising displaying the interaction in the virtual environment.

15. The method of claim 13 or of claim 14, wherein the interaction with the virtual environment comprises generating an annotation that is incorporated into the virtual environment.

16. The method of any of claims 13 to 15 dependent from claim 3, further comprising displaying the interaction in augmented reality over the real scene to the person located at the real scene.

17. The method of any preceding claim, wherein the depth data comprises: i) point cloud data; or ii) a depth map.

18. The method of any preceding claim, further comprising obtaining colour and/or texture data of the real scene and projecting the colour and/ortexture data onto the virtual representation.

19. A system for generating a virtual reality environment, the system comprising: one or more sensors configured to obtain depth data of at least a part of a real scene; a processor configured to generate a virtual environment by generating a virtual representation of the at least a part of the real scene using the obtained depth data; and one or more displays configured to display the virtual environment.