SE544895C2 - Data and command transmission system, computer-implemented method of transmitting data and commands, computer program and non-volatile data carrier - Google Patents

Data and command transmission system, computer-implemented method of transmitting data and commands, computer program and non-volatile data carrier

Info

Publication number
SE544895C2
SE544895C2 SE2150590A SE2150590A SE544895C2 SE 544895 C2 SE544895 C2 SE 544895C2 SE 2150590 A SE2150590 A SE 2150590A SE 2150590 A SE2150590 A SE 2150590A SE 544895 C2 SE544895 C2 SE 544895C2
Authority
SE
Sweden
Prior art keywords
data
user
processing unit
future
gaze
Prior art date
Application number
SE2150590A
Other languages
Swedish (sv)
Other versions
SE2150590A1 (en
Inventor
Haibo Li
Original Assignee
Gazelock AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gazelock AB filed Critical Gazelock AB
Priority to SE2150590A priority Critical patent/SE544895C2/en
Priority to PCT/SE2022/050400 priority patent/WO2022240331A1/en
Publication of SE2150590A1 publication Critical patent/SE2150590A1/en
Publication of SE544895C2 publication Critical patent/SE544895C2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/003Navigation within 3D models or images
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/0093Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00 with means for monitoring data relating to the user, e.g. head-tracking, eye-tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • G06F3/147Digital output to display device ; Cooperation and interconnection of the display device with other functional units using display panels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2380/00Specific applications
    • G09G2380/08Biomedical applications
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/12Synchronisation between the display unit and other units, e.g. other display units, video-disc players
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server

Abstract

Commands based on motoric sensor data describing movements of a user's (U) body parts (H) and/or least one gaze parameter (GD) of the user are transmitted from a local site (LS) to a remote site (RS), where control commands (Cctrl) are produced thereon. At the remote site (RS), milieu sensors (151, 152, 153) register primary sensory data (Dv1, Dv2, DA) describing visual and/or acoustic characteristics of the remote site's (RS) environment. The milieu sensors register the primary sensory data within a segment of the environment at the remote site (RS) in response to the control commands (Cctrl). Content data (Dcontent) is produced based on the primary sensory data at the remote site (RS), which content data (Dcontent) reflect said segment of the environment. A first processing unit (131) at the local site (LS) estimates future control data based on an estimate of a future motion (mL') of the user's (U) body parts (H) and/or the least one gaze parameter (GD), and produces the command data (CMD) on the further basis of the estimated future control data.

Description

TECHNICAL FIELD The invention relates generally to techniques for providing true high-fidelity multisensory presence for a subject at a remote pla- ce. ln particular, the present invention concerns a data and com- mand transmission system according to the preamble of claim 1 and a corresponding computer-implemented method. The inven- tion also relates to a computer program and a non-volatile data carrier storing such a computer program.
BACKGROUND There are numerous applications in which it is useful to enable people to actively navigate and explore remote locations and in- teract with objects and people at such locations. The existing vi- deo streaming, video surveillance or videoconference systems are unable to provide a multisensory presence of a remote place at such high-fidelity that a person truly experiences a feeling of “actually being” at the remote place. Today's systems of said kinds at best offer a passive watch experience.
Recently, technical advancements have been made in the fields of virtual reality (VR), augmented reality (AR) and mixed reality (MR) that deliver an immersive presence experience.
VR delivers a simulated experience that can be either similar to or completely different from the real world experience. Standard VR systems typically use VR headsets to generate artificial ima- ges, sounds and other sensations that simulate a user's physical presence in a virtual environment. A person using a VR equip- ment is able to look around in an artificial world, move around in the artificial world and interact with virtual subjects and/or items therein. VR is mainly used for exploring a virtual or artificial world.
AR can be regarded as an extension of VR, which offers an in- teractive experience of a real-world environment. Here, objects residing in the real world are enhanced by computer-generated perceptual information that sometimes extends across multiple sensory modalities, including visual, auditory and haptic.
A yet more advanced form of VR is MR, which merges the real and virtual worlds to produce new environments for visualiza- tions. ln the MR world physical and virtual objects co-exist and interact with one other in real time. Thus, MR does not exclu- sively take place in either the physical or the virtual world, how- ever in a hybrid world of reality and virtual reality.
Obviously, although VR/AR/MR may provide interesting expe- riences none of these techniques is designed for exploring a physical remote location, or environment.
Recently, there has been a rapid development in the field of mo- bile robotic telepresence (MRP) involving an increasing amount of commercial systems being made available. An MRP system contains a video conferencing system, which is mounted on a mobile robotic base. The system allows a pilot user (local vie- wer) to control the robot and thus create an impression of “mo- ving around” in the robot's environment. The primary aim of the MRP systems is to provide social interaction between humans. These systems contain the physical robot, including its sensors and actuators, and an interface used to control the robot. The pilot user is a person who remotely connects to the robot via said interface. The goal of the MRP systems is to present the pilot user in the remote environment and allow the pilot user to “move around” in the environment where the robot is located, as well as to interact with people in that environment. Neverthe- less, the MRP system does not enable the pilot user to expe- rience the feeling of “being” at the remote place.
Neither VR/AR/MR nor MRP offers a user the ability to fully ex- plore a remote environment. ln our capacity as human beings, we explore an environment by actively involving our body mo- tions in reality. For example, we not only watch the environment using our eyes, but also with the eyes in the head on the shoul- ders of a body that moves. We look at details with the eyes, but we also look around with the mobile head and we go-and-look with the mobile body.
To bring about a true immersion of being a remote environment for a viewer at a local site, we need to simulate how we actively involve our body movement when exploring the remote environ- ment. For this, in turn, we need a precise tracking of the motion of the head of the viewer in all six degrees of freedom, the eye gaze movements, hand and body gestures. All these motions must then be converted into motions performed by a robot mo- ving around in the real world at the remote site. Audio-video da- ta captured as representations of the remote environment must be returned to the local place and displayed locally to the vie- wer. Such a manner of operating the robot at the remote place potentially enables an immersive experience of “being there”.
Below follows examples of earlier attempts to tackle this prob- lem.
US 10,437,335 shows a wearable Haptic Humaxi/Machine Inter- face (HHMI) that receives electrical activity from muscles and nerves of a user. An electrical signal is determined having cha- racteristics based on the received electrical activity. The elec- trical signal is generated and applied to an object to cause an action dependent on the received electrical activity. The object can be a biological component of the user, such as a muscle, another user, or a remotely located machine such as a drone. Exemplary uses include mitigating tremor, accelerated learning, cognitive therapy, remote robotic, drone and probe control and sensing, virtual and augmented reality, stroke, brain and spinal cord rehabilitation, gaming, education, pain relief, entertain- ment, remote surgery, remote participation in and/or observation of an event such as a sporting event, biofeedback and remotali- ty. Remotality is the perception of a reality occurring remote from the user. The reality may be remote in time, location and/or physical form. The reality may be consistent with the natural world comprised of an alternative, fictional world or a mixture of natural and fictional constituents. The document is targeted to creating an immersion experience. However, the disclosure is narrowed down to how to deliver such an experience through electrical sensations in the user's muscles and nerves.
US 2019/0362557 discloses examples of wearable systems and methods can use multiple inputs (e.g., gesture, head pose, eye gaze, voice, totem and/or environmental factors (e.g., location)) to determine a command that should be executed and objects in the three-dimensional (3D) environment that should be operated on. The wearable system can detect when different inputs con- verge together, such as when a user seeks to select a virtual object using multiple inputs such as eye gaze, head pose, hand gesture, and totem input. Upon detecting an input convergence, the wearable system can perform a transmodal filtering scheme that leverages the converged input s to assist in properly inter- preting what command the user is providing or what object the user is targeting. The document is mainly focused on how to de- velop multimodal inputs for user interaction. For example, how to fuse two modals from a multimodal input point of view is des- cribed. Here, environmental factors are basically understood to mean the location of an object or the user.
The above systems may be capable of providing a near immersi- ve experience of “being there”. However, they fail to overcome an important technical hurdle represented by system latency. ldeally, if the user moves his/her head, the resulting view of the remote environment should change immediately. lf the delay between the head movement and the change in view exceeds a critical threshold, say 20 milliseconds, the user will have an ex- perience ranging from slight discomfort to full motion sickness. ln any case, the user will not experience any immersion effect.
Reducing the system latency below the critical threshold has proven to be challenging inter a|ia due to the long and complex two-way chain of connections that must be run through, the amount of commands and data that shall be passed back and forth and various delay constraints for example imposed by the movements of the mechanical parts of robotic parts.
SUMMARY lt is therefore an object of the present invention to offer on im- proved solution for providing a remote presence experience cap- able of duping a person's brain in such a realistic manner that the person gains an immersive experience of a remote scene.
According to one aspect of the invention, the object is achieved by a data and command transmission system including a local site and a remote site. The local site contains at least one moto- ric sensor, at least one presentation device and a first proces- sing unit. The at least one motoric sensor is configured to re- gister control data reflecting a respective motion of at least one body part of a user with respect to at least one spatial dimen- sion and/or at least one gaze parameter of the user. The at least one presentation device is configured to present at least one of image and acoustic data to the use; and the first processing unit is configured to receive the control data, and based thereon pro- duce command data. The remote site contains a second pro- cessing unit and at least one milieu sensor. The second pro- cessing unit is configured to receive the command data. Based thereon, the second processing unit is configured to produce control commands. The at least one milieu sensor is configured to register primary sensory data describing visual and/or acous- tic characteristics of an environment at the remote site. The at least one milieu sensor is controllable in response to the control commands with respect to at least one spatial dimension so as to register the primary sensory data within a particular segment of the environment at the remote site. The second processing unit is further configured to produce content data based on the primary sensory data, which content data reflect the particular segment of the environment at the remote site. Moreover, the first processing unit is configured to receive the content data, and based thereon, produce image and/or acoustic data adapted to be presented to the user via the at least one presentation device. Particularly, the first processing unit is configured to estimate future control data based on an estimate of a future motion of the at least one body part of the user and/or the at least one gaze parameter of the user, and produce the com- mand data on the further basis of the estimated future control data. For example, the system may have a first channel configu- red to transmit the command data from the local site to the re- mote site, and a second channel configured to transmit the con- tent data from the remote site to the local site.
This system is advantageous because it enables control of the at least one milieu sensor essentially contemporaneously with the user's body-part movements and/or gaze activities. lt is the- refore possible to provide a true immersion effect.
For example, this allows medical staff to treat infectious patients without risk being infected. Expert physicians may also treat pa- tients at different locations without having to travel. ln addition, cultural experiences may be highly enriched. For in- stance, people with disabilities are enabled to conveniently visit remote museums, and spectators can experience moving pictu- res in an interactive and very realistic manner.
According to one embodiment of this aspect of the invention, the first processing unit is configured to estimate the future motion of the at least one body part of the user and/or the at least one gaze parameter of the user using a Kalman filtering technique or a particle filtering technique. Thus, the fact that the body parts and the gaze parameters obey the physical laws of motion can be modelled efficiently.
According to another embodiment of this aspect of the invention, the second processing unit is configured to estimate a future content of the primary sensory data, and produce the content data based on the estimated future content of the primary sen- sory data. Thereby, the prediction efforts at the command-gene- rating side, i.e. the local site, can be relaxed. lnstead, a part of the prediction work can be transferred to the remote site. Prefer- ably, the second processing unit is configured to estimate the future content of the primary sensory data using a Kalman filte- ring technique.
According to still another embodiment of this aspect of the in- vention, the second processing unit contains a first deep-lear- ning neural network trained to estimate the future motion of the at least one body part of the user by predicting an intent of the user based on the primary sensory data. The second processing unit is configured to feed a series of samples of the primary sen- sory data into the first deep-learning neural network in which se- ries each sample contains data reflecting content data in a par- ticular segment of the environment at the remote site registered at a particular point in time. ln response thereto, the second pro- cessing unit is configured to receive from the first deep-learning neural network, an estimate of a position of the at least one bo- dy part of the user at a future point in time occurring after a la- test sample in the series of samples of the primary sensory da- ta. This allows efficient and accurate prediction of the user's commands.
According to yet another embodiment of this aspect of the inven- tion, at least one of the motoric sensors is configured to register control data reflecting motions in the form of gestures of a hand of the user. Here, the first processing unit is further configured to produce the command data based on the gestures of the hand. For example, this enables precise and responsive control of a robotic hand at the remote site.
Preferably, therefore, the remote site has a robotic hand confi- gured to be controlled in response to a subset of the control commands. The second processing unit is further configured to generate the subset of the control commands based on a subset of the command data produced by the first processing unit. The subset is based on the gestures of the hand. ln other words, the user's gestures can be directly transferred into corresponding movements of the robotic hand. This provides the user with ex- cellent control capabilities at the remote site.
According to another embodiment of this aspect of the invention, the first processing unit contains a second deep-learning neural network trained to estimate a future position of the hand based on a spatio temporal relationship between gaze fixations and the command data produced based on the gestures of the hand. Here, the gaze parameter expresses a series of gaze fixations each of which represents a gaze point being located on a land- mark at the local site during at least a threshold period. The first processing unit is configured to: feed the series of gaze fixations into the second deep-learning neural network; and in response thereto, receive from the second deep-learning neural network, an estimate of a position of the hand at a future point in time oc- curring after a latest registered gaze fixation in the series of ga- ze fixations. Thus, accurate prediction of the user's hand move- ments can be made using the concept of eye-hand coordination. This, in turn, enables very efficient control of the remote site.
According to one embodiment of this aspect of the invention, the first processing unit is configured to estimate the future motion of the at least one body part of the user, e.g. a hand, based on a series of registered samples of the at least one gaze parameter of the user.
According to yet one embodiment of this aspect of the invention, at least one of the motoric sensors includes a movable camera arranged on a head-mounted display configured to be worn on the head of the user. The movable camera is configured to re- gister control data in the form of a video signal representing visual characteristics of an environment at the local site. The first processing unit is further configured to determine motions of the user's head with respect to the at least one spatial dimen- sion based on the video signal representing the visual characte- ristics of the environment at the local site. This arrangement provides movement registration according to the inside-out prin- ciple.
Additionally, or alternatively, according to one embodiment of this aspect of the invention, at least one of the motoric sensors at the local site contains a stationary camera, which is configu- red to register control data in the form of a respective video sig- nal representing the user. Here, the first processing unit is confi- gured to determine motions of the user with respect to the at least one spatial dimension based on the respective video sig- nals representing the user. This arrangement provides move- ment registration according to the outside-in principle According to still another embodiment of this aspect of the in- vention, at least of the motoric sensors is comprised in a mobile terminal. The at least one motoric sensor is configured to regis- ter control data in the form of a set of vectors reflecting move- ments of the mobile terminal. Here, it is presumed that the user holds the mobile terminal in his/her hand, and therefore the mo- vements are indicative of the movements of a hand of the user holding the mobile terminal. Hence, a readily available and cost- efficient hand sensor can be used to interact with the system.
Preferably, the mobile terminal contains a proximity sensor, an accelerometer, a gyroscope and/or a compass configured to re- gister the control data in the form of the set of vectors. The first processing unit is configured to interpret the control data as mo- vements of the hand of the user with respect to the at least one spatial dimension.
Additionally, the mobile terminal may contain a camera system configured to register the at least one gaze parameter of the user. The at least one gaze parameter is presumed to contain a point of regard of the user on a display of the mobile terminal. Here, the display is configured to present the image data pro- duced by the first processing unit. As a result, the user has ac- cess to a highly intuitive means of interaction with the system According to a further embodiment of this aspect of the inven- tion, the estimate of the future motion of the at least one gaze parameter of the user contains an estimate of a future point of regard of the user on the display. The future point of regard is here based on a trajectory of the point of regard over the display and/or the image data presented on the display. Hence, the mo- vements of the point of regard, as such, can be accurately pre- dicted.
According to another aspect of the invention, the object is achie- ved by a computer-implemented method of transmitting data and commands between a local site and a remote site. The method involves, at the local site, registering, by means of at least one motoric sensor, control data reflecting a respective motion of at least one body part of a user with respect to at least one spatial dimension and/or at least one gaze parameter of the user. The method also involves, at the local site, presenting, via at least one presentation device, image and/or acoustic data to the user, and receiving, in a first processing unit, the control data. Based thereon, the method involves producing command data, and transmitting the command data from the local site to the remote site. At the remote site, the method involves, receiving the com- mand data in a second processing unit, and based thereon pro- ducing control commands to at least one milieu sensor configu- red to register primary sensory data describing visual and/or acoustic characteristics of an environment at the remote site. The at least one milieu sensor is controllable in response to the control commands with respect to at least one spatial dimen- sion. The method also involves, registering, at the remote site, the primary sensory data within a particular segment of theenvironment at the remote site, which segment is designated by the control commands. At the remote site, the method further in- volves producing content data based on the primary sensory da- ta, which content data reflect the particular segment of the en- vironment at the remote site. Additionally, the method involves: transmitting the content data from the remote site to the local site; receiving the content data at the local site, and based the- reon, producing image and/or acoustic data adapted to be pre- sented to the user via the at least one presentation device. At the local site, the method involves, estimating future control data based on an estimate of a future motion of the at least one body part of the user and/or the at least one gaze parameter of the user; and producing the command data on the further basis of the estimated future control data. The advantages of this me- thod, as well as the preferred embodiments thereof, are appa- rent from the discussion above with reference to the proposed system.
According to a further aspect of the invention, the object is achieved by a computer program loadable into a non-volatile da- ta carrier communicatively connected to a processing unit. The computer program includes software for executing the above method when the program is run on the processing unit.
According to another aspect of the invention, the object is achie- ved by a non-volatile data carrier containing the above computer program.
Further advantages, beneficial features and applications of the present invention will be apparent from the following description and the dependent claims.
BRIEF DESCRIPTION OF THE DRAWINGS The invention is now to be explained more closely by means of preferred embodiments, which are disclosed as examples, and with reference to the attached drawings.Figure 1 shows a system overview of a first embodiment according to the invention; Figure 2 shows a system overview of a second embodi- ment according to the invention; Figure 3 shows a system overview of a third embodi- ment according to the invention; Figure 4 shows a system overview of a fourth embodi- ment according to the invention; Figure 5 shows a block diagram of a first processing unit according to one embodiment of the inven- tion; Figure 6 shows a block diagram of a second processing unit according to one embodiment of the inven- üon;and Figure 7 illustrates, by means of a flow diagram, the ge- neral method according to the invention.
DETAILED DESCRIPTION ln Figure 1, we see an overview of a data and command trans- mission system according to a first embodiment to the invention. The system contains a local site LS and a remote site RS.
The local site LS includes at least one motoric sensor, here ex- emplified by a head mounted display 110, and video cameras 121, 122 and 123. Each of the motoric sensors is configured to register control data V1, V2 and V3 respectively reflecting a res- pective motion of at least one body part of a user U, such as the head H, with respect to at least one spatial dimension xL', yU, zU, wxU, wyL' and/or wzt”. Preferably, the control data reflect six degrees of freedom represented by three rotational parameters and three translational parameters. These parameters may thus express translation along vertical, transverse and longitudinal axes respectively as well as yaw, pitch and roll rotation respec- tively around these axes. The arrangement of the video cameras121, 122 and 123 constitute an example of the so-called out- side-in principle for registering the user movements.
As will be described below, alternatively or additionally, the control data may also reflect at least one gaze parameter GD of the user U, for example a point of regard on a display or any other surface at the local site LS.
The local site LS also includes at least one presentation device configured to present image and/or acoustic data to the user U. ln figure 1, the presentation device is exemplified by the display of the head mounted display The head mounted display 110 may be represented by an AR/ VR/MR glass headset configured to provide the user U with a first-person view of the remote site RS.
The head mounted display 110 preferably contains two sets of small display optics in front of each eye of the user U. The vi- sual optics part includes of two displays, a first display in the peripheral area provides the wide field of view and a second dis- play is in the fovea area covering only a small area but with a much higher pixel density. The images of these two displays are then optically combined through a semi-transparent mirror to offer a blended visual field covering both a wide field of view and a fovea view with high details. The head mounted display 110, may further include two video cameras configured to pick up images of the user's left and right eyeballs respectively in order to register at least one gaze parameters GD of the user U.
The gaze parameters GD may be used to control the semi-trans- parent mirror to synthesize a foveated rendering of the display. With such an arrangement the user U may be provided with a high resolution of first-person view of the remote site RS. lt is also advantageous if the head mounted display 110 inclu- des a spatial speaker system that is configured to generate a spatial sound field. Namely, this further enhances the user's Uexperience of immersion.
Additionally, the local site LS includes a first processing unit 131, for instance comprised in a computer, which is configured to receive the control data GD, V1, V2 and/or V3, and based the- reon produce command data CMD.
The remote site RS includes a second processing unit 141 and at least one milieu sensor 151, 152 and 153. The second pro- cessing unit 141 is configured to receive the command data Cl\/ID and based thereon produce control commands Com. For example, the command data CMD may be transmitted via a first channel 161 over the lnternet, which first channel 161 is confi- gured to transmit the command data CMD from the local site LS to the remote site RS. Each of the at least one milieu sensor, which in Figure 1 is exemplified by first and second image recor- ders 151 and 152 respectively and a microphone 153, is confi- gured to register primary sensory data Dv1, Dvz and/or DA res- pectively, which describe visual and/or acoustic characteristics of an environment at the remote site RS. For example, a first image recorder 151 may contain a still camera configured to pick up image data at a relatively high resolution and a second image recorder 152 may be a contain a video camera configured to pick up moving image data at a relatively low resolution.
Each of the at least one milieu sensor 151, 152 and 153 is con- trollable in response to the control commands Com with respect to at least one spatial dimension xR', yR', zR', wxR' wyR' and/or wzR' so as to register the primary sensory data Dv1, Dvz and DA respectively within a particular segment of the environment at the remote site RS, typically a segment covering a principal spa- tial angle in which milieu sensor 151, 152 and 153 in question is aimed at. Analogous to the control data, the control commands Com are preferably configured to cause the at least one milieu sensor 151, 152 and 153 to move in six degrees of freedom rep- resented by three rotational parameters and three translational parameters. The at least one spatial dimension xR', yR', zR', wxR' wyR' and/or wzR' may thus express translation along vertical, transverse and longitudinal axes respectively as well as yaw, pitch and roll rotation respectively around these axes.
The second processing unit 141 is further configured to produce content data Doom-ent based on the primary sensory data Dv1, Dvz and DA respectively, which content data Doom-ent reflect the par- ticular segment of the environment at the remote site RS, e.g. moving image data around the principal spatial angle in which the second image recorder 152 is aimed. The content data Doom-ent may be transmitted from the remote site RS via a second channel 162 over the lnternet to the local site LS. The second channel 162 is configured to transmit the content data Dcoment from the remote site RS to the local site LS.
The first processing unit 131 is configured to receive the content data Doom-ent, and based thereon, produce image and/or acoustic data Dl/A adapted to be presented to the user U via the at least one presentation device, which in Figure 1 is represented by the head mounted display ln addition, the first processing unit 131 is configured to estima- te future control data based on an estimate of a future motion mL' of the at least one body part H of the user U. For example, the future motion mL' of the at least one body part H of the user U may be estimated by the first processing unit 131 using a Kal- man filtering technique, or a particle filtering technique as will be discussed below referring to Figure Additionally, or alternatively, the first processing unit 131 may be configured to estimate the future control data based on one or more gaze parameters GD of the user U, such as the user's point of regard on a presentation device presenting the content data Doom-ent, saccades, i.e. rapid, simultaneous movement of the user's U both eyes between two or more phases of fixation in the same direction, as well as other types of characteristic eye movements. ln such a case, the first processing unit 131 is con-figured to produce the command data Cl\/ID on the further basis of the estimated future control data, which, in turn, is based on the estimated gaze parameter(s) GD, at least in part.
The above-described estimation of the future control data ren- ders it possible to reduce the system latency substantially. Na- mely, based on the estimated the future control data, the first processing unit 131 may produce the control commands Com much earlier than if only the already generated control data GD, V1 and V2 were available. Thus, the proposed low-latency pre- sence technique is key to providing a true immersion experience of “being” at the remote site RS. ln short, the control data and the estimated future control data cause a robotic platform 150 at the remote site RS to move in response to the control commands Com as a duplication of the user U motion at the local site LS, which robotic platform 150 carries the milieu sensors 151, 152 and To further reduce the system latency, according to one embodi- ment of the invention, the second processing unit 141 is configu- red to estimate a future content of the primary sensory data DR”. Additionally, the second processing unit 141 is configured to produce the content data Dcontent on the further basis of the esti- mated future content of the primary sensory data DR”. As a re- sult, parts of the content data Dcontent may be produced even be- fore the command data Cl\/ID have arrived in the second proces- sing unit 141. Analogous to the estimated command data, the estimated future content of the primary sensory data DR' may be produced using a Kalman filtering technique.
Alternatively, the second processing unit 141 may contain, or by other means be communicatively connected to, a first deep-lear- ning neural network DL1 that is trained to estimate the future motion mL' of the at least one body part H of the user U by pre- dicting an intent of the user U based on the primary sensory da- ta Dv1 and/or Dvz. Further, the second processing unit 141 isconfigured to feed a series of samples of the primary sensory data Dvi and/or Dvz into the first deep-learning neural network DL1 in which series each sample includes data reflecting con- tent data Dcomem in a particular segment of the environment at the remote site RS registered at a particular point in time. ln response thereto, the second processing unit 141 is configu- red to receive from the first deep-learning neural network DL1 an estimate of a position of the at least one body part H of the user U at a future point in time, i.e. a point in time occurring af- ter a latest sample in the series of samples of the primary sen- sory data Dvi and/or Dvz.
Figure 2 shows an overview of a data and command transmis- sion system according to a second embodiment to the invention. ln Figure 2, all entities, units and signals bearing the same refe- rence as a reference in Figure 1 designate the entities, units and signals as described above with reference to Figure Analogous to the above, in Figure 2, motoric sensors in the form of video cameras 121, 122 and 123 are configured to register control data reflecting motions of the user U. Here, however, at least one video camera, exemplified by 123, is configured to re- gister control data Va specifically reflecting motions in the form of gestures G of a hand l\/I of the user U. The first processing unit 131 is further configured to produce the command data Cl\/ID based on the gestures G of the hand M.
This is beneficial, since it facilitates for the user U to control a robotic hand 170 at the remote site RS. Naturally, it is therefore preferable if the remote site RS contains a robotic hand 170, which is configured to be controlled in response to a subset of the control commands Com, and the second processing unit 141 is further configured to generate the subset of the control com- mands Ccm based on a subset of the command data CMD produ- ced by the first processing unit 131, which subset, in turn, is ba- sed on the gestures G of the hand M. The first processing unit131 may be configured to estimate the future motion mL' of the at least one body part H of the user U, which body part is repre- sented by the hand l\/I by using a particle filtering technique ba- sed on a 3D physical hand motion model.
To reduce system latency, according to one embodiment of the invention, the first processing unit 131, contains, or is by other means communicatively connected to, a second deep-learning neural network DL2, which is trained to estimate a future posi- tion of the hand l\/I based on a spatio temporal relationship bet- ween gaze fixations and the command data Cl\/ID produced ba- sed on the gestures G of the hand M. The gaze parameter GD here expresses a series of gaze fixations each of which repre- sents a gaze point being located on a landmark at the local site LS during at least a threshold period. The landmarks may be in the form of virtual objects presented to the user via the head mounted display The first processing unit 111 is configured to feed the series of gaze fixations into the second deep-learning neural network DL2. ln response thereto, the first processing unit 111 is confi- gured to receive from the second deep-learning neural network DL2 an estimate of a position of the hand l\/I at a future point in time occurring after a latest registered gaze fixation in the series of gaze fixations. Thereby, substantial time savings can be ma- de in the process of duplicating the gestures G of the hand l\/I at the remote site RS.
The technique of controlling the robotic platform 150 at the re- mote site RS may contain the following steps: 1) recover a pose of the _user's U head H expressed by three ro- tational parameters and three translational parameters 2) compute a gaze parameter, for example as described above; 3) compute gestures GLE of the hand M;4) send the pose, the gaze and the gesture parameters from the local site LS to the remote site RS; te LS, are the predicted rotation parameters at the remote site RS, are the translational parameters extracted at the local site LS, are the predicted translational parameters at the remote site RS, are the gaze parameters extracted at the local site LS, are the predicted gaze parameters at the remote site RS, Gås are the gesture parameters computed at the local site LS, are the predicted gesture parameters at the remote site RS, and A and V are the audio and video contents respectively at the remote site RS. The audio content A and the video content V are included because these components are comprised in the physical environment that shapes the user's actions. Consequently, the environmental factors represented by the audio and video contents A and V can be used to make pre- dictions about the user's actions. f(~) is a prediction function, which provides a prediction of the estimated future motion of the user U and/or a prediction of the estimated future attention of the user U in terms of what to view and/or listen to.
) At the remote site, use the predicted pose, gaze and gesture parameters to control the capture of the audio- and/or visual content; 6) send the predicted audio- and/or visual content to the local site LS; 7) at the local site LS, display a synthesized view to the user U, which synthesized view contains stitched images from multiple two or more wide angle cameras to create a 360 degree view and pan/rotate the image based on head movement; and 8) send the synthesized peripheral and fovea views to the head mounted display Figure 3 shows an overview of a data and command transmis- sion system according to a third embodiment to the invention. ln Figure 3, all entities, units and signals bearing the same refe- rence as a reference in Figures 1 and/or 2 designate the same entities, units and signals as described above with reference to Figures 1 and ln Figure 3, at least one of the motoric sensors is represented by a movable camera 124 that is arranged on the head-mounted display 110 configured to be worn on the head H of the user U. The movable camera 124 is configured to register control data in the form of a video signal V4 representing the visual characteris- tics of an environment at the local site LS, for example a room in which the user U is located. Here, the first processing unit 131 is configured to determine motions of the user's U head with respect to the at least one spatial dimension xU, yU, zU, wxfl, wyt' and/or wzt' based on the video signal V4 representing the visual characteristics of the environment at the local site LS. Na- mely, the motions of the user's U head can be derived by study- ing how video signal's V4 representation of the environment at the local site LS varies over time. This embodiment is an ex- ample of the so-called inside-out principle for registering the user movements.
Figure 4 shows an overview of a data and command transmis- sion system according to a fourth embodiment to the invention. ln Figure 4, all entities, units and signals bearing the same refe- rence as a reference in any of the Figures 1 to 3 designate the entities, units and signals as described above with reference to these Figures. ln Figure 4, at least one of the motoric sensors is comprised in a mobile terminal 410. Hence, motoric sensor(s) may be represen- ted by a standard camera unit 411 of the mobile terminal 410,for example the front camera and/or the so-called selfie camera.
Analogous to the above, the motoric sensor(s) is(are) configured to register control data reflecting movements. Here, control data appear in the form of a set of vectors mvs reflecting movements mr of the mobile terminal 410. lt is further presumed that the user U hold the mobile terminal 410 in his/her hand M, and the- refore the movements mr are indicative of the movements of the hand l\/I of the user U. This is advantageous because the hand movements can be registered in a simple and straightforward manner.
According to one embodiment of the invention, the mobile ter- minal 410 contains a proximity sensor, an accelerometer, a gy- roscope and/or a compass configured to register said control da- ta in the form of the set of vectors mvs. Based on the same as- sumption as above, the first processing unit 131 is configured to interpret the control data as movements of the hand l\/I of the user U with respect to the at least one spatial dimension xfl, yU, zU, wxU, wyt' and/or wzt”. Thus, the first processing unit 131 may provide the command data CMD based on at least one mo- toric sensor being alternative or additional to the camera unit According to another embodiment of the invention, the mobile terminal 410 contains a camera system 411 configured to re- gister the at least one gaze parameter GD of the user U. The camera system 411 may thus be represented by the so-called selfie camera of the mobile terminal 410. The at least one gaze parameter GD describes a point of regard of the user U on a dis- play 412 of the mobile terminal 410, which display 412is confi- gured to present the image data Dl/A produced by the first pro- cessing unit 131. The image data Dl/A, in turn, reflect content data Doom-ent from the particular segment of the environment at the remote site RS. ln other words, the at least one gaze para- meter GD may provide feedback relating to what catches the user's U visual attention at the remote site RS.Here, the estimate of the future motion mL' of the at least one gaze parameter GD of the user U preferably contains an esti- mate of a future point of regard of the user U on the display 412. The future point of regard is based on a trajectory of the point of regard over the display 412 and/or the image data Di/A presen- ted on the display 412. ln other words, the future point of regard is determined based on a sequence of events occurring at the local site LS, at the remote site RS or a combination thereof. ln any case, such prediction of the at least one gaze parameter GD typically reduces the system latency dramatically.
Moreover, additional latency reduction is potentially attainable if the first processing unit 131 is configured to estimate the future motion mL' of the at least one body part H of the user U on the further basis of a series of registered samples of the at least one gaze parameter GD of the user U. Namely, this predict an intent of the user U as expressed by the eye-to-hand coordina- tion process.
Figure 5 shows a block diagram of a first processing unit 131 according to one embodiment of the invention. lt is generally ad- vantageous if the first processing unit 131 is configured to assist in effecting the above-described procedure in an automatic man- ner by executing a computer program 517. Therefore, the first processing unit 131 may include a memory unit 516, i.e. non- volatile data carrier, storing the computer program 517, which, in turn, contains software for making processing circuitry in the form of at least one processor 515 in the first processing unit 131 execute the actions mentioned in this disclosure when the computer program 517 is run on the at least one processor Figure 6 shows a block diagram of a second processing unit 141 according to one embodiment of the invention. lt is generally ad- vantageous if the second processing unit 141 is configured to assist in effecting the above-described procedure in an auto- matic manner by executing a computer program 617. Therefore, the second processing unit 141 may include a memory unit 616,i.e. non-volatile data carrier, storing the computer program 617, which, in turn, contains software for making processing circuitry in the form of at least one processor 615 in the second proces- sing unit 141 execute the actions mentioned in this disclosure when the computer program 617 is run on the at least one pro- cessor ln order to sum up, and with reference to the flow diagram in Fi- gure 7, we will now describe the computer-implemented method according to one embodiment of the invention for transmitting data and commands between a first processing unit 131 at a lo- cal site LS and a second processing unit 141 at a remote site RS. ln a first step 705, control data are registered by at least one motoric sensor at the local site LS. The control data reflect a respective motion of at least one body part of a user with res- pect to at least one spatial dimension and/or at least one gaze parameter of the user. ln a subsequent step 720, the first processing unit 131 estima- tes future control data based on an estimate of a future motion of the at least one body part of the user and/or the at least one gaze parameter of the user. ln a step 710 effected after steps 705 and 720, the first proces- sing unit 131 produces command data based on the control data and the estimated future control data.
A step 715 thereafter transmits the command data from the local site LS to the remote site RS. ln a following step 725, the command data are received in the second processing unit 141 at the remote site RS.
Subsequently, in a step 730, the second processing unit 141 produces control commands based on the command data. The control commands, in turn, control at least one milieu sensor atthe remote site RS, which at least one milieu sensor is configu- red to register primary sensory data describing visual and/or acoustic characteristics of an environment at the remote site RS. Specifically, the control commands are configured to control the at least one milieu sensor with respect to at least one spatial dimension so as to register primary sensory data within a par- ticular segment of the environment at the remote site RS. ln a step 735 thereafter, such primary sensory data are registe- red. Then, in a step 740, content data are produced by the se- cond processing unit 141 based on the primary sensory data. The content data reflect characteristics of the particular segment of the environment at the remote site RS. ln a subsequent step 745, the content data are transmitted from the second processing unit 141 to the first processing unit 131; and in a following step 750, the content data are received in the first processing unit Then, in a step 755, based on the content data, the first proces- sing unit 131 produces image and/or acoustic data for presenta- tion to the user via at least one presentation device at the local site LS. The image and/or acoustic data are presented to the user (not shown), whereafter the procedure loops back to step After step 745, the procedure also loops back to step 705 for registering updated control data at the local site LS.
The process steps described with reference to Figure 7 may be controlled by means of a programmed processor. Moreover, al- though the embodiments of the invention described above with reference to the drawings comprise processor and processes performed in at least one processor, the invention thus also extends to computer programs, particularly computer programs on or in a carrier, adapted for putting the invention into practice. The program may be in the form of source code, object code, a code intermediate source and object code such as in partially compiled form, or in any other form suitable for use in the imple- mentation of the process according to the invention. The pro- gram may either be a part of an operating system, or be a sepa- rate application. The carrier may be any entity or device capable of carrying the program. For example, the carrier may comprise a storage medium, such as a Flash memory, a ROM (Read Only Memory), for example a DVD (Digital Video/Versatile Disk), a CD (Compact Disc) or a semiconductor ROM, an EPROM (Erasable Programmable Read-Only Memory), an EEPROM (Electrically Erasable Programmable Read-Only Memory), or a magnetic recording medium, for example a floppy disc or hard disc. Further, the carrier may be a transmissible carrier such as an electrical or optical signal which may be conveyed via elect- rical or optical cable or by radio or by other means. When the program is embodied in a signal, which may be conveyed, directly by a cable or other device or means, the carrier may be constituted by such cable or device or means. Alternatively, the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted for performing, or for use in the performance of, the relevant processes.
Variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed in- vention, from a study of the drawings, the disclosure, and the appended claims.
The term “comprises/comprising” when used in this specification is taken to specify the presence of stated features, integers, steps or components. The term does not preclude the presence or addition of one or more additional elements, features, inte- gers, steps or components or groups thereof. The indefinite ar- ticle "a" or "an" does not exclude a plurality. ln the claims, the 10 word “or” is not to be interpreted as an exclusive or (some- times referred to as “XOR”). On the contrary, expressions such as “A or B” covers all the cases “A and not B", “B and not A” and “A and B", unless otherwise indicated. The mere fact that certain measures are recited in mutually different dependent claimsdoes not indicate that a combination of these measures cannot be used to advantage. Any reference signs in the claims should not be construed as limiting the scope. lt is also to be noted that features from the various embodiments described herein may freely be combined, unless it is explicitly stated that such a combination would be unsuitable.
The invention is not restricted to the described embodiments in the figures, but may be varied freely within the scope of the claims.

Claims (30)

Claims
1. A data and command transmission system comprising: a local site (LS) comprising: at least one motoric sensor (110, 121, 122, 123, 124, 410) configured to register control data (GD, V1, V2, V3, V4, mvs) reflecting a respective motion of at least one body part (H, l\/I) of a user (U) with respect to at least one spatial dimension (xfl, yU, zU, wxU, wyU, wzfl) and/or at least one gaze parameter (GD) of the user (U), at least one presentation device (110, 410) configu- red to present at least one of image and acoustic data to the user (U), and a first processing unit (131) configured to receive the control data (GD, V1, V2, V3, V4, mvs), and based thereon produce command data (CMD); and a remote site (RS) comprising: a second processing unit (141) configured to receive the command data (Cl\/ID) and based thereon produce con- trol commands (Ccm), and at least one milieu sensor (151, 152, 153) configured to register primary sensory data (Dv1, Dvz, DA) describing visual and/or acoustic characteristics of an environment at the remote site (RS), the at least one milieu sensor (151, 152, 153) being controllable in response to the control commands (Com) with respect to at least one spatial di- mension (xR', yR', zR', wxRH wyRH wzR') so as to register the primary sensory data (Dv1, Dvz, DA) within a particular segment of the environment at the remote site (RS), wherein the second processing unit (141) is further configured to produce content data (Doom-ent) based on the primary sensory da- ta (Dv1, Dvz, DA), which content data (Doom-ent) reflect the particu- lar segment of the environment at the remote site (RS), and wherein the first processing unit (131) is further configured to receive the content data (Doom-ent), and based thereon, produce image and/or acoustic data (Di/A) adapted to be presented to the user (U) via the at least one presentation device (110, 410),characterized in that the second processing unit (141) is confi- gured to estimate a future content of the primary sensory data (DR'), and produce the content data (Doom-ent) based on the esti- mated future content of the primary sensory data (DR'), wherein the estimate of the future content of the primary senso- ry data (DR') is such that at least a part of the content data (Qcoment) can be produced before the command data (CMD) have arrived in the second processing unit (141), and wherein the content data (Doom-ent) are produced to comprise im- aqe data reflecting a particular seqment of the environment at the remote site.
2. The system according to claim 1, wherein the first proces- sing unit (131) is configured to estimate the future motion (mU) of the at least one body part (H) of the user (U) and/or the at least one gaze parameter (GD) of the user (U) using a Kalman filtering technique or a particle filtering technique.
3. The system according to claim 1, wherein the second pro- cessing unit (141) is configured to estimate the future content of the primary sensory data (DR') using a Kalman filtering techni- que.
4. The system according to any one of the preceding claims, wherein the second processing unit (141) comprises a first deep-learning neural network (DL1) trained to estimate the futu- re motion (mU) of the at least one body part (H) of the user (U) by predicting an intent of the user (U) based on the primary sen- sory data (Dv1, Dvz, DA), and the second processing unit (141) is configured to: feed a series of samples of the primary sensory data (Dv1, Dvz, DA) into the first deep-learning neural network (DL1) in which series each sample comprises data reflecting content data (Doom-ent) in a particular segment of the environment at the remo- te site (RS) registered at a particular point in time, and in res- ponse theretoreceive from the first deep-learning neural network (DL1), an estimate of a position of the at least one body part (H) of the user (U) at a future point in time occurring after a latest sample in the series of samples of the primary sensory data (Dv1, Dvz, DA).
5. The system according to any one of the preceding claims, comprising: a first channel (161) configured to transmit the command data (CMD) from the local site (LS) to the remote site (RS), and a second channel (162) configured to transmit the content data (Dconlent) from the remote site (RS) to the local site (LS).
6. The system according to any one of the preceding claims, wherein at least one of the at least one motoric sensor (123) is configured to register control data (V3) reflecting motions in the form of gestures (G) of a hand (M) of the user (U), and the first processing unit (131) is further configured to produce the com- mand data (CMD) based on the gestures (G) of the hand (M).
7. The system according to claim 6, wherein the remote site (RS) comprises a robotic hand (170) configured to be controlled in response to a subset of the control commands (Com), and the second processing unit (141) is further configured to generate the subset of the control commands (Com) based on a subset of the command data (CMD) produced by the first processing unit (131), which subset is based on the gestures (G) of the hand (NI)-
8. The system according to any one of the claims 6 or 7, whe- rein the first processing unit (131) comprises a second deep- learning neural network (DL2) trained to estimate a future posi- tion of the hand (M) based on a spatio temporal relationship bet- ween gaze fixations and the command data (CMD) produced ba- sed on the gestures (G) of the hand (M), the gaze parameter (GD) expressing a series of gaze fixations each of which repre- sents a gaze point being located on a landmark at the local site (LS) during at least a threshold period, and the first processing unit (111) is configured to: feed the series of gaze fixations into the second deep-lear- ning neural network (DL2), and in response thereto receive from the second deep-learning neural network (DL2), an estimate of a position of the hand (M) at a future point in time occurring after a latest registered gaze fixation in the se- ries of gaze fixations.
9. The system according to any one of the preceding claims, wherein: at least one of the at least one motoric sensor comprises a movable camera (124) arranged on a head-mounted display (110) configured to be worn on the head (H) of the user (U), which movable camera (124) is configured to register control da- ta in the form of a video signal (V4) representing visual charac- teristics of an environment at the local site (LS), and the first processing unit (131) is configured to determine motions of the user's (U) head with respect to the at least one spatial dimension (xfl, yU, zU, wxU, wyU, wzU) based on the vi- deo signal (V4) representing the visual characteristics of the en- vironment at the local site (LS).
10. The system according to any one of the preceding claims, wherein: each of at least one of the at least one motoric sensor comprises a respective stationary camera (121, 122, 123) at the local site (LS), which respective stationary camera (121, 122, 123) is configured to register control data in the form of a res- pective video signal (V1, V2, V3) representing the user (U), and the first processing unit (131) is configured to determine motions of the user (U) with respect to the at least one spatial dimension (xfl, yU, zU, wxU, wyU, wzU) based on the respective video signals (V1, V2, V3) representing the user (U).
11. The system according to any one of the preceding claims, wherein at least of the at least one motoric sensor is comprised in a mobile terminal (410), which said at least one motoric sen- sor is configured to register control data in the form of a set of vectors (mvs) reflecting movements (mr) of the mobile terminal (410), which movements (mT) are indicative of a hand (l\/I) of the user (U) holding the mobile terminal (410).
12. The system according to claim 11, wherein the mobile ter- minal (410) comprises at least one of a proximity sensor, an ac- celerometer, a gyroscope and a compass configured to register said control data in the form of the set of vectors (mvs), and the first processing unit (131) is configured to interpret said control data as movements of the hand (M) of the user (U) with respect to the at least one spatial dimension (xU, yU, zU, wxU, wyU, (JL)Z|_,).
13. The system according to any one of the claims 11 or 12, wherein the mobile terminal (410) further comprises a camera system (411) configured to register the at least one gaze para- meter (GD) of the user (U), which at least one gaze parameter (GD) comprises a point of regard of the user (U) on a display (412) of the mobile terminal (410), which display (412) is confi- gured to present the image data (Di/A) produced by the first pro- cessing unit (131).
14. The system according to claim 13, wherein the estimate of the future motion (mL') of the at least one gaze parameter (GD) of the user (U) comprises an estimate of a future point of regard of the user (U) on the display (412), which future point of regard is based on at least one of: a trajectory of the point of regard over the display (412) and the image data (Di/A) presented on the display (412).
15. The system according to any one of the preceding claims, wherein the first processing unit (131) is configured to estimatethe future motion (mL') of the at least one body part (H) of the user (U) based on a series of registered samples of the at least one gaze parameter (GD) of the user (U).
16. A computer-implemented method of transmitting data and commands between a local site (LS) and a remote site (RS), the method comprising, at the local site (LS): registering, by means of at least one motoric sensor (110, 121, 122, 123, 124, 410), control data (GD, V1, V2, Va, V4, mvs) reflecting a respective motion of at least one body part (H, l\/I) of a user (U) with respect to at least one spatial dimension (xfl, yt', zU, wxU, wyU, wzU) and/or at least one gaze parameter (GD) of the user (U), presenting, via at least one presentation device (110, 410), at least one of image and acoustic data to the user (U), receiving, in a first processing unit (131), the control data (GD, V1, V2, V3, V4, mvs), and based thereon, producing command data (CMD) transmitting the command data (CMD) from the local site (LS) to the remote site (RS); at the remote site (RS); receiving the command data (Cl\/ID) in a second processing unit (141), and based thereon producing control commands (Com) to at least one milieu sensor (151, 152, 153) being configured to register primary sen- sory data (Dv1, Dvz, DA) describing visual and/or acoustic cha- racteristics of an environment at the remote site (RS), the at least one milieu sensor (151, 152, 153) being controllable in response to the control commands (Com) with respect to at least one spatial dimension (xR', yR', zR', wxRH wyRH wzR') registering the primary sensory data (Dv1, Dvz, DA) within a particular segment of the environment at the remote site (RS), which segment is designated by the control commands (Com), producing content data (Doom-ent) based on the primary sen- sory data (Dv1, Dvz, DA), which content data (Doom-ent) reflect theparticular segment of the environment at the remote site (RS), transmitting the content data (Dcontent) from the remote site (RS) to the local site (LS); receiving the content data (Dcontent), and based thereon producing image and/or acoustic data (Di/A) adapted to be presented to the user (U) via the at least one presentation devi- ce (110, 410), characterized by, at the local site (LS); estimating a future content of the primary sensory data (DFÄ), and producing the content data (Doom-ent) based on the estima- ted future content of the primary sensory data (DR') wherein the estimate of the future content of the primary senso- ry data (DR') is such that at least a part of the content data (Qcoment) can be produced before the command data (CMD) have arrived in the second processinq unit (141), and wherein the content data (Doom-ent) are produced to comprise im- aqe data reflectinq a particular seqment of the environment at the remote site.
17. The method according to claim 16, comprising: estimating the future motion (mfl) of the at least one body part (H) of the user (U) and/or the at least one gaze parameter (GD) of the user (U) using a Kalman filtering technique or a par- ticle filtering technique.
18. The method according to claim 16, comprising: estimating the future content of the primary sensory data (DR') using a Kalman filtering technique.
19. The system according to any one of the claims 16 to 18, further comprising: feeding a series of samples of the primary sensory data (Dv1, Dvz, DA) into a first deep-learning neural network (DL1) that is trained to estimate the future motion (mL') of the at least one body part (H) of the user (U) by predicting an intent of theuser (U) based on the primary sensory data (Dv1, Dvz, DA), each sample in said series comprising data reflecting content data (Doom-ent) in a particular segment of the environment at the remo- te site (RS) registered at a particular point in time, and receiving from the first deep-learning neural network (DL1), an estimate of a position of the at least one body part (H) of the user (U) at a future point in time occurring after a latest sample in the series of samples of the primary sensory data (Üv1, Üvz, ÜA).
20. The method according to any one of the claims 16 to 19, comprising: transmitting the command data (CMD) from the local site (LS) to the remote site (RS) via a first channel (161), and transmitting the content data (Doom-ent) from the remote site (RS) to the local site (LS) via a second channel (162).
21. The method according to any one of the claims 16 to 20, comprising: registering control data (V3) reflecting motions in the form of gestures (G) of a hand (M) of the user (U) via at least one of the at least one motoric sensor (123), and producing the command data (CMD) based on the gestures (G) of the hand (M).
22. The method according to claim 21, wherein the remote site (RS) comprises a robotic hand (170) configured to be controlled in response to a subset of the control commands (Com), and the method further comprises: generating the subset of the control commands (Com) ba- sed on a subset of the command data (CMD) produced by the first processing unit (131), which subset is based on the ges- tures (G) of the hand (M).
23. The method according to any one of the claims 21 or 22, further comprising: feeding a series of gaze fixations into a second deep-lear- ning neural network (DL2) that is trained to estimate a future position of the hand (M) based on a spatio temporal relationship between gaze fixations and the command data (CMD) produced based on the gestures (G) of the hand (M), the series of gaze fixations being expressed by the gaze parameter (GD) wherein each fixation represents a gaze point being located on a land- mark at the local site (LS) during at least a threshold period, and receiving from the second deep-learning neural network (DL2), an estimate of a position of the hand (M) at a future point in time occurring after a latest registered gaze fixation in the se- ries of gaze fixations.
24. The method according to any one of the claims 16 to 20, wherein at least one of the at least one motoric sensor compri- ses a movable camera (124) arranged on a head-mounted dis- play (110) configured to be worn on the head (H) of the user (U), and the method further comprises: registering, via the movable camera (124), control data in the form of a video signal (V4) representing visual characteris- tics of an environment at the local site (LS), and determining motions of the user's (U) head with respect to the at least one spatial dimension (xU, yU, zU, wxU, wyU, wzfl) based on the video signal (V4) representing the visual characte- ristics of the environment at the local site (LS).
25. The method according to any one of the claims 16 to 24, wherein each of at least one of the at least one motoric sensor comprises a respective stationary camera (121, 122, 123) at the local site (LS), and the method further comprises: registering, via the respective stationary cameras (121, 122, 123), control data in the form of a respective video signal (V1, V2, V3) representing the user (U), and determining motions of the user (U) with respect to the at least one spatial dimension (xfl, yU, zU, wxU, wyU, wzU) based on the respective video signals (V1, V2, Va) representing theuser (U).
26. The method according to any one of the claims 16 to 25, wherein at least of the at least one motoric sensor is comprised in a mobile terminal (410), and the method further comprises: registering, via said at least one motoric sensor, control data in the form of a set of vectors (mvs) reflecting movements (mr) of the mobile terminal (410), which movements (mr) are indicative of a hand (M) of the user (U) holding the mobile ter- minal (410).
27. The method according to claim 26, wherein the mobile ter- minal (410) further comprises a camera system (411) configured to register the at least one gaze parameter (GD) of the user (U), which at least one gaze parameter (GD) comprises a point of regard of the user (U) on a display (412) of the mobile terminal (410), which display (412) is configured to present the image da- ta (Di/A) produced by the first processing unit (131), and the esti- mating of the future motion (mL') of the at least one gaze para- meter (GD) of the user (U) comprises: generating an estimate of a future point of regard of the user (U) on the display (412), which future point of regard is ba- sed on at least one of: a trajectory of the point of regard over the display (412) and the image data (Di/A) presented on the dis- play (412).
28. The method according to any one of the claims 16 to 27, wherein the estimating of the future motion (mL') of the at least one body part (H) of the user (U) based on a series of registered samples of the at least one gaze parameter (GD) of the user (U).
29. A computer program (517; 617) loadable into a non-volatile data carrier (516; 616) communicatively connected to at least one processor (515; 615), the computer program (517; 617) com- prising software for executing the method according any of the 37 claims 16 to 28 when the computer program (517; 617) is run on the at least one processor (515; 615).
30. A non-volatile data carrier (516; 616) containing the com- puter program (517; 617) of the claim 29.
SE2150590A 2021-05-10 2021-05-10 Data and command transmission system, computer-implemented method of transmitting data and commands, computer program and non-volatile data carrier SE544895C2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
SE2150590A SE544895C2 (en) 2021-05-10 2021-05-10 Data and command transmission system, computer-implemented method of transmitting data and commands, computer program and non-volatile data carrier
PCT/SE2022/050400 WO2022240331A1 (en) 2021-05-10 2022-04-25 Data and command transmission system, computer-implemented method of transmitting data and commands, computer program and non-volatile data carrier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
SE2150590A SE544895C2 (en) 2021-05-10 2021-05-10 Data and command transmission system, computer-implemented method of transmitting data and commands, computer program and non-volatile data carrier

Publications (2)

Publication Number Publication Date
SE2150590A1 SE2150590A1 (en) 2022-11-11
SE544895C2 true SE544895C2 (en) 2022-12-20

Family

ID=84029712

Family Applications (1)

Application Number Title Priority Date Filing Date
SE2150590A SE544895C2 (en) 2021-05-10 2021-05-10 Data and command transmission system, computer-implemented method of transmitting data and commands, computer program and non-volatile data carrier

Country Status (2)

Country Link
SE (1) SE544895C2 (en)
WO (1) WO2022240331A1 (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140354515A1 (en) * 2013-05-30 2014-12-04 Oculus Vr, Llc Perception based predictive tracking for head mounted displays
WO2015192117A1 (en) * 2014-06-14 2015-12-17 Magic Leap, Inc. Methods and systems for creating virtual and augmented reality
US20160364881A1 (en) * 2015-06-14 2016-12-15 Sony Computer Entertainment Inc. Apparatus and method for hybrid eye tracking
US20170045941A1 (en) * 2011-08-12 2017-02-16 Sony Interactive Entertainment Inc. Wireless Head Mounted Display with Differential Rendering and Sound Localization
US20170115488A1 (en) * 2015-10-26 2017-04-27 Microsoft Technology Licensing, Llc Remote rendering for virtual images
WO2018083211A1 (en) * 2016-11-04 2018-05-11 Koninklijke Kpn N.V. Streaming virtual reality video
US20180286105A1 (en) * 2017-04-01 2018-10-04 Intel Corporation Motion biased foveated renderer
EP3392827A1 (en) * 2017-04-17 2018-10-24 INTEL Corporation Collaborative multi-user virtual reality
WO2019092698A1 (en) * 2017-11-10 2019-05-16 Infinity Augmented Reality Israel Ltd. Device, system and method for improving motion estimation using a human motion model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170045941A1 (en) * 2011-08-12 2017-02-16 Sony Interactive Entertainment Inc. Wireless Head Mounted Display with Differential Rendering and Sound Localization
US20140354515A1 (en) * 2013-05-30 2014-12-04 Oculus Vr, Llc Perception based predictive tracking for head mounted displays
WO2015192117A1 (en) * 2014-06-14 2015-12-17 Magic Leap, Inc. Methods and systems for creating virtual and augmented reality
US20160364881A1 (en) * 2015-06-14 2016-12-15 Sony Computer Entertainment Inc. Apparatus and method for hybrid eye tracking
US20170115488A1 (en) * 2015-10-26 2017-04-27 Microsoft Technology Licensing, Llc Remote rendering for virtual images
WO2018083211A1 (en) * 2016-11-04 2018-05-11 Koninklijke Kpn N.V. Streaming virtual reality video
US20180286105A1 (en) * 2017-04-01 2018-10-04 Intel Corporation Motion biased foveated renderer
EP3392827A1 (en) * 2017-04-17 2018-10-24 INTEL Corporation Collaborative multi-user virtual reality
WO2019092698A1 (en) * 2017-11-10 2019-05-16 Infinity Augmented Reality Israel Ltd. Device, system and method for improving motion estimation using a human motion model

Also Published As

Publication number Publication date
SE2150590A1 (en) 2022-11-11
WO2022240331A1 (en) 2022-11-17

Similar Documents

Publication Publication Date Title
US11921913B2 (en) Generating and providing immersive experiences to users isolated from external stimuli
Loomis et al. Immersive virtual environment technology as a basic research tool in psychology
RU2664397C2 (en) Virtual reality display system
CN109799900B (en) Wrist-mountable computing communication and control device and method of execution thereof
US6774885B1 (en) System for dynamic registration, evaluation, and correction of functional human behavior
CN115769174A (en) Avatar customization for optimal gaze recognition
JP2020502603A (en) Field of view (FOV) aperture of virtual reality (VR) content on head-mounted display
IL290002B2 (en) Automatic control of wearable display device based on external conditions
TW202103646A (en) Augmented reality system and method for tele-proctoring a surgical procedure
EP1131734B1 (en) System for dynamic registration, evaluation, and correction of functional human behavior
CN105078580B (en) Surgical robot system and its laparoscopic procedure method and human body temperature type operation image processing apparatus and its method
US20200241299A1 (en) Enhanced reality systems
CN114641251A (en) Surgical virtual reality user interface
US20220407902A1 (en) Method And Apparatus For Real-time Data Communication in Full-Presence Immersive Platforms
EP3797931A1 (en) Remote control system, information processing method, and program
CN107708819A (en) Response formula animation for virtual reality
Mazuryk et al. History, applications, technology and future
Mihelj et al. Introduction to virtual reality
Joshi et al. Inattentional blindness for redirected walking using dynamic foveated rendering
SE544895C2 (en) Data and command transmission system, computer-implemented method of transmitting data and commands, computer program and non-volatile data carrier
Thalmann et al. Virtual reality software and technology
JP2023095862A (en) Program and information processing method
Zhang Human-robot interaction in augmented virtuality: perception, cognition and action in 360◦ video-based robotic telepresence systems
WO2022091832A1 (en) Information processing device, information processing system, information processing method, and information processing terminal
Wu et al. Launching your VR neuroscience laboratory

Legal Events

Date Code Title Description
NUG Patent has lapsed