CN117111724A

CN117111724A - Data processing method and system for XR

Info

Publication number: CN117111724A
Application number: CN202211408599.3A
Authority: CN
Inventors: 印眈峰; 吴梅荣
Original assignee: Intuitive Vision Co ltd
Current assignee: Ningbo Longtai Medical Technology Co ltd
Priority date: 2022-09-28
Filing date: 2022-09-28
Publication date: 2023-11-24
Also published as: CN115576427A; CN117826976A

Abstract

The embodiment of the specification provides a data processing method and system for XR, wherein the method comprises the following steps: creating a canvas within the virtual space in response to the request by the annotation requester; displaying content to be marked on a canvas, wherein the content to be marked is marked data and/or unmarked original data; the method comprises the steps of obtaining marking information created by a marking requester on a canvas by using a ray interaction system, wherein the marking information comprises marking content and marking paths; and sharing the content to be marked and the marking information to terminals of other participants for display.

Description

Data processing method and system for XR

Description of the division

The application provides a divisional application aiming at China application with application date of 2022, 09 month and 28 days, application number of 2022111915615 and the name of 'an XR-based multi-person collaboration method and system'.

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to a data processing method and system for XR.

Background

In the current scene of multi-person communication, important conferences cannot be attended at any time because of time cost and traffic cost and because of different places. In the current market, the multi-user remote communication scheme can only use computers and mobile phones to display planar pictures for things explanation and communication.

It is therefore desirable to provide an XR-based method and system for multi-person collaboration that provides a more direct and efficient way of communicating for deep, off-site participants collaboration.

Disclosure of Invention

One of the embodiments of the present specification provides a data processing method for XR, comprising: creating a canvas within the virtual space in response to the request by the annotation requester; displaying content to be marked on a canvas, wherein the content to be marked is marked data and/or unmarked original data; the method comprises the steps of obtaining marking information created by a marking requester on a canvas by using a ray interaction system, wherein the marking information comprises marking content and marking paths; and sharing the content to be marked and the marking information to terminals of other participants for display.

One of the embodiments of the present specification provides a data processing system for XR, comprising: a generation module for creating a canvas within the virtual space in response to a request by the annotation requester; a presentation module configured to implement the following operations: displaying content to be marked on a canvas, wherein the content to be marked is marked data and/or unmarked original data; the method comprises the steps of obtaining marking information created by a marking requester on a canvas by using a ray interaction system, wherein the marking information comprises marking content and marking paths; and sharing the content to be marked and the marking information to terminals of other participants for display.

One of the embodiments of the present specification provides a data processing apparatus for XR, the apparatus comprising: at least one storage medium storing computer instructions; at least one processor executing computer instructions to perform the data processing method for XR described above.

One of the embodiments of the present specification provides a computer-readable storage medium storing computer instructions that, when read by a computer, perform a data processing method for XR as described above.

Drawings

The present specification will be further elucidated by way of example embodiments, which will be described in detail by means of the accompanying drawings. The embodiments are not limiting, in which like numerals represent like structures, wherein:

FIG. 1 is a schematic diagram illustrating an application scenario of an XR-based multi-person collaboration system, according to some embodiments of the invention;

FIG. 2 is an exemplary block diagram of an XR-based multi-person collaboration system, shown in accordance with some embodiments of the present description;

FIG. 3 is an exemplary flow chart of an XR-based multi-person collaboration method shown in accordance with some embodiments of the present description;

FIG. 4 is a flowchart of an exemplary method of determining location information of a participant in a virtual space according to some embodiments of the present description;

FIG. 5 is an exemplary flow chart of an XR-based multi-person online live method, according to some embodiments of the present description;

FIG. 6 is an exemplary flow chart for real-time updating of location information according to some embodiments of the present description;

FIG. 7 is an exemplary flow chart of determining presentation priority of sub-action information according to some embodiments of the present description;

FIG. 8 is an exemplary flow chart of a data processing method for XR shown in accordance with some embodiments of the present disclosure;

FIG. 9 is an exemplary flow chart of a content presentation to be marked shown in accordance with some embodiments of the present description;

FIG. 10 is an exemplary flow chart for determining predicted presentation content according to some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.

It will be appreciated that "system," "apparatus," "unit" and/or "module" as used herein is one method for distinguishing between different components, elements, parts, portions or assemblies at different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.

As used in this specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.

Fig. 1 is a schematic diagram illustrating an application scenario 100 of an XR-based multi-person collaboration system, according to some embodiments of the invention. XR (Extended Reality), also known as augmented reality, is a generic term for various new immersive technologies such as Virtual Reality (VR), augmented Reality (AR), and Mixed Reality (MR). XR can combine reality with virtual through computer, creating a virtual space that can be man-machine interacted.

As shown in fig. 1, an application scenario 100 of an XR-based multi-person collaborative system may include a processing device 110, a network 120, a storage device 130, a terminal 140, and a data acquisition device 150. The components in application scenario 100 of the XR-based multi-person collaboration system may be connected in one or more different ways. For example, the data acquisition device 150 may be connected to the processing device 110 through the network 120. For example, as shown in FIG. 1, the data acquisition device 150 may be directly connected to the processing device 110.

In some embodiments, application scenarios 100 of an XR-based multi-person collaboration system may include scenarios where multiple persons not in the same physical space need to collaborate. For example, the application scenario 100 may include academic conferences, remote consultation, teaching training, surgical instruction, live broadcast, and the like. An XR-based multi-person collaboration system may create a virtual space. The application scenario 100 may be implemented by a virtual space. For example, in the scenario of surgical guidance, medical staff participating in surgery can interactively communicate in a virtual space, share data information of patients recorded by medical equipment, and can live broadcast with an expert in the virtual space to assist the expert in conducting remote operation guidance. Furthermore, in the virtual space, a 3D model such as a heart model and the like can be shared, the model can be disassembled, video image data related to the operation can be displayed in the virtual space, and data information can be shared by operating staff through wearing terminal equipment such as VR equipment and AR equipment.

The data acquisition device 150 may be configured as a device that acquires audio and video data related to the participant and the space in which the participant is located. The data acquisition device 150 may include a panoramic camera 151, a general camera 152, motion sensors (not shown), and the like.

The processing device 110 may process data and/or information obtained from the storage device 130, the terminal 140, and/or the data acquisition device 150. Processing device 110 may include a server data center. In some embodiments, processing device 110 may host a simulated virtual world, or metadomain for terminal 140. For example, the processing device 110 may generate the participant's location data based on the images of the participant collected by the data acquisition device 150. As another example, the processing device 110 may generate location information of the participant in the virtual space based on the location data of the participant.

In some embodiments, the processing device 110 may be a computer, a user console, a single server or group of servers, or the like. The server group may be centralized or distributed. For example, a specified region of a metadomain may be emulated by a single server. In some embodiments, processing device 110 may include multiple simulation servers dedicated to physical simulation to manage interactions and process collisions between characters and objects in the metauniverse.

In some embodiments, the processing device 110 may be implemented on a cloud platform. For example, the cloud platform may include private cloud, public cloud, hybrid cloud, community cloud, distributed cloud and inter-cloud, multi-cloud, and the like, or a combination thereof.

In some embodiments, processing device 110 may include a storage device dedicated to storing data related to objects and characters in the meta world. The data stored in the storage device may include object shapes, avatar shapes and appearances, audio clips, meta-world related scripts, and other meta-world related objects. In some embodiments, processing device 110 may be implemented by a computing device having a processor, memory, input/output (I/O), communication ports, and the like. In some embodiments, the processing device 110 may be implemented on a processing circuit (e.g., processor, CPU) of the terminal 140.

Terminal 140 may be a device that allows a user to participate in a virtual reality experience. In some embodiments, the terminal 140 may include a VR headset, VR glasses, VR patch, stereoscopic head display or the like, a personal computer (personal computer, PC), a cell phone, or any combination thereof. For example, the terminal 140 may include google Glass ^TM 、Oculus Rift ^TM 、Gear VR ^TM Etc. In particular, terminal 140 may include a display device 141 on which virtual content may be presented and displayed. The user may view virtual content (e.g., content to be tagged, tagging information, etc.) via display device 141.

In some embodiments, a user may interact with virtual content through display device 141. For example, when a user wears display device 141, the user's head movements and/or gaze directions may be tracked, thereby presenting virtual content in response to changes in user position and/or direction, providing an immersive and convincing virtual reality experience reflecting changes in the user's perspective.

In some embodiments, terminal 140 can further include an input component 142. Input component 142 may enable user interaction with virtual content displayed on display device 141. Wherein the virtual content may include data information uploaded by the participant. For example, the input component 142 may include a touch sensor, microphone, etc. configured to receive user inputs that may be provided to the terminal 140 and control the virtual world by changing visual content presented on the display device. In some embodiments, the user input received by the input component may include, for example, touch, voice input, and/or gesture input, and may be perceived by any suitable sensing technique (e.g., capacitive, resistive, acoustic, optical). In some embodiments, the input component 142 may include a handle, glove, stylus, gaming machine, or the like.

In some embodiments, the display device 141 (or the processing device 110) may track the input component 142 and present virtual elements based on the tracking of the input component 142. The virtual element may include a representation of the input component 142 (e.g., an image of a user's hand, fingers). The virtual element may be presented in a 3D position in the virtual reality experience that corresponds to the real position of the input component 142.

For example, one or more sensors may be used to track the input component 142. The display device 141 may receive signals collected from the input component 142 by one or more sensors over a wired or wireless network. The signals may include any suitable information capable of tracking the input assembly 142, such as the output of one or more inertial measurement units (e.g., accelerometers, gyroscopes, magnetometers) in the input assembly 142, global Positioning System (GPS) sensors in the input assembly 142, or the like, or a combination thereof.

The signals may indicate the position (e.g., in the form of a three-dimensional coordinate) and/or direction (e.g., in the form of a three-dimensional rotational coordinate) of the input assembly 142. In some embodiments, the sensors may include one or more optical sensors for tracking the input assembly 142. For example, the sensor may use visible light and/or a depth camera to position the input assembly 142.

In some embodiments, input component 142 may include a haptic component that may provide haptic feedback to a user. For example, the haptic component may include a plurality of force sensors, motors, and/or actuators. The force sensor may measure the magnitude and direction of the force applied by the user and input these measurements to the processing device 110.

The processing device 110 may convert the entered measurements into movements of one or more virtual elements (e.g., virtual fingers, virtual palms, etc.) that may be displayed on the display device 141. Processing device 110 may then calculate one or more interactions between the one or more virtual elements and at least a portion of the participants and output these interactions as computer signals (i.e., signals representative of the feedback force). The motors or actuators in the haptic elements may apply feedback forces to the user based on computer signals received from the processing device 110 so that the participant experiences the actual haptic sensation of the subject in the surgical instruction. In some embodiments, the magnitude of the feedback force may be preset by a user or operator, e.g., a terminal device (e.g., terminal 140), or according to a default setting of the XR-based multi-person collaboration system.

In some embodiments, an audio device (not shown) configured to provide audio signals to a user may be further included in the application scenario 100 of the XR-based multi-person collaboration system. For example, an audio device (e.g., a speaker) may play sound emitted by a participant. In some embodiments, the audio device may include an electromagnetic speaker (e.g., moving coil speaker, moving iron speaker, etc.), a piezoelectric speaker, an electrostatic speaker (e.g., condenser speaker), or the like, or any combination thereof. In some embodiments, the audio device may be integrated into the terminal 140. In some embodiments, the terminal 140 may include two audio devices located on left and right sides of the terminal 140, respectively, to provide audio signals to the left and right ears of the user.

Storage device 130 may be used to store data and/or instructions, for example, storage device 130 may be used to store relevant information and/or data collected by data collection device 150. The storage device 130 may obtain data and/or instructions from, for example, the processing device 110, etc. In some embodiments, the storage device 130 may store data and/or instructions that the processing device 130 uses to perform or use to accomplish the exemplary methods described in this specification. In some embodiments, the storage device 130 may be integrated on the processing device 110.

Network 120 may provide a conduit for information and/or data exchange. In some embodiments, information may be exchanged between processing device 110, storage device 130, terminal 140, and data acquisition device 150 via network 160. For example, the terminal 140 may transmit data information or the like from the acquisition processing device 110 through the network 120.

It should be noted that the above description of the application scenario 100 of an XR-based multi-person collaboration system is for illustrative purposes only and is not intended to limit the scope of the present disclosure. For example, the assembly and/or functionality of application scenario 100 of an XR-based multi-person collaboration system may vary or change depending on the particular implementation scenario. In some embodiments, application scenario 100 of the XR-based multi-person collaboration system may include one or more additional components (e.g., storage devices, networks, etc.) and/or one or more components of application scenario 100 of the XR-based multi-person collaboration system described above may be omitted. Additionally, two or more components of application scenario 100 of the XR-based multi-person collaboration system may be integrated into one component. One component of application scenario 100 of an XR-based multi-person collaboration system may be implemented on two or more sub-components.

Fig. 2 is an exemplary block diagram of an XR-based multi-person collaboration system 200, shown in accordance with some embodiments of the present description. In some embodiments, XR-based multi-person collaboration system 200 may include a connection module 210, a location module 220, a download module 230, a presentation module 240, and a generation module 250.

The connection module 210 may be used to establish a communication connection with terminals of at least two participants.

The positioning module 220 may be configured to determine location information of at least two participants in the virtual space through a preset 3D coordinate location algorithm.

In some embodiments, the position information of the at least two participants in the virtual space is related to the position data of the at least two participants in the real space, and the position data of the at least two participants in the real space is acquired through terminals of the at least two participants.

In some embodiments, the positioning module 220 may be further configured to create a virtual space; creating a virtual character corresponding to each of at least two participants in a virtual space, wherein the virtual character has initial position information in the virtual space; acquiring position data of a participant in an actual space, and associating the position data with position information of a corresponding virtual character in the virtual space; acquiring movement data of a participant in an actual space based on the position data of the participant; and updating the initial position information based on the movement data through a preset 3D coordinate position algorithm, and determining the updated position information.

In some embodiments, the positioning module 220 may be configured to create a virtual space and to create a virtual character in the virtual space corresponding to each of the at least two participants. In some embodiments, the positioning module 220 is further configured to determine, based on the obtained position data of the participant in the real space, the position information of the avatar corresponding to the participant in the virtual space through a preset 3D coordinate position algorithm. In some embodiments, the positioning module 220 may be configured to display the avatar in the virtual space based on the task's location information.

In some embodiments, the positioning module 220 may be further configured to scan the actual space in which the participant is located, and spatially locate the participant; for the scanned participants, determining real-time position data of the participants in the actual space; determining first movement information of the participant in the real space based on the real-time position data; determining initial position information of a virtual character in a virtual space; acquiring first action information of a participant in an actual space; the first motion information includes sub-motion information for each part of the participant's body; and synchronously updating second movement information and/or second action information of the virtual character based on the first movement information and/or the first action information through a preset 3D coordinate position algorithm.

In some embodiments, the positioning module 220 may be further configured to determine at least one core body part of the participant based on the current scene; determining a presentation priority of sub-action information of each part of the body of the participant based on the at least one core body part; determining display parameters of the action information based on the display priority of the sub-action information, wherein the display parameters comprise display frequency and display precision; and synchronizing second action information of the virtual character corresponding to the participant based on the presentation parameters.

The download module 230 may be configured to save data information uploaded by at least two participants and provide data download services to the at least two participants, where the data download services include at least one of creating a data download channel and providing download resources.

In some embodiments, the download module 230 may be used to obtain shared data uploaded by the participants.

The presentation module 240 may be used to synchronously present data information on the terminals of at least two participants.

In some embodiments, the terminal comprises at least one of VR visualization device, AR visualization device, mobile terminal handset, PC computer terminal.

In some embodiments, the presentation module 240 may be used to present shared data within a virtual space.

In some embodiments, presentation module 240 may be configured to create at least one second space and/or second window in the virtual space, wherein each of the at least one second space and/or second window corresponds to one participant; and displaying the shared data of the corresponding participants through the second space and/or the second window.

In some embodiments, the presentation module 240 may be configured to present content to be marked on a canvas, where the content to be marked is marked data and/or unmarked raw data; the method comprises the steps of obtaining marking information created by a marking requester on a canvas by using a ray interaction system, wherein the marking information comprises marking content and marking paths; and sharing the content to be marked and the marking information to terminals of other participants for display.

In some embodiments, the content to be marked is content presented in any window, any location, of a plurality of windows on the terminal of the marking requester.

In some embodiments, the presentation module 240 may be further configured to obtain presentation settings of the annotation requester, the presentation settings including a real-time markup presentation and a post-markup presentation; and sharing the content to be marked and the marking information thereof to terminals of other participants for displaying based on the display setting.

In some embodiments, presentation module 240 may be further to determine perspective information for each participant based on the location information for each of the other participants; and determining the display content of each participant based on the view angle information of each participant, and displaying, wherein the display content comprises content to be marked and/or marking information under the view angle information.

The generation module 250 may be used to create a canvas within the virtual space in response to a request by an annotation requester.

It should be noted that the above description of the system and its modules is for convenience of description only and is not intended to limit the application to the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the principles of the system, various modules may be combined arbitrarily or a subsystem may be constructed in connection with other modules without departing from such principles. For example, the connection module 210, the location module 220, the download module 230, and the presentation module 240 may combine to form an XR-based multi-person online live broadcast system. For another example, presentation module 240 and generation module 250 may combine to form a data processing system for XR. Such variations are within the scope of the application.

Fig. 3 is an exemplary flow chart of a multi-person collaboration method shown in accordance with some embodiments of the present description. As shown in fig. 3, the process 300 may include the following steps.

In step 310, a communication connection is established with the terminals of at least two participants. In some embodiments, step 310 may be performed by connection module 210.

Participants may refer to persons participating in a collaboration. The collaboration scenario may be different and the participants may be different. For example, in an operating room VR scenario, a participant may include an operator (e.g., a doctor engaged in surgery), a remote expert, an operator. The operator can manage the user authority and archive and trace back the guiding process.

The multi-person collaboration can be used for academic conferences, remote consultation, medical training, operation live broadcast, medical instrument training and the like, and when the multi-person collaboration is performed, the multi-person live broadcast, real-time annotation sharing and the like can be realized. Details of live multi-person broadcasting may be found in the description of the rest of the specification, e.g. fig. 5, 6, 7. Details of real-time annotation sharing can be found in the description of the rest of the specification, e.g. fig. 8, 9, 10.

In some embodiments, a server data center may be established, with the participants being connected to the server data center through the terminals of the participants to effect a communication connection.

The server data center can be a platform for carrying multi-user collaborative instant communication. The server data center can comprise at least one server, the performance of the server can meet the requirements of multi-user collaborative operation, the server data center can be used for multi-user multi-terminal access, the stability and the instantaneity of the access of a plurality of terminals can be guaranteed, and the safety and the completeness of server data can be guaranteed.

In some embodiments, the communication connection may be used for audio instant communication, video instant communication, multi-person virtual space technology communication, and the like. The audio instant communication may include, among other things, recording, transmitting, and receiving of audio information. Video instant communication may include recording, decoding, transmitting, and receiving video information. Multi-person collaboration may be achieved through multi-person virtual space technology communication. For example, a live surgical specialist may be invited to remotely assist. For another example, the invited participant may view and communicate with other participants and assist the live view of the participant who sent the invitation. For another example, the participant may also use the tagging function for local tagging and may present tagged content to other participants in real-time.

The participant's terminal may refer to a device used by the participant to participate in the collaboration. In some embodiments, the participant's terminal may include a device for enabling the participant to connect with a server data center and a data collection device. The data collection device is a device for collecting data of audio, video, etc. of the actual space where the participant is located, for example, a panoramic camera, a general camera, AR glasses, a mobile phone, a motion sensor, a depth camera, etc. The participant terminal may also include a display device that may display data acquired from the server data center.

In some embodiments, the terminal comprises at least one of VR visualization device, AR visualization device, mobile terminal handset, PC computer terminal. For example, devices that enable participants to connect with a server data center may include AR glasses, VR helmets, PCs, cell phones, and the like.

Step 320, determining position information of at least two participants in the virtual space through a preset 3D coordinate position algorithm. In some embodiments, step 320 may be performed by the positioning module 220.

Virtual space may refer to the space in which virtual objects are presented. The virtual space may be created based on information of the real space or based on preset virtual information; details regarding creation of virtual spaces may be found in the description of other parts of this specification, e.g., fig. 4.

In some embodiments, the virtual space may correspond to different scenarios, for example, may include academic conferences, teaching training, case interrogation, operating room VR scenarios, surgical procedure detail scenarios, pathological data sharing scenarios, surgical navigation information scenarios, patient vital sign data scenarios, and the like. In the virtual space, location information and data information of the participants may be presented. Details regarding the location information and the data information may be found in the description of the other parts of the present specification, for example, step 340 of fig. 3.

By way of example only, in a surgical procedure detail scenario, a surgeon may wear a terminal device and connect to a server data center, may project a surgical picture of the real space seen by the surgeon to a virtual space for live broadcast to other remote specialists and students, and other remote specialists and students may view and learn the near-field surgical details in real time by connecting to the server data center, and the surgeon may also communicate with and obtain remote guidance from remote specialists.

For another example, in the teaching and training scene, a teacher and a student can join in the virtual space through the participant terminal, the teacher can conduct training live broadcast in the virtual space, data such as a three-dimensional model, images and characters can be imported and shared in the virtual space, the teacher and the student can walk and interact in the virtual space, and the shared data can be edited and marked.

In some embodiments, a spatial coordinate system may be provided in the virtual space, which may be used to represent the spatial positional relationship of the virtual object in the virtual space. Multiple participants can communicate and interact in the same virtual space through the participant terminals.

In some embodiments, the virtual object may include a spatial background, a virtual character, a virtual window, a canvas, data information, and the like. In some embodiments, the virtual space may include a spatial background, which may be a real-time image or other preset image of the real space. In some embodiments, a avatar corresponding to each participant may be included in the virtual space. Details regarding the avatar may be found in the introduction to the rest of the description, for example, in fig. 4.

In some embodiments, the virtual space may include a plurality of second windows and/or a plurality of second spaces. Details regarding the second window and/or the second space may be found in the description of the other parts of the present specification, for example, in fig. 5. In some embodiments, the virtual space may include a canvas. For details on the canvas, see, for example, FIG. 8 for an introduction to the rest of the specification.

Location information may refer to information related to the participant's location and/or action in virtual space. The location information may include initial location information and real-time location information of the participant in the virtual space. The initial position information may refer to initial positions of the respective participants in the virtual space. For details on the initial position information, see the introduction of the content of the rest of the present description, for example, fig. 4.

In some embodiments, the location information may include motion information of a virtual character corresponding to the participant in the virtual space. The motion information may refer to physical motion information generated by the participant in real space. For details of the action information, reference may be made to the description of the other parts of the present specification. For example, fig. 6.

In some embodiments, the motion information may further include head motion information of the virtual character in the virtual space corresponding to an actual motion of the participant, from which perspective information of the participant in the virtual space may be determined. Details regarding the viewing angle information can be found in the description of the other parts of the present specification, for example, fig. 9.

The actual space may refer to the space in which the participant is actually located. For example, the actual space may refer to an office, study, outdoor location, etc. where the participants are located.

The location data may refer to data related to the location and/or actions of the participant in real space. In some embodiments, the location data may include the location and/or actions of the participant in real space. Wherein the position may be represented by coordinates in real space. For example, the coordinates may be represented by coordinates composed of longitude and latitude or by coordinate information based on other preset coordinate systems. In some embodiments, the location data may include a coordinate location of the participant, a movement speed, an acceleration, an action of the body part, a direction of the participant terminal (i.e., an orientation of the participant), and so forth. The location data may include real-time location data.

The position data of the participant may be determined by a positioning device, a data acquisition device (e.g., camera, sensor, etc.) in the real space in which the participant is located, and by receiving data sent by the positioning device, the data acquisition device, etc. For example, based on the received data of the positioning device, the location of the user may be determined. For another example, the participant's actions may be determined by cameras and sensors. The position data may be acquired by connecting with a positioning device, a data acquisition device in the actual space in which the participant is located.

Exemplary positioning devices may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation system, a galileo positioning system, a Quasi Zenith Satellite System (QZSS), a base station positioning system, a Wi-Fi positioning system.

In some embodiments, the location information of the participant in the virtual space may be determined based on the location data of the participant in the real space. For example, a database may be preset in the server data center, in which the position data of the participants may be associated with the position information of the avatar. The database may be established based on correspondence between location data and location information in the history data. The correspondence of the position data and the position information may be determined by a 3D coordinate position algorithm.

In some embodiments, the position information may be updated based on the position data of the participant in the real space, so as to achieve synchronization of the position information of the participant in the real space and the virtual space. Details regarding updating the location information may be found in other parts of the present description, for example in fig. 4.

In some embodiments, the position data (e.g., 3D coordinates) of the lower part of the real space may be transformed into position information of the virtual space through a projective transformation matrix by a 3D coordinate position algorithm. For example, coordinates of the real space may be converted into coordinates in a virtual space coordinate system.

In some embodiments, determining location information of at least two participants in the virtual space by a preset 3D coordinate location algorithm includes: creating a virtual space; creating a virtual character corresponding to each of at least two participants in a virtual space, wherein the virtual character has initial position information in the virtual space; acquiring the position data of the participants in the actual space, and associating the position data with the position information of the corresponding virtual characters in the virtual space; acquiring movement data of a participant in an actual space based on the position data of the participant; and updating the initial position information based on the movement data through a preset 3D coordinate position algorithm, and determining the updated position information. For details of determining the location information, reference may be made to the content of the rest of the description, e.g. fig. 4.

In some embodiments, determining, by a preset 3D coordinate position algorithm, position information of the virtual character corresponding to the participant in the virtual space based on the acquired position data of the participant in the real space includes: scanning the actual space of the participants, and performing space positioning on the participants; for the scanned participants, determining real-time position data of the participants in the actual space; determining first movement information of the participant in the real space based on the real-time position data; determining initial position information of a virtual character in a virtual space; acquiring first action information of a participant in an actual space; the first motion information includes sub-motion information of each part of the body of the participant; and synchronously updating second movement information and/or second action information of the virtual character corresponding to the participant based on the first movement information and/or the first action information through a preset 3D coordinate position algorithm. Further, the position information is determined based on the second movement information and/or the second action information. For details of determining the location information, reference may be made to the content of the rest of the description, e.g. fig. 6.

And 330, saving the data information uploaded by the at least two participants and providing a data downloading service for the at least two participants, wherein the data downloading service comprises at least one of creating a data downloading channel and providing downloading resources. In some embodiments, step 330 may be performed by the download module 230.

The data information may refer to information shared in a virtual space. For example, the data information may include 3D models, videos, documents, operation manuals, and the like. In some embodiments, the data information may include content to be marked and marking information. Details concerning the content to be marked and the marking information can be found in the description of other contents of the present specification, for example, in fig. 8.

In some embodiments, the data information may be data uploaded by the participant, or may be data retrieved from another platform (e.g., a network cloud platform). The data information may be stored in a storage device of the server data center. In response to the data request of the participant, the server data center can be connected with other platforms and call corresponding data information, can call the data information uploaded to the server data center by the participant, and can call the data information stored in the storage device of the server data center.

The data download service may refer to a service of data downloading information through a communication module (e.g., LTE communication module) connected to a corresponding communication network (e.g., 4G network). The participant may obtain the data information through a data download service.

In some embodiments, the download channel may be created at the server data center and the participant terminals. The download channels may be plural, one for each participant. The participants can acquire the required information data through the data downloading channel.

In some embodiments, the data information may be stored in a storage device of the server data center in a classification (e.g., by data type), each type of data information corresponding to one data download channel, and corresponding data information may be obtained from a different data download channel in response to the type of data request by the participant.

Step 340, synchronously displaying the data information on the terminals of the at least two participants. In some embodiments, step 340 may be performed by presentation module 240.

In some embodiments, the data information may be displayed in a virtual space, and the participant terminal may be connected to a server data center to obtain the data information, and the data information may be synchronously displayed through a display device of the participant terminal.

In some embodiments, different presentation modes may be determined according to different participant terminals. For example, the PC and the mobile phone may display data information through a screen of the PC and the mobile phone, and the AR glasses and/or the VR helmet may display data information through a screen projected inside the AR glasses and/or the VR helmet.

By establishing the 3D virtual space and synchronously sharing the data information in the 3D virtual space, the synchronization of the local and the different places is realized, and the problems of the number limit, the place limit and the like of the off-line multi-person conference are solved. The participants can more intuitively perform scene interaction on object information in the virtual space through the virtual space in a face-to-face communication mode, support more compatible platforms, add discussions at any time and any place by different devices, reduce a great deal of time cost and form a collaboration team efficiently and rapidly. Meanwhile, through the virtual space, records can be formed, so that later other staff can learn and refer, experience summarize and even survey and evidence collection conveniently.

Fig. 4 is a flowchart illustrating an exemplary method of determining location information of a participant in a virtual space according to some embodiments of the present description. In some embodiments, the process 400 may be performed by the positioning module 220.

Step 410, creating a virtual space; creating a virtual character corresponding to each of the at least two participants in a virtual space, wherein the virtual character has initial position information in the virtual space.

In some embodiments, a coordinate system may be established in any real space, model data of a real space model may be created based on the real space coordinate system and the real space scan data, and an actual space coordinate system corresponding to the real space model may be established.

In some embodiments, the virtual space may be created based on a design, e.g., the virtual space may be a virtual operating room of the design, or the like.

The avatar may refer to an avatar corresponding to the participant in the virtual space. The corresponding avatar may be assigned to the participant according to a default setting when the participant connects to the server data center, or a plurality of candidate avatars that have been created may be provided to the participant, one of which is selected by the participant to determine the avatar corresponding to the participant. The position information of the participants corresponding to the virtual characters can be synchronously displayed through the virtual characters. For example, the participant 1 clicks on the avatar 1, and the participant moves left in the real space, and the corresponding avatar 1 also moves left in the virtual space.

In some embodiments, the initial position information of the avatar may be determined according to a preset rule. For example, each avatar is preset with an initial position. When the participant selects the virtual character corresponding to the participant, the initial position corresponding to the participant can be determined, and the participant can select the initial position of the virtual character corresponding to the participant in the virtual space by himself.

Step 420, obtain the position data of the participants in the real space, and associate the position data with the position information of the corresponding virtual characters in the virtual space.

In some embodiments, the participant's position data in real space may be acquired through connection with a positioning device, a data acquisition device.

In some embodiments, a storage device for each avatar may be provided in the server data center. The position data of the participant may be stored in a storage device corresponding to the participant after being acquired, and the server data center may convert the position data into position information of the avatar through a preset 3D coordinate position algorithm.

Step 430, obtaining movement data of the participant in the real space based on the position data of the participant.

Movement data may refer to data related to the movement of the participant in real space. The movement data may include the direction and distance the participant moved, etc.

In some embodiments, the participant movement data may be determined based on the direction of the participant movement, the coordinate points before and after the movement. The distance moved may be determined based on the coordinates of the participant before and after movement and a distance formula. For example, the position data of the participant includes a leftward movement, a coordinate before movement (1, 2), and a coordinate after movement (1, 3), and the movement distance is calculated to be 1 meter based on the coordinates before and after movement, and the movement data is moved 1 meter leftward.

Step 440, updating the initial position information based on the movement data by presetting a 3D coordinate position algorithm, and determining the updated position information.

In some embodiments, the virtual space needs to acquire the spatial position information of each participant, the participants are in initial positions when entering the virtual space, after the relative displacement of the user occurs, the movement data is uploaded to the server data center through the participant terminals, the server data center converts the movement data into the movement information of the virtual space through the projective transformation matrix through the 3D coordinate position algorithm, the positions of the virtual characters are updated, and then the movement information is synchronized to other participant terminals, so that the other participants can see the real-time movement of the virtual image of the participant in the virtual space.

Based on the position data of the participants in the actual space and the position information of the virtual characters of the participants in the virtual space, more realistic, omnibearing and multi-layer rendering display effects can be provided for the participants, the sense of reality approaching face-to-face communication is created, and the effectiveness of communication is increased.

Fig. 5 is an exemplary flow chart of an XR-based multi-person online live broadcast method, according to some embodiments of the present description. As shown in fig. 5, the process 500 may include the following steps.

Step 510, establishing a communication connection with terminals of at least two participants. In some embodiments, step 510 may be performed by connection module 210.

For definition and explanation of participants and terminals, and methods of establishing communication connections, reference may be made to fig. 3 and its associated description.

Step 520, creating a virtual space, creating a virtual character corresponding to each of the at least two participants in the virtual space. In some embodiments, step 520 may be performed by the positioning module 220.

With respect to the definition and description of virtual spaces and virtual characters, and the method of creating virtual spaces and virtual characters, reference may be made to FIG. 4 and its associated description.

In step 530, the position information of the virtual character corresponding to the participant in the virtual space is determined based on the obtained position data of the participant in the real space by the preset 3D coordinate position algorithm. In some embodiments, step 530 may be performed by the positioning module 220.

For definition and explanation of the 3D coordinate location algorithm, location data and location information, reference may be made to fig. 3 and its associated description. With respect to the method of determining and updating location information in real time, reference may be made to fig. 4 and its associated description.

Step 540, displaying the avatar in the avatar space based on the avatar's location information. In some embodiments, step 540 may be performed by the positioning module 220.

In some embodiments, the corresponding created avatar may be displayed at coordinates corresponding to the location information according to the location information of the avatar. When the position information is changed, the display of the avatar is changed in real time with the change of the position information. For a detailed description of the avatar, reference may be made to fig. 4 and its associated description.

Step 550, the shared data uploaded by the participants is obtained, and the shared data is displayed in the virtual space. In some embodiments, step 550 may be performed by the download module 230 and the presentation module 240.

Shared data refers to data uploaded to a virtual space by a participant. Shared data has various manifestations such as video, audio, images, models, etc., and the shared data in different application scenarios may be different.

For example, in exposing an operating room VR scene, the shared data may include panoramic (e.g., spatial design of the operating room, positional orientation, instrument placement, etc.) data for the operating room. For another example, in presenting surgical details, the shared data may include close-up (e.g., physician's hand manipulation, instrument manipulation, patient surgical site, etc.) data of the surgical procedure. For another example, in the case of pathological data sharing, the shared data may include a three-dimensional image model of the patient, pathological pictures, videos, and the like. For another example, when presenting surgical navigational information, the shared data may include a view of the surgical robot (e.g., a surgical planning view), and so forth. For another example, when patient vital sign data is presented, the shared data may include vital sign monitoring data (e.g., vital sign, blood pressure, heart rate, electrocardiogram, blood oxygen saturation, etc.) during a patient procedure. For another example, when a far-end expert video frame is presented, the shared data may include video data, audio data, etc. of an expert photographed by a camera. For another example, in an interactive scenario, the shared data may include model manipulation, spatial annotation, group chat message boards, private chat dialogs, and so on.

In some embodiments, step 550 further comprises creating at least one second space and/or second window in the virtual space, wherein each of the at least one second space and/or second window corresponds to one participant; and displaying the shared data of the corresponding participants through the second space and/or the second window.

The second space and/or the second window refers to a space and/or window created in the virtual space for exposing the shared data. In some embodiments, the second space and/or the second window may be a window visible to only the corresponding participants, e.g., a private chat window between two participants. In some embodiments, the second space and/or the second window may be preset by the system, or the participant may drag the mobile location by himself.

In some embodiments, the participant may create the second space and/or the second window as desired, e.g., the participant may choose to create the second space and/or the second window at a creation interface of the terminal. In some embodiments, the second space and/or the second window may also be created by default by the system.

In some embodiments, the different second spaces and/or second windows may correspond to different participants and exhibit different shared data. In some embodiments, the second space and/or the second window may be in a one-to-one correspondence with the participants through dynamic allocation, e.g., when the participants enter the virtual space, the system automatically creates a corresponding second space and/or second window for the participants. For another example, the second space and/or the second window created by the participant by himself corresponds to himself.

In some embodiments, the participant may upload the data to be shared received or stored by the terminal to a server of the system, and other participants may download the shared data from the server as needed. For details on the method of sharing data, see fig. 3 and its associated description.

The XR-based multi-person online live broadcast method can realize the first-person visual angle immersive interactive operation of the participants in the virtual space, increase learning interest and skill mastering proficiency of the participants, solve the problems that the actual space field and the number of people are limited and the optimal guiding effect cannot be achieved, and the like, intuitively display the data shared by the participants in the virtual space, and facilitate the information synchronization among the participants, thereby improving the efficiency and the effect of discussion, guidance and the like.

Fig. 6 is an exemplary flow chart for real-time updating of location information according to some embodiments of the present description. In some embodiments, the process 600 may be performed by the positioning module 220.

In some embodiments, the participant terminal may scan the actual space in which the participant is located, spatially locating the participant; for the participant who completes the scan, the participant terminal may determine real-time location data 610 of the participant in real space.

In some embodiments, the participant terminal may scan the actual space in which the participant is located based on a variety of means, for example, the participant may hold the terminal and scan the actual space surroundings with the depth camera of the terminal.

For another example, the participant terminal can acquire an actual space anchor point, perform multi-point space scanning on a special plane of the actual space, and keep the anchor point of the actual space successfully positioned, namely the space successfully positioned; if the scanning fails, the space scanning is reminded to be incomplete.

In some embodiments, the participant terminal may determine real-time location data 610 of the participant in real space based on a variety of ways. For example, the participant terminal may draw a spatial profile after the actual spatial scan is completed, and determine real-time location data 610 of the participant based on the relative location information of the participant and the spatial reference. For another example, the participant terminal may also obtain real-time location data 610 of the participant directly from a positioning method such as GPS.

In some embodiments, the participant terminal may determine first movement information 620 of the participant in real space based on the real-time location data 610.

The first movement information 620 refers to information generated by the movement of the participant in the real space. The first movement information may include movement information of the participant in a position, distance, altitude, etc. of the real space.

In some embodiments, the first movement information 620 may be obtained in various ways, for example, the first movement information may be determined based on real-time position data of the participant in real space, and when the real-time position data of the participant changes, the participant terminal may calculate corresponding first movement information according to the changed data.

As another example, it may be determined by anchor point positioning of the participants in real space, i.e. it may be determined by movement information of the anchor points. For more on acquiring the first movement information, see fig. 4 and its related description.

In some embodiments, the server may determine initial location information 660 of the virtual character in the virtual space.

For example, after scanning the actual space in which the participant is located, the participant terminal may determine position data of the current participant in the actual space, and the server may acquire the position data and map it into initial position information of the virtual character in the virtual space. For another example, the initial position information may be preset by the server. For more explanation of determining initial position information of a virtual character in a virtual space, reference may be made to fig. 4 and its related description.

In some embodiments, the participant terminal may obtain first action information 630 of the participant in the real space; the first motion information 630 includes sub-motion information for various parts of the participant's body,

The first motion information 630 refers to physical motion information generated by the participant in real space. The first motion information 630 may also include limb motion information (e.g., stretching arms, shaking body, walking, squatting), facial expression information (e.g., blinking, opening mouth), etc. In some embodiments, the first motion information 630 includes sub-motion information for various parts of the participant's body.

The sub-motion information refers to specific motion information of each part of the body of the participant, for example, leg motion information and arm motion information in running motion. In some embodiments, the participant terminal may divide the participant's actions into sub-actions for a plurality of body parts, thereby obtaining sub-action information for each body part.

In some embodiments, when a participant performs an action, at least one body part may participate in or constitute the action, e.g., when the participant performs a running action, the feet, legs, arms, etc. of the participant may each produce a corresponding action, wherein the feet and legs may serve as core parts in the running action. The core of the participant may be different for different scenes and different actions. For more description of different scenarios and core sites, see fig. 6 and its associated description.

In some embodiments, the first action information and the sub-action information may be obtained in a variety of ways. For example, it may be acquired by a camera, a wearable device, a sensor, or the like. Specifically, the camera can capture real-time image information of the participant, and changes of all parts of the body of the participant can be obtained through real-time image processing so as to obtain first action information and sub-action information; the wearable device can be fixedly connected with elements such as a displacement sensor, an angle sensor and the like at the corresponding joint movement part, and is used for acquiring the change information of each part of the body of the participant and converting the change information into first action information and sub-action information.

In some embodiments, the participant terminal may synchronously update the second movement information 670 and/or the second movement information 680 of the virtual character corresponding to the participant based on the first movement information 620 and/or the first movement information 630 through a preset 3D coordinate position algorithm.

In some embodiments, synchronously updating the second movement information and/or the second movement information of the avatar corresponding to the participant based on the first movement information and/or the first movement information may include updating the second movement information based on the first movement information, updating the second movement information and the second movement information based on the first movement information and the first movement information, and the like.

In some embodiments, the participant terminal may synchronously update the second movement information and/or the second action information of the virtual character corresponding to the participant based on the first movement information and/or the first action information through various methods. For example, the participant terminal may acquire the first movement information and/or the first action information of the participant by scanning the real space in real time and transmit the first movement information and/or the first action information to the server, and then coordinate-convert the first movement information and/or the first action information of the participant and the data of the virtual character by a preset 3D coordinate position algorithm, thereby obtaining the second movement information and/or the second action information of the corresponding virtual character.

For example, the participant terminal acquires first movement information of the participant moving forward by a distance of two meters in the real space and first movement information of walking by a stride of 70 cm and accompanying a sagging 15 ° swing of the double arm, and coordinate conversion can be performed by a preset 3D coordinate position algorithm to combine the above data with the virtual character, thereby obtaining second movement information of the virtual character moving forward by a distance of two meters in the virtual space as well as second movement information of walking by a stride of 70 cm and accompanying a sagging 15 ° swing of the double arm.

In some embodiments, the server may determine at least one core body part of the participant based on the current scene; determining a presentation priority 640 of sub-action information for each part of the participant's body based on the at least one core body part; determining display parameters 650 of the action information based on the display priority of the sub-action information, wherein the display parameters 650 comprise display frequency and display precision; the second action information 680 of the avatar corresponding to the participant is synchronized based on the presentation parameters.

The current scene refers to a scene within the current virtual space, such as an academic conference, remote consultation, etc., and for more description of the scene, reference may be made to fig. 1 and its related content.

The core body part refers to the body part that is most important to the participant's movements. For example, in a surgical instruction scenario, the core site when a doctor performs a surgical procedure may be a hand; for another example, during interrogation, the core of the person being interrogated may be the face.

In some embodiments, the core body part may be determined in a plurality of ways, for example, a comparison table of core body parts corresponding to different phases of different scenes may be preset, and the core body part in the current scene is determined based on the preset comparison table. For another example, the core body part may also be determined according to the duration of the movement of the part, and for a part with a longer duration of movement, it may be considered to take on the current main actions of the participant, being the core body part.

Presentation priority 640 refers to the presentation priority of sub-action information for each part of the participant's body. Presentation priority 640 may be represented by a ranking or level, e.g., a presentation priority ranking may be reflected in a value of 1-10, with a smaller value being the earlier the ranking, indicating that the corresponding sub-action information is to be presented before. For another example, a value of 1-10 may reflect the level of presentation priority, with a larger value indicating a higher level and corresponding sub-action information to be presented before.

In some embodiments, the server may preset the display priority of each part in different scenes, and may display the sub-action information based on a preset display priority comparison table in actual application. For example, the display priority of the hand sub-action information is highest, the hand sub-action information is next to the arm and the leg is lowest in the live surgery scene, so that the action of the doctor can be displayed based on the information in the preset priority comparison table during actual live surgery. In some embodiments, presentation priority may also be determined based on scene information and motion information for various parts of the body. For details of determining presentation priority, reference may be made to fig. 7 and its description.

The presentation parameters 650 refer to parameters related to sub-action presentation, and for example, the presentation parameters may include presentation frequency and presentation accuracy of actions, and the like.

The presentation frequency refers to the update frequency of the sub-actions. For example, the range of display frequencies may be set to low display frequencies of 30-60 hertz, medium display frequencies of 60-90 hertz, and high display frequencies of 90-120 hertz. In some embodiments, the presentation frequency may be a fixed numerical choice or may be freely variable within the presentation frequency range. In some embodiments, the presentation frequency may be preset by the server, and may also be determined based on the presentation priority of the sub-action information, e.g., the greater the presentation priority, the higher the presentation frequency.

The display accuracy refers to the display accuracy of the sub-action, and the display accuracy can be represented by pixels. For example, pixels of 1280×720 may be used as the fluent display precision, pixels of 1920×1080 may be used as the standard display precision, and pixels of 2560×1440 and above may be used as the high-definition display precision. In some embodiments, the presentation accuracy may be preset by the server, and may also be determined based on the presentation priority of the sub-action information, e.g., the greater the presentation priority, the higher the presentation accuracy.

In some embodiments, the presentation parameters may be determined based on the presentation priority of the sub-action information, and for sub-actions with higher priorities, presentation may be performed with higher frequency and accuracy. For example, in surgery, the display frequency and display accuracy are higher for the change of the hand motion of a doctor, and the display frequency can be lower for the motion of other parts such as shaking the body; for another example, during the interrogation process, the facial expression of the person being interrogated may be changed with a higher frequency and accuracy of presentation.

In some embodiments, the server may preset the parameter tables corresponding to different display priorities, for example, preset the display frequency corresponding to the first display priority to be 120hz, the display precision to be 2560×1440 pixels, the display frequency corresponding to the second display priority to be 90hz, and the display precision to be 1920×1080 pixels. In some embodiments, the presentation parameters may also be set by the participants themselves.

In some embodiments, after determining the display parameters of each part of the participant based on the display priorities of the sub-action information, the server may obtain the display parameter data of each part, and further update the second action information of the virtual character synchronously according to the display parameters of different parts. For example, the second motion information of the corresponding part is collected and updated according to the display frequency of different parts (such as the hand display frequency of 120hz and the leg display frequency of 60hz for the same participant); for another example, the second motion information is displayed according to the display precision of different parts (for example, for the above participant, the hand display precision is 2560×1440 pixels, and the leg display precision is 1280×720 pixels).

In some embodiments, the location information may be updated based on the second movement information by a preset 3D coordinate location algorithm, and a detailed description may be made with reference to fig. 4 and its related content.

And the second action information is synchronized by determining the display parameters according to the display priority, high-frequency and high-precision display is adopted for more important action changes, and lower-frequency and low-precision display is adopted for less important action changes, so that the server resources can be effectively saved while the action display effect is ensured.

According to the position updating method shown in fig. 6, the action information and the movement information of the participants in the actual space can be accurately mapped in real time in the virtual space, so that the remote participants can watch and know the operation details of the operators in the actual space in real time, thereby providing real-time guiding information, avoiding interference caused by unnecessary actions and improving the immersive experience of the participants.

Fig. 7 is an exemplary diagram of determining presentation priority of sub-action information according to some embodiments of the present description. In some embodiments, the flow 700 may be performed by the positioning module 220.

In some embodiments, presentation priority of sub-action information may be implemented based on a processing model.

In some embodiments, a processing model may be used to determine presentation priority of sub-action information. The process model may be a machine learning model, for example, the process model may include a convolutional neural network model (Convolutional Neural Networks, CNN), a deep neural network model (Deep Neural Networks, DNN).

In step 710, motion trajectories and motion feature vectors for each part of the body may be determined from the convolutional neural network model based on the motion images.

The action image may refer to an image of a sub-action of each part of the participant. And acquiring action information through data acquisition equipment in the actual space of the participants. For example, the action image may be an action video or picture of the participant a taken by the panoramic camera.

In some embodiments, a convolutional neural network model may be used to determine at least one motion trajectory and motion feature vector corresponding to the motion image for at least one motion image process.

The motion profile may refer to a motion profile of a body part of the participant. The motion trajectories may be represented by a sequence or matrix of position coordinates of the respective body part at successive points in time, wherein each sequence or matrix element may represent the position coordinates of the central position of a part on the body at the corresponding moment in time. For example, the motion trajectory sequence may be ((1, 0), (1, 1), (1, 2)), where (1, 0), (1, 1), (1, 2) are the position coordinates of the right hand of the participant a at three consecutive time points, respectively.

The motion feature vector may refer to a feature vector of motion of various parts of the body. The elements of the motion feature vector may include a part name, a degree of importance of a part action in each scene, a degree of how frequently the part acts, and the like. The number of the parts obtained based on the motion image, that is, the part names may be plural. The actions of each part can be preset with different importance degrees according to different scenes. The frequency of the part action can be represented by the number of times of action in a preset time period. For example, in training courses, both the finger movements and facial movements of the teacher may be set to a higher level of importance. By way of example only, the motion feature vector may be (1,40,3), where 1 may represent a hand, 40 may represent a degree of importance of the hand to the current scene, and 3 may represent a number of hand movements of 3 times.

In some embodiments, a deep neural network model may be used to process motion trajectories, motion feature vectors, and scene information to determine presentation priorities for sub-motion information.

The scene information may be represented based on various forms, for example, may be represented by vectors. The elements in the scene information may correspond to one type of scene according to a preset relationship between a preset scene and numbers and/or letters. For example, 1 in the scene vector (1) may represent a training scene.

In step 720, the display priority of the sub-motion information can be determined through the deep neural network model based on the scene information, the motion trajectories of the respective parts of the body, and the motion feature vectors of the respective parts of the body.

In some embodiments, the deep neural network model may be used to process at least one motion trajectory, motion feature vector, and scene information corresponding to the motion image to determine a presentation priority of the sub-motion information. Details regarding presentation priority of sub-action information may be found in the description of other parts of the present specification, for example, fig. 6.

In some embodiments, the processing model may be obtained by a convolutional neural network model and a deep neural network model co-training. For example, a training sample, namely a historical action image, is input into the initial convolutional neural network model to obtain at least one historical action track and a historical action feature vector corresponding to the historical action image; and then taking the output of the initial convolutional neural network model and the historical scene information corresponding to the historical action image as the input of the initial depth neural network model. In the training process, a loss function is established based on the label of the training sample and the output result of the initial deep neural network model, and parameters of the initial convolutional neural network model and the initial deep neural network model are iteratively updated based on the loss function at the same time until the preset condition is met and training is completed. Parameters of the convolutional neural network model and the deep neural network model in the post-training processing model can also be determined.

In some embodiments, the training samples may be obtained based on historical motion images acquired by the data acquisition device and historical scene information corresponding thereto. The label of the training sample may be a historical presentation priority of the corresponding sub-action information. The labels may be manually marked.

The display priority of the sub-action information is determined through the machine learning model, so that the speed of determining the display priority can be improved, and the accuracy of the display priority can be improved.

In some embodiments, the presentation priority of sub-action information may be implemented by a vector database. Specifically, a scene motion vector can be constructed based on scene information and sub-motion information of each part of the body of the participant, then a reference vector is searched in a vector database based on the scene motion vector, and the display priority of the sub-motion information corresponding to the reference vector is used as the priority of the current time.

The sub-action information may be represented by a sub-action information vector. The elements in the sub-action information vector may represent body part names and corresponding actions. Different actions may be represented based on different numbers or letters. For example, the sub motion information vector is (1, 2), where 1 denotes a hand and 2 denotes a hand motion as a fist. In some embodiments, the scene information, sub-action information may be combined to determine a scene action vector. The scene motion vector may be a multidimensional vector. For example, a in the scene motion vectors (a, b) may represent a consultation scene, and b may represent a sub-motion information vector.

In some implementations, the scene action vector may be obtained through an embedded layer. The embedded layer may be a machine learning model, for example, the embedded layer may be a recurrent neural network model (Recurrent Neural Network, RNN), or the like. The input of the embedded layer may be scene information, sub-action information of each part of the body of the participant, and the output may be a scene action vector.

The vector database may refer to a database containing historical scene motion vectors. In some embodiments, the preset database includes the historical scene motion vector and the display priority of the sub-motion information corresponding to the historical scene motion vector.

The reference vector may refer to a historical scene motion vector having a similarity to the scene motion vector exceeding a preset threshold. For example, if the preset threshold is 80% and the similarity between the historical scene motion vector 1 and the scene motion vector in the vector database is 90%, the historical scene motion vector 1 is the reference vector. In some embodiments, the reference vector may be a historical scene motion vector that has the highest similarity to the scene motion vector.

The reference vector may refer to a similarity to a scene motion vector that may be determined based on a vector distance between the scene motion vector and a historical scene motion vector. Vector distances may include Manhattan distance, euclidean distance, chebyshev distance, cosine distance, marsh distance, and the like. The numerical values can be substituted into the formulas corresponding to different distance types to carry out mathematical calculation.

In some embodiments, the embedded layer may be obtained in conjunction with training with a deep neural network model. Inputting training samples into the initial embedding layer to obtain scene action vectors; the output of the initial embedded layer is then used as input to the initial deep neural network model. In the training process, a loss function is established based on the output results of the tag and the initial deep neural network model, and parameters of the initial embedded layer and the initial deep neural network model are iteratively updated based on the loss function at the same time until the preset condition is met, and training is completed. Parameters of the embedded layer and the deep neural network model after training is completed can also be determined.

In some embodiments, the training samples may be historical scene information, historical sub-action information for various parts of the participant's body. The label of the training sample may be a historical presentation priority of the corresponding sub-action information. The labels may be manually marked.

The parameters of the embedded layer are obtained through the training mode, so that the problem that labels are difficult to obtain when the embedded layer is independently trained can be solved, and the embedded layer can better obtain scene motion vectors reflecting scene information and sub-action information.

Based on the historical data, a vector database is preset, and then the display priority of the sub-action information is determined, so that the determined display priority can be more in line with the actual situation.

Fig. 8 is a flow diagram of an exemplary flow 800 of a data processing method for XR, according to some embodiments of the present description. The process 800 may be performed by the presentation module 240 and the generation module 250.

At step 810, a canvas is created within the virtual space in response to the request of the annotation requester.

The annotation requester refers to the participant who makes the annotation request.

In some embodiments, the annotation requester may send an annotation request at the terminal and be received by the server.

The canvas refers to a canvas in the virtual scene that exposes the content to be tagged. The canvas may take a variety of forms, such as a three-dimensional canvas, etc.

In some embodiments, upon receiving a request from an annotation requester, the server may create a default canvas in the virtual space. In some embodiments, the annotation requester may change the size and shape of the canvas by manual manipulation or preset options, and in some embodiments, the annotation requester may drag the canvas to move as desired.

And step 820, displaying the content to be marked on the canvas, wherein the content to be marked is marked data and/or unmarked original data.

In some embodiments, the content to be marked may originate from a variety of data. For example, the content to be marked may be data prepared in advance by the participant. For another example, the content to be marked may further include shared data corresponding to the scene, for example, a three-dimensional image model of the patient, a pathological picture, a video, and the like when the pathological data is shared. The shared data in different scenes is different, the content to be marked is also different, and for more description about the content to be marked in different scenes, reference can be made to the related content of fig. 5.

The noted data refers to data having a history of annotations. In some embodiments, labeling may continue for the labeled data. In some embodiments, when annotating data secondarily, it may be selected whether to display historical annotation conditions.

The unlabeled original data refers to data without history labeling, for example, original data downloaded from a server by a participant, real-time data in live broadcast, and the like.

In some embodiments, the content to be marked is content presented in any window, any position, of a plurality of windows on the terminal of the marking requester.

In some embodiments, different manners of presentation may be employed for different content to be marked. For example, pictures may be presented statically and video may be presented dynamically using a video source. In some embodiments, different display modes may be adopted for the incompatible terminal, for example, the VR device may perform 3D display through different images of two eyes and a proper pupil distance, the mobile terminal device may perform display through a screen, and the computer terminal may perform display through a display.

In step 830, markup information created on the canvas by the markup requester using the ray interactive system is obtained, wherein the markup information includes markup content and markup paths.

The marking information is information generated by marking the content to be marked. In some embodiments, the tag information may also include a tag time, tag location information corresponding to the tag time, and the like. For example, the marking time is nine am, and the marking is at the position of the virtual space coordinates (20, 30, 40) at this time.

The marking content refers to specific content for marking the content to be marked, and the marking content can comprise content drawn by a painting brush, inserted pictures, operations for resizing and the like.

The mark path refers to the stroke path of the mark. For example, if the annotation requester marks a "herringbone" on the canvas, the annotation path is the skimming and the right-falling of the "herringbone".

A ray interactive system refers to a system for marking. In some embodiments, through the ray interactive system, the participant can direct rays to the content to be marked presented by the canvas and mark through gesture operations such as touching, pressing and the like.

In some embodiments, the terminal may automatically save the annotation information of the annotation requester locally in real time. In some embodiments, the annotation requester may actively choose to save the annotation information by touching the canvas, clicking a button, or the like.

In step 840, the content to be marked and the marking information are shared to the terminals of other participants for display.

In some embodiments, the terminal can collect the marking information of the corresponding participant and upload the marking information to the server, and then the server sends the marking information to other participant terminals, and a display window is created in the other participant terminals, so that the content to be marked and the marking information are shared to the terminals of other participants for display.

In some embodiments, the content to be marked and the marking information thereof can be shared to terminals of other participants for displaying based on the display setting, and the displaying based on the display setting can realize individuation of the displaying, so as to meet the requirements of the participants, and particularly, refer to fig. 9 and related contents.

The data processing method for XR shown in fig. 8 can realize real-time annotation of shared data, and simultaneously, participants can perform operations such as picture inserting, painting brush color changing, size adjusting, withdrawal, emptying and the like on a canvas, and save operation results to a local place, so that reference, summarization and comparison are convenient to carry out in the future. In addition, the annotation information can be shared with other participants, so that the discussion of complex problems among the participants is facilitated.

Fig. 9 is a schematic diagram of an exemplary flow 900 of content presentation to be marked, shown in accordance with some embodiments of the present description. The process 900 may be performed by the presentation module 240.

In step 910, display settings of the annotation requester are obtained, the display settings including real-time markup display and post-markup display.

The presentation setting refers to a setting related to presenting the content to be marked and marking information. The display settings may also include information of the 3D position of the display window, the size, color, accuracy, etc. of the display screen.

Real-time mark presentation refers to the process of real-time synchronizing the marking process of a mark requester to other participants, i.e. the creation process of mark information can be seen by other participants. Post-marking presentation refers to sharing the final result after marking to other participants, i.e., the other participants can obtain the result after marking, but cannot see the creation process of the marking information.

In some embodiments, the presentation settings may be determined by the terminal default, e.g., the terminal default flag is complete and presented. In some embodiments, the presentation settings may also be determined by the participant's selection of options of the terminal presentation settings window, e.g., the participant may click, touch, etc. on to pick up the options presented in real-time indicia. In some embodiments, the terminal may record the presentation settings of the participant and transmit the settings data to the server.

And step 920, sharing the content to be marked and the marking information thereof to terminals of other participants for displaying based on the display setting.

In some embodiments, the server may share the content to be marked and the marking information thereof to terminals of other participants for display according to the display setting. For example, the server may obtain the display setting selected by the participant as the tag display, and may also obtain the content to be tagged and the tag information thereof, so that the real-time data of the content to be tagged and the tag information thereof may be transmitted to the terminals of other participants for real-time display according to the display setting.

In some embodiments, the situation shown may be different for different scenarios. For example, in the case of surgical guidance, in order to avoid affecting the surgical procedure, the contents to be marked and the marking information thereof need to be displayed at positions avoiding the surgical site of the patient, the display screen of the device, and the like. For another example, in the training and teaching process, to ensure that each participant can clearly see the content to be marked and the marking information thereof, a display window can be created in front of each participant; in the academic lecture scene, only one large display window can be created for display.

In some embodiments, sharing the content to be marked and the marking information thereof to terminals of other participants for display includes: determining perspective information of each participant based on the location information of each of the other participants; and determining the display content of each participant based on the view angle information of each participant, and displaying, wherein the display content comprises content to be marked and/or marking information under the view angle information.

The viewing angle information refers to the viewing angle information of the participant relative to the content to be marked and the marking information thereof. The viewing angle information may include azimuth, angle, altitude, distance, etc. The participants are located at different positions and the corresponding viewing angle information is different. For example, for the same display model, perspective information of a participant located right in front of the display model mainly includes partial right view information and partial front view information of the display model; the viewing angle information of the participant positioned at the upper left of the display model mainly comprises part of left view information and part of top view information of the display model.

In some embodiments, the server may determine the viewing angle information by acquiring the location information of the participant based on the terminal, comparing the location information with the content to be marked and the display location information of the marking information thereof, and determining the relative location between the participant and the display location. For example, three-dimensional space coordinates (x, y, z) can be constructed in the virtual space, when three-dimensional images such as a model are displayed, the participants stand facing the y direction, the position coordinates are (1, 1), the position coordinates (such as center coordinates) of the display model are (1, 2), then the relative positions between the display model and the participants can be obtained based on calculation between the two coordinates, the participants can see the information of part of front views of the model and the bottom of the model under the view angle, and the server can determine specific view angle information through an algorithm. For another example, in the above example, if the position coordinate of the other participant is (1, 0, 1), the distance from the display model in the view angle information corresponding to the participant is greater than the distance from the participant with the position coordinate of (1, 1) to the display model, that is, the model proportion seen by the corresponding participant is smaller than the model proportion seen by the participant with the position coordinate of (1, 1).

In some embodiments, the display content that can be seen by each participant at the viewing angle can be calculated based on the viewing angle information of each participant, and the content is displayed. For example, the right side of the displayed 3D model of the participant is obtained from the perspective information of the participant, i.e., it is determined that the participant can see the right view of the model and the right view is displayed to the participant. In some embodiments, the content that a participant may see is also related to distance, e.g., for a participant that is farther away, the proportion of the corresponding presentation content may be less than the proportion of the participant that is closer.

The contents to be marked and the marking information thereof are displayed to other participants according to the display setting, so that the participants can conveniently discuss complex problems, and different display settings are adopted for different contents to be marked and the marking information thereof, so that the optimal display effect is achieved, and further, the discussion and guidance effects are improved; and different display settings can realize individuation of display, thereby meeting the requirements of participants. Meanwhile, the display content is determined according to the visual angle information of the participants, so that more vivid, omnibearing and multi-layer rendering display effect can be provided, and the immersive experience of the participants is enhanced.

FIG. 10 is a schematic diagram of an exemplary illustrative flow 1000 for determining predicted presentation content, according to some embodiments of the present description. In some embodiments, the process 1000 may be performed by the presentation module 240.

Step 1010, predicting a future motion profile of the participant.

Future motion trajectories refer to motion trajectories of the participants after the current time. The motion profile may include time information, corresponding position information, and the like. In some embodiments, the time length of the time period corresponding to the future motion trail may be set according to the requirement. In some embodiments, the time period corresponding to the future motion trajectory may be a time period from the current time or a time period spaced from the current time. For example, the future motion trajectory may be a motion trajectory within 5 seconds of the future or a motion trajectory within 10 minutes of the future. For another example, the future motion profile may be a motion profile within 5 seconds from the current time, or may be a motion profile within 10 minutes from the current time (i.e., within 5-15 minutes from the current time).

In some embodiments, the future motion profile of the participant may be predicted based on the scene and the current sub-motion information of the participant. For example, in a live surgical scenario, a physician may hold the instrument or suture in a surgical suture package and then predict that the physician will want to suture the surgical site next. In some embodiments, the future motion profile of the participant may also be predicted based on the historical scene and the time of the historical entry into the scene. For example, for a teaching scene of the same theme, a teaching person in a history teaching scene shows demonstration actions at 30 minutes into the scene, and then it can be predicted that in the current teaching scene, the teaching person will show demonstration actions at 30 minutes. For another example, for a surgical guidance scenario of the same subject, in a historical surgical guidance scenario, the guided person would focus on the surgical bed side viewing guidance at 10 minutes into the scenario, and then in the current surgical guidance scenario, the guided person would be predicted to move around the hospital bed at 10 minutes.

In some embodiments, the predicting the future motion profile of the participant may be based on a predictive model; the structure of the prediction model is a cyclic neural network model; the input of the prediction model is a preset historical time period up to the current time, and the positioning data sequence of the participant; the output of the predictive model is a sequence of predicted position data for a preset future time period.

The preset history period refers to a period of time that expires to the current time. In some embodiments, the preset history period may be a period from the entry of the participant into the scene to the current time, for example, nine points of the participant into the virtual scene, and the current time is ten points, then nine points to ten points are the preset history period. In some embodiments, the preset historical time period may be a time period from the beginning of a certain action to the current time, for example, the participant performs a squatting action, and the time from the moment the participant starts to squat to the current time is the preset historical time period.

In some embodiments, a predetermined historical time period may also contain a plurality of time point information. In some embodiments, the point-in-time information may be information of a single point-in-time. For example, the acquisition interval of the time points is 1 second, and the preset history period is the past 2 minutes, and then the preset history period includes 120 time point information. For another example, if the acquisition interval of the time points is 1 minute and the preset history period is the past 1 hour, the preset history period includes 60 pieces of time point information.

In some embodiments, one point-in-time information also corresponds to one sub-period of information. For example, the preset history period is nine to ten, and the period may be further divided into three sub-periods, such as nine to twenty-five, into a first sub-period, nine to forty-five, into a second sub-period, and nine to forty-five, into a third sub-period, i.e., the preset period contains three time point information. In some embodiments, the acquisition interval of the time point and the length of the sub-period may be preset by the server or may be set by the participant.

The positioning data sequence refers to a positioning data sequence of a participant in a virtual space. The sequence of positioning data may reflect movement of the participant over a preset historical period of time. Each element value in the positioning data sequence corresponds to the position data of the participant at a point in time. For example, in a set of sequences ((1, 1), (2, 2), (1, 2, 1), (1, 2)), if each coordinate element value corresponds to a single point in time spaced by one second, (1, 2, 1) indicates the participant's position data in the virtual space at the third second in the preset time period. For another example, in the above sequence, if the time point corresponding to each coordinate element value is one sub-time period, (1, 2, 1) indicates the position data of the participant corresponding to the third sub-time period in the virtual space within the preset time period.

In some embodiments, the training data of the predictive model may be sets of labeled training samples, which may be a predetermined historical period of time from the time of the history to the current time, a historical localization data sequence of the participant. The training samples may be derived from historical data stored by the server. The label of the training sample of the prediction model can be an actual position data sequence of the participant in a historical preset future time period, and the label can be obtained in a manual labeling mode.

In some embodiments, a loss function is constructed from the label and the result correspondence of the initial predictive model, and parameters of the predictive model are iteratively updated by gradient descent or other methods based on the loss function. And when the preset conditions are met, model training is completed, and a trained prediction model is obtained. The preset condition may be that the loss function converges, the number of iterations reaches a threshold value, etc.

In step 1020, perspective information for a future point in time is determined based on the future motion profile.

The future point in time refers to a point in time after the current time. The future point in time is included in a time period corresponding to the future motion trajectory.

The viewing angle information refers to the viewing angle information of the participant relative to the content to be marked and the marking information thereof. For more about view information, see fig. 9 and its associated description.

In some embodiments, based on the future motion trajectory of the participant, the position information of the participant at the future point in time may be determined, and based on a comparison of the position information of the participant at the future point in time with the position information of the display window of the content to be marked and its marking information, the relative position between the participant and the display window may be determined, thereby determining the perspective information. For a more description of how to determine perspective information, see fig. 9 for relevant content.

Step 1030, determining corresponding predicted presentation content based on the perspective information at the future point in time.

In some embodiments, the predicted presentation content may include content to be marked in the virtual scene, marked content, marking information thereof, and the like. In some embodiments, when the predicted presentation is determined, the predicted presentation that is viewable by the participant at the view angle may be calculated based on the view angle information for the future point in time for each participant. For example, when the predicted presentation content is a three-dimensional model of the heart and the marking information thereof after marking, the predicted presentation window of the content to be marked and the marking information thereof is known to be positioned 30 ° in front of the right of the position of the participant when the predicted corresponding future time point is located according to the view angle information of the future time point, and then the predicted presentation content corresponding to the participant can be predicted as the side perspective view of the marked content and the marking information thereof when the future time point is located under the angle.

In some embodiments, as the predicted presentation changes in real-time, pre-acquisition preparations may be made for the predicted presentation at a future point in time based on perspective information for each participant at the future point in time. For example, if the content to be marked in the virtual scene is a doctor operation, and the doctor operation corresponding to the future time point is a chest operation, the camera at the corresponding position of the participant's view angle information may be brought into a standby state for chest shooting in advance.

By predicting the future motion trail of the participant to predict the display content, the display content at the corresponding future time point can be prepared in advance, or the collection of the display content at the future time point can be prepared in advance, so that the loading speed is improved, and the use experience of the participant is optimized.

It should be noted that, the advantages that may be generated by different embodiments may be different, and in different embodiments, the advantages that may be generated may be any one or a combination of several of the above, or any other possible advantages that may be obtained.

While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing detailed disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements, and adaptations to the present disclosure may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within this specification, and therefore, such modifications, improvements, and modifications are intended to be included within the spirit and scope of the exemplary embodiments of the present invention.

Meanwhile, the specification uses specific words to describe the embodiments of the specification. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present description. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present description may be combined as suitable.

Furthermore, the order in which the elements and sequences are processed, the use of numerical letters, or other designations in the description are not intended to limit the order in which the processes and methods of the description are performed unless explicitly recited in the claims. While certain presently useful embodiments have been discussed in the foregoing disclosure, by way of various examples, it is to be understood that such details are merely illustrative and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements included within the spirit and scope of the embodiments of the present disclosure. For example, while the system components described above may be implemented by hardware devices, they may also be implemented solely by software solutions, such as installing the described system on an existing server or mobile device.

Likewise, it should be noted that in order to simplify the presentation disclosed in this specification, and thereby aid in understanding one or more embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of the preceding description of the embodiments of the present specification. This method of disclosure, however, is not intended to imply that more features than are presented in the claims are required for the present description. Indeed, less than all of the features of a single embodiment disclosed above.

In some embodiments, numbers describing the components, number of attributes are used, it being understood that such numbers being used in the description of embodiments are modified in some examples by the modifier "about," approximately, "or" substantially. Unless otherwise indicated, "about," "approximately," or "substantially" indicate that the number allows for a 20% variation. Accordingly, in some embodiments, numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and employ a method for preserving the general number of digits. Although the numerical ranges and parameters set forth herein are approximations that may be employed in some embodiments to confirm the breadth of the range, in particular embodiments, the setting of such numerical values is as precise as possible.

Each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., referred to in this specification is incorporated herein by reference in its entirety. Except for application history documents that are inconsistent or conflicting with the content of this specification, documents that are currently or later attached to this specification in which the broadest scope of the claims to this specification is limited are also. It is noted that, if the description, definition, and/or use of a term in an attached material in this specification does not conform to or conflict with what is described in this specification, the description, definition, and/or use of the term in this specification controls.

Finally, it should be understood that the embodiments described in this specification are merely illustrative of the principles of the embodiments of this specification. Other variations are possible within the scope of this description. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present specification may be considered as consistent with the teachings of the present specification. Accordingly, the embodiments of the present specification are not limited to only the embodiments explicitly described and depicted in the present specification.

Claims

1. A data processing method for XR, comprising:

Creating a canvas within the virtual space in response to the request by the annotation requester;

displaying content to be marked on the canvas, wherein the content to be marked is marked data and/or unmarked original data;

the marking information created by the marking requester on the canvas by using a ray interaction system is obtained, wherein the marking information comprises marking content and marking paths;

and sharing the content to be marked and the marking information to terminals of other participants for display.

2. The method of claim 1, wherein the content to be marked is content presented in any window, any location, of a plurality of windows on the terminal of the annotation requester.

3. The method according to claim 1, wherein the sharing the content to be tagged and the tag information to terminals of other participants for display includes:

acquiring display settings of the labeling requester, wherein the display settings comprise real-time labeling display and display after labeling is completed;

and sharing the content to be marked and the marking information of the content to be marked to the terminals of the other participants for displaying based on the display setting.

4. The method according to claim 3, wherein the sharing the content to be tagged and the tag information thereof to the terminals of the other participants for display includes:

determining perspective information of each of the other participants based on the location information of each of the participants;

and determining and displaying the display content of each participant based on the view angle information of each participant, wherein the display content comprises the content to be marked and/or the marking information under the view angle information.

5. A data processing system for XR, comprising:

a generation module for creating a canvas within the virtual space in response to a request by the annotation requester;

a presentation module configured to implement the following operations:

6. The system of claim 5, wherein the content to be marked is content presented in any window, any location, of a plurality of windows on the terminal of the annotation requester.

7. The system of claim 5, wherein the display module is further configured to:

8. The system of claim 7, wherein the presentation module is further to:

9. A data processing apparatus for XR, the apparatus comprising:

at least one storage medium storing computer instructions;

At least one processor executing the computer instructions to implement the data processing method for XR of any one of claims 1 to 4.

10. A computer-readable storage medium, characterized in that the storage medium stores computer instructions, which, when read by a computer, perform the data processing method for XR as claimed in any one of claims 1 to 4.