CN114332417B

CN114332417B - Method, equipment, storage medium and program product for interaction of multiple scenes

Info

Publication number: CN114332417B
Application number: CN202111517619.6A
Authority: CN
Inventors: 廖春元; 杨帆; 林祥杰; 缪琳
Original assignee: Hiscene Information Technology Co Ltd
Current assignee: Hiscene Information Technology Co Ltd
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2023-07-14
Anticipated expiration: 2041-12-13
Also published as: CN114332417A; WO2023109153A1

Abstract

The application aims to provide a multi-person scene interaction method, equipment, a storage medium and a program product, wherein the specific method comprises the following steps: receiving first tag information, which is uploaded by first user equipment and corresponds to a current scene of multi-user scene interaction, wherein the first tag information comprises corresponding first tag identification information and first tag position information, and the tag position information is used for indicating the scene position of the first tag information in the current scene; and creating or updating tag record information of current scene information of the current scene according to the first tag information. According to the method and the device, the tag record information, the current scene information and the like of the current scene of the multi-user scene interaction are stored through the network equipment, and the rapid initialization of the user in the current scene can be achieved, so that the information interaction of a plurality of users in the current scene can be conveniently and rapidly achieved, the immediate sharing and real-time interaction of data can be achieved, and good interaction effect can be achieved.

Description

Method, equipment, storage medium and program product for interaction of multiple scenes

Technical Field

The application relates to the field of communication, in particular to a technology for multi-person scene interaction.

Background

The virtual scene is a virtual constructed digital scene, such as a digital space, and is combined with a specific physical space to establish a mapping relationship, so that an augmented reality virtual scene can be obtained. In the augmented reality virtual scene mode, a user can superimpose and present virtual tag information on a real-world environment through interactive operation on the basis of the real-world environment.

Disclosure of Invention

An object of the present application is to provide a method, apparatus, storage medium and program product for multi-person scene interaction.

According to one aspect of the present application, there is provided a multi-person scene interaction method, wherein the method is applied to a network device, and comprises:

receiving first tag information, which is uploaded by first user equipment and corresponds to a current scene of multi-user scene interaction, wherein the first tag information comprises corresponding first tag identification information and first tag position information, and the tag position information is used for indicating the scene position of the first tag information in the current scene;

And creating or updating tag record information of current scene information of the current scene according to the first tag information, wherein the current scene information is stored in a scene information base, the scene information base comprises one or more pieces of scene information, each piece of scene information comprises corresponding scene identification information and scene positioning information, and the tag record information comprises at least one piece of tag information.

According to another aspect of the present application, there is provided a multi-person scene interaction method, wherein the method is applied to a second user equipment, and the method includes:

acquiring shooting pose information of a shooting device in second user equipment;

receiving label record information of a current scene about a multi-person interaction scene issued by corresponding network equipment, wherein the label record information comprises at least one piece of label information of the current scene, each piece of label information comprises corresponding label identification information and label position information, and the label position information is used for indicating the scene position of the corresponding label information in the current scene;

and superposing and presenting at least one piece of tag information on a display device of the second user equipment according to the tag position information of the at least one piece of tag information in the tag record information and the shooting pose information.

According to one aspect of the present application, there is provided a method of multi-person scene interaction, wherein the method comprises:

the method comprises the steps that network equipment receives first tag information, which is uploaded by first user equipment and corresponds to a current scene of multi-user scene interaction, wherein the first tag information comprises corresponding first tag identification information and first tag position information, and the tag position information is used for indicating the scene position of the first tag information in the current scene;

the network equipment establishes or updates label record information of current scene information of the current scene according to the first label information, wherein the current scene information is stored in a scene information base, the scene information base comprises one or more pieces of scene information, each piece of scene information comprises corresponding scene identification information and scene positioning information, and the label record information comprises at least one piece of label information;

the second user equipment acquires shooting pose information of a shooting device in the second user equipment, receives the tag record information about the current scene of the multi-user interaction scene, issued by the network equipment, and displays at least one piece of tag information in a superposition manner on a display device of the second user equipment according to the tag position information of the at least one piece of tag information in the tag record information and the shooting pose information.

According to another aspect of the present application, there is provided a method of multi-person scene interaction, wherein the method comprises:

the second user equipment acquires shooting pose information of a shooting device in the second user equipment, and receives tag record information, which is issued by corresponding network equipment and relates to a current scene of a multi-person interaction scene, wherein the tag record information comprises at least one tag information of the current scene, each tag information comprises corresponding tag identification information and tag position information, and the tag position information is used for indicating the scene position of the corresponding tag information in the current scene; and superposing and presenting at least one piece of tag information on a display device of the second user equipment according to the tag position information of the at least one piece of tag information in the tag record information and the camera pose information;

the first user equipment acquires editing operation of the first user on target tag information, and broadcasts the editing operation to corresponding network equipment and the second user equipment;

and the second user equipment receives the editing operation, updates the target label information according to the editing operation, and superimposes and presents the updated target label information on a display device of the second user equipment.

According to one aspect of the present application, there is provided a network device for multi-person scene interaction, wherein the device comprises:

the system comprises a one-to-one module, a first user equipment and a second user equipment, wherein the one-to-one module is used for receiving first tag information, which is uploaded by the first user equipment and corresponds to a current scene of multi-person scene interaction, wherein the first tag information comprises corresponding first tag identification information and first tag position information, and the tag position information is used for indicating the scene position of the first tag information in the current scene;

and the second module is used for creating or updating tag record information of current scene information of the current scene according to the first tag information, wherein the current scene information is stored in a scene information base, the scene information base comprises one or more pieces of scene information, each piece of scene information comprises corresponding scene identification information and scene positioning information, and the tag record information comprises at least one piece of tag information.

According to another aspect of the present application, there is provided a second user device for multi-person scene interaction, wherein the device comprises:

the second module is used for acquiring the shooting pose information of the shooting device in the second user equipment;

The system comprises a second module and a third module, wherein the second module is used for receiving label record information of a current scene about a multi-person interaction scene issued by corresponding network equipment, the label record information comprises at least one piece of label information of the current scene, each piece of label information comprises corresponding label identification information and label position information, and the label position information is used for indicating the scene position of the corresponding label information in the current scene;

and the second and third modules are used for superposing and presenting at least one piece of tag information on the display device of the second user equipment according to the tag position information of the at least one piece of tag information in the tag record information and the shooting pose information.

According to one aspect of the present application, there is provided a computer device, wherein the device comprises:

a processor; and

a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the steps of any of the methods described above.

According to one aspect of the present application there is provided a computer readable storage medium having stored thereon a computer program/instructions which, when executed, cause a system to perform the steps of a method as described in any of the above.

According to one aspect of the present application there is provided a computer program product comprising computer programs/instructions which when executed by a processor implement the steps of a method as described in any of the above.

Compared with the prior art, the method and the device have the advantages that the tag record information, the current scene information and the like of the current scene of the multi-user scene interaction are stored through the network equipment, the quick initialization of the user in the current scene can be realized, the information interaction of a plurality of users in the current scene is realized conveniently and quickly, the immediate sharing and real-time interaction of data are realized, and a good interaction effect can be achieved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:

FIG. 1 illustrates a method flow diagram for a multi-person scene interaction according to one embodiment of the present application;

FIG. 2 illustrates a method flow diagram for multi-person scene interaction according to another embodiment of the present application;

FIG. 3 illustrates a system method flow diagram for multi-person scene interaction according to one embodiment of the present application;

FIG. 4 illustrates a system method flow diagram for multi-person scene interaction according to another embodiment of the present application;

FIG. 5 illustrates functional blocks of a network device according to one embodiment of the present application;

fig. 6 shows functional modules of a second user device according to one embodiment of the present application;

FIG. 7 illustrates an exemplary system that can be used to implement various embodiments described herein.

The same or similar reference numbers in the drawings refer to the same or similar parts.

Detailed Description

The present application is described in further detail below with reference to the accompanying drawings.

In one typical configuration of the present application, the terminal, the devices of the services network, and the trusted party each include one or more processors (e.g., central processing units (Central Processing Unit, CPU)), input/output interfaces, network interfaces, and memory.

The Memory may include non-volatile Memory in a computer readable medium, random access Memory (Random Access Memory, RAM) and/or non-volatile Memory, etc., such as Read Only Memory (ROM) or Flash Memory (Flash Memory). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase-Change Memory (PCM), programmable Random Access Memory (Programmable Random Access Memory, PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (Dynamic Random Access Memory, DRAM), other types of Random Access Memory (RAM), read-Only Memory (ROM), electrically erasable programmable read-Only Memory (EEPROM), flash Memory or other Memory technology, read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), digital versatile disks (Digital Versatile Disc, DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device.

The device referred to in the present application includes, but is not limited to, a user device, a network device, or a device formed by integrating a user device and a network device through a network. The user equipment includes, but is not limited to, any mobile electronic product which can perform man-machine interaction with a user (for example, perform man-machine interaction through a touch pad), such as a smart phone, a tablet computer and the like, and the mobile electronic product can adopt any operating system, such as an Android operating system, an iOS operating system and the like. The network device includes an electronic device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a programmable logic device (Programmable Logic Device, PLD), a field programmable gate array (Field Programmable Gate Array, FPGA), a digital signal processor (Digital Signal Processor, DSP), an embedded device, and the like. The network device includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud of servers; here, the Cloud is composed of a large number of computers or network servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, a virtual supercomputer composed of a group of loosely coupled computer sets. Including but not limited to the internet, wide area networks, metropolitan area networks, local area networks, VPN networks, wireless Ad Hoc networks (Ad Hoc networks), and the like. Preferably, the device may be a program running on the user device, the network device, or a device formed by integrating the user device and the network device, the touch terminal, or the network device and the touch terminal through a network.

Of course, those skilled in the art will appreciate that the above-described devices are merely examples, and that other devices now known or hereafter may be present as appropriate for the application, are intended to be within the scope of the present application and are incorporated herein by reference.

In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

The multi-user scene interaction method is applied to an interaction system of a plurality of user devices and network devices, and the system supports augmented reality interaction of a plurality of users on the same scene. The digital world WCS (World Coordinate System ) is collected and mapped into the physical space of the corresponding scene by the plurality of users through the corresponding augmented reality equipment, so that the plurality of users realize digital content sharing display, synchronous communication, collaborative interaction and the like through the plurality of terminals. The user device includes, but is not limited to, any mobile electronic product that can interact with a user (e.g., via a touch pad), such as a smart phone, a tablet, augmented reality glasses, an augmented reality helmet, etc. The network device includes, but is not limited to, a computer, a network host, a single network server, a set of multiple network servers, or a cloud of multiple servers.

Fig. 1 shows a multi-person scene interaction method according to an aspect of the present application, where the method is applied to a network device, and specifically includes step S101 and step S102. In step S101, first tag information, which is uploaded by a first user device and corresponds to a current scene of a multi-user scene interaction edited by a first user, is received, wherein the first tag information includes corresponding first tag identification information and first tag position information, and the tag position information is used for indicating a scene position of the first tag information in the current scene; in step S102, tag record information of current scene information of the current scene is newly created or updated according to the first tag information, where the current scene information is stored in a scene information base, the scene information base includes one or more pieces of scene information, each piece of scene information includes corresponding scene identification information and scene positioning information, and the tag record information includes at least one piece of tag information. The multi-user scene interaction refers to that a plurality of users participate in scene interaction about a current scene, for example, the current scene has a plurality of participating users at the same time, the plurality of users can enter the current scene at the same time and perform real-time interaction, the plurality of users can be online and perform real-time interaction at the same time, the plurality of users can enter the current scene successively and perform scene interaction based on scene data, and the like. When any user joins the current scene, scene initialization is required to determine the position information of each user in the current scene, wherein the position information can be characterized by the shooting pose information of the shooting device of the user equipment held by the user. The current scene of the multi-person scene interaction may be the same real space at multiple users, or different real spaces with the same or similar positioning features, etc.

Specifically, in step S101, first tag information, which is uploaded by a first user device and corresponds to a current scene of a multi-user scene interaction edited by a first user, is received, where the first tag information includes corresponding first tag identification information and first tag location information, and the tag location information is used to indicate a scene location of the first tag information in the current scene. For example, a first user holds a first user device, and the first user device installs a corresponding scene interaction application, through which the first user can create or join a certain scene, so as to implement multi-person scene interaction with other users. When the first user creates or joins the current scene, scene initialization is performed first, and shooting pose information of the first user equipment is obtained, for example, a coordinate transformation relation between the first user equipment and a world coordinate system of the current scene is obtained, and specific scene initialization includes, but is not limited to, three-dimensional tracking initialization, 2D recognition initialization and the like, wherein the three-dimensional tracking initialization includes, but is not limited to, SLAM (simultaneous localization and mapping, synchronous positioning and mapping) initialization. In some cases, the scene positioning information includes 3D point cloud information in the current scene or 2D feature information of a 2D recognition graph included in the current scene, where the 2D feature information may be the 2D recognition graph, or may be feature information extracted from the 2D recognition graph by a feature extraction algorithm, which is not limited herein. In some embodiments, the scene location information further comprises camera intrinsic to the first user device. For example, the scene positioning information includes 2D feature information, a 2D recognition graph is shot by the image capturing device, or features of the 2D recognition graph are further extracted by the feature extraction algorithm, so as to obtain 2D feature information corresponding to the current scene, and for another example, the scene positioning information includes 3D point cloud information, after the first user equipment shoots the current scene to perform scene initialization, the first user equipment continues to scan the current scene by the image capturing device to obtain a corresponding real-time scene image, and the real-time scene image is processed by a tracking thread, a local graph building thread and the like of the three-dimensional tracking algorithm, so as to obtain real-time pose information of the image capturing device and 3D point cloud information corresponding to the current scene. Wherein the 3D point cloud information includes 3D points, for example, the 3D point cloud information includes 3D map points determined through image matching and depth information acquisition. In some embodiments, the 3D point cloud information includes, but is not limited to, in addition to including corresponding 3D map points: key frames corresponding to the point cloud information, common view information corresponding to the point cloud information, growth tree information corresponding to the point cloud information, and the like. The imaging pose information includes a position and a pose of the imaging device in space, and conversion of an image position and a space position can be performed through the imaging pose information, for example, the pose information includes an external parameter of the imaging device relative to a world coordinate system of a current scene, and for example, the pose information includes an external parameter of the imaging device relative to the world coordinate system of the current scene, and an internal parameter of a camera coordinate system and an image/pixel coordinate system of the imaging device, which are not limited herein.

After the first user equipment completes initialization about the current scene, editing operation of the first user can be collected to realize corresponding scene interaction, wherein the editing operation comprises, but is not limited to, adding, modifying or deleting label information and the like in the current scene, and the specific expression form of the editing operation comprises, but is not limited to, user touch input, character input, voice input or gesture input and the like. In some embodiments, the tag information includes, but is not limited to: identification information; file information; form information; application calling information; real-time sensing information, etc. For example, the label information may include identification information of arrows, brushes-graffiti on screen at will, circles, geometric shapes, etc. For example, the tag information may further include information corresponding to a multimedia file, such as a picture, a video, a 3D model, a PDF file, an office document, and other various files. For another example, the tag information may also include form information, such as generating a form at a corresponding target image location for viewing or entering content by a user, and the like. For example, the tag information may further include application call information for executing related instructions of the application, such as opening the application, calling specific functions of the application, such as making a call, opening a link, and the like. For example, the tag information may also include real-time sensing information for connecting sensing devices (e.g., sensors, etc.) and acquiring sensing data of the target object. In some embodiments, the tag information includes corresponding tag identification information and tag location information, wherein the tag identification information includes any one of: tag identification, such as an icon, a name, or an acquisition link of a tag; tag content, such as the content of PDF files, the color, size, etc. of tags. The tag position information is used for indicating the scene position of the tag information in the current scene, for example, the position information of the tag information in the current scene can be the coordinate of the tag information in a coordinate system corresponding to the current scene, for example, the tag information is added to any position in the current scene by a first user on a photographed image, and the first user equipment calculates the three-dimensional coordinate of the tag information in a three-dimensional coordinate system corresponding to the current scene, for example, the world coordinate in a world coordinate system corresponding to the current scene, or the image position information in the photographed image photographed by the first user equipment according to the two-dimensional coordinate of the position of the tag information added on the photographed image. Of course, those skilled in the art will appreciate that the above-described tag information is by way of example only, and that other tag information that may be present in the present application or otherwise may be present in the future should be included within the scope of the present application and is incorporated herein by reference.

In step S102, tag record information of current scene information of the current scene is newly created or updated according to the first tag information, where the current scene information is stored in a scene information base, the scene information base includes one or more pieces of scene information, each piece of scene information includes corresponding scene identification information and scene positioning information, and the tag record information includes at least one piece of tag information. For example, the network device receives the first tag information uploaded by the first user device, and creates or updates tag record information about the current scene according to the first tag information, where the tag record information includes tag information existing at the current time in the current scene. If the first tag information comprises tag information newly added by the first user and the mapped tag record information does not exist in the current scene information of the current scene, the tag record information of the current scene information is established according to the first tag information. If the mapped tag record information exists in the current scene, the tag information in the tag record information is updated according to the first tag information, for example, the first tag information is newly added or the stored first tag information is replaced, modified, deleted according to the tag identification information, and the like.

Here, the current scene information of the current scene includes related information of the current scene for describing the plurality of scene interactions, for example, scene identification information of the current scene, scene location information including 3D point cloud information or 2D feature information of the current scene, and the like. The corresponding scene location information may be location feature information of the current scene, such as 3D point cloud information or 2D feature information, etc., acquired before or after the current scene is created. The corresponding scene database is used for storing scene information created or participated by each user, and each scene information at least comprises scene identification information and scene positioning information of the scene information, wherein the scene identification information is used for representing the uniqueness of the corresponding scene information. In some cases, each scene information is also bound with the corresponding creating user or participating user, and the scene information created or participated by the corresponding user and the like can be found by searching the user identification information of the creating user or the participating user. The label position information of each label information in the scene information stored by the network equipment is convenient for each participating user to calculate the image position information of the label information in the shooting picture shot by the participating user equipment based on the respective different shooting pose information and the label position information when joining the scene information, so that the label information and the like at the current moment in the current scene are overlapped and presented on the display screen of the participating user equipment.

In some cases, the current scene information stored in the network device is only used for scene initialization and presenting the current tag information when a plurality of users join the current scene, and the real-time interaction of the tag information before the plurality of users participate in the current scene information is not involved. In other cases, the network device acquires editing operations of each participating user about the tag information in the current scene, updates the tag record information based on the corresponding editing operations, thereby realizing real-time updating and summarizing of the current scene information, and simultaneously, transmitting the updated tag record information to other participating users except the participating user to which the editing operations belong. As in some embodiments, the method further comprises step S103 (not shown), in step S103, distributing the updated tag record information to a second user device of a second user, wherein the second user comprises a user other than the first user who is currently participating in a scene interaction of the current scene. The second user refers to a user who joins the current scene to interact after the current scene information is established, and may be one or more users, which is not limited herein. Similarly, if the network device obtains the second tag information about the current scenario uploaded by the second user device (for example, the first tag information and the second tag information may be the same tag or different tags, and the first and second tags are only used for distinguishing the difference of the operation objects) in addition to the first tag information, the network device forwards the second tag information to the first user device and other second user devices (other second user devices of other second users except the second user currently operating), and so on.

In some embodiments, the method further comprises step S104 (not shown), in step S104, receiving a scene creation request about the current scene sent by the first user equipment; generating scene identification information of the current scene in response to the scene creation request, newly building the current scene information according to the scene identification information, and storing the current scene information in the scene information base; and returning the scene identification information of the current scene to the first user equipment. For example, the first user includes a user who initially performs scene information creation at the current scene. In some embodiments, the first user device sets a touch area, a touch key, a preset operation, or the like of the corresponding created scene in the application interactive interface, and if the first user device acquires that the touch area, the touch operation of the touch key, or the user operation corresponding to the first user is the same as or similar to the preset operation, the first user device generates a scene creation request of the corresponding current scene, and sends the scene creation request to the network device. The above manner of generating the scene creation request is merely an example, and the method of generating the scene creation request includes, but is not limited to, user character input, voice input, gesture input, or the like, which is not limited herein. In some cases, the network device receives a scene creation request of the current scene, establishes a corresponding scene data format in a scene database, and gives the current scene an identification as scene identification information of the current scene, wherein the scene identification information is used for distinguishing other scenes. After determining the corresponding scene identification information, the network device returns the corresponding scene identification information to the first user device, in some embodiments, the first user device completes initialization of the current scene, such as 2D identification initialization, in other embodiments, the first user device completes initialization of the current scene, such as three-dimensional tracking initialization, and obtains scene positioning information of the current scene, so that the scene positioning information is stored in scene information corresponding to the scene identification information in the network device. Under other situations, the first user equipment firstly initializes the current scene at the local end and acquires scene positioning information of the current scene, then uploads the scene positioning information to the network equipment for creating a multi-person interactive shared scene, and the network equipment firstly determines corresponding scene identification information and establishes the current scene information based on the scene positioning information and the scene identification information. In some implementations, the scene creation request includes scene location information for the current scene; wherein, the responding to the scene creation request generates the scene identification information of the current scene, and the current scene information is newly built according to the scene identification information, which comprises the following steps: and responding to the scene creation request, generating scene identification information of the current scene, and establishing the current scene information according to the scene positioning information and the scene identification information. In the scene creation process, the first user equipment completes scene initialization at the local end and acquires current scene positioning information of a current scene, and then requests to the network equipment to create the current scene.

In some embodiments, the method further includes step S105 (not shown), in step S105, receiving a scene interaction request about the current scene sent by a corresponding second user device, wherein the scene interaction request includes scene identification information of the current scene; responding to the scene interaction request, and determining current scene positioning information of current scene information of the current scene in the scene information base based on the scene identification information in a matching way; and transmitting the current scene positioning information to the second user equipment. For example, the second user includes a user participating in the interaction of the current scene after the scene of the current scene is created, and at this time, the first user and the second user exist in the current scene at the same time, so that the first user and the second user can perform information interaction in the current scene, and a good interaction effect is achieved. In some embodiments, the second user device sets a touch area, a touch key, or a preset operation of a corresponding joining scene in the application interaction interface, and if the second user device acquires a scene interaction request corresponding to the current scene from the second user device (for example, the current scene is determined by searching and inputting scene identification information, or the current scene is determined by matching and screening a second user account or a corresponding process (such as a job, a game, etc.), and the like, where the touch area, the touch operation of the touch key, or the user operation corresponding to the second user device is the same as or similar to the preset operation), the second user device generates a scene interaction request corresponding to the current scene, and sends the scene interaction request to the network device, where the scene interaction request includes the scene identification information of the current scene. The above manner of generating the scene interaction request is merely an example, and the method of generating the scene interaction request includes, but is not limited to, user character input, voice input, gesture input, etc., which are not limited herein. Under normal conditions, the network equipment receives a scene interaction request of a second user about a current scene, determines current scene positioning information corresponding to current scene information based on scene identification information inquiry of the current scene, and returns the current scene positioning information to the second user equipment for the second user equipment to initialize the scene by using the current scene positioning information. In some embodiments, the current scene positioning information includes 3D point cloud information, the second user device photographs the current scene or a scene with similarity to the current scene meeting a certain condition through the camera device, performs point cloud initialization by using the three-dimensional tracking algorithm and the current scene positioning information, and aligns the world coordinate system of the current scene with the world coordinate system of the point cloud in the scene positioning information if the initialization is successful. In other embodiments, the current scene location information includes 2D feature information, the first user device photographs a 2D recognition graph in the current scene or other scenes through the photographing device, extracts features of the 2D recognition graph and matches the 2D feature information in the scene location information, and if matching is successful, aligns a world coordinate system of the current scene with a world coordinate system of the 2D feature information in the scene location information. Further, if the tag record information already exists in the current scene information, in some embodiments, after the second user equipment completes the initialization of the current scene, the tag position in the tag record information is aligned with the current scene, and the tag can be copied to a position designated when the first user edits, for example, the network equipment can also return the latest tag record information of the current scene to the second user equipment, for example, the tag record information only records the tag information of the current moment of the current scene, so that the second user equipment can display one or more tag information existing at the current moment of the current scene in a superposition manner after the initialization of the scene is completed. For another example, the network device returns the tag record information of all the current scenes to the second user device, for example, the tag record information records the tag information of all the current scenes, and after the initialization of the scenes of the second user device is completed, one or more tag information of all the current scenes is overlapped and presented in the screen of the second user device. The tag information in the one or more tag information may be added by the first user, or may be added by another second user who earlier participates in the current scene interaction.

In some embodiments, the determining, in response to the scene interaction request, current scene location information of current scene information of the current scene based on matching of the scene identification information in the scene information base includes: responding to the scene interaction request, and based on the scene identification information, matching and determining current scene information of the current scene in the scene information base, and determining scene state information of the current scene according to the current scene information, wherein the scene state information comprises that initialization is completed or that the initialization is not completed; and if the scene state information contains the initialization completion, acquiring the current scene positioning information according to the current scene information. For example, the scene status information is used to describe whether the creating user (first user) corresponding to the current scene information has completed the scene initialization of the current scene. In some cases, the first user device only acquires scene identification information about the current scene based on the scene creation request of the first user, and the first user device performs scene initialization and the like after sending the creation request. The second user can join the current scene based on the scene identification information, and the network device selects whether to return the corresponding current scene positioning information to the second user device according to whether the current first user device completes scene initialization. In some embodiments, when the first user device completes the scene initialization of the current scene, the network device determines that the scene state information in the current scene information of the current scene is complete including the initialization, and when the first user device does not complete the scene initialization of the current scene, the network device determines that the scene state information in the current scene information of the current scene is incomplete including the initialization. The method for determining the scene status information is merely an example, and is not limited herein. If the scene status information includes the initialization completion, the network device may send current scene location information of the current scene to the second user device. If the scene state information contains incomplete initialization, the network equipment temporarily fails to provide current scene positioning information and the like related to the current scene for the second user equipment, and the network equipment sends corresponding state prompt information to the second user equipment until the initialization is completed, and then the current scene positioning information is acquired and issued. As in some embodiments, the method further comprises step S106 (not shown), in step S106, if the scene status information includes that initialization is not complete, sending status hint information regarding the current scene to the second user device, wherein the status hint information includes hint information regarding that initialization of the current scene is not complete; and acquiring the current scene positioning information according to the current scene information until the scene state information of the current scene is changed from incomplete initialization to complete initialization. For example, the state prompt information is used for prompting the second user that the current scene does not complete the corresponding scene initialization, and the specific presentation forms of the state prompt information include, but are not limited to, presentation forms such as characters, voice, vibration, images or non-display content of a screen of the second user equipment. In some embodiments, after the second user equipment obtains that the scene prompt information of the current scene includes the prompt information that the initialization of the current scene is not complete, the second user equipment waits for the first user equipment to perform the scene initialization, and at this time, the second user equipment cannot perform the scene interaction of the current scene. The method for determining the scene status information of the current scene by the network device may determine whether the current scene status information includes initialization completion by querying whether the current scene information of the current scene includes current scene positioning information, or whether the current scene status information includes initialization completion by using an initialization completion message uploaded by the first user device, or whether the current scene status information includes initialization completion by querying an initialization progress of the first user device, and the like, which is not limited herein. In other cases, when the second user equipment end presents the scene information, the corresponding scene state information can be presented while the scene identification information is presented, so that the second user is assisted in deciding whether to join the current scene or not.

In some embodiments, the method further includes step S107 (not shown), in step S107, receiving second tag information about a current scene corresponding to the second user edit uploaded by a second user device, where the second tag information includes corresponding second tag identification information and second tag location information, and the second tag location information is used to indicate a scene location of the second tag information in the current scene; and creating or updating tag record information of current scene information of the current scene according to the second tag information. For example, the network device may receive, in addition to the editing operation of the first user about the current scene, the editing operation of the second user about the current scene, specifically, for example, adding a new tag, modifying or deleting an existing tag, so that the first user and the second user can perform information interaction in the current scene to achieve a good interaction effect, and the network device may determine, based on the editing operation of the second user, corresponding second tag information, where the second is only used to distinguish the user of the edited tag, identify that the operation object of the tag is a second user that is added subsequently, and so on. The network device acquires the second tag information, and creates or updates corresponding tag record information based on the second tag information, for example, the execution of step S101 occurs after step S106, and the network device side does not store the tag information about the current scene, so that the network device may create the corresponding tag record information according to the second tag information. If the network device side already stores the tag record information (such as the first tag information or other second tag information edited by other second users) about the current scene mapping, the network device updates the tag record information according to the second tag information.

After the second user edits the second tag, the second user device may send the second tag information to the network device and forward the second tag information to other first user devices and other second user devices participating in the current scene through the network device, and the second user device may also directly broadcast the second tag information to the first user device, other second user devices and/or the network device, for example, the second user device directly broadcasts the second tag information to the first user device, other second user devices and the network device, thereby reducing delay of the first user device and other second user devices for obtaining the second tag information and improving real-time performance of scene interaction. The second label information corresponding to the editing operation may be updated complete label information about a certain label information, or may be an operation instruction to be executed about a certain label information, which is used for executing a corresponding operation instruction on the label information at each end after sending to update the corresponding label information, so that each user equipment end displays the updated label information in a superimposed manner, and real-time interaction between the users participating in the current scene is realized.

In some embodiments, the second user is given editing rights with respect to the current scene information. For example, in a general case, any participating user currently existing in the current scene may edit the scene information, and all participating users have editing rights for the current scene information. In some cases, in order to facilitate management of current scene information, and in order to prevent overlapping confusion of operations when a plurality of users interact, etc., only editing rights of partial participating users with respect to the current scene information are given at the same time; among these, management of editing rights includes, but is not limited to, the following modes: a designation mode, such as giving editing authority about current scene information to a second user designated correspondingly through designation of a first user by a creator of the current scene; a transfer mode, such as transferring from the editing rights of the first user of the creator of the current scene, the corresponding editing rights being transferred from the user currently possessing the editing rights to the transferred one or more users; an account mode, such as determining editing rights through account information of a user participating in a current scene; and setting a mode, such as determining editing authorities of participating users through processes of work, games and the like corresponding to the current scene.

Further, the editing authority generally means that the user has the corresponding operation authority for the object at any position in the current scene information, on the basis of which the editing authority can also be the limitation of the operation authority for the specific object, for example, multiple participating users in the current scene can all edit the object in the current scene, and if the operation objects of the multiple users are different, the network device can all give the multiple users the corresponding editing authority; if the operation objects of the plurality of users exist the same operation object, the network device only gives one of the participating users corresponding to the same operation object to hold the corresponding editing authority, specifically, if one participating user is determined from the participating users corresponding to the same operation object according to the authority level, the starting time of the editing time and the like, and only gives the participation to the operation authority about the same operation object and the like at the same time, for example, if one user starts editing a certain object in the current scene, the user locks the object, and other users cannot edit the object without editing authority about the object.

In some embodiments, the method further includes step S108 (not shown), in step S108, an editing operation of the participating user of the current scene with respect to target tag information is acquired, the target tag information is updated according to the editing operation, and the tag record information is updated based on the updated target tag information, where the target tag information is included in the tag record information of the current scene information. For example, after the second user starts to participate in the scene interaction of the current scene, the second user device receives tag information or editing operation and the like sent by other devices (for example, other user devices or network devices), and updates the tag information of the superimposed presentation in real time based on the received tag information or editing operation and the like. In some embodiments, the second user device also stores tag record information of the current scene locally, and updates the tag record information of the second user device end in real time based on the received tag information or editing operation to update the tag information of the overlay presentation in real time, where, of course, the second user may also select corresponding tag information and perform editing operation on the selected target tag information based on the tag information presentation position of the second user device in the current scene, for example, modification, deletion, etc. of the target tag information, where the corresponding modification includes, but is not limited to, modification of content, scaling of display size, deformation of display shape, or movement of display position. The second user equipment can broadcast the corresponding editing operation to the first user, other second users and/or network equipment, so that the delay of the first user equipment and other second user equipment for acquiring the corresponding editing operation is reduced, and the real-time performance of scene interaction is improved; or the second user equipment can send the corresponding editing operation to the network equipment and forward the editing operation to other participating user equipment in the current scene through the network equipment, wherein the network equipment can directly forward the editing operation to the first user equipment and other second user equipment for the corresponding equipment to update the target label information based on the editing operation, or the network equipment updates the target label information according to the editing operation and sends the updated target label information to the first user equipment and other second user equipment to replace the original target label information to realize label updating and the like. The user equipment editing the tag information may broadcast and send the editing operation to other participating user equipment and/or network equipment in real time, or may broadcast and send the editing operation to other participating user equipment and/or network equipment at preset time intervals, for example, every 100 milliseconds, which is not limited herein. Here, in addition to the procedure regarding the tag update being applicable to the editing operation of the second user, it should be understood by those skilled in the art that the tag update procedure is equally applicable to any editing operation of any participating user of the current scene regarding the tag, and the like.

In some embodiments, the editing operation of the participating user who obtains the current scene about the target tag information includes: acquiring candidate editing operations of a plurality of participating users of the current scene on target tag information; and determining an editing operation related to the target tag information from the candidate editing operations according to the user editing priority levels of the plurality of participating users. For example, in addition to the aforementioned management of editing rights, the network device may also filter based on operations of multiple participating users with respect to the same tag so as to implement that only one user operates on the same tag at the same time, and so on. Specifically, each user device collects editing operations of users about labels, and the like, if candidate editing operations of a plurality of participating users about the same target label information are received, the network device determines a final editing operation from the candidate editing operations according to user editing priority levels of the plurality of participating users, wherein the user editing priority levels are determined based on association information of the users about a current scene, for example, whether the users are creating users of the current scene, or participation duration of the users participating in the current scene, or whether the users are users with earliest editing of the target label information, and the like, for example, in the current scene, a first user and a second user respectively click on a certain target label, the network device determines that the user with the earliest clicking on the target label obtains the editing priority level of the target label according to clicking time of the first user and the second user, and then the user edits the target label, and the other user cannot edit the target label during the editing of the target label by the user.

In some embodiments, the method further comprises step S109 (not shown), in step S109, obtaining an edit time of at least one participant user of the current scene with respect to the at least one tag information; and updating the label record information of the current scene information according to the at least one label information and the editing time, wherein the label record information comprises at least one label information, and each label information has at least one editing time corresponding to editing operation. For example, after the network device newly builds or updates at least one tag information uploaded by the first user device or the second user device, if an editing operation of at least one participating user (for example, the first user or the second user) about any tag information in the at least one tag information is acquired, an editing time corresponding to each editing operation is recorded. In some cases, when a participating user starts editing, the editing time of the editing operation is recorded, and the editing time and the corresponding editing operation are uploaded to a network device or the like. When the editing operation comprises the corresponding instantaneous operation, the network equipment or each user equipment records an operation time node corresponding to the instantaneous operation as the corresponding editing time; when the editing operation includes a continuous operation, the network device or each user device continuously records (e.g. records once every 20ms or 0.1 s) the editing operation at the corresponding time, and establishes a mapping relationship between the corresponding time node and the editing operation, so as to form a continuous editing operation and the corresponding editing time.

The network device updates the label information and the editing time to corresponding label record information, wherein the label record information comprises at least one label information, each label information comprises corresponding label identification information, label position information and the like, and the network device also comprises the editing time corresponding to the editing operation. For each piece of tag information, there is at least one editing operation (for example, an operation when the tag is added), and then there is an editing time corresponding to the at least one editing operation for each piece of tag information; if a plurality of editing operations exist in one piece of tag information, the states of the tag information are arranged according to the editing time corresponding to each editing operation to form a complete tag time axis, tag state information of the tag information on the time axis is presented along with the transition of the time axis, specific tag state information is used for describing tag information corresponding to the moment, such as the position, the size, the shape, the color and the like of the tag, and the corresponding tag state information changes after each editing of the tag information. In some embodiments, the tag record information records tag information of all current scenes, when the network device receives a joining request of other users, the network device determines tag information corresponding to the current time from the tag record information, and returns the tag information corresponding to the current time to the other users joining the request, so that the other user devices superimpose and display one or more tag information existing at the current time of the current scene after the scene initialization is completed.

In some embodiments, the method further includes step S110 (not shown), in which at least one participant user of the current scene is obtained as to editing operations of the at least one tag information and editing time corresponding to each editing operation; and creating or updating a scene tag library of the current scene according to the editing operation of the at least one tag information and the editing time corresponding to each editing operation, wherein the scene tag library comprises at least one tag information, and each tag information has the editing time corresponding to at least one editing operation. For example, tag record information in current scene information of the network device is used for recording tag information existing at the current moment of the current scene, an adding and changing process of each tag information is stored in a corresponding scene tag library, the scene tag library corresponds to scene identification information, and at least one tag information of a corresponding scene can be called through the corresponding scene identification information. In some embodiments, the scene tag library may be a storage space independent of tag record information, so as to facilitate orderly management and reasonable resource allocation for related information of the current scene. The at least one piece of tag information stored in the scene tag library comprises tag identification information, tag position information, editing time and the like.

In some embodiments, the method further includes step S111 (not shown), in step S111, if a tag backtracking request of the application user about the current scene is obtained, where the tag backtracking request includes scene identification information of the current scene; responding to the tag backtracking request, and determining at least one backtracking tag information of the current scene according to the scene identification information, wherein each backtracking tag information comprises at least one editing time; and returning the at least one backtracking label information to the user equipment of the application user. For example, the network device stores tag information corresponding to the editing time, and the application user can call the tag information corresponding to the time axis and the tag state information, so that the user can trace back the tag of the current scene conveniently. The tag backtracking request is used for requesting the superposition of the formation of part or all of tag information in the current scene, the change process along with the time axis and the like. The user sending the tag backtracking request includes an application user in the multi-user scene interaction application, and may be a user participating in the current scene interaction or a user not participating in the current scene interaction, for example, the application user may view a scene information list displayed on an interface and select the current scene information through operations such as touch control, so as to send a corresponding tag backtracking request to the network device, so as to backtrack formation and change of all or part of tag information in the current scene information. The tag backtracking request comprises corresponding scene identification information and is used for backtracking at least one tag information pointing to the current scene; in some cases, the tag backtracking request further includes user identification information of the corresponding participating user, and is used for requesting backtracking of tag information edited by the participating user corresponding to the specific participating user identification information in the current scene; in other cases, the tag backtracking request also includes one or more tag identification information for requesting backtracking of particular one or more tag information in the current scene, and so on. If the corresponding application user is the participating user of the current scene, the network device determines that the participating user has performed scene initialization, and then the corresponding backtracking label information is directly returned to the user device of the participating user, so that the user device superimposes and presents the backtracking label information at the corresponding position of the current scene in a time axis manner. If the corresponding application user refers to a user not participating in the current scene, the network device can directly return the corresponding backtracking label information to the user device of the user so that the user can watch the backtracking label information through the corresponding user device, for example, the backtracking label information is overlapped and presented in a time axis mode in the current scene of the user device; in other cases, if the corresponding application user refers to a user not participating in the current scene, and the user wants to superimpose and display the corresponding backtracking tag information in the real space corresponding to the current scene determined by the scene identification information, the network device returns the backtracking tag information to the user, and returns the scene positioning information and the like of the current scene to the corresponding user, so that the user device superimposes and presents the backtracking tag information in a time axis manner at the corresponding position of the current scene after completing the scene initialization in the current scene determined by the scene identification information. As in some implementations, the application user includes a user not currently participating in the current scene; the returning the at least one backtracking tag information to the user equipment of the corresponding application user includes: and returning the at least one backtracking tag information and the scene positioning information of the current scene to the user equipment of the application user, wherein the scene positioning information is used for carrying out scene initialization, and the at least one backtracking tag information is overlapped and presented on the current scene with the scene initialization completed.

Here, after the backtracking tag information is sent to the corresponding user equipment, the backtracking tag information may be directly presented in a superposition manner, for example, in a current scene of the user equipment in a time axis manner; the method can also be presented in an augmented reality mode, for example, if the current user is in the same or similar real space as the current scene corresponding to the current scene identification information, the network equipment sends scene positioning information corresponding to the current scene to the user equipment, the user equipment shoots the current scene and performs scene initialization through the scene positioning information to obtain corresponding shooting pose information, and screen position information of the backtracking label information is calculated based on the shooting pose information and label position information of the backtracking label information, so that the backtracking label information is superimposed in the real space corresponding to the current scene.

In some embodiments, the tag backtracking request further includes a corresponding backtracking time; the determining, in response to the tag traceback request, at least one traceback tag information of the current scene according to the scene identification information includes: and responding to the tag backtracking request, determining at least one backtracking tag information of the current scene according to the scene identification information, and determining corresponding target backtracking tag information from the at least one backtracking tag information according to the backtracking time information, wherein the editing time of each target backtracking tag information corresponds to the backtracking time. For example, the corresponding tag backtracking request may further include a corresponding backtracking time, which may be a time node or a time interval on the time axis, or the like. Based on the corresponding time node, the network device may determine tag information existing in the time node in the current scene, or the network device may determine tag information in a preset time period before or after the time node in the current scene, or the network device may determine tag information in a last editing time period from the time node to the current scene in the current scene, use the determined tag information as corresponding backtracking tag information, send the backtracking tag information to the corresponding user device, and after receiving the corresponding backtracking tag information, the user device presents the backtracking tag information through the corresponding display device. Or based on the corresponding time interval, the network device may determine the tag information existing in the time interval, and return the tag information in the time interval corresponding to the tag information as corresponding backtracking tag information to the user device, where the user device presents the backtracking tag information at each moment in the corresponding time interval by sequentially playing the time axis.

Fig. 2 shows a multi-person scene interaction method according to another aspect of the present application, applied to a second user device, the method comprising step S201, step S202 and step S203. In step S201, capturing pose information of a capturing apparatus in a second user device is acquired; in step S202, receiving tag record information about a current scene of a multi-person interaction scene issued by a corresponding network device, where the tag record information includes at least one tag information of the current scene, each tag information includes corresponding tag identification information and tag position information, and the tag position information is used to indicate a scene position of the corresponding tag information in the current scene; in step S203, the at least one piece of tag information is superimposed and presented on the display device of the second user equipment according to the tag position information and the imaging pose information of the at least one piece of tag information in the tag record information.

For example, the second user includes a user joining in the interaction of the current scene after the scene creation of the current scene, at which time other participating users are present at the same time. The second user equipment can acquire the shooting pose information of the shooting device, for example, the second user equipment shoots the current scene or the scene with the similarity meeting a certain condition with the current scene through the shooting device, and performs point cloud initialization by utilizing a three-dimensional tracking algorithm and the current scene positioning information, and if the initialization is successful, the shooting pose information of the shooting device is acquired. For another example, the first user equipment shoots a 2D recognition graph in the current scene or other scenes through the camera device, extracts the characteristics of the 2D recognition graph and matches with 2D characteristic information in scene positioning information, if the matching is successful, the initialization is successful, and the camera pose information of the camera device is obtained. In some embodiments, the network device may further return the latest tag record information of the current scene to the second user device, where the tag record information only records the tag information of the current moment of the current scene, so that the second user device displays one or more tag information existing at the current moment of the current scene in a superimposed manner after the initialization of the scene is completed. In other embodiments, the network device returns the tag record information of all the current scenes to the second user device, for example, the tag record information records the tag information of all the current scenes, and after the initialization of the scenes of the second user device is completed, one or more tag information of all the current scenes is overlaid and presented in the screen of the second user device, where the tag information in the one or more tag information may be added by the first user or may be added by another second user that participates in the interaction of the current scenes earlier.

In some embodiments, the method further includes step S204 (not shown), in which a scene interaction request regarding the current scene is generated based on a joining operation of the second user regarding the current scene, wherein the scene interaction request includes scene identification information of the current scene; sending the scene interaction request to corresponding network equipment; and receiving current scene positioning information which is returned by the network equipment and related to the current scene, and completing scene initialization related to the current scene according to the current scene positioning information. For example, the second user device sets a touch area, a touch key, or a preset operation of a corresponding joining scene in the application interactive interface, and if the second user device acquires a scene interaction request corresponding to the current scene from the second user device (for example, the current scene is determined by searching and inputting scene identification information, or the current scene is determined according to a matching and screening mode of a second user account or a corresponding process (such as a job and a game), and the like, which are not limited herein), the touch area, the touch operation of the touch key, or the user operation corresponding to the second user device is the same as or similar to the preset operation, the second user device generates a scene interaction request corresponding to the current scene, and sends the scene interaction request to the network device, where the scene interaction request includes the scene identification information of the current scene. The above manner of generating the scene interaction request is merely an example, and the method of generating the scene interaction request includes, but is not limited to, user character input, voice input, gesture input, etc., which are not limited herein. In a general case, the network device receives a scene interaction request of the second user about the current scene, determines corresponding current scene positioning information based on the scene identification information query, and returns the current scene positioning information to the second user device for the second user device to initialize the scene by using the current scene positioning information.

In some embodiments, the method further includes step S205 (not shown), in step S105, receiving an editing operation broadcasted by other participating user devices of the current scene, corresponding to other participating users, regarding target tag information, and updating the target tag information according to the editing operation; and superposing and presenting the updated target label information on a display device of the second user equipment. For example, after the second user starts to participate in the scene interaction of the current scene, the second user device receives tag information or editing operation and the like broadcast by other participating user devices, and updates the tag information in the superimposed presentation in real time based on the received tag information or editing operation and the like. In some embodiments, the second user device also stores tag record information of the current scene locally, and updates the tag record information of the second user device end in real time based on the received tag information or editing operation to update the tag information of the overlay presentation in real time, where, of course, the second user may also select corresponding tag information and perform editing operation on the selected target tag information based on the tag information presentation position of the second user device in the current scene, for example, modification, deletion, etc. of the target tag information, where the corresponding modification includes, but is not limited to, modification of content, scaling of display size, deformation of display shape, or movement of display position. The second user device may broadcast the corresponding editing operation to the first user, other second users and/or the network device, where the second user device may directly send the editing operation broadcast to the first user device and other second user devices for the corresponding device to update the target tag information based on the editing operation, or the second user device updates the target tag information according to the editing operation, and sends the updated target tag information broadcast to the first user device and other second user devices to replace the original target tag information to implement tag update, and so on. The user equipment editing the tag information may broadcast and send the editing operation to other participating user equipment and/or network equipment in real time, or may broadcast and send the editing operation to other participating user equipment and/or network equipment at preset time intervals, for example, every 100 milliseconds, which is not limited herein. Here, in addition to the procedure regarding the tag update being applicable to the editing operation of the second user, it should be understood by those skilled in the art that the tag update procedure is equally applicable to any editing operation of any participating user of the current scene regarding the tag, and the like.

In some embodiments, the method further includes step S206 (not shown), in step S206, sending a tag backtracking request of the current scene to a corresponding network device, wherein the tag backtracking request includes scene identification information of the current scene; receiving at least one piece of backtracking label information of the current scene returned by the network equipment based on the label backtracking request, wherein each piece of backtracking label information comprises corresponding editing time; and presenting the at least one piece of retrospective tag information, wherein the presentation time axis corresponds to the editing time of each piece of retrospective tag information. For example, the network device stores tag information corresponding to the editing time, and the second user can call the tag information corresponding to the time axis and the tag state information, so that the second user can trace back the tag of the current scene conveniently. The tag backtracking request is used for requesting the formation of part or all of tag information in the current scene, the change process along with the time axis and the like. For example, the second user may view a scene information list displayed on the interface, and may select the current scene information through operations such as touch control, so as to send a corresponding tag backtracking request to the network device, so as to backtrack formation and change of all or part of tag information in the current scene information. The tag backtracking request comprises corresponding scene identification information and is used for backtracking at least one tag information pointing to the current scene; in some cases, the tag backtracking request further includes user identification information of the corresponding participating user, and is used for requesting backtracking of tag information edited by the participating user corresponding to the specific participating user identification information in the current scene; in other cases, the tag backtracking request also includes one or more tag identification information for requesting backtracking of particular one or more tag information in the current scene, and so on. In some embodiments, if the network device determines that the second user has performed scene initialization of the current scene, the network device directly returns the corresponding backtracking tag information to the user device of the second user, so that the user device superimposes and presents the backtracking tag information at the corresponding position of the current scene in a time axis manner, where the time axis corresponds to the editing time of the backtracking tag.

In some embodiments, the method further includes step S207 (not shown), in step S207, obtaining second tag information about the current scene edited by the second user, where the second tag information includes corresponding second tag identification information and second tag location information, and the second tag location information is used to indicate a scene location of the second tag information in the current scene; and sending the second tag information to other user equipment participating in the current scene and the network equipment.

. For example, the network device may receive, in addition to the editing operation of the first user about the current scene, the editing operation of the second user about the current scene, specifically, for example, adding a new tag, modifying or deleting an existing tag, so that the first user and the second user can perform information interaction in the current scene to achieve a good interaction effect, and the network device may determine, based on the editing operation of the second user, corresponding second tag information, where the second is only used to distinguish the user of the edited tag, identify that the operation object of the tag is a second user that is added subsequently, and so on. And the network equipment acquires the second tag information and updates the tag record information corresponding to the current scene based on the second tag information. After the second user edits the second tag, the second user device may send the second tag information to the network device and forward the second tag information to other first user devices and other second user devices participating in the current scene through the network device, and the second user device may also directly broadcast the second tag information to the first user device, other second user devices and the network device, thereby reducing delay of the first user device and other second user devices obtaining the second tag information, and improving real-time performance of scene interaction. The second label information corresponding to the editing operation may be updated complete label information about a certain label information, or may be an operation instruction to be executed about a certain label information, which is used for executing a corresponding operation instruction on the label information at each end after sending to update the corresponding label information, so that each user equipment end displays the updated label information in a superimposed manner, and real-time interaction between the users participating in the current scene is realized.

In some embodiments, the method further includes step S208 (not shown), in step S208, acquiring an editing operation of the second user with respect to target tag information of the current scene, and updating the target tag information based on the editing operation; and sending the editing operation to other user equipment and the network equipment participating in the current scene so as to update the target tag information in the other user equipment and the network equipment. For example, after the second user starts to participate in the scene interaction of the current scene, the second user may select corresponding target tag information based on the tag information presentation position of the second user device in relation to the current scene and perform editing operations on the selected target tag information, such as modification, deletion, etc. of the target tag information, where the corresponding modification includes, but is not limited to, modification of content, scaling of display size, deformation of display shape, or movement of display position. The second user equipment can broadcast and send the corresponding editing operation to other user equipment and network equipment which participate in the current scene interaction; or the second user equipment may send the corresponding editing operation to the network equipment, and forward the editing operation to other participating user equipment in the current scene through the network equipment, where the second user equipment may directly send the editing operation to other participating user equipment and the network equipment for the corresponding equipment to update the target tag information based on the editing operation, for example, the second user equipment updates the target tag information according to the editing operation, and sends the updated target tag information to other user equipment and the network equipment to replace the original target tag information to implement target tag update, and so on. In some embodiments, the network device updates the tag record information in the current scene according to the editing operation sent by the second user.

Fig. 3 illustrates a method of multi-person scene interaction according to an aspect of the subject application, wherein the method comprises:

Fig. 4 illustrates a method of multi-person scene interaction according to another aspect of the present application, wherein the method comprises:

The foregoing mainly describes embodiments of a method for interaction in a multi-user scenario according to an aspect of the present application, and in addition, the present application further provides specific devices capable of implementing the foregoing embodiments, and we describe below with reference to fig. 5 and 6.

Fig. 5 illustrates a network device showing a multi-person scene interaction according to an aspect of the present application, specifically including a one-to-one module 101, a two-to-one module 102. The one-to-one module 101 is configured to receive first tag information, which is uploaded by a first user device and corresponds to a current scene of a multi-user scene interaction edited by a first user, where the first tag information includes corresponding first tag identification information and first tag position information, and the tag position information is used to indicate a scene position of the first tag information in the current scene; and a second module 102, configured to create or update tag record information of current scene information of the current scene according to the first tag information, where the current scene information is stored in a scene information base, the scene information base includes one or more pieces of scene information, each piece of scene information includes corresponding scene identification information and scene positioning information, and the tag record information includes at least one piece of tag information.

Here, the specific implementation manner of the one-to-one module 101 and the two-module 102 shown in fig. 5 is the same as or similar to the embodiment of the step S101 and the step S102 shown in fig. 1, and thus is not described in detail and is incorporated herein by reference.

In some embodiments, the apparatus further comprises a third module (not shown) for distributing the updated tag record information to a second user device of a second user, wherein the second user comprises a user other than the first user who is currently participating in the scene interaction of the current scene.

In some embodiments, the apparatus further includes a fourth module (not shown) for receiving a scene creation request about the current scene sent by the first user equipment; generating scene identification information of the current scene in response to the scene creation request, newly building the current scene information according to the scene identification information, and storing the current scene information in the scene information base; and returning the scene identification information of the current scene to the first user equipment. In some implementations, the scene creation request includes scene location information for the current scene; wherein, the responding to the scene creation request generates the scene identification information of the current scene, and the current scene information is newly built according to the scene identification information, which comprises the following steps: and responding to the scene creation request, generating scene identification information of the current scene, and establishing the current scene information according to the scene positioning information and the scene identification information.

In some embodiments, the apparatus further includes a fifth module (not shown) for receiving a scene interaction request about the current scene sent by the corresponding second user equipment, wherein the scene interaction request includes scene identification information of the current scene; responding to the scene interaction request, and determining current scene positioning information of current scene information of the current scene in the scene information base based on the scene identification information in a matching way; and transmitting the current scene positioning information to the second user equipment.

In some embodiments, the determining, in response to the scene interaction request, current scene location information of current scene information of the current scene based on matching of the scene identification information in the scene information base includes: responding to the scene interaction request, and based on the scene identification information, matching and determining current scene information of the current scene in the scene information base, and determining scene state information of the current scene according to the current scene information, wherein the scene state information comprises that initialization is completed or that the initialization is not completed; and if the scene state information contains the initialization completion, acquiring the current scene positioning information according to the current scene information. In some embodiments, the apparatus further includes a six module (not shown), if the scene status information includes that initialization is not complete, sending status hint information regarding a current scene to the second user device, wherein the status hint information includes hint information regarding that initialization of the current scene is not complete; and acquiring the current scene positioning information according to the current scene information until the scene state information of the current scene is changed from incomplete initialization to complete initialization.

In some embodiments, the device further includes a seventh module (not shown) configured to receive second tag information about a current scene, which is uploaded by a second user device and corresponds to the second user edit, where the second tag information includes corresponding second tag identification information and second tag location information, and the second tag location information is used to indicate a scene location of the second tag information in the current scene; and creating or updating tag record information of current scene information of the current scene according to the second tag information.

In some embodiments, the second user is given editing rights with respect to the current scene information.

In some embodiments, the apparatus further includes an eight module (not shown) configured to obtain an editing operation of a participant user of the current scene with respect to target tag information, update the target tag information according to the editing operation, and update the tag record information based on the updated target tag information, where the target tag information is included in the tag record information of the current scene information.

In some embodiments, the editing operation of the participating user who obtains the current scene about the target tag information includes: acquiring candidate editing operations of a plurality of participating users of the current scene on target tag information; and determining an editing operation related to the target tag information from the candidate editing operations according to the user editing priority levels of the plurality of participating users.

In some embodiments, the apparatus further comprises a nine module (not shown) for obtaining an edit time of at least one participant user of the current scene with respect to the at least one tag information; and updating the label record information of the current scene information according to the at least one label information and the editing time, wherein the label record information comprises at least one label information, and each label information has at least one editing time corresponding to editing operation.

In some embodiments, the apparatus further includes a tenth module (not shown) for acquiring editing operations of at least one participating user of the current scene with respect to the at least one tag information and editing time corresponding to each editing operation; and creating or updating a scene tag library of the current scene according to the editing operation of the at least one tag information and the editing time corresponding to each editing operation, wherein the scene tag library comprises at least one tag information, and each tag information has the editing time corresponding to at least one editing operation.

In some embodiments, the apparatus further includes an eleventh module (not shown), if a tag backtracking request of the application user about the current scene is obtained, where the tag backtracking request includes scene identification information of the current scene; responding to the tag backtracking request, and determining at least one backtracking tag information of the current scene according to the scene identification information, wherein each backtracking tag information comprises at least one editing time; and returning the at least one backtracking label information to the user equipment of the application user. In some embodiments, the application user comprises a user not currently participating in the current scene; the returning the at least one backtracking tag information to the user equipment of the corresponding application user includes: and returning the at least one backtracking tag information and the scene positioning information of the current scene to the user equipment of the application user, wherein the scene positioning information is used for carrying out scene initialization, and the at least one backtracking tag information is overlapped and presented on the current scene with the scene initialization completed.

In some embodiments, the tag backtracking request further includes a corresponding backtracking time; the determining, in response to the tag traceback request, at least one traceback tag information of the current scene according to the scene identification information includes: and responding to the tag backtracking request, determining at least one backtracking tag information of the current scene according to the scene identification information, and determining corresponding target backtracking tag information from the at least one backtracking tag information according to the backtracking time information, wherein the editing time of each target backtracking tag information corresponds to the backtracking time.

Here, the embodiments corresponding to the three to eleven modules are the same as or similar to the embodiments of the steps S103 to S111, and are not described in detail herein, and are incorporated by reference.

Fig. 6 illustrates a second user device for multi-person scene interaction according to another aspect of the present application, the device comprising a two-one module 201, a two-two module 202, and a two-three module 203. A second module 201, configured to obtain imaging pose information of an imaging device in the second user equipment; the second-second module 202 is configured to receive tag record information about a current scene of a multi-person interaction scene issued by a corresponding network device, where the tag record information includes at least one tag information of the current scene, each tag information includes corresponding tag identification information and tag location information, and the tag location information is used to indicate a scene location of the corresponding tag information in the current scene; and a second and third module 203, configured to superimpose and present at least one piece of tag information on the display device of the second user equipment according to the tag position information and the imaging pose information of the at least one piece of tag information in the tag record information.

Here, the specific embodiments of the two-one module 201, the two-two module 202 and the two-three module 203 shown in fig. 6 are the same as or similar to the embodiments of the step S201, the step S202 and the step S203 shown in fig. 2, and thus are not described in detail and are incorporated herein by reference.

In some embodiments, the apparatus further includes a second four module (not shown) for generating a scene interaction request regarding the current scene based on a joining operation of the second user regarding the current scene, wherein the scene interaction request includes scene identification information of the current scene; sending the scene interaction request to corresponding network equipment; and receiving current scene positioning information which is returned by the network equipment and related to the current scene, and completing scene initialization related to the current scene according to the current scene positioning information.

In some embodiments, the device further includes a second and fifth modules (not shown) configured to receive an editing operation broadcasted by other participating user devices of the current scene and corresponding to other participating users about target tag information, and update the target tag information according to the editing operation; and superposing and presenting the updated target label information on a display device of the second user equipment.

In some embodiments, the device further includes two six modules (not shown) configured to send a tag backtracking request of the current scene to a corresponding network device, where the tag backtracking request includes scene identification information of the current scene; receiving at least one piece of backtracking label information of the current scene returned by the network equipment based on the label backtracking request, wherein each piece of backtracking label information comprises corresponding editing time; and presenting the at least one piece of retrospective tag information, wherein the presentation time axis corresponds to the editing time of each piece of retrospective tag information. In some embodiments, each of the at least one retrospective tag information further comprises corresponding tag location information; wherein said presenting said at least one backtracking tag information comprises: determining current backtracking tag information of current presentation according to the current presentation time of the presentation time axis, wherein the current presentation time is adapted to the editing time; determining the presentation position information of the current backtracking tag information in a display device according to the camera pose information and the tag position information of the current backtracking tag information; and presenting the current backtracking label information in a display device based on the presentation position information.

In some embodiments, the device further includes two seven modules (not shown) configured to obtain second tag information about a current scene edited by the second user, where the second tag information includes corresponding second tag identification information and second tag location information, and the second tag location information is used to indicate a scene location of the second tag information in the current scene; and sending the second tag information to other user equipment participating in the current scene and the network equipment. In some embodiments, the apparatus further includes two-eight modules (not shown) for acquiring an editing operation of the second user with respect to target tag information of the current scene, the target tag information being updated based on the editing operation; and sending the editing operation to other user equipment and the network equipment participating in the current scene so as to update the target tag information in the other user equipment and the network equipment.

Here, the specific implementation manners of the two-four module to the two-eight module are the same as or similar to the embodiments of the step S204 to the step S208, and thus are not described in detail, and are incorporated herein by reference.

In addition to the methods and apparatus described in the above embodiments, the present application also provides a computer-readable storage medium storing computer code which, when executed, performs a method as described in any one of the preceding claims.

The present application also provides a computer program product which, when executed by a computer device, performs a method as claimed in any preceding claim.

The present application also provides a computer device comprising:

one or more processors;

a memory for storing one or more computer programs;

the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.

FIG. 7 illustrates an exemplary system that may be used to implement various embodiments described herein;

in some embodiments, as shown in fig. 7, system 300 can function as any of the above-described devices of each of the described embodiments. In some embodiments, system 300 can include one or more computer-readable media (e.g., system memory or NVM/storage 320) having instructions and one or more processors (e.g., processor(s) 305) coupled with the one or more computer-readable media and configured to execute the instructions to implement the modules to perform the actions described herein.

For one embodiment, the system control module 310 may include any suitable interface controller to provide any suitable interface to at least one of the processor(s) 305 and/or any suitable device or component in communication with the system control module 310.

The system control module 310 may include a memory controller module 330 to provide an interface to the system memory 315. Memory controller module 330 may be a hardware module, a software module, and/or a firmware module.

The system memory 315 may be used, for example, to load and store data and/or instructions for the system 300. For one embodiment, system memory 315 may include any suitable volatile memory, such as, for example, a suitable DRAM. In some embodiments, the system memory 315 may comprise a double data rate type four synchronous dynamic random access memory (DDR 4 SDRAM).

For one embodiment, system control module 310 may include one or more input/output (I/O) controllers to provide an interface to NVM/storage 320 and communication interface(s) 325.

For example, NVM/storage 320 may be used to store data and/or instructions. NVM/storage 320 may include any suitable nonvolatile memory (e.g., flash memory) and/or may include any suitable nonvolatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).

NVM/storage 320 may include storage resources that are physically part of the device on which system 300 is installed or which may be accessed by the device without being part of the device. For example, NVM/storage 320 may be accessed over a network via communication interface(s) 325.

Communication interface(s) 325 may provide an interface for system 300 to communicate over one or more networks and/or with any other suitable device. The system 300 may wirelessly communicate with one or more components of a wireless network in accordance with any of one or more wireless network standards and/or protocols.

For one embodiment, at least one of the processor(s) 305 may be packaged together with logic of one or more controllers (e.g., memory controller module 330) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be packaged together with logic of one or more controllers of the system control module 310 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 305 may be integrated on the same die as logic of one or more controllers of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic of one or more controllers of the system control module 310 to form a system on chip (SoC).

In various embodiments, the system 300 may be, but is not limited to being: a server, workstation, desktop computing device, or mobile computing device (e.g., laptop computing device, handheld computing device, tablet, netbook, etc.). In various embodiments, system 300 may have more or fewer components and/or different architectures. For example, in some embodiments, system 300 includes one or more cameras, keyboards, liquid Crystal Display (LCD) screens (including touch screen displays), non-volatile memory ports, multiple antennas, graphics chips, application Specific Integrated Circuits (ASICs), and speakers.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, using Application Specific Integrated Circuits (ASIC), a general purpose computer or any other similar hardware device. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions as described above. Likewise, the software programs of the present application (including associated data structures) may be stored on a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. In addition, some steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

Furthermore, portions of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application by way of operation of the computer. Those skilled in the art will appreciate that the form of computer program instructions present in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., and accordingly, the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.

Communication media includes media whereby a communication signal containing, for example, computer readable instructions, data structures, program modules, or other data, is transferred from one system to another. Communication media may include conductive transmission media such as electrical cables and wires (e.g., optical fibers, coaxial, etc.) and wireless (non-conductive transmission) media capable of transmitting energy waves, such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules, or other data may be embodied as a modulated data signal, for example, in a wireless medium, such as a carrier wave or similar mechanism, such as that embodied as part of spread spectrum technology. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.

By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory, such as random access memory (RAM, DRAM, SRAM); and nonvolatile memory such as flash memory, various read only memory (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memory (MRAM, feRAM); and magnetic and optical storage devices (hard disk, tape, CD, DVD); or other now known media or later developed computer-readable information/data that can be stored for use by a computer system.

An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to operate a method and/or a solution according to the embodiments of the present application as described above.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the apparatus claims can also be implemented by means of one unit or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.

Claims

1. A multi-person scene interaction method, wherein the method is applied to a network device, the method comprising:

creating or updating tag record information of current scene information of the current scene according to the first tag information, wherein the current scene information is stored in a scene information base, the scene information base comprises one or more pieces of scene information, each piece of scene information comprises corresponding scene identification information and scene positioning information, and the tag record information comprises at least one piece of tag information;

receiving a scene interaction request about the current scene, which is sent by a corresponding second user device, wherein the scene interaction request comprises scene identification information of the current scene, and the first user and the second user corresponding to the second user device coexist in the current scene;

Responding to the scene interaction request, and based on the scene identification information, matching and determining current scene information of the current scene in the scene information base, and determining scene state information of the current scene according to the current scene information, wherein the scene state information is used for describing whether a creating user of the current scene has completed scene initialization of the current scene information or not, and the scene state information comprises initialization completion or incomplete initialization; if the scene state information contains initialization completion, acquiring the current scene positioning information according to the current scene information; if the scene state information contains incomplete initialization, temporarily failing to provide current scene positioning information about the current scene to the second user equipment; obtaining the current scene positioning information according to the current scene information until the scene state information of the current scene is changed from incomplete initialization to complete initialization;

issuing the current scene positioning information to the second user equipment, wherein the second user equipment completes scene initialization about the current scene according to the current scene positioning information;

If the plurality of participating users in the current scene have the same operation object about the operation object in the current scene, the network device only endows one of the plurality of participating users with editing authority to the same operation object at the same time, wherein the operation object of one of the plurality of participating users comprises the same operation object.

2. The method of claim 1, wherein the method further comprises:

and distributing the updated tag record information to second user equipment of a second user, wherein the second user comprises users except the first user and currently participating in scene interaction of the current scene.

3. The method of claim 1, wherein the method further comprises:

receiving a scene creation request about the current scene, which is sent by the first user equipment;

generating scene identification information of the current scene in response to the scene creation request, newly building the current scene information according to the scene identification information, and storing the current scene information in the scene information base;

and returning the scene identification information of the current scene to the first user equipment.

4. A method according to claim 3, wherein the scene creation request includes scene location information of the current scene;

wherein, the responding to the scene creation request generates the scene identification information of the current scene, and the current scene information is newly built according to the scene identification information, which comprises the following steps:

and responding to the scene creation request, generating scene identification information of the current scene, and establishing the current scene information according to the scene positioning information and the scene identification information.

5. The method of claim 1, wherein the method further comprises:

and if the scene state information contains incomplete initialization, sending state prompt information about the current scene to the second user equipment, wherein the state prompt information comprises prompt information about the incomplete initialization of the current scene.

6. The method of any one of claims 1 to 5, wherein the method further comprises:

receiving second tag information which is uploaded by second user equipment and corresponds to the current scene and edited by the second user, wherein the second tag information comprises corresponding second tag identification information and second tag position information, and the second tag position information is used for indicating the scene position of the second tag information in the current scene;

And creating or updating tag record information of current scene information of the current scene according to the second tag information.

7. The method of claim 6, wherein the second user is given editing rights with respect to the current scene information.

8. The method of claim 1, wherein the method further comprises:

acquiring editing operation of a participating user of the current scene on target tag information, updating the target tag information according to the editing operation, and updating the tag record information based on the updated target tag information, wherein the target tag information is contained in the tag record information of the current scene information.

9. The method of claim 8, wherein the obtaining the editing operation of the participating user of the current scene with respect to the target tag information comprises:

acquiring candidate editing operations of a plurality of participating users of the current scene on target tag information;

and determining an editing operation related to the target tag information from the candidate editing operations according to the user editing priority levels of the plurality of participating users.

10. The method of claim 1, wherein the method further comprises:

Acquiring editing time of at least one participant user of the current scene with respect to the at least one tag information;

and updating the label record information of the current scene information according to the at least one label information and the editing time, wherein the label record information comprises at least one label information, and each label information has at least one editing time corresponding to editing operation.

11. The method of claim 1, wherein the method further comprises:

acquiring editing operations of at least one participant user of the current scene on the at least one tag information and editing time corresponding to each editing operation;

and creating or updating a scene tag library of the current scene according to the editing operation of the at least one tag information and the editing time corresponding to each editing operation, wherein the scene tag library comprises at least one tag information, and each tag information has the editing time corresponding to at least one editing operation.

12. The method according to claim 10 or 11, wherein the method further comprises:

if a tag backtracking request of an application user about the current scene is acquired, wherein the tag backtracking request comprises scene identification information of the current scene;

Responding to the tag backtracking request, and determining at least one backtracking tag information of the current scene according to the scene identification information, wherein each backtracking tag information comprises at least one editing time;

and returning the at least one backtracking label information to the user equipment of the application user.

13. The method of claim 12, wherein the application user comprises a user not currently participating in the current scene; the returning the at least one backtracking tag information to the user equipment of the corresponding application user includes:

and returning the at least one backtracking tag information and the scene positioning information of the current scene to the user equipment of the application user, wherein the scene positioning information is used for carrying out scene initialization, and the at least one backtracking tag information is overlapped and presented on the current scene with the scene initialization completed.

14. The method of claim 12, wherein the tag backtracking request further includes a corresponding backtracking time; the determining, in response to the tag traceback request, at least one traceback tag information of the current scene according to the scene identification information includes:

And responding to the tag backtracking request, determining at least one piece of backtracking tag information of the current scene according to the scene identification information, and determining corresponding target backtracking tag information from the at least one piece of backtracking tag information according to the backtracking time, wherein the editing time of each piece of target backtracking tag information corresponds to the backtracking time.

15. A multi-person scene interaction method, wherein the method is applied to a second user device, the method comprising:

generating a scene interaction request about a current scene based on joining operation about the current scene of a second user corresponding to second user equipment, wherein the scene interaction request comprises scene identification information of the current scene; sending the scene interaction request to corresponding network equipment; receiving current scene positioning information about the current scene returned by the network equipment, and completing scene initialization about the current scene according to the current scene positioning information;

receiving label record information of a current scene about a multi-user interaction scene issued by corresponding network equipment, wherein the label record information comprises at least one piece of label information of the current scene, each piece of label information comprises corresponding label identification information and label position information, the label position information is used for indicating the scene position of the corresponding label information in the current scene, the at least one piece of label information comprises first label information about the current scene edited by a first user, and a second user corresponding to the second user equipment exists in the current scene at the same time;

Superposing and presenting at least one piece of tag information on a display device of the second user equipment according to the tag position information of the at least one piece of tag information in the tag record information and the camera pose information;

if the plurality of participating users in the current scene have the same operation object about the operation object in the current scene, the network equipment only endows one of the plurality of participating users with editing authority on the same operation object at the same time, wherein the operation object of one of the plurality of participating users comprises the same operation object;

wherein the scene locating information is acquired via the network device by performing the steps of:

responding to the scene interaction request, and based on the scene identification information, matching and determining current scene information of the current scene in a scene information base, and determining scene state information of the current scene according to the current scene information, wherein the scene state information is used for describing whether a creating user of the current scene has completed scene initialization of the current scene information or not, and the scene state information comprises initialization completion or incomplete initialization; if the scene state information contains initialization completion, acquiring the current scene positioning information according to the current scene information; if the scene state information contains incomplete initialization, temporarily failing to provide current scene positioning information about the current scene to the second user equipment; and acquiring the current scene positioning information according to the current scene information until the scene state information of the current scene is changed from incomplete initialization to complete initialization.

16. The method of claim 15, wherein the method further comprises:

receiving editing operations, which are broadcasted by other participating user equipment of the current scene and correspond to other participating users, about target tag information, and updating the target tag information according to the editing operations;

and superposing and presenting the updated target label information on a display device of the second user equipment.

17. The method of claim 15, wherein the method further comprises:

sending a tag backtracking request of the current scene to a corresponding network device, wherein the tag backtracking request comprises scene identification information of the current scene;

receiving at least one piece of backtracking label information of the current scene returned by the network equipment based on the label backtracking request, wherein each piece of backtracking label information comprises corresponding editing time;

and presenting the at least one piece of retrospective tag information, wherein a presentation time axis of the at least one piece of retrospective tag information corresponds to an editing time of each piece of retrospective tag information.

18. The method of claim 15, wherein the method further comprises:

acquiring second tag information about the current scene edited by the second user, wherein the second tag information comprises corresponding second tag identification information and second tag position information, and the second tag position information is used for indicating the scene position of the second tag information in the current scene;

And sending the second tag information to other user equipment participating in the current scene and the network equipment.

19. The method of claim 15, wherein the method further comprises:

acquiring editing operation of the second user on target tag information of the current scene, and updating the target tag information based on the editing operation;

and sending the editing operation to other user equipment and the network equipment participating in the current scene so as to update the target tag information in the other user equipment and the network equipment.

20. A method of multi-person scene interaction, wherein the method comprises:

the network equipment establishes or updates label record information of current scene information of the current scene according to the first label information, wherein the current scene information is stored in a scene information base, the scene information base comprises one or more pieces of scene information, each piece of scene information comprises corresponding scene identification information and scene positioning information, and the label record information comprises at least one piece of label information; receiving a scene interaction request about the current scene, which is sent by a corresponding second user device, wherein the scene interaction request comprises scene identification information of the current scene, and the first user and the second user corresponding to the second user device coexist in the current scene; responding to the scene interaction request, and based on the scene identification information, matching and determining current scene information of the current scene in the scene information base, and determining scene state information of the current scene according to the current scene information, wherein the scene state information is used for describing whether a creating user of the current scene has completed scene initialization of the current scene information or not, and the scene state information comprises initialization completion or incomplete initialization; if the scene state information contains initialization completion, acquiring the current scene positioning information according to the current scene information; if the scene state information contains incomplete initialization, temporarily failing to provide current scene positioning information about the current scene to the second user equipment; obtaining the current scene positioning information according to the current scene information until the scene state information of the current scene is changed from incomplete initialization to complete initialization; issuing the current scene positioning information to the second user equipment, wherein the second user equipment completes scene initialization about the current scene according to the current scene positioning information;

The second user equipment acquires shooting pose information of a shooting device in the second user equipment, receives the tag record information about the current scene of the multi-user interaction scene issued by the network equipment, and superimposes and presents at least one piece of tag information on a display device of the second user equipment according to the tag position information of the at least one piece of tag information in the tag record information and the shooting pose information; if the plurality of participating users in the current scene have the same operation object about the operation object in the current scene, the network device only endows one of the plurality of participating users with editing authority to the same operation object at the same time, wherein the operation object of one of the plurality of participating users comprises the same operation object.

21. A method of multi-person scene interaction, wherein the method comprises:

the method comprises the steps that a second user device generates a scene interaction request about a current scene based on a joining operation about the current scene of a corresponding second user, wherein the scene interaction request comprises scene identification information of the current scene; sending the scene interaction request to corresponding network equipment; receiving current scene positioning information about the current scene returned by the network equipment, and completing scene initialization about the current scene according to the current scene positioning information; acquiring shooting pose information of a shooting device in second user equipment, and receiving label record information of a current scene about a multi-user interaction scene issued by corresponding network equipment, wherein the label record information comprises at least one piece of label information of the current scene, each piece of label information comprises corresponding label identification information and label position information, the label position information is used for indicating the scene position of the corresponding label information in the current scene, the at least one piece of label information comprises first label information about the current scene edited by a first user, and the first user and a second user corresponding to the second user equipment coexist in the current scene; superposing and presenting at least one piece of tag information on a display device of the second user equipment according to tag position information and the shooting pose information of the at least one piece of tag information in the tag record information, wherein if a plurality of participating users in the current scene have the same operation object about the operation object in the current scene, the network equipment only endows one of the plurality of participating users with editing authority to the same operation object at the same time, and the operation object of one of the plurality of participating users comprises the same operation object;

the second user equipment receives the editing operation, updates the target label information according to the editing operation, and superimposes and presents the updated target label information on a display device of the second user equipment;

22. A network device for multi-person scene interactions, wherein the device comprises:

the two-module is used for creating or updating tag record information of current scene information of the current scene according to the first tag information, wherein the current scene information is stored in a scene information base, the scene information base comprises one or more pieces of scene information, each piece of scene information comprises corresponding scene identification information and scene positioning information, and the tag record information comprises at least one piece of tag information;

23. A second user device for multi-person scene interaction, wherein the device comprises:

the second and fourth modules are used for generating a scene interaction request about the current scene based on joining operation of the second user about the current scene, wherein the scene interaction request comprises scene identification information of the current scene; sending the scene interaction request to corresponding network equipment; receiving current scene positioning information about the current scene returned by the network equipment, and completing scene initialization about the current scene according to the current scene positioning information;

the second module is used for receiving label record information of a current scene about a multi-user interaction scene issued by corresponding network equipment, wherein the label record information comprises at least one piece of label information of the current scene, each piece of label information comprises corresponding label identification information and label position information, the label position information is used for indicating the scene position of the corresponding label information in the current scene, the at least one piece of label information comprises first label information about the current scene edited by a first user, and a second user corresponding to the second user equipment exists in the current scene at the same time;

The second and third modules are used for superposing and presenting at least one piece of tag information on a display device of the second user equipment according to the tag position information of the at least one piece of tag information in the tag record information and the camera pose information;

24. A computer device, wherein the device comprises:

a processor; and

a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the steps of the method of any one of claims 1 to 19.

25. A computer readable storage medium having stored thereon a computer program/instructions which, when executed, cause a system to perform the steps of the method according to any of claims 1 to 19.