CN114332417A

CN114332417A - Method, device, storage medium and program product for multi-person scene interaction

Info

Publication number: CN114332417A
Application number: CN202111517619.6A
Authority: CN
Inventors: 廖春元; 杨帆; 林祥杰; 缪琳
Original assignee: Hiscene Information Technology Co Ltd
Current assignee: Hiscene Information Technology Co Ltd
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2022-04-12
Anticipated expiration: 2041-12-13
Also published as: WO2023109153A1; CN114332417B

Abstract

The application aims to provide a multi-user scene interaction method, equipment, a storage medium and a program product, and the specific method comprises the following steps: receiving first label information which is uploaded by first user equipment and corresponds to a current scene edited by a first user and related to multi-user scene interaction, wherein the first label information comprises corresponding first label identification information and first label position information, and the label position information is used for indicating the scene position of the first label information in the current scene; and creating or updating label record information of the current scene according to the first label information. According to the method and the device, the label record information, the current scene information and the like of the current scene of multi-person scene interaction are stored through the network equipment, and the user can be quickly initialized in the current scene, so that information interaction of a plurality of users in the current scene can be conveniently and quickly realized, instant data sharing and real-time interaction are realized, and a good interaction effect can be achieved.

Description

Method, device, storage medium and program product for multi-person scene interaction

Technical Field

The application relates to the field of communication, in particular to a multi-user scene interaction technology.

Background

The virtual scene is a digital scene which is virtually constructed, such as a digital space, and the digital scene is combined with a specific physical space to establish a mapping relation, so that an augmented reality virtual scene can be obtained. In the augmented reality virtual scene mode, a user can superpose and present virtual tag information in the real-world environment through interactive operation on the basis of the real-world environment.

Disclosure of Invention

An object of the present application is to provide a method, an apparatus, a storage medium, and a program product for multi-person scene interaction.

According to an aspect of the present application, there is provided a multi-person scene interaction method, applied to a network device, the method including:

receiving first label information which is uploaded by first user equipment and corresponds to a current scene edited by a first user and related to multi-user scene interaction, wherein the first label information comprises corresponding first label identification information and first label position information, and the label position information is used for indicating the scene position of the first label information in the current scene;

and creating or updating label record information of the current scene according to the first label information, wherein the current scene information is stored in a scene information base, the scene information base comprises one or more pieces of scene information, each piece of scene information comprises corresponding scene identification information and scene positioning information, and the label record information comprises at least one piece of label information.

According to another aspect of the present application, there is provided a multi-user scene interaction method, applied to a second user equipment, the method including:

acquiring camera pose information of a camera device in second user equipment;

receiving label record information which is issued by corresponding network equipment and relates to a current scene of a multi-person interaction scene, wherein the label record information comprises at least one piece of label information of the current scene, each piece of label information comprises corresponding label identification information and label position information, and the label position information is used for indicating the scene position of the corresponding label information in the current scene;

and superposing and presenting at least one piece of label information on a display device of the second user equipment according to the label position information and the camera shooting pose information of at least one piece of label information in the label record information.

According to an aspect of the present application, a method of multi-person scene interaction is provided, wherein the method comprises:

the method comprises the steps that network equipment receives first label information which is uploaded by first user equipment and corresponds to a current scene edited by a first user and related to multi-user scene interaction, wherein the first label information comprises corresponding first label identification information and first label position information, and the label position information is used for indicating the scene position of the first label information in the current scene;

the network equipment creates or updates label record information of current scene information of the current scene according to the first label information, wherein the current scene information is stored in a scene information base, the scene information base comprises one or more pieces of scene information, each piece of scene information comprises corresponding scene identification information and scene positioning information, and the label record information comprises at least one piece of label information;

the second user equipment acquires the camera shooting pose information of a camera shooting device in the second user equipment, receives the label record information which is issued by the network equipment and is related to the current scene of the multi-person interactive scene, and displays at least one piece of label information on a display device of the second user equipment in an overlapping mode according to the label position information and the camera shooting pose information of the at least one piece of label information in the label record information.

According to another aspect of the present application, there is provided a method of multi-person scene interaction, wherein the method comprises:

the second user equipment acquires camera shooting pose information of a camera shooting device in the second user equipment, and receives label recording information which is issued by corresponding network equipment and relates to a current scene of a multi-person interactive scene, wherein the label recording information comprises at least one piece of label information of the current scene, each piece of label information comprises corresponding label identification information and label position information, and the label position information is used for indicating the scene position of the corresponding label information in the current scene; superposing and presenting at least one piece of label information on a display device of the second user equipment according to label position information and the camera shooting pose information of at least one piece of label information in the label record information;

the first user equipment acquires the editing operation of a first user on the target label information, and broadcasts the editing operation to corresponding network equipment and the second user equipment;

and the second user equipment receives the editing operation, updates the target label information according to the editing operation, and superimposes and presents the updated target label information on a display device of the second user equipment.

According to an aspect of the present application, there is provided a network device for multi-person scene interaction, wherein the device includes:

the system comprises a one-to-one module, a first module and a second module, wherein the one-to-one module is used for receiving first label information which is uploaded by first user equipment and corresponds to a current scene about multi-user scene interaction edited by a first user, the first label information comprises corresponding first label identification information and first label position information, and the label position information is used for indicating the scene position of the first label information in the current scene;

and a second module, configured to create or update label record information of current scene information of the current scene according to the first label information, where the current scene information is stored in a scene information base, the scene information base includes one or more pieces of scene information, each piece of scene information includes corresponding scene identification information and scene positioning information, and the label record information includes at least one piece of label information.

According to another aspect of the present application, there is provided a second user equipment for multi-person scene interaction, wherein the second user equipment comprises:

the first module is used for acquiring the camera shooting pose information of the camera shooting device in the second user equipment;

the second module is used for receiving label record information which is issued by corresponding network equipment and relates to a current scene of a multi-person interaction scene, wherein the label record information comprises at least one piece of label information of the current scene, each piece of label information comprises corresponding label identification information and label position information, and the label position information is used for indicating the scene position of the corresponding label information in the current scene;

and the second module and the third module are used for displaying the at least one piece of label information on the display device of the second user equipment in an overlapping mode according to the label position information and the camera shooting pose information of the at least one piece of label information in the label record information.

According to an aspect of the present application, there is provided a computer apparatus, wherein the apparatus comprises:

a processor; and

a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the steps of the method as described in any one of the above.

According to an aspect of the application, there is provided a computer readable storage medium having stored thereon a computer program/instructions, characterized in that the computer program/instructions, when executed, cause a system to perform the steps of performing the method as described in any of the above.

According to an aspect of the application, there is provided a computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the steps of the method as described in any of the above.

Compared with the prior art, the method and the device have the advantages that the label record information, the current scene information and the like of the current scene of multi-user scene interaction are stored through the network device, and the user can be quickly initialized in the current scene, so that information interaction of a plurality of users in the current scene is conveniently and quickly realized, instant data sharing and real-time interaction are realized, and a good interaction effect can be achieved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a flow diagram of a method for multi-person scene interaction, according to an embodiment of the present application;

FIG. 2 illustrates a flow diagram of a method of multi-person scene interaction, according to another embodiment of the present application;

FIG. 3 illustrates a flow diagram of a system method for multi-person scene interaction, according to an embodiment of the present application;

FIG. 4 illustrates a flow diagram of a system method for multi-person scene interaction, according to another embodiment of the present application;

FIG. 5 illustrates functional modules of a network device according to one embodiment of the present application;

FIG. 6 illustrates functional modules of a second user equipment according to an embodiment of the present application;

FIG. 7 illustrates an exemplary system that can be used to implement the various embodiments described in this application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., Central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.

The Memory may include forms of volatile Memory, Random Access Memory (RAM), and/or non-volatile Memory in a computer-readable medium, such as Read Only Memory (ROM) or Flash Memory. Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, Phase-Change Memory (PCM), Programmable Random Access Memory (PRAM), Static Random-Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other Memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

The device referred to in this application includes, but is not limited to, a user device, a network device, or a device formed by integrating a user device and a network device through a network. The user equipment includes, but is not limited to, any mobile electronic product, such as a smart phone, a tablet computer, etc., capable of performing human-computer interaction with a user (e.g., human-computer interaction through a touch panel), and the mobile electronic product may employ any operating system, such as an Android operating system, an iOS operating system, etc. The network Device includes an electronic Device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded Device, and the like. The network device includes but is not limited to a computer, a network host, a single network server, a plurality of network server sets or a cloud of a plurality of servers; here, the Cloud is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, one virtual supercomputer consisting of a collection of loosely coupled computers. Including, but not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless Ad Hoc network (Ad Hoc network), etc. Preferably, the device may also be a program running on the user device, the network device, or a device formed by integrating the user device and the network device, the touch terminal, or the network device and the touch terminal through a network.

Of course, those skilled in the art will appreciate that the foregoing is by way of example only, and that other existing or future devices, which may be suitable for use in the present application, are also encompassed within the scope of the present application and are hereby incorporated by reference.

In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

The multi-user scene interaction method is applied to an interaction system of a plurality of user equipment and network equipment, and the system supports the augmented reality interaction of a plurality of users about the same scene. The multiple users acquire and map a digital World computing System (World Coordinate System) to a physical space of a corresponding scene through corresponding augmented reality equipment, so that the multiple users can realize digital content sharing display, synchronous communication, collaborative interaction and the like through multiple terminals. The user equipment includes, but is not limited to, any mobile electronic product capable of human-computer interaction with a user (e.g., human-computer interaction through a touch panel), such as a smart phone, a tablet computer, augmented reality glasses, an augmented reality helmet, and the like. The network device includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud of multiple servers.

Fig. 1 shows a multi-person scene interaction method according to an aspect of the present application, where the method is applied to a network device, and specifically includes steps S101 and S102. In step S101, receiving first tag information, which is uploaded by a first user device and corresponds to a current scene edited by a first user about multi-user scene interaction, where the first tag information includes corresponding first tag identification information and first tag position information, and the tag position information is used to indicate a scene position of the first tag information in the current scene; in step S102, tag record information of current scene information of the current scene is newly created or updated according to the first tag information, where the current scene information is stored in a scene information base, the scene information base includes one or more pieces of scene information, each piece of scene information includes corresponding scene identification information and scene positioning information, and the tag record information includes at least one piece of tag information. The multi-user scene interaction refers to that multiple users participate in scene interaction related to a current scene, if multiple participating users exist in the current scene at the same time, the multiple users can enter the current scene at the same time and perform real-time interaction, the multiple users can be online at the same time and perform real-time interaction, the multiple users can enter the current scene successively and perform scene interaction based on scene data, and the like. When any user joins the current scene, the scene initialization is required to be performed, so that the position information of each user in the current scene is determined, and the position information can be represented by the image pickup pose information of the image pickup device of the user equipment held by the user. The current scene of the multi-person scene interaction may be the same real space where multiple users are, or different real spaces with the same or similar positioning features, etc.

Specifically, in step S101, first tag information, which is uploaded by a first user device and corresponds to a current scene edited by a first user about a multi-user scene interaction, is received, where the first tag information includes corresponding first tag identification information and first tag position information, and the tag position information is used to indicate a scene position of the first tag information in the current scene. For example, a first user holds first user equipment, the first user equipment is provided with a corresponding scene interaction application, and the first user can create or join a certain scene through the scene interaction application, so that multi-user scene interaction is realized with other users. When a first user creates or joins a current scene, the first user may perform scene initialization to acquire camera pose information of first user equipment, such as a coordinate transformation relationship between the first user equipment and a world coordinate system of the current scene, and the specific scene initialization includes, but is not limited to, three-dimensional tracking initialization, 2D identification initialization, and the like, where the three-dimensional tracking initialization includes, but is not limited to, SLAM (synchronous positioning and mapping) initialization. In some cases, the scene positioning information includes 3D point cloud information in a current scene or 2D feature information of a 2D recognition map included in the current scene, and the like, where the 2D feature information may be the 2D recognition map or feature information extracted from the 2D recognition map by a feature extraction algorithm, and is not limited herein. In some embodiments, the scene positioning information further comprises camera parameters of the first user equipment. For example, the scene positioning information includes 2D feature information, the 2D identification map is shot by the camera device, or features of the 2D identification map are further extracted by the feature extraction algorithm, so as to obtain 2D feature information corresponding to the current scene, and for example, the scene positioning information includes 3D point cloud information, after the first user equipment shoots the current scene and performs scene initialization, the current scene is continuously scanned by the camera device to obtain a corresponding real-time scene image, and the real-time scene image is processed by a tracking thread, a local mapping thread, and the like of the three-dimensional tracking algorithm, so as to obtain real-time pose information of the camera device and 3D point cloud information corresponding to the current scene. The 3D point cloud information includes 3D points, for example, the 3D point cloud information includes 3D map points determined by image matching and depth information acquisition. In some embodiments, the 3D point cloud information includes, but is not limited to, in addition to the corresponding 3D map points: key frames corresponding to the point cloud information, common view information corresponding to the point cloud information, growth tree information corresponding to the point cloud information and the like. The image pickup pose information includes a position and a posture of the image pickup device in a space, and the image position and the space position can be converted through the image pickup pose information, for example, the pose information includes an external reference of the image pickup device relative to a world coordinate system of a current scene, and for example, the pose information includes an external reference of the image pickup device relative to the world coordinate system of the current scene, and an internal reference of a camera coordinate system and an image/pixel coordinate system of the image pickup device, which is not limited herein.

After the first user equipment completes initialization of the current scene, editing operations of the first user can be collected to realize corresponding scene interaction, wherein the editing operations include but are not limited to adding, modifying or deleting tag information and the like in the current scene, and specific expression forms of the editing operations include but are not limited to user touch input, character input, voice input or gesture input and the like. In some embodiments, the tag information includes, but is not limited to: identification information; file information; form information; application call information; real-time sensory information, etc. For example, the label information may include identification information such as arrows, brushes, arbitrary graffiti on the screen, circles, geometric shapes, and the like. For example, the tag information may also include corresponding multimedia file information, such as various files like pictures, videos, 3D models, PDF files, office documents, and the like. For another example, the tag information may also include form information, such as generating a form at a corresponding target image location for a user to view or input content, etc. For example, the tag information may also include application call information, related instructions for executing the application, and the like, such as opening the application, calling a specific function of the application, such as making a phone call, opening a link, and the like. For example, the tag information may also include real-time sensing information for connecting a sensing device (e.g., a sensor) and acquiring sensing data of the target object. In some embodiments, the tag information includes corresponding tag identification information and tag location information, wherein the tag identification information includes any one of: a tag identifier, such as an icon, a name, or an acquisition link of the tag; the label content, such as the content of the PDF file, the color, size, etc. of the label. The tag position information is used to indicate a scene position of the tag information in the current scene, such as position information of the tag information in the current scene, and may be coordinates of the tag information in a coordinate system corresponding to the current scene, such as a position where the first user adds the tag information to an arbitrary position in the current scene on the captured image, and the first user device calculates three-dimensional coordinates of the tag information in a three-dimensional coordinate system corresponding to the current scene, such as world coordinates in a world coordinate system corresponding to the current scene, or image position information in the captured image captured by the first user device, according to two-dimensional coordinates of the position where the tag information is added on the captured image. Of course, those skilled in the art will appreciate that the above described tag information is merely exemplary, and that other existing or future tag information, as may be applicable to the present application, is also encompassed within the scope of the present application and is hereby incorporated by reference.

In step S102, tag record information of current scene information of the current scene is newly created or updated according to the first tag information, where the current scene information is stored in a scene information base, the scene information base includes one or more pieces of scene information, each piece of scene information includes corresponding scene identification information and scene positioning information, and the tag record information includes at least one piece of tag information. For example, the network device receives first tag information uploaded by the first user device, and creates or updates tag record information about the current scene according to the first tag information, where the tag record information includes tag information existing at the current time in the current scene. If the first label information comprises label information newly added by the first user and the current scene information of the current scene does not have the mapped label record information, establishing the label record information of the current scene information according to the first label information. If the mapped tag record information exists in the current scene, updating the tag information in the tag record information according to the first tag information, for example, adding the first tag information newly or replacing, modifying, deleting the stored first tag information according to the tag identification information.

Here, the current scene information of the current scene includes related information describing the current scene of the plurality of scene interactions, for example, scene identification information, scene positioning information, and the like of the current scene, where the scene positioning information includes 3D point cloud information or 2D feature information, and the like of the current scene. The corresponding scene positioning information may be positioning feature information of the current scene, such as 3D point cloud information or 2D feature information, acquired before or after the current scene is created. The corresponding scene database is used for storing scene information created or participated by each user, each scene information at least comprises scene identification information and scene positioning information of the scene information, and the scene identification information is used for representing the uniqueness of the corresponding scene information. In some cases, each piece of scene information is also bound with a corresponding creating user or participating user, and the scene information created or participating by the corresponding user and the like can be found by searching through the user identification information of the creating user or the participating user. The method and the device have the advantages that the label position information of each label information in the scene information stored by the network device is convenient for each participator user to calculate the image position information of the label information in the camera shooting picture shot by the participator user device based on the different camera shooting pose information and the label position information when the participator user joins the scene information, so that the label information at the current moment in the current scene is displayed on the display screen of the participator user device in a superposition mode.

In some cases, the current scene information stored in the network device is only used for performing scene initialization and presenting the current tag information when a plurality of users join the current scene, and the like, and does not relate to real-time interaction on the tag information before the plurality of users participate in the current scene information. In other cases, the network device obtains the editing operation of each participating user about the tag information in the current scene, updates the tag record information based on the corresponding editing operation, thereby realizing real-time updating and summarization of the current scene information, and simultaneously sends the updated tag record information to other participating users except the participating user to which the editing operation belongs, and the like. As in some embodiments, the method further comprises step S103 (not shown), in step S103, distributing the updated tag record information to a second user device of a second user, wherein the second user comprises a user other than the first user who is currently participating in the scene interaction of the current scene. The second user refers to a user who joins the current scene for interaction after the current scene information is established, and may be one or more users, which is not limited herein. Similarly, in addition to forwarding the first tag information, if second tag information about the current scene uploaded by the second user equipment is acquired (for example, corresponding object tags such as tags are added, modified, or deleted in the current scene, the first tag information and the second tag information may be the same tag or different tags, and the first tag and the second tag are only used for distinguishing different operation objects), the network equipment forwards the second tag information to the first user equipment and other second user equipment (other second user equipment of other second users than the currently operated second user), and so on.

In some embodiments, the method further includes step S104 (not shown), in step S104, receiving a scene creation request about the current scene, which is sent by the first user equipment; responding to the scene creating request, generating scene identification information of the current scene, creating the current scene information according to the scene identification information, and storing the current scene information in the scene information base; and returning the scene identification information of the current scene to the first user equipment. For example, the first user includes a user who initially performs scene information creation in a current scene. In some embodiments, the first user equipment sets a corresponding touch area, a touch button, or a preset operation of creating a scene in the application interactive interface, and if the first user equipment acquires that the touch area, the touch button, or the user operation corresponding to the first user about creating the scene is the same as or similar to the preset operation, the first user equipment generates a scene creation request of a corresponding current scene and sends the scene creation request to the network equipment. The manner of generating the scene creation request is only an example, and the method of generating the scene creation request includes, but is not limited to, user character input, voice input, gesture input, or the like, and is not limited herein. In some cases, the network device receives a scene creation request of the current scene, establishes a corresponding scene data format in a scene database, and gives an identifier to the current scene as scene identifier information of the current scene, where the scene identifier information is used to distinguish other scenes. After determining the corresponding scene identification information, the network device returns the corresponding scene identification information to the first user device, in some embodiments, the first user device completes initialization of the current scene, such as 2D identification initialization, and in other embodiments, the first user device completes initialization of the current scene, such as three-dimensional tracking initialization, and obtains scene positioning information of the current scene, so that the scene positioning information is subsequently stored in the scene information corresponding to the scene identification information in the network device. In other cases, the first user equipment initializes a current scene at a local end and obtains scene positioning information of the current scene, and then uploads the scene positioning information to the network equipment for creating a multi-person interactive shared scene, and the network equipment determines corresponding scene identification information and establishes the current scene information based on the scene positioning information and the scene identification information. In some embodiments, the scene creation request includes scene location information for the current scene; wherein, the generating the scene identification information of the current scene in response to the scene creating request, and creating the current scene information according to the scene identification information includes: and responding to the scene creating request, generating scene identification information of the current scene, and establishing the current scene information according to the scene positioning information and the scene identification information. In the scene creating process, the first user equipment completes scene initialization at a local end and acquires current scene positioning information of a current scene, and then requests the network equipment to create the current scene.

In some embodiments, the method further includes step S105 (not shown), in step S105, receiving a scene interaction request about the current scene, where the scene interaction request includes scene identification information of the current scene, and the scene interaction request is sent by a corresponding second user equipment; in response to the scene interaction request, matching and determining current scene positioning information of current scene information of the current scene in the scene information base based on the scene identification information; and sending the current scene positioning information to the second user equipment. For example, the second user includes a user who participates in the interaction of the current scene after the scene of the current scene is created, and at this time, the first user and the second user exist in the current scene at the same time, so that the first user and the second user can perform information interaction in the current scene, and a good interaction effect is achieved. In some embodiments, the second user equipment sets a corresponding touch area, a touch key, or a preset operation for joining a scene in the application interaction interface, and if the second user equipment acquires a touch area, a touch key, or a preset operation for joining a scene, where the touch area, the touch key, or the user operation corresponding to the second user is the same as or similar to the preset operation, the second user equipment generates a scene interaction request corresponding to the current scene, and sends the scene interaction request to the network equipment, where the scene interaction request includes scene identification information of the current scene, where the touch area, the touch key, or the preset operation are determined by the second user in relation to the current scene (for example, by searching for and inputting scene identification information), or the touch area, the touch operation, or the user operation corresponding to the second user is determined by a matching and screening method according to a second user account or a corresponding process (such as an operation, a game, or the like), and is not limited herein. The method for generating the scene interaction request is only an example, and the method for generating the scene interaction request includes, but is not limited to, user character input, voice input, or gesture input, and the like, and is not limited herein. In a general situation, the network device receives a scene interaction request of a second user about a current scene, inquires and determines current scene positioning information corresponding to the current scene information based on scene identification information of the current scene, and returns the current scene positioning information to the second user device, so that the second user device can perform scene initialization by using the current scene positioning information. In some embodiments, the current scene positioning information includes 3D point cloud information, the second user equipment captures the current scene or a scene whose similarity to the current scene satisfies a certain condition through the camera device, performs point cloud initialization by using a three-dimensional tracking algorithm and the current scene positioning information, and aligns a current scene world coordinate system with a world coordinate system of a point cloud in the scene positioning information if the initialization is successful. In other embodiments, the current scene positioning information includes 2D feature information, the first user equipment captures a 2D identification map in the current scene or other scenes through the camera device, extracts features of the 2D identification map and matches the 2D feature information in the scene positioning information, and if matching is successful, aligns a world coordinate system of the current scene with a world coordinate system of the 2D feature information in the scene positioning information. Further, if the tag record information already exists in the current scene information, in some embodiments, after the second user equipment completes initialization of the current scene, alignment between the tag position in the tag record information and the current scene is achieved, and the tag can be reproduced at a position specified by the first user when editing, for example, the network device also returns the latest tag record information of the current scene to the second user equipment, and if the tag record information only records the tag information of the current scene, the second user equipment displays one or more tag information existing at the current scene at the current time in a superimposed manner after completing initialization of the scene. For another example, the network device returns all the tag record information of the current scene to the second user device, and for example, the tag record information records all the tag information of the current scene, and after the initialization of the scene of the second user device is completed, one or more tag information of the current scene is overlaid and displayed on a screen of the second user device. The tag information in the one or more tag information may be added by a first user, or may be added by a second user who participates in the current scene interaction earlier.

In some embodiments, said determining, in response to the scene interaction request, current scene positioning information of current scene information of the current scene based on the scene identification information matching in the scene information base includes: responding to the scene interaction request, matching and determining current scene information of the current scene in the scene information base based on the scene identification information, and determining scene state information of the current scene according to the current scene information, wherein the scene state information comprises initialization completion or initialization incompletion; and if the scene state information contains initialization completion, acquiring the current scene positioning information according to the current scene information. For example, the scene state information is used to describe whether a creating user (first user) corresponding to the current scene information has completed scene initialization of the current scene. In some cases, the first user equipment acquires only scene identification information on a current scene based on a scene creation request of the first user, performs scene initialization after sending the creation request, and the like. The second user can join the current scene based on the scene identification information, and the network device selects whether to return the corresponding current scene positioning information to the second user device according to whether the current first user device completes the scene initialization. In some embodiments, when the first user equipment completes the scene initialization of the current scene, the network device determines that the scene state information in the current scene information of the current scene includes the initialization completion, and when the first user equipment does not complete the scene initialization of the current scene, the network device determines that the scene state information in the current scene information of the current scene includes the initialization completion. The method for determining the scene state information is only an example, and is not limited herein. If the scene state information contains the initialization completion, the network device may send the current scene positioning information of the current scene to the second user device. If the scene state information contains incomplete initialization, the network equipment can not provide the current scene positioning information and the like related to the current scene to the second user equipment temporarily, the network equipment sends corresponding state prompt information to the second user equipment until the current scene positioning information is acquired and issued after the initialization is completed. As in some embodiments, the method further includes step S106 (not shown), in step S106, if the scene status information includes that initialization is not completed, sending status prompt information about a current scene to the second user equipment, where the status prompt information includes prompt information about that initialization of the current scene is not completed; and obtaining the current scene positioning information according to the current scene information until the scene state information of the current scene is changed from the initialization incomplete state to the initialization complete state. For example, the status prompt message is used to prompt the second user that the current scene is not initialized, and the specific presentation form of the status prompt message includes, but is not limited to, text, voice, vibration, image, or non-display content on the screen of the second user equipment. In some embodiments, after the second user equipment obtains the scene prompt information of the current scene and includes the prompt information that the current scene initialization is not completed, the second user equipment waits for the first user equipment to perform the scene initialization, and at this time, the second user equipment cannot perform the scene interaction of the current scene. The method for determining the scene state information of the current scene by the network device may be to judge whether the current scene state information includes initialization completion by querying whether the current scene information of the current scene includes current scene positioning information, judge whether the current scene state information includes initialization completion by an initialization completion message uploaded by the first user device, judge whether the current scene state information includes initialization completion by querying an initialization progress of the first user device, and determine whether the current scene state information includes initialization completion or not, which is not limited herein. In other cases, when the second user equipment terminal presents the scene information, the corresponding scene state information may be presented while the scene identification information is presented, so as to assist the second user in deciding whether to join the current scene or not.

In some embodiments, the method further includes step S107 (not shown), in step S107, receiving second tag information about the current scene, which is uploaded by a second user device and corresponds to the second user edition, where the second tag information includes corresponding second tag identification information and second tag location information, and the second tag location information is used to indicate a scene location of the second tag information in the current scene; and creating or updating label record information of the current scene according to the second label information. For example, the network device may receive, in addition to the editing operation of the first user on the current scene, an editing operation of a second user participating in the current scene on the current scene, specifically, such as adding a new tag, modifying or deleting an existing tag, so that the first user and the second user can perform information interaction in the current scene to achieve a good interaction effect, and the network device may determine, based on the editing operation of the second user, corresponding second tag information, where the second tag is only used for distinguishing users of the edited tag, and an operation object used for identifying the tag is a second user who subsequently joins, and the like. The network device obtains the second tag information, and creates or updates the corresponding tag record information based on the second tag information, for example, if the execution of step S101 occurs after step S106, and the network device side does not store the tag information about the current scene, the network device may create the corresponding tag record information according to the second tag information. If the network device side has stored the tag record information (such as the first tag information or other second tag information edited by other second users) about the current scene mapping, the network device updates the tag record information according to the second tag information.

Here, after the second user edits the second tag, the second user equipment may send the second tag information to the network device and forward the second tag information to other first user equipment and other second user equipment participating in the current scene through the network device, and the second user equipment may also directly broadcast the second tag information to the first user equipment, other second user equipment and/or the network device, for example, the second user equipment directly broadcasts the second tag information to the first user equipment, other second user equipment and the network device, thereby reducing a delay of the first user equipment and other second user equipment obtaining the second tag information and improving a real-time performance of scene interaction. The second tag information corresponding to the editing operation may be updated complete tag information related to a certain tag information, or an operation instruction to be executed related to a certain tag information, and is used to execute a corresponding operation instruction on the tag information at each end after sending the operation instruction to update the corresponding tag information, so that each user equipment end displays the updated tag information in an overlapping manner, and real-time interaction between users participating in a current scene is realized.

In some embodiments, the second user is given editing rights with respect to the current scene information. For example, in a general situation, any participating user currently existing in the current scene can edit the scene information, and all the participating users have the editing authority of the current scene information. In some cases, in order to facilitate management of current scene information, and at the same time, in order to prevent operation overlap confusion and the like when multiple users interact, only a part of participating users are given editing authority with respect to the current scene information at the same time; wherein, the management of the editing authority includes but is not limited to the following modes: a designated mode, for example, the first user designates the editing authority of the corresponding designated second user about the current scene information through the creator of the current scene; a transfer mode, for example, when the transfer starts from the editing right of the creator first user of the current scene, the corresponding editing right is transferred from the user currently possessing the editing right to one or more transferred users; an account mode, for example, the editing authority is determined by account information of a current scene participating user; and setting a mode, for example, determining the editing authority of the participating user through the work, game and other processes corresponding to the current scene.

Further, the editing right usually means that a user has an operation right corresponding to an object at any position in the current scene information, and on this basis, the editing right may also be a limitation on the operation right of a specific object, for example, multiple participating users in the current scene may all perform editing operations on the object in the current scene, and if the operation objects of the multiple users are different, the network device may all give the multiple users the corresponding editing right; if the same operation object exists in the operation objects of the multiple users, the network device only gives one of the participating users corresponding to the same operation object a corresponding editing right, specifically, if one participating user is determined from the participating users corresponding to the same operation object according to the right level, the start time of the editing time, and the like, and only gives the participating user an operation right or the like for the same operation object at the same time, for example, if a certain user starts editing a certain object in the current scene, the user locks the object, and other users cannot edit the object without the editing right for the object.

In some embodiments, the method further includes step S108 (not shown), in step S108, obtaining an editing operation of a participating user of the current scene on target tag information, updating the target tag information according to the editing operation, and updating the tag record information based on the updated target tag information, where the target tag information is included in the tag record information of the current scene information. For example, after the second user starts to participate in scene interaction of the current scene, the second user device receives tag information or editing operation and the like sent by other devices (e.g., other user devices or network devices), and updates the tag information displayed in an overlaid manner in real time based on the received tag information or editing operation and the like. In some embodiments, the second user equipment also locally stores tag record information of the current scene, and the second user equipment updates the tag record information of the second user equipment end in real time based on the received tag information or editing operation to update the tag information that is displayed in an overlapping manner in real time, of course, the second user may also select corresponding tag information and perform editing operation on the selected target tag information based on the tag information presentation position of the second user equipment in relation to the current scene, such as modification, deletion, and the like of the target tag information, where the corresponding modification includes, but is not limited to, change of content, scaling of display size, deformation of display shape, or movement of display position. The second user equipment can broadcast the corresponding editing operation to the first user, other second users and/or network equipment, so that the delay of the first user equipment and other second user equipment for obtaining the corresponding editing operation is reduced, and the real-time performance of scene interaction is improved; or the second user equipment may send the corresponding editing operation to the network equipment, and forward the editing operation to other participating user equipments in the current scene through the network equipment, where the network equipment may directly forward the editing operation to the first user equipment and other second user equipments for the corresponding equipments to update the target tag information based on the editing operation, or the network equipment updates the target tag information according to the editing operation, and issues the updated target tag information to the first user equipment and other second user equipments to replace the original target tag information to implement tag update and the like. The user equipment that edits the tag information may broadcast the editing operation in real time to other participating user equipments and/or network equipments, or broadcast the editing operation at preset time intervals to other participating user equipments and/or network equipments, for example, every 100 milliseconds, which is not limited herein. Here, in addition to the editing operation of the second user, the process related to the tag update is also applicable to any editing operation of any participating user related to the tag of the current scene, and the like, as will be understood by those skilled in the art.

In some embodiments, the obtaining of the editing operation of the participating user of the current scene on the target tag information includes: acquiring candidate editing operations of a plurality of participating users of the current scene on target label information; determining an editing operation with respect to the target tag information from the candidate editing operations according to the user editing priority levels of the plurality of participating users. For example, in addition to the management regarding editing rights described above, the network device may perform filtering based on operations of multiple participating users regarding the same tag, so that only one user operates the same tag at the same time, and the like. Specifically, each user device collects editing operations of users on tags, and the like, and if candidate editing operations of multiple participating users on the same target tag information are received, the network device determines a final editing operation from the candidate editing operations according to user editing priority levels of the multiple participating users, where the user editing priority level is determined based on associated information of the users on the current scene, for example, whether the user is a creation user of the current scene, or a participation duration of the user participating in the current scene, or whether the user is a user with earliest editing of the target tag information, and the like, for example, in the current scene, a first user and a second user respectively click a certain target tag, and the network device determines, according to the time clicked by the first user and the second user, that the user who clicked the target tag earliest obtains the editing priority level of the target tag, the target label is edited by the user subsequently, and another user cannot edit the target label during the editing period of the target label by the user.

In some embodiments, the method further includes step S109 (not shown), in step S109, acquiring an editing time of at least one participant user of the current scene with respect to the at least one tag information; and updating the label record information of the current scene information according to the at least one label information and the editing time, wherein the label record information comprises at least one label information, and each label information has at least one editing time corresponding to the editing operation. For example, after the network device creates or updates at least one piece of tag information uploaded by the first user device or the second user device, if the editing operation of at least one participating user (e.g., the first user or the second user, etc.) on any one of the at least one piece of tag information is obtained, the editing time corresponding to each editing operation is recorded. In some cases, when the participating user starts editing, the editing time of the editing operation is recorded, and the editing time and the corresponding editing operation are uploaded to a network device or the like. Here, the corresponding editing operation includes an instantaneous operation or a continuous operation, and when the editing operation includes the corresponding instantaneous operation, the network device or each user device records an operation time node corresponding to the instantaneous operation as a corresponding editing time; when the editing operation includes a continuous operation, the network device or each user device continuously records (for example, records once every 20ms or 0.1 s) the editing operation at the corresponding time and establishes a mapping relationship between the corresponding time node and the editing operation, so as to form a continuous editing operation and an editing time corresponding to the continuous editing operation.

The network equipment updates the label information, the editing time and the like into corresponding label record information, wherein the label record information comprises at least one piece of label information, each piece of label information comprises corresponding label identification information, label position information and the like, and the editing time corresponding to the editing operation is also included. For each piece of tag information, at least one editing operation (for example, an operation when the tag is added) exists, and then each piece of tag information has an editing time corresponding to the at least one editing operation; if a plurality of editing operations exist in one label information, the states of the label information are arranged according to the editing time corresponding to each editing operation to form a complete label time axis, the label state information of the label information on the time axis is presented along with the time axis, the specific label state information is used for describing the label information corresponding to the time, such as the position, the size, the shape, the color and the like of a label, and the corresponding label state information changes after the label information is edited every time. In some embodiments, the tag recording information records all tag information of the current scene, and when the network device receives a join request from another user, the network device determines the tag information corresponding to the current time from the tag recording information and returns the tag information corresponding to the current time to the other user who joins the request, so that the other user device displays one or more tag information existing at the current time of the current scene in a superimposed manner after the scene initialization is completed.

In some embodiments, the method further includes step S110 (not shown), in step S110, acquiring editing operations of at least one participating user of the current scene with respect to the at least one tag information and an editing time corresponding to each editing operation; and newly building or updating a scene tag library of the current scene according to the editing operation of the at least one tag information and the editing time corresponding to each editing operation, wherein the scene tag library comprises at least one tag information, and each tag information has the editing time corresponding to at least one editing operation. For example, the tag recording information in the current scene information of the network device is used to record the tag information existing at the current moment of the current scene, the adding and changing process of each tag information is stored in the corresponding scene tag library, the scene tag library corresponds to the scene identification information, and at least one tag information of the corresponding scene can be called through the corresponding scene identification information. In some embodiments, the scene tag library may be a storage space independent from the tag recording information, which facilitates orderly management and reasonable resource allocation of the related information of the current scene. At least one piece of label information stored in the scene label library comprises label identification information, label position information, editing time and the like.

In some embodiments, the method further includes step S111 (not shown), in step S111, if a tag backtracking request of the application user about the current scene is obtained, where the tag backtracking request includes scene identification information of the current scene; responding to the label backtracking request, and determining at least one piece of backtracking label information of the current scene according to the scene identification information, wherein each piece of backtracking label information comprises at least one piece of editing time; and returning the at least one retrospective tag information to the user equipment of the application user. For example, the network device stores the tag information corresponding to the editing time, and the application user can call the tag information corresponding to the time axis and the tag state information, so that the user can conveniently backtrack the tag of the current scene. The tag backtracking request is used for requesting the overlapping and presenting of the formation of part or all of tag information in the current scene, the change process along with the time axis and the like. The user sending the tag backtracking request includes an application user in the multi-person scene interaction application, which may be a user participating in current scene interaction, or a user not participating in current scene interaction, for example, the application user may view a scene information list displayed on an interface, and may select the current scene information through operations such as touch control, so as to send a corresponding tag backtracking request to the network device, so as to backtrack formation and changes of all or part of tag information in the current scene information. The label backtracking request comprises corresponding scene identification information and is used for backtracking at least one piece of label information pointing to the current scene; in some cases, the tag backtracking request further includes user identification information corresponding to the participating user, and is used to request backtracking of tag information edited by the participating user corresponding to the specific participating user identification information in the current scene; in other cases, the tag backtracking request further includes one or more tag identification information, for requesting backtracking of a particular one or more tag information in the current scenario, and the like. If the corresponding application user is a participating user in the current scene, the network device determines that the participating user has performed scene initialization, and directly returns the corresponding backtracking tag information to the user device of the participating user, so that the user device displays the backtracking tag information in a time axis manner in a superposition manner at the corresponding position of the current scene. If the corresponding application user is a user who does not participate in the current scene, the network device may also directly return the corresponding backtracking tag information to the user device of the user, so that the user can watch the backtracking tag information through the corresponding user device, for example, the backtracking tag information is displayed in a manner of time axis superposition in the current scene of the user device; in other cases, if the corresponding application user is a user who does not participate in the current scene, and the user wants to superimpose and display the corresponding backtracking tag information in the real space corresponding to the current scene determined by the scene identification information, the network device returns the backtracking tag information to the user, and also returns scene positioning information of the current scene to the corresponding user, so that after the user device completes scene initialization in the current scene determined by the scene identification information, the backtracking tag information is superimposed and displayed at the corresponding position of the current scene in a time axis manner. As in some embodiments, the application users include users that are not currently participating in the current scenario; wherein, the returning the at least one retrospective tag information to the user equipment of the corresponding application user includes: and returning the at least one piece of backtracking label information and the scene positioning information of the current scene to the user equipment of the application user, wherein the scene positioning information is used for carrying out scene initialization, and the at least one piece of backtracking label information is superposed and presented on the current scene after the scene initialization is finished.

After the retroactive tag information is sent to the corresponding user equipment, the retroactive tag information may be directly subjected to superposition presentation, for example, the retroactive tag information is superposed and presented in a time axis manner in a scene where the user equipment is currently located; the information can also be presented in an augmented reality manner, for example, if the current user is in a real space that is the same as or similar to the current scene corresponding to the current scene identification information, the network device sends the scene positioning information corresponding to the current scene to the user device, the user device shoots the current scene and performs scene initialization through the scene positioning information to obtain corresponding shooting pose information, and calculates the screen position information of the backtracking tag information based on the shooting pose information and the tag position information of the backtracking tag information, so that the backtracking tag information is superimposed on the real space corresponding to the current scene.

In some embodiments, the tag backtracking request further includes a corresponding backtracking time; wherein, the determining at least one piece of backtracking tag information of the current scene according to the scene identification information in response to the tag backtracking request includes: and responding to the tag backtracking request, determining at least one piece of backtracking tag information of the current scene according to the scene identification information, and determining corresponding target backtracking tag information from the at least one piece of backtracking tag information according to the backtracking time information, wherein the editing time of each piece of target backtracking tag information corresponds to the backtracking time. For example, the corresponding tag backtracking request may further include a corresponding backtracking time, which may be one time node or one time interval on the time axis, and the like. Based on the corresponding time node, the network device may determine the tag information existing at the time node in the current scene, or the network device may determine the tag information existing in a preset time period before or after the time node in the current scene, or the network device may determine the tag information existing in the last editing time period from the time node to the current scene in the current scene, and use the tag information as the corresponding backtracking tag information, the network device sends the backtracking tag information to the corresponding user device, and the user device presents the backtracking tag information through the corresponding display device after receiving the corresponding backtracking tag information. Or, based on the corresponding time interval, the network device may determine the tag information existing in the time interval, and return the tag information in the time interval corresponding to the tag information as the corresponding backtracking tag information to the user device, where the user device presents the backtracking tag information at each time in the corresponding time interval in a manner of sequentially playing the time axis.

Fig. 2 shows a multi-user scene interaction method applied to a second user equipment, according to another aspect of the present application, the method includes step S201, step S202, and step S203. In step S201, acquiring imaging pose information of an imaging device in the second user equipment; in step S202, receiving label record information about a current scene of a multi-user interaction scene, where the label record information includes at least one piece of label information of the current scene, each piece of label information includes corresponding label identification information and label position information, and the label position information is used to indicate a scene position of the corresponding label information in the current scene; in step S203, the at least one piece of tag information is superimposed and presented on the display device of the second user equipment according to the tag position information and the camera pose information of the at least one piece of tag information in the tag recording information.

For example, the second user includes a user joining the current scene interaction after the scene creation of the current scene, when other participating users exist in the current scene at the same time. The second user equipment may obtain the camera pose information of the camera device, for example, the second user equipment shoots the current scene or a scene with similarity meeting a certain condition with the current scene through the camera device, and performs point cloud initialization by using a three-dimensional tracking algorithm and the current scene positioning information, and if the initialization is successful, obtains the camera pose information of the camera device. For another example, the first user equipment shoots the 2D identification map in the current scene or other scenes through the camera device, extracts the features of the 2D identification map and matches the features with the 2D feature information in the scene positioning information, and if matching is successful, initialization is successful, and camera pose information of the camera device is obtained. In some embodiments, the network device further returns the latest tag record information of the current scene to the second user device, for example, the tag record information only records the tag information of the current scene at the current time, so that the second user device displays one or more tag information existing at the current time of the current scene in an overlapping manner after the scene initialization is completed. In other embodiments, the network device returns all the tag record information of the current scene to the second user device, for example, the tag record information records all the tag information of the current scene, and after the initialization of the scene of the second user device is completed, one or more tag information of all the current scene is superimposed and presented on a screen of the second user device, where the tag information of the one or more tag information may be added by the first user or added by other second users who participate in the interaction of the current scene earlier.

In some embodiments, the method further includes step S204 (not shown), in step S204, generating a scene interaction request regarding the current scene based on a join operation of the second user regarding the current scene, wherein the scene interaction request includes scene identification information of the current scene; sending the scene interaction request to corresponding network equipment; and receiving current scene positioning information which is returned by the network equipment and is related to the current scene, and finishing the scene initialization related to the current scene according to the current scene positioning information. For example, the second user equipment sets a corresponding touch area, a touch button, or a preset operation for joining a scene in the application interaction interface, and if the second user equipment acquires a touch area, a touch button, or a preset operation for joining a scene, where the touch area, the touch button, or the user operation corresponding to the second user is the same as or similar to the preset operation, the second user equipment generates a scene interaction request corresponding to the current scene, and sends the scene interaction request to the network device, where the scene interaction request includes scene identification information of the current scene, if the second user equipment acquires the current scene in relation to the current scene (for example, the current scene is determined by searching and inputting scene identification information, or if the current scene is determined by matching and screening according to a second user account or a corresponding process (such as a job, a game, or the like), and the touch operation corresponding to the second user is not limited herein. The method for generating the scene interaction request is only an example, and the method for generating the scene interaction request includes, but is not limited to, user character input, voice input, or gesture input, and the like, and is not limited herein. In a general situation, the network device receives a scene interaction request of a second user about a current scene, determines corresponding current scene positioning information based on scene identification information query, and returns the current scene positioning information to the second user device, so that the second user device can perform scene initialization by using the current scene positioning information.

In some embodiments, the method further includes step S205 (not shown), in step S105, receiving an editing operation broadcast by other participating user equipments of the current scene and corresponding to target tag information of other participating users, and updating the target tag information according to the editing operation; and superposing and presenting the updated target label information on the display device of the second user equipment. For example, after the second user starts to participate in the scene interaction of the current scene, the second user equipment receives the tag information or the editing operation and the like broadcast and transmitted by other participating user equipment, and updates the tag information displayed in an overlapping manner in real time based on the received tag information or the editing operation and the like. In some embodiments, the second user equipment also locally stores tag record information of the current scene, and the second user equipment updates the tag record information of the second user equipment end in real time based on the received tag information or editing operation to update the tag information that is displayed in an overlapping manner in real time, of course, the second user may also select corresponding tag information and perform editing operation on the selected target tag information based on the tag information presentation position of the second user equipment in relation to the current scene, such as modification, deletion, and the like of the target tag information, where the corresponding modification includes, but is not limited to, change of content, scaling of display size, deformation of display shape, or movement of display position. The second user equipment may broadcast the corresponding editing operation to the first user, other second users and/or the network equipment, where the second user equipment may directly send the editing operation broadcast to the first user equipment and other second user equipment for the corresponding equipment to update the target tag information based on the editing operation, or the second user equipment updates the target tag information according to the editing operation and sends the updated target tag information broadcast to the first user equipment and other second user equipment to replace the original target tag information to implement tag update and the like. The user equipment that edits the tag information may broadcast the editing operation in real time to other participating user equipments and/or network equipments, or broadcast the editing operation at preset time intervals to other participating user equipments and/or network equipments, for example, every 100 milliseconds, which is not limited herein. Here, in addition to the editing operation of the second user, the process related to the tag update is also applicable to any editing operation of any participating user related to the tag of the current scene, and the like, as will be understood by those skilled in the art.

In some embodiments, the method further includes step S206 (not shown), in step S206, sending a tag backtracking request of the current scenario to a corresponding network device, where the tag backtracking request includes scenario identification information of the current scenario; receiving at least one piece of backtracking label information of the current scene, which is returned by the network device based on the label backtracking request, wherein each piece of backtracking label information comprises corresponding editing time; and presenting the at least one piece of backtracking label information, wherein the presentation time axis corresponds to the editing time of each piece of backtracking label information. For example, the network device stores the tag information corresponding to the editing time, and the second user can call the tag information corresponding to the time axis and the tag state information, so that the second user can conveniently backtrack the tag of the current scene. The tag backtracking request is used for requesting the formation of part or all of tag information in the current scene, the change process along with a time axis and the like. For example, the second user may view a scene information list displayed on the interface, and may select the current scene information through operations such as touch control, so as to send a corresponding tag backtracking request to the network device, so as to backtrack formation and changes of all or part of tag information in the current scene information. The label backtracking request comprises corresponding scene identification information and is used for backtracking at least one piece of label information pointing to the current scene; in some cases, the tag backtracking request further includes user identification information corresponding to the participating user, and is used to request backtracking of tag information edited by the participating user corresponding to the specific participating user identification information in the current scene; in other cases, the tag backtracking request further includes one or more tag identification information, for requesting backtracking of a particular one or more tag information in the current scenario, and the like. In some embodiments, the network device determines that the second user has performed scene initialization of the current scene, and directly returns the corresponding backtracking tag information to the user device of the second user, so that the user device superimposes and presents the backtracking tag information at the corresponding position of the current scene in a time axis manner, where the time axis corresponds to the editing time of the backtracking tag.

In some embodiments, the method further includes step S207 (not shown), in step S207, obtaining second tag information about the current scene edited by the second user, where the second tag information includes corresponding second tag identification information and second tag position information, and the second tag position information is used to indicate a scene position of the second tag information in the current scene; and sending the second label information to other user equipment participating in the current scene and the network equipment.

. For example, the network device may receive, in addition to the editing operation of the first user on the current scene, an editing operation of a second user participating in the current scene on the current scene, specifically, such as adding a new tag, modifying or deleting an existing tag, so that the first user and the second user can perform information interaction in the current scene to achieve a good interaction effect, and the network device may determine, based on the editing operation of the second user, corresponding second tag information, where the second tag is only used for distinguishing users of the edited tag, and an operation object used for identifying the tag is a second user who subsequently joins, and the like. And the network equipment acquires the second label information and updates the label record information corresponding to the current scene based on the second label information. Here, after the second user edits the second tag, the second user equipment may send the second tag information to the network device and forward the second tag information to other first user equipment and other second user equipment participating in the current scene through the network device, and the second user equipment may also directly broadcast the second tag information to the first user equipment, other second user equipment and the network device, thereby reducing delay of the first user equipment and other second user equipment in obtaining the second tag information and improving real-time performance of scene interaction. The second tag information corresponding to the editing operation may be updated complete tag information related to a certain tag information, or an operation instruction to be executed related to a certain tag information, and is used to execute a corresponding operation instruction on the tag information at each end after sending the operation instruction to update the corresponding tag information, so that each user equipment end displays the updated tag information in an overlapping manner, and real-time interaction between users participating in a current scene is realized.

In some embodiments, the method further includes step S208 (not shown), in step S208, acquiring an editing operation of the second user on the target tag information of the current scene, and updating the target tag information based on the editing operation; and sending the editing operation to other user equipment participating in the current scene and the network equipment so as to update the target label information in the other user equipment and the network equipment. For example, after the second user starts to participate in the scene interaction of the current scene, the second user may select corresponding target tag information and perform an editing operation on the selected target tag information, such as modification, deletion, and the like of the target tag information, based on the tag information presentation position of the second user device in the current scene, where the corresponding modification includes, but is not limited to, a change to content, a zoom to display size, a deformation to display shape, a movement to display position, and the like. The second user equipment can send the corresponding editing operation broadcast to other user equipment and network equipment participating in the current scene interaction; or the second user equipment may send the corresponding editing operation to the network equipment, and forward the editing operation to other participating user equipments in the current scene through the network equipment, where the second user equipment may directly send the editing operation to the other participating user equipments and the network equipment for the corresponding equipment to update the target tag information based on the editing operation, and for example, the second user equipment updates the target tag information according to the editing operation, and sends the updated target tag information to the other user equipments and the network equipment to replace the original target tag information to implement target tag update and the like. In some embodiments, the network device updates the tag record information in the current scene according to the editing operation sent by the second user.

FIG. 3 illustrates a method of multi-person scene interaction, according to an aspect of the subject application, wherein the method comprises:

FIG. 4 illustrates a method of multi-person scene interaction, according to another aspect of the subject application, wherein the method comprises:

In addition, the present application also provides a specific device capable of implementing the above embodiments, and we refer to fig. 5 and fig. 6 below.

Fig. 5 shows a network device for multi-person scene interaction according to an aspect of the present application, which specifically includes a one-module 101 and a two-module 102. A one-to-one module 101, configured to receive first tag information, which is uploaded by a first user device and corresponds to a current scene about multi-user scene interaction and edited by a first user, where the first tag information includes corresponding first tag identification information and first tag position information, and the tag position information is used to indicate a scene position of the first tag information in the current scene; a second module 102, configured to create or update label record information of current scene information of the current scene according to the first label information, where the current scene information is stored in a scene information base, the scene information base includes one or more pieces of scene information, each piece of scene information includes corresponding scene identification information and scene positioning information, and the label record information includes at least one piece of label information.

Here, the specific implementation corresponding to the one-to-one module 101 and the two-to-two module 102 shown in fig. 5 is the same as or similar to the embodiment of the step S101 and the step S102 shown in fig. 1, and thus is not repeated here, and is included herein by way of reference.

In some embodiments, the apparatus further includes a third module (not shown) for distributing the updated tag record information to a second user device of a second user, wherein the second user includes a user other than the first user who is currently participating in the scene interaction of the current scene.

In some embodiments, the apparatus further includes a fourth module (not shown) for receiving a scene creation request sent by the first user equipment regarding the current scene; responding to the scene creating request, generating scene identification information of the current scene, creating the current scene information according to the scene identification information, and storing the current scene information in the scene information base; and returning the scene identification information of the current scene to the first user equipment. In some embodiments, the scene creation request includes scene location information for the current scene; wherein, the generating the scene identification information of the current scene in response to the scene creating request, and creating the current scene information according to the scene identification information includes: and responding to the scene creating request, generating scene identification information of the current scene, and establishing the current scene information according to the scene positioning information and the scene identification information.

In some embodiments, the apparatus further includes a fifth module (not shown) for receiving a scene interaction request regarding the current scene, where the scene interaction request includes scene identification information of the current scene, and the scene interaction request is sent by a corresponding second user equipment; in response to the scene interaction request, matching and determining current scene positioning information of current scene information of the current scene in the scene information base based on the scene identification information; and sending the current scene positioning information to the second user equipment.

In some embodiments, said determining, in response to the scene interaction request, current scene positioning information of current scene information of the current scene based on the scene identification information matching in the scene information base includes: responding to the scene interaction request, matching and determining current scene information of the current scene in the scene information base based on the scene identification information, and determining scene state information of the current scene according to the current scene information, wherein the scene state information comprises initialization completion or initialization incompletion; and if the scene state information contains initialization completion, acquiring the current scene positioning information according to the current scene information. In some embodiments, the apparatus further includes a sixth module (not shown), if the scene status information includes that initialization is not complete, sending status prompt information about a current scene to the second user equipment, where the status prompt information includes prompt information about that initialization of the current scene is not complete; and obtaining the current scene positioning information according to the current scene information until the scene state information of the current scene is changed from the initialization incomplete state to the initialization complete state.

In some embodiments, the apparatus further includes a seventh module (not shown) configured to receive second tag information about the current scene, which is uploaded by a second user equipment and corresponds to the second user for editing, where the second tag information includes corresponding second tag identification information and second tag location information, and the second tag location information is used to indicate a scene location of the second tag information in the current scene; and creating or updating label record information of the current scene according to the second label information.

In some embodiments, the second user is given editing rights with respect to the current scene information.

In some embodiments, the apparatus further includes an eight module (not shown) configured to obtain an editing operation of a participating user of the current scene on target tag information, update the target tag information according to the editing operation, and update the tag record information based on the updated target tag information, where the target tag information is included in the tag record information of the current scene information.

In some embodiments, the obtaining of the editing operation of the participating user of the current scene on the target tag information includes: acquiring candidate editing operations of a plurality of participating users of the current scene on target label information; determining an editing operation with respect to the target tag information from the candidate editing operations according to the user editing priority levels of the plurality of participating users.

In some embodiments, the apparatus further includes a nine module (not shown) for obtaining an edit time of the at least one participant user of the current scene with respect to the at least one tag information; and updating the label record information of the current scene information according to the at least one label information and the editing time, wherein the label record information comprises at least one label information, and each label information has at least one editing time corresponding to the editing operation.

In some embodiments, the apparatus further includes a tenth module (not shown) configured to obtain editing operations of at least one participating user of the current scene with respect to the at least one tag information and an editing time corresponding to each editing operation; and newly building or updating a scene tag library of the current scene according to the editing operation of the at least one tag information and the editing time corresponding to each editing operation, wherein the scene tag library comprises at least one tag information, and each tag information has the editing time corresponding to at least one editing operation.

In some embodiments, the apparatus further includes an eleven module (not shown), configured to, if a tag backtracking request of an application user about the current scene is obtained, obtain the tag backtracking request, where the tag backtracking request includes scene identification information of the current scene; responding to the label backtracking request, and determining at least one piece of backtracking label information of the current scene according to the scene identification information, wherein each piece of backtracking label information comprises at least one piece of editing time; and returning the at least one retrospective tag information to the user equipment of the application user. In some embodiments, the application users include users that are not currently participating in the current scenario; wherein, the returning the at least one retrospective tag information to the user equipment of the corresponding application user includes: and returning the at least one piece of backtracking label information and the scene positioning information of the current scene to the user equipment of the application user, wherein the scene positioning information is used for carrying out scene initialization, and the at least one piece of backtracking label information is superposed and presented on the current scene after the scene initialization is finished.

In some embodiments, the tag backtracking request further includes a corresponding backtracking time; wherein, the determining at least one piece of backtracking tag information of the current scene according to the scene identification information in response to the tag backtracking request includes: and responding to the tag backtracking request, determining at least one piece of backtracking tag information of the current scene according to the scene identification information, and determining corresponding target backtracking tag information from the at least one piece of backtracking tag information according to the backtracking time information, wherein the editing time of each piece of target backtracking tag information corresponds to the backtracking time.

Here, the specific implementation corresponding to the three-module to the eleven-module is the same as or similar to the embodiment of the steps S103 to S111, and thus is not repeated herein and is included herein by reference.

Fig. 6 shows a second user equipment for multi-person scene interaction according to another aspect of the present application, which includes a two-to-one module 201, a two-to-two module 202, and a two-to-three module 203. A first module 201, configured to obtain shooting pose information of a shooting device in second user equipment; a second module 202, configured to receive label record information, which is sent by a corresponding network device and is related to a current scene of a multi-user interaction scene, where the label record information includes at least one piece of label information of the current scene, each piece of label information includes corresponding label identification information and label position information, and the label position information is used to indicate a scene position of the corresponding label information in the current scene; a second and third module 203, configured to superimpose and present at least one piece of tag information on a display device of the second user equipment according to the tag position information and the camera pose information of at least one piece of tag information in the tag record information.

Here, the specific implementation corresponding to the two-in-one module 201, the two-in-two module 202, and the two-in-three module 203 shown in fig. 6 is the same as or similar to the embodiment of the step S201, the step S202, and the step S203 shown in fig. 2, and therefore, the detailed description is not repeated and is included herein by reference.

In some embodiments, the apparatus further includes a twenty-four module (not shown) configured to generate a scene interaction request regarding the current scene based on a join operation of the second user regarding the current scene, wherein the scene interaction request includes scene identification information of the current scene; sending the scene interaction request to corresponding network equipment; and receiving current scene positioning information which is returned by the network equipment and is related to the current scene, and finishing the scene initialization related to the current scene according to the current scene positioning information.

In some embodiments, the apparatus further includes a twenty-five module (not shown) configured to receive an editing operation, broadcasted by other participating user devices of the current scene, corresponding to target tag information of other participating users, and update the target tag information according to the editing operation; and superposing and presenting the updated target label information on the display device of the second user equipment.

In some embodiments, the device further includes a twenty-six module (not shown) configured to send a tag backtracking request of the current scene to a corresponding network device, where the tag backtracking request includes scene identification information of the current scene; receiving at least one piece of backtracking label information of the current scene, which is returned by the network device based on the label backtracking request, wherein each piece of backtracking label information comprises corresponding editing time; and presenting the at least one piece of backtracking label information, wherein the presentation time axis corresponds to the editing time of each piece of backtracking label information. In some embodiments, each of the at least one retroactive tag information further comprises corresponding tag location information; wherein the presenting the at least one retrospective tag information comprises: determining currently presented current backtracking label information according to the current presentation time of the presentation time axis, wherein the current presentation time is adaptive to the editing time; determining the display position information of the current backtracking label information in a display device according to the camera position information and the label position information of the current backtracking label information; and presenting the current backtracking label information in a display device based on the presentation position information.

In some embodiments, the apparatus further includes a seventeenth module (not shown) configured to obtain second tag information about the current scene edited by the second user, where the second tag information includes corresponding second tag identification information and second tag position information, and the second tag position information is used to indicate a scene position of the second tag information in the current scene; and sending the second label information to other user equipment participating in the current scene and the network equipment. In some embodiments, the apparatus further includes a twenty-eight module (not shown) configured to obtain an editing operation of the second user on the target tag information of the current scene, and update the target tag information based on the editing operation; and sending the editing operation to other user equipment participating in the current scene and the network equipment so as to update the target label information in the other user equipment and the network equipment.

Here, the specific implementation corresponding to the two-four to two-eight modules is the same as or similar to the embodiment of the foregoing step S204 to step S208, and thus is not repeated here, and is included herein by way of reference.

In addition to the methods and apparatus described in the embodiments above, the present application also provides a computer readable storage medium storing computer code that, when executed, performs the method as described in any of the preceding claims.

The present application also provides a computer program product, which when executed by a computer device, performs the method of any of the preceding claims.

The present application further provides a computer device, comprising:

one or more processors;

a memory for storing one or more computer programs;

the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.

FIG. 7 illustrates an exemplary system that can be used to implement the various embodiments described herein;

in some embodiments, as shown in FIG. 7, the system 300 can be implemented as any of the above-described devices in the various embodiments. In some embodiments, system 300 may include one or more computer-readable media (e.g., system memory or NVM/storage 320) having instructions and one or more processors (e.g., processor(s) 305) coupled with the one or more computer-readable media and configured to execute the instructions to implement modules to perform the actions described herein.

For one embodiment, system control module 310 may include any suitable interface controllers to provide any suitable interface to at least one of processor(s) 305 and/or any suitable device or component in communication with system control module 310.

The system control module 310 may include a memory controller module 330 to provide an interface to the system memory 315. Memory controller module 330 may be a hardware module, a software module, and/or a firmware module.

System memory 315 may be used, for example, to load and store data and/or instructions for system 300. For one embodiment, system memory 315 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, the system memory 315 may include a double data rate type four synchronous dynamic random access memory (DDR4 SDRAM).

For one embodiment, system control module 310 may include one or more input/output (I/O) controllers to provide an interface to NVM/storage 320 and communication interface(s) 325.

For example, NVM/storage 320 may be used to store data and/or instructions. NVM/storage 320 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).

NVM/storage 320 may include storage resources that are physically part of the device on which system 300 is installed or may be accessed by the device and not necessarily part of the device. For example, NVM/storage 320 may be accessible over a network via communication interface(s) 325.

Communication interface(s) 325 may provide an interface for system 300 to communicate over one or more networks and/or with any other suitable device. System 300 may wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols.

For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) (e.g., memory controller module 330) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) of the system control module 310 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310 to form a system on a chip (SoC).

In various embodiments, system 300 may be, but is not limited to being: a server, a workstation, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.). In various embodiments, system 300 may have more or fewer components and/or different architectures. For example, in some embodiments, system 300 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Those skilled in the art will appreciate that the form in which the computer program instructions reside on a computer-readable medium includes, but is not limited to, source files, executable files, installation package files, and the like, and that the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Computer-readable media herein can be any available computer-readable storage media or communication media that can be accessed by a computer.

Communication media includes media by which communication signals, including, for example, computer readable instructions, data structures, program modules, or other data, are transmitted from one system to another. Communication media may include conductive transmission media such as cables and wires (e.g., fiber optics, coaxial, etc.) and wireless (non-conductive transmission) media capable of propagating energy waves such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules, or other data may be embodied in a modulated data signal, for example, in a wireless medium such as a carrier wave or similar mechanism such as is embodied as part of spread spectrum techniques. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.

By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM); and non-volatile memory such as flash memory, various read-only memories (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM); and magnetic and optical storage devices (hard disk, tape, CD, DVD); or other now known media or later developed that can store computer-readable information/data for use by a computer system.

An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A multi-person scene interaction method is applied to network equipment and comprises the following steps:

2. The method of claim 1, wherein the method further comprises:

and distributing the updated label record information to second user equipment of second users, wherein the second users comprise users except the first users who are currently participating in scene interaction of the current scene.

3. The method of claim 1, wherein the method further comprises:

receiving a scene creating request about the current scene, which is sent by the first user equipment;

responding to the scene creating request, generating scene identification information of the current scene, creating the current scene information according to the scene identification information, and storing the current scene information in the scene information base;

and returning the scene identification information of the current scene to the first user equipment.

4. The method of claim 3, wherein the scene creation request includes scene location information for the current scene;

wherein, the generating the scene identification information of the current scene in response to the scene creating request, and creating the current scene information according to the scene identification information includes:

and responding to the scene creating request, generating scene identification information of the current scene, and establishing the current scene information according to the scene positioning information and the scene identification information.

5. The method of claim 1, wherein the method further comprises:

receiving a scene interaction request which is sent by corresponding second user equipment and is about the current scene, wherein the scene interaction request comprises scene identification information of the current scene;

in response to the scene interaction request, matching and determining current scene positioning information of current scene information of the current scene in the scene information base based on the scene identification information;

and sending the current scene positioning information to the second user equipment.

6. The method of claim 5, wherein said matching, in response to the scene interaction request, current scene location information that determines current scene information of the current scene based on the scene identification information in the scene information repository comprises:

responding to the scene interaction request, matching and determining current scene information of the current scene in the scene information base based on the scene identification information, and determining scene state information of the current scene according to the current scene information, wherein the scene state information comprises initialization completion or initialization incompletion;

and if the scene state information contains initialization completion, acquiring the current scene positioning information according to the current scene information.

7. The method of claim 6, wherein the method further comprises:

if the scene state information contains incomplete initialization, sending state prompt information about the current scene to the second user equipment, wherein the state prompt information comprises prompt information about incomplete initialization of the current scene;

and obtaining the current scene positioning information according to the current scene information until the scene state information of the current scene is changed from the initialization incomplete state to the initialization complete state.

8. The method of any of claims 1 to 7, wherein the method further comprises:

receiving second label information which is uploaded by second user equipment and corresponds to a current scene edited by a second user, wherein the second label information comprises corresponding second label identification information and second label position information, and the second label position information is used for indicating the scene position of the second label information in the current scene;

and creating or updating label record information of the current scene according to the second label information.

9. The method of claim 8, wherein the second user is given an editing right with respect to the current scene information.

10. The method of claim 1, wherein the method further comprises:

and acquiring editing operation of a participating user of the current scene on target label information, updating the target label information according to the editing operation, and updating the label record information based on the updated target label information, wherein the target label information is contained in the label record information of the current scene information.

11. The method of claim 10, wherein the obtaining of the editing operation of the participating user of the current scene on the target tag information comprises:

acquiring candidate editing operations of a plurality of participating users of the current scene on target label information;

determining an editing operation with respect to the target tag information from the candidate editing operations according to the user editing priority levels of the plurality of participating users.

12. The method of claim 1, wherein the method further comprises:

acquiring the editing time of at least one participant user of the current scene about the at least one tag information;

and updating the label record information of the current scene information according to the at least one label information and the editing time, wherein the label record information comprises at least one label information, and each label information has at least one editing time corresponding to the editing operation.

13. The method of claim 1, wherein the method further comprises:

acquiring editing operation of at least one participating user of the current scene on the at least one piece of label information and editing time corresponding to each editing operation;

and newly building or updating a scene tag library of the current scene according to the editing operation of the at least one tag information and the editing time corresponding to each editing operation, wherein the scene tag library comprises at least one tag information, and each tag information has the editing time corresponding to at least one editing operation.

14. The method of claim 12 or 13, wherein the method further comprises:

if a label backtracking request of an application user about the current scene is acquired, wherein the label backtracking request comprises scene identification information of the current scene;

responding to the label backtracking request, and determining at least one piece of backtracking label information of the current scene according to the scene identification information, wherein each piece of backtracking label information comprises at least one piece of editing time;

and returning the at least one retrospective tag information to the user equipment of the application user.

15. The method of claim 14, wherein the application users include users that are not currently participating in the current scenario; wherein, the returning the at least one retrospective tag information to the user equipment of the corresponding application user includes:

and returning the at least one piece of backtracking label information and the scene positioning information of the current scene to the user equipment of the application user, wherein the scene positioning information is used for carrying out scene initialization, and the at least one piece of backtracking label information is superposed and presented on the current scene after the scene initialization is finished.

16. The method of claim 14, wherein the tag backtracking request further comprises a corresponding backtracking time; wherein, the determining at least one piece of backtracking tag information of the current scene according to the scene identification information in response to the tag backtracking request includes:

and responding to the tag backtracking request, determining at least one piece of backtracking tag information of the current scene according to the scene identification information, and determining corresponding target backtracking tag information from the at least one piece of backtracking tag information according to the backtracking time information, wherein the editing time of each piece of target backtracking tag information corresponds to the backtracking time.

17. A multi-user scene interaction method is applied to a second user device, and comprises the following steps:

acquiring camera pose information of a camera device in second user equipment;

18. The method of claim 17, wherein the method further comprises:

generating a scene interaction request about the current scene based on a joining operation of the second user about the current scene, wherein the scene interaction request comprises scene identification information of the current scene;

sending the scene interaction request to corresponding network equipment;

and receiving current scene positioning information which is returned by the network equipment and is related to the current scene, and finishing the scene initialization related to the current scene according to the current scene positioning information.

19. The method of claim 17, wherein the method further comprises:

receiving editing operation which is broadcasted by other participating user equipment of the current scene and corresponds to target label information of other participating users, and updating the target label information according to the editing operation;

and superposing and presenting the updated target label information on the display device of the second user equipment.

20. The method of claim 17, wherein the method further comprises:

sending a tag backtracking request of the current scene to corresponding network equipment, wherein the tag backtracking request comprises scene identification information of the current scene;

receiving at least one piece of backtracking label information of the current scene, which is returned by the network device based on the label backtracking request, wherein each piece of backtracking label information comprises corresponding editing time;

and presenting the at least one piece of backtracking label information, wherein the presentation time axis corresponds to the editing time of each piece of backtracking label information.

21. The method of claim 17, wherein the method further comprises:

acquiring second label information about the current scene edited by the second user, wherein the second label information includes corresponding second label identification information and second label position information, and the second label position information is used for indicating a scene position of the second label information in the current scene;

and sending the second label information to other user equipment participating in the current scene and the network equipment.

22. The method of claim 17, wherein the method further comprises:

acquiring editing operation of the second user on target label information of the current scene, and updating the target label information based on the editing operation;

and sending the editing operation to other user equipment participating in the current scene and the network equipment so as to update the target label information in the other user equipment and the network equipment.

23. A method of multi-person scene interaction, wherein the method comprises:

24. A method of multi-person scene interaction, wherein the method comprises:

25. A network device for multi-person scene interaction, wherein the device comprises:

26. A second user device for multi-person scene interaction, wherein the device comprises:

27. A computer device, wherein the device comprises:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the steps of the method of any one of claims 1 to 22.

28. A computer-readable storage medium having stored thereon a computer program/instructions, characterized in that the computer program/instructions, when executed, cause a system to perform the steps of performing the method according to any one of claims 1 to 22.

29. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method of any of claims 1 to 22.