CN112581630B

CN112581630B - User interaction method and system

Info

Publication number: CN112581630B
Application number: CN202011440875.5A
Authority: CN
Inventors: 李江亮; 周硙; 牛旭恒; 方俊
Original assignee: Beijing Yimu Technology Co ltd
Current assignee: Beijing Yimu Technology Co ltd
Filing date: 2020-12-08
Publication date: 2024-06-21
Anticipated expiration: 2040-12-08

Abstract

A user interaction method and system are provided, the user being located in a scene in which a sensor and visual markers are deployed, the sensor being operable to sense or determine location information of the user in the scene, the method comprising: receiving information sent by first equipment of a first user, wherein the information comprises spatial position information of the first equipment and identification information of the first user or the first equipment; identifying the first user within a sensing range of the sensor based on spatial location information of the first device; associating identification information of the first user or first device to the first user within a sensing range of the sensor; tracking the first user through the sensor and updating the spatial location information of the first user; setting related information of a first virtual object associated with the first user; and sending the related information of the first virtual object to second equipment of a second user.

Description

User interaction method and system

Technical Field

The invention belongs to the technical field of augmented reality, and particularly relates to a user interaction method and system.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art to aid in the understanding of the present disclosure.

Social contact is a basic requirement for humans. Currently, online social approaches have matured and some large online social platforms have emerged, such as WeChat, facebook, etc. Correspondingly, some platforms mainly for realizing offline social contact exist, but the existing offline social contact platform is actually an online+offline combination mode, which realizes the functions of advanced matching, communication, organization and the like on the line, and then actually meets the face on the line. There is currently no pure offline social solution that does not require pre-online communication. However, people encounter a large number of strangers every day and there is a need to communicate with the strangers. Without knowledge of stranger information (e.g., occupation, interest, view, etc.), people's desire to communicate is hindered and communication efficiency is reduced. Therefore, there is an ongoing need for how people can communicate in a more convenient and efficient manner.

Disclosure of Invention

One aspect of the application relates to a user interaction method, the user being located in a scene in which one or more sensors and one or more visual markers are deployed, the sensors being operable to sense or determine location information of the user in the scene, the method comprising: receiving information sent by first equipment of a first user, wherein the information comprises spatial position information of the first equipment and identification information of the first user or the first equipment, and the first equipment determines the spatial position information by scanning the visual mark; identifying the first user within a sensing range of the sensor based on spatial location information of the first device; associating identification information of the first user or first device to the first user within a sensing range of the sensor; tracking the first user through the sensor and updating the spatial location information of the first user; setting related information of a first virtual object associated with the first user, wherein the related information comprises content information and spatial position information, and the set spatial position information of the first virtual object is related to the spatial position information of the first user; and sending the related information of the first virtual object to second equipment of a second user.

Another aspect of the application relates to a user interaction system, the system comprising: one or more sensors deployed in a scene, the sensors being operable to sense or determine location information of a user in the scene; one or more visual markers deployed in the scene; and a server configured to implement the method described in the embodiments of the present application.

Another aspect of the application relates to a storage medium in which a computer program is stored which, when being executed by a processor, can be used to carry out the method described in the embodiments of the application.

Another aspect of the application relates to an electronic device comprising a processor and a memory, the memory having stored therein a computer program which, when executed by the processor, is operable to carry out the method described in the embodiments of the application.

By using the scheme of the application, the user interaction method and system are provided, so that people can conduct downlink communication in a more convenient and efficient manner.

Drawings

Embodiments of the invention are further described below with reference to the accompanying drawings, in which:

FIG. 1 illustrates an optical communication device that may be a visual cue;

FIG. 2 illustrates a user interaction system according to one embodiment;

FIG. 3 illustrates a user interaction method according to one embodiment;

FIG. 4 illustrates a first user observed by a second user through his device and a virtual object associated with the first user, according to one embodiment;

fig. 5 shows an actual image viewed by a user through his cell phone screen according to one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by the following examples with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

A visual sign refers to a sign that can be recognized by the human eye or an electronic device, which can have a variety of forms. In some embodiments, the visual indicia may be used to communicate information that is available to a smart device (e.g., a cell phone, smart glasses, etc.). For example, the visual indicia may be an optical communication device capable of emitting coded light information, or the visual indicia may be a graphic with coded information, such as a two-dimensional code (e.g., QR code, applet code), bar code, or the like. Exemplary visual indicia have a specific black and white pattern. Fig. 1 shows an optical communication device 100 that may be used as a visual marker, comprising three light sources (a first light source 101, a second light source 102, a third light source 103, respectively). The optical communication device 100 further comprises a controller (not shown in fig. 1) for selecting a respective driving mode for each light source in dependence of the information to be transferred. For example, in different driving modes, the controller may control the light emission manner of the light source using different driving signals, so that when the optical communication apparatus 100 is photographed using a device having an imaging function, the imaging of the light source therein may take on different appearances (e.g., different colors, patterns, brightness, etc.). By analyzing the imaging of the light sources in the optical communication apparatus 100, the driving pattern of each light source at the moment can be analyzed, and thus the information transferred by the optical communication apparatus 100 at the moment can be analyzed.

To provide the user with a corresponding service based on the visual signs, each visual sign may be assigned an identification Information (ID) for uniquely identifying or identifying the visual sign by the manufacturer, manager, or user of the visual sign, etc. The user may use the device to image capture the visual cue to obtain identification information conveyed by the visual cue, such that a corresponding service may be accessed based on the identification information, e.g., accessing a web page associated with the identification information, obtaining other information associated with the identification information (e.g., location or pose information of the visual cue corresponding to the identification information), and so forth. The devices referred to herein may be, for example, devices that a user carries or controls (e.g., cell phones, tablet computers, smart glasses, AR glasses, smart helmets, smart watches, automobiles, etc.), or may be machines that are capable of autonomous movement (e.g., unmanned automobiles, robots, etc.). The device may acquire an image containing the visual marker by image acquisition of the visual marker by an image acquisition device thereon, and may identify information conveyed by the visual marker and determine position or attitude information of the device relative to the visual marker by analyzing imaging of the visual marker in the image.

The sensor capable of sensing the position of the target may be various sensors that can be used to sense or determine positional information of the target in the scene, such as cameras, radar (e.g., lidar, millimeter wave radar), wireless signal transceivers, and the like. The object in the scene may be a person or an object in the scene. In the following embodiments, a camera is described as an example of a sensor.

Fig. 2 shows a user interaction system according to one embodiment, which comprises a visual sign 301, a camera 302 and a server (not shown in fig. 2). The camera and visual markers are each deployed in a real scene in a particular position and pose (hereinafter may be collectively referred to as a "pose"), with first and second users 303, 305 carrying first and second devices 304, 306, respectively. The first device 304 and the second device 306 have image capturing means thereon and are capable of recognizing the visual marker 301 by the image capturing means. The first device 304 and the second device 306 may be, for example, a cell phone, glasses, etc.

In one embodiment, the server may obtain pose information for each of the camera and the visual marker, and may obtain relative pose information between the camera and the visual marker based on the pose information for each of the camera and the visual marker. In one embodiment, the server may also directly obtain the relative pose information between the camera and the visual markers. In this way, the server may obtain a transformation matrix between the camera coordinate system and the visual marker coordinate system, which may comprise, for example, a rotation matrix R and a displacement vector t between the two coordinate systems. The coordinates in one coordinate system can be converted to coordinates in the other coordinate system by a transformation matrix between the camera coordinate system and the visual marker coordinate system. The camera may be a camera mounted in a fixed position and having a fixed orientation, but it is understood that the camera may also be a movable (e.g., the position may be changed or the direction may be adjusted) camera as long as its current pose information can be determined. The current pose information of the camera can be set by the server, the movement of the camera can be controlled based on the pose information, or the movement of the camera can be controlled by the camera or other devices, and the current pose information of the camera is sent to the server. In some embodiments, more than one camera may be included in the system, as well as more than one visual marker.

In one embodiment, a scene coordinate system (which may also be referred to as a real world coordinate system) may be established for the real scene, and a transformation matrix between the camera coordinate system and the scene coordinate system may be determined based on pose information of the camera in the real scene, and a transformation matrix between the visual marker coordinate system and the scene coordinate system may be determined based on pose information of the visual marker in the real scene. In this case, the coordinates in the camera coordinate system or visual marker coordinate system may be converted to coordinates in the scene coordinate system without transforming between the camera coordinate system and the visual marker coordinate system, but it will be appreciated that the relative pose information or transformation matrix between the camera and the visual marker can still be known by the server. Thus, in the present application, having a relative pose between the camera and the visual marker means that there is objectively a relative pose between the two, and the system is not required to store the relative pose information between the two in advance or use the relative pose information. For example, in one embodiment, only pose information for each of the camera and visual markers in the scene coordinate system may be stored in the system, and the relative pose of the two may not be calculated or used.

Cameras are used to track objects in a real scene, which may be stationary or moving, which may be, for example, persons in the scene, stationary objects, movable objects, etc. Cameras can be used to track the position of a person or object in a real scene by various methods known in the art. For example, for the case of using a single monocular camera, the location information of the object in the scene may be determined in combination with scene information (e.g., information of a plane in which a person or object in the scene is located). For the case of using a binocular camera, the position information of the target may be determined from the position of the target in the camera field of view and the depth information of the target. In the case of using a plurality of cameras, the positional information of the target may be determined according to the position of the target in the respective camera fields of view.

It will be appreciated that the system may have multiple visual signs or multiple cameras, and that the field of view of the multiple cameras may be continuous or discontinuous.

FIG. 3 illustrates a user interaction method according to one embodiment, which may be implemented using the system described above, and which may include the steps of:

Step 401: information sent by first equipment of a first user is received, wherein the information comprises spatial position information of the first equipment and identification information of the first user or the first equipment.

The information sent by the device may be various information such as service request information, help information, comment information, alarm information, and the like. The identification information of the device or user may be any information that can be used to identify or identify the device or user, such as device ID information, a phone number of the device, account information for a certain application on the device, a name or nickname of the user, identity information of the user, account information of the user, etc.

In one embodiment, a first user may use a first device to determine spatial location information of the first device by scanning visual markers deployed in a scene. The first user may send information to the server via the first device, where the information may include spatial location information of the first device, where the spatial location information may be spatial location information of the first device relative to the visual marker or spatial location information of the first device in the scene. In one embodiment, a first device may be used to capture an image of a visual marker; determining identification information of the visual marker and spatial position information of the first device relative to the visual marker by analyzing the acquired images of the visual marker; determining the position and posture information of the visual mark in the space through the identification information of the visual mark; and determining spatial location information of the first device in the scene based on the position and pose information of the visual marker in space and the spatial location information of the first device relative to the visual marker. In one embodiment, the first device may send identification information of the visual marker and spatial location information of the first device relative to the visual marker to the server, such that the server may determine the spatial location information of the first device in the scene.

In one embodiment, the first device may also be used to determine pose information of the first device relative to the visual marker or pose information of the first device in the scene by scanning the visual marker, and may send the pose information to the server.

In one embodiment, the spatial location information and gesture information of the first device may be the spatial location information and gesture information of the first device when the visual marker is scanned, or may be real-time location information and gesture information at any time after the visual marker is scanned. For example, the first device may determine its initial spatial location information and pose information as the visual markers are scanned, and then measure or track its location and/or pose changes by methods known in the art (e.g., inertial navigation, visual odometer, SLAM, VSLAM, SFM, etc.) using various sensors built into the device (e.g., acceleration sensor, magnetic force sensor, orientation sensor, gravitational sensor, gyroscope, camera, etc.), thereby determining the real-time location and/or pose of the first device.

The spatial location information of the first device received by the server may be, but is not limited to, coordinate information, and any information that can be used to derive the spatial location of the device belongs to the spatial location information. In one embodiment, the spatial location information of the first device received by the server may be an image of a visual marker captured by the first device from which the server may determine the spatial location of the first device. Similarly, any information that can be used to derive the pose of the device belongs to pose information, which in one embodiment may be an image of a visual marker taken by the first device.

Step 402: the first user in the image captured by the camera is identified based on the spatial location information of the first device.

The first user using the first device can be identified in the image taken by the camera by means of the spatial location information of the first device in various possible ways.

In one embodiment, an imaging position of the device or a user thereof in an image captured by the camera may be determined based on spatial position information of the device, and the user in the image captured by the camera may be identified from the imaging position.

For devices that are typically held or carried by a user, such as cell phones, smart glasses, smart watches, tablet computers, etc., the imaging location of their user in the image captured by the camera may be determined based on the spatial location information of the device. Since the user typically scans the visual marker while holding the device or wearing the device, the spatial position of the user can be inferred from the spatial position of the device and then the imaging position of the user in the image captured by the camera can be determined from the spatial position of the user. The imaging position of the device in the image shot by the camera can also be determined according to the spatial position of the device, and then the imaging position of the user can be deduced according to the imaging position of the device.

In one embodiment, the mapping between one or more spatial locations (not necessarily all) in the pre-established scene and one or more imaging locations in the image captured by the camera, and the spatial location information of the device, may be used to determine the imaging location of the device or its user in the image captured by the camera. For example, for a hall scene, a number of spatial positions on the hall floor may be selected and imaging positions of the positions in an image captured by a camera may be determined, after which a mapping relationship between the spatial positions and the imaging positions may be established, and the imaging position corresponding to a certain spatial position may be deduced based on the mapping relationship.

In one embodiment, the imaging position of the device or its user in the image captured by the camera may be determined based on the spatial position information of the device and pose information of the camera, where the pose information of the camera may be pose information of the camera in its scene or pose information of the camera relative to visual markers.

After determining the imaging position of the device or its user in the image taken by the camera, the device or its user can be identified in the image according to the imaging position. For example, a device or user closest to the imaging position or a device or user whose distance from the imaging position satisfies a predetermined condition may be selected.

In one embodiment, to identify the device or user thereof in the image captured by the camera, the spatial location information of the device may be compared to the spatial location information of one or more devices or users determined from the tracking results of the camera. The camera may be used to determine the spatial position of a person or object in a real scene by various methods known in the art. For example, for the case of using a single monocular camera, the location information of the object in the scene may be determined in combination with scene information (e.g., information of a plane in which a person or object in the scene is located). For the case of using a binocular camera, the position information of the target may be determined from the position of the target in the camera field of view and the depth information of the target. In the case of using a plurality of cameras, the positional information of the target may be determined according to the position of the target in the respective camera fields of view. In one embodiment, the images captured by the cameras may also be used in conjunction with lidar or the like to determine spatial location information of one or more users.

In one embodiment, if there are multiple users in the vicinity of the spatial location of the first device, real-time spatial location information thereof (e.g., satellite positioning information or location information obtained by a sensor of the first device) may be received from the first device, the locations of the multiple users tracked by the camera, and the first user identified by comparing the real-time spatial location information received from the first device with the locations of the multiple users tracked by the camera.

In one embodiment, if there are multiple users in the vicinity of the spatial location of the first device, the feature information of the first user (e.g., feature information for face recognition) may be determined based on the information transmitted by the first device, the feature information of the multiple users may be collected by the camera, and the first user may be identified by comparing the feature information of the multiple users with the feature information of the first user.

Step 403: the identification information of the first user or the first device is associated to the first user in the image taken by the camera.

After identifying the first user in the image captured by the camera, the received identification information of the first user or the first device may be associated to the first user in the image. In this way, for example, ID information of a user's device in the camera view, a phone number, account information of an application on the device, or name or nickname of the user, identity information of the user, account information of the user, and so on may be known. After associating the identification information of the first user or the first device to the first user in the image captured by the camera, information transmitted by the first user or the first device may be identified based on the identification information.

Step 404: tracking the first user through the camera and updating the spatial position information of the first user.

In one embodiment, the first user may be tracked by a camera and the imaging position of the first user updated, and the spatial position information of the first user determined based on the updated imaging position.

Various visual tracking methods known in the art may be used to track a user in the camera field of view and update the imaging location of the user. The camera may remain stationary or movable during tracking of the user. In one embodiment, multiple cameras may be used in tracking the user, which may have a continuous field of view or a discontinuous field of view. In the case of a discontinuous field of view, the user's characteristics may be recorded and re-identified and tracked when the user reenters the field of view of one or more cameras.

In one embodiment, the spatial location information of the user may be determined using a mapping relationship between one or more spatial locations (not necessarily all) in the pre-established scene and one or more imaging locations in the image captured by the camera, and the imaging locations.

In one embodiment, the spatial location information of the user may be determined based on pose information of the camera and the imaging location. For example, in the case of using a depth camera or a multi-view camera, the direction of the user relative to the camera may be determined based on the imaging position, the distance of the user relative to the camera may be determined using the depth information, thereby determining the position of the user relative to the camera, and then the spatial position information of the user may be further determined based on pose information of the camera. In one embodiment, the distance of the user relative to the camera may be estimated based on the imaging size of the user, and the spatial location information of the user may be determined based on the pose information of the camera and the imaging location. In one embodiment, a camera-mounted lidar or the like may be used to determine the distance of the user relative to the camera and to determine the spatial location information of the user based on the pose information of the camera and the imaging location. In one embodiment, if the fields of view of multiple cameras simultaneously cover the user, the multiple cameras may be used to collectively determine the spatial location information of the user. In one embodiment, the spatial location information of the user may be determined based on pose information of the camera, the imaging location, and optionally other information (e.g., coordinate information of the ground within the scene).

In one embodiment, the pose information of the user or device may also be determined based on the tracking results of the camera on the user or device.

Step 405: setting related information of a first virtual object associated with the first user, wherein the related information comprises content information and spatial position information, and the spatial position information of the first virtual object is set according to the spatial position information of the first user.

For example, the spatial location of the first virtual object may be configured to be located a predetermined distance above the first user. The content information of the first virtual object is related information for describing the content of the virtual object, and may include, for example, a picture, a letter, a number, an icon, an animation, a video, a three-dimensional model, and the like included in the virtual object, and may also include shape information, color information, size information, posture information, and the like of the virtual object. In one embodiment, the content information of the first virtual object may be set according to information from the first user or the first device identified by the identification information of the first user or the first device. In one embodiment, the content information of the first virtual object may be, for example, the first user's occupation, identity, gender, age, name, nickname, etc.

The spatial location information of the first virtual object may change accordingly as the location of the first user changes, and content information of the virtual object (e.g., text content of the virtual object) may be updated according to new information (e.g., new comments of the user) received from the first user or the first device.

In one embodiment, the pose information of the virtual object may also be set, which may be set based on the pose information of the device or user with which it is associated, but may also be set in other ways.

Step 406: and sending the related information of the first virtual object to second equipment of a second user.

The relevant information of the first virtual object can be used by the second device to render the first virtual object on its display medium based on its position information and/or pose information (e.g., in an augmented reality or mixed reality manner).

The location information and the pose information of the second device may be determined in various possible ways. In one embodiment, the second device may determine its location information and/or pose information by scanning the visual markers. In one embodiment, the location information and/or the pose information of the second device may be determined by a tracking result of the camera on the second device or a user thereof. In one embodiment, the second device may also use its built-in various sensors to determine its location information and/or pose information. In one embodiment, the second device may use the point cloud information of the scene to determine its location information and/or pose information.

In one embodiment, after the spatial location information of the first virtual object and the location and pose information of the second device are obtained, the first virtual object may be superimposed at a suitable location in a real scene presented through a display medium of the second device. In the case where the first virtual object has posture information, the posture of the superimposed first virtual object may be further determined. The pose of the first virtual object may be adjusted with the position and/or pose of the second device relative to the first virtual object, e.g. such that a certain direction of the first virtual object (e.g. the frontal direction of the first virtual object) is always towards the second device.

In one embodiment, after overlaying the first virtual object, the user of the second device may perform various interactive operations on the first virtual object. For example, a user of the second device may click on the first virtual object to review details thereof, change a pose of the first virtual object, change a size or color of the first virtual object, add a callout on the first virtual object, and so forth. Information related to the operation may be sent to the server, and the server may modify or adjust the first virtual object based on the information.

In an embodiment, a second virtual object may also be set for a second user of a second device in a similar manner, and content information and spatial location information of the second virtual object may be sent to a first device or other device of a first user (the first device and other device may be, for example, a cell phone and glasses, respectively), where the content information and spatial location information of the second virtual object may be used by the first device or other device to render the second virtual object on its display medium based on its location information and/or gesture information.

In one embodiment, the determination of which other devices or users 'virtual objects can be observed by a device or user, or by which other devices or users' virtual objects can be observed by the device or user, may be based on characteristic information or settings information (e.g., privacy settings) of the device or user, etc.

The steps in the method shown in fig. 3 may be implemented by a server in the system shown in fig. 2, but it will be understood that one or more of these steps may also be implemented by other means.

Fig. 4 illustrates a first user as viewed by a second user through his device (e.g., glasses or a cell phone) and a virtual object associated with the first user, according to one embodiment. The virtual object may be, for example, an icon containing text, "connect to machine," XXX of XX company. The spatial location of the virtual object is associated with the spatial location of the first user and is movable as the first user moves.

Although in some of the above embodiments two users are illustrated as examples, this is not a limitation and the solution of the present application is equally applicable to more users. Fig. 5 illustrates an actual image viewed by a user through his cell phone screen, including a plurality of users, each having an associated virtual object, according to one embodiment.

In the above embodiments, the camera is described as an example of a sensor, but it is understood that the embodiments herein are equally applicable to any other sensor capable of sensing the position of a target, such as a laser radar, millimeter wave radar, wireless signal transceiver, etc.

In one embodiment of the invention, the invention may be implemented in the form of a computer program. The computer program may be stored in various storage media (e.g. hard disk, optical disk, flash memory, etc.), which, when executed by a processor, can be used to carry out the method of the invention.

In another embodiment of the invention, the invention may be implemented in the form of an electronic device. The electronic device comprises a processor and a memory, in which a computer program is stored which, when being executed by the processor, can be used to carry out the method of the invention.

Reference herein to "various embodiments," "some embodiments," "one embodiment," or "an embodiment" or the like, means that a particular feature, structure, or property described in connection with the embodiments is included in at least one embodiment. Thus, appearances of the phrases "in various embodiments," "in some embodiments," "in one embodiment," or "in an embodiment" in various places throughout this document are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Thus, a particular feature, structure, or characteristic described in connection with or illustrated in one embodiment may be combined, in whole or in part, with features, structures, or characteristics of one or more other embodiments without limitation, provided that the combination is not logically or otherwise inoperable. The expressions appearing herein like "according to a", "based on a", "by a" or "using a" are meant to be non-exclusive, i.e. "according to a" may cover "according to a only" as well as "according to a and B", unless the meaning of "according to a only" is specifically stated. In the present application, some exemplary operation steps are described in a certain order for clarity of explanation, but it will be understood by those skilled in the art that each of these operation steps is not essential and some of them may be omitted or replaced with other steps. The steps do not have to be performed sequentially in the manner shown, but rather, some of the steps may be performed in a different order, or concurrently, as desired, provided that the new manner of execution is not non-logical or non-operational.

Having thus described several aspects of at least one embodiment of this invention, it is to be appreciated various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications, and improvements are intended to be within the spirit and scope of the invention. While the invention has been described in terms of several embodiments, the invention is not limited to the embodiments described herein, but encompasses various changes and modifications that may be made without departing from the scope of the invention.

Claims

1. A method of user interaction, the user being located in a scene in which one or more sensors and one or more visual markers are deployed, the sensors being operable to sense or determine location information of the user in the scene, the method comprising:

receiving information sent by first equipment of a first user, wherein the information comprises spatial position information of the first equipment and identification information of the first user or the first equipment, and the first equipment determines the spatial position information by scanning the visual mark;

identifying the first user within a sensing range of the sensor based on spatial location information of the first device;

associating identification information of the first user or first device to the first user within a sensing range of the sensor;

tracking the first user through the sensor and updating the spatial location information of the first user;

Setting related information of a first virtual object associated with the first user, wherein the related information comprises content information and spatial position information, and the set spatial position information of the first virtual object is related to the spatial position information of the first user;

Transmitting the related information of the first virtual object to second equipment of a second user;

wherein the identifying the first user within the sensing range of the sensor based on the spatial location information of the first device comprises:

Determining an imaging position of the first device or the first user in an image shot by a camera based on the spatial position information of the first device; and

Identifying the first user in the image shot by the camera according to the imaging position;

Or comprises:

And comparing the spatial position information of the first device with the spatial position information of one or more first devices or first users determined according to the tracking result of the camera so as to identify the first device or the first user in the image shot by the camera.

2. The method of claim 1, further comprising setting content information of the first virtual object according to information from the first user or first device identified by the identification information of the first user or first device.

3. The method of claim 1, further comprising:

Updating the spatial position information of the first virtual object according to the new spatial position information of the first user; and/or

Content information of the first virtual object is updated according to new information from the first user or first device.

4. A method according to claim 3, further comprising:

And sending the new spatial position information or content information of the first virtual object to second equipment of the second user.

5. The method of claim 1, wherein the related information of the first virtual object is usable by the second device to render the first virtual object on its display medium based on its location information and/or gesture information.

6. The method of claim 5, wherein,

The second device determining its position information and/or pose information by scanning the visual markers; or alternatively

Position information and/or posture information of the second device is determined based on a tracking result of the camera on the second device or a user thereof.

7. The method of claim 1, further comprising: and receiving relevant information of an operation executed by the second user on the first virtual object through the second equipment.

8. The method of claim 7, further comprising: and modifying the content information of the first virtual object according to the related information of the operation.

9. The method of claim 1, further comprising:

Receiving information sent by a second device of the second user, wherein the information comprises spatial position information of the second device and identification information of the second user or the second device, and the second device determines the spatial position information by scanning the visual mark;

Identifying the second user within a sensing range of the sensor based on spatial location information of the second device;

associating identification information of the second user or second device to the second user within sensing range of the sensor;

tracking the second user through the sensor and updating spatial location information of the second user;

Setting related information of a second virtual object associated with the second user, wherein the related information comprises content information and spatial position information, and the set spatial position information of the second virtual object is related to the spatial position information of the second user; and

And sending the related information of the second virtual object to first equipment or other equipment of the first user.

10. The method of claim 1, wherein the sensor comprises one or more of:

A camera;

A radar;

A wireless signal transceiver.

11. The method of claim 1, wherein the first device determining its spatial location information by scanning the visual markers comprises:

acquiring an image of the visual cue using the first device;

determining identification information of the visual marker and a position of the first device relative to the visual marker by analyzing the image;

Obtaining the position and posture information of the visual mark in space through the identification information of the visual mark;

Spatial location information of the first device is determined based on the location and pose information of the visual marker in space and the location of the first device relative to the visual marker.

12. A user interaction system, the system comprising:

one or more sensors deployed in a scene, the sensors being operable to sense or determine location information of a user in the scene;

one or more visual markers deployed in the scene; and

A server configured to implement the method of any one of claims 1-11.

13. A storage medium having stored therein a computer program which, when executed by a processor, is operable to carry out the method of any one of claims 1-11.

14. An electronic device comprising a processor and a memory, the memory having stored therein a computer program which, when executed by the processor, is operable to carry out the method of any of claims 1-11.