WO2022121606A1

WO2022121606A1 - Method and system for obtaining identification information of device or user thereof in scenario

Info

Publication number: WO2022121606A1
Application number: PCT/CN2021/129727
Authority: WO
Inventors: 方俊; 李江亮; 牛旭恒
Original assignee: 北京外号信息技术有限公司
Priority date: 2020-12-08
Filing date: 2021-11-10
Publication date: 2022-06-16
Also published as: TWI800113B; TW202223749A

Abstract

Provided are a method and system for obtaining identification information of a device or a user thereof in a scenario. One or more sensors and one or more visual marks are deployed in the scenario, and the sensor can be used for sensing or determining position information of a device or a user in the scenario. The method comprises: receiving information sent by a device, wherein the information comprises identification information of the device or a user thereof and spatial position information of the device, and the device determines the spatial position information thereof by scanning a visual mark; identifying the device or the user thereof within a sensing range of a sensor on the basis of the spatial position information of the device; and associating the identification information of the device or the user thereof with the device or the user thereof within the sensing range of the sensor, so as to provide a service for the device or the user thereof.

Description

Method and system for obtaining identification information of a device in a scene or its user

technical field

The present invention relates to the field of information interaction, and in particular, to a method and system for obtaining identification information of a device or its user in a scene.

Background technique

The statements in this section are only for providing background information related to the technical solutions of the present application to help understanding, and they do not necessarily constitute prior art to the technical solutions of the present application.

In many scenarios, based on the needs of security, surveillance, and public services, sensors such as cameras and radars will be deployed in the scenario to sense, locate, and track the personnel or equipment that appears in the scenario. However, although these sensors can sense the position or movement of people or equipment in the scene, they cannot obtain identification information of these people or equipment, thus making it difficult to provide services for these people or equipment. Although facial recognition technology can be used to identify people, this involves violating user privacy and may have legal risks. In addition, these sensors can usually only realize one-way information transmission (that is, collect relevant information in the scene), and cannot provide information to the user in the scene based on this information (for example, the user's real-time location information), such as navigation information , instruction information, commercial promotion information, etc. In the prior art, in order to provide services to users in the scene, on-site manual service is usually adopted, which requires setting up some consultation desks and arranging service personnel at a certain density in the venue, which is costly and flexible. Low.

SUMMARY OF THE INVENTION

One aspect of the present invention relates to a method for obtaining identification information for a device or its user in a scene in which one or more sensors and one or more visual markers are deployed, the sensors being capable of being used for sensing measuring or determining the location information of the device or user in the scene, the method includes: receiving information sent by the device, the information including the identification information of the device or its user and the spatial location information of the device, wherein , the device determines its spatial location information by scanning the visual sign; identifies the device or its user within the sensing range of the sensor based on the spatial location information of the device; The identification information of its user is associated with the device or its user within the sensing range of the sensor in order to provide services to the device or its user.

Another aspect of the present invention relates to a system for obtaining identification information of a device in a scene or a user thereof, the system comprising: one or more sensors deployed in the scene, the sensors capable of sensing measuring or determining location information of devices or users in the scene; one or more visual markers deployed in the scene; and a server configured to implement the methods described in the embodiments of the present application.

Another aspect of the present invention relates to a storage medium, in which a computer program is stored, and when the computer program is executed by a processor, can be used to implement the method described in the embodiments of the present application.

Another aspect of the present invention relates to an electronic device, comprising a processor and a memory, wherein the memory stores a computer program, and when the computer program is executed by the processor, can be used to implement the described embodiments of the present application. method.

Through the solution of the present invention, not only the position or movement of persons or equipment existing in the scene can be sensed, but also the identification information of these persons or equipment can be obtained, and the corresponding personnel or equipment can be provided with services through the identification information. In addition, in some embodiments, not only the location information of the user in the scene can be collected or monitored, but also information, such as navigation information, instruction information, business promotion information, etc., can be provided to the user based on the real-time location information of the user.

Description of drawings

Embodiments of the present invention are further described below with reference to the accompanying drawings, wherein:

Figure 1 shows an exemplary visual sign;

Figure 2 shows an optical communication device that can be used as a visual sign;

Figure 3 illustrates a system for obtaining identification information of a device in a scene or its user, according to one embodiment;

4 illustrates a method for obtaining identification information of a device in a scene or a user thereof, according to one embodiment;

Figure 5 illustrates a method for providing a service to a device in a scene or its user, according to one embodiment.

Figure 6 illustrates a method for providing information to a user in a scene through a device (here, glasses are used as an example), according to one embodiment;

7 illustrates a system for providing information to a user in a scene through glasses, according to one embodiment;

Figure 8 illustrates a method for providing information to a user in a scene through glasses, according to one embodiment;

Figure 9 illustrates a user interaction system according to one embodiment;

Figure 10 illustrates a user interaction method according to one embodiment;

11 illustrates a first user and a virtual object associated with the first user as observed by a second user through his device, according to one embodiment;

Figure 12 shows the actual image observed by a user through his cell phone screen, according to one embodiment.

Detailed ways

In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings through specific embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

Visual signs refer to signs that can be recognized by the human eye or electronic devices, which can have various forms. In some embodiments, visual markers may be used to convey information that can be obtained by smart devices (eg, cell phones, smart glasses, etc.). For example, the visual sign may be an optical communication device capable of emitting encoded optical information, or the visual sign may be a graphic with encoded information, such as a two-dimensional code (eg, QR code, applet code), barcode, or the like. Figure 1 shows an exemplary visual sign with a specific black and white pattern. FIG. 2 shows an optical communication device 100 that can be used as a visual sign, which includes three light sources (respectively, a first light source 101, a second light source 102, and a third light source 103). The optical communication device 100 also includes a controller (not shown in FIG. 2 ) for selecting a corresponding driving mode for each light source according to the information to be communicated. For example, in different driving modes, the controller can use different driving signals to control the light-emitting manner of the light source, so that when the optical communication apparatus 100 is photographed by using a device with an imaging function, the imaging of the light source can present different images. Appearance (eg, different colors, patterns, brightness, etc.). By analyzing the imaging of the light sources in the optical communication device 100 , the driving mode of each light source at the moment can be analyzed, so as to analyze the information transmitted by the optical communication device 100 at the moment.

In order to provide corresponding services to users based on the visual logo, each visual logo may be assigned an identification information (ID), which is used to uniquely identify or identify the visual logo by the manufacturer, manager or user of the visual logo, etc. . The user can use the device to capture the image of the visual sign to obtain the identification information transmitted by the visual sign, so as to access the corresponding service based on the identification information, for example, visit the webpage associated with the identification information, obtain the identification information associated with the identification information. Other information (eg, position or gesture information of the visual landmark corresponding to the identification information) and so on. The devices mentioned herein can be, for example, devices carried or controlled by users (eg, mobile phones, tablet computers, smart glasses, AR glasses, smart helmets, smart watches, cars, etc.), or machines that can move autonomously (eg, drones, driverless cars, robots, etc.). The device can acquire the image containing the visual sign by collecting the image of the visual sign through the image acquisition device on it, and can identify the information transmitted by the visual sign and determine the relative position of the device relative to the visual sign by analyzing the imaging of the visual sign in the image. position or attitude information.

Sensors capable of sensing the location of objects may be various sensors capable of sensing or determining location information of objects in a scene, such as cameras, radars (eg, lidars, millimeter-wave radars), wireless signal transceivers, and the like. A target in a scene can be a person or an object in the scene. In the following embodiments, a camera is used as an example of a sensor for description.

Figure 3 illustrates a system for obtaining identification information of a device in a scene or its user, which can be used to provide services or information to a user in the scene through the device, according to one embodiment. The system includes a visual sign 301, a camera 302, and a server (not shown in Figure 3). User 303 is in the scene and carries device 304 . The device 304 has an image capture device on it and can identify the visual sign 301 through the image capture device. In one embodiment, device 304 may be a cell phone carried by the user. In one embodiment, device 304 may be glasses worn by a user. The glasses themselves may have the ability to directly access the network, for example, the glasses may access the network by means of wifi, telecommunication network or the like. The glasses may also not have the ability to directly access the network, but may indirectly access the network through a connection (eg, a Bluetooth connection or a wired connection) between it and the user's other devices (eg, a mobile phone, a watch, etc.).

The visual sign 301 and the camera 302 are each installed in the real scene in a specific position and attitude (hereinafter collectively referred to as "pose"). In one embodiment, the server may obtain the respective pose information of the camera and the visual marker, and may obtain relative pose information between the camera and the visual marker based on the respective pose information of the camera and the visual marker. In one embodiment, the server may also directly obtain the relative pose information between the camera and the visual marker. In this way, the server can obtain a transformation matrix between the camera coordinate system and the visual sign coordinate system, and the transformation matrix may include, for example, a rotation matrix R and a displacement vector t between the two coordinate systems. Through the transformation matrix between the camera coordinate system and the visual marker coordinate system, the coordinates in one coordinate system can be converted into the coordinates in the other coordinate system. The camera may be a camera installed in a fixed position and having a fixed orientation, but it is understood that the camera may also be a camera that can move (for example, the position or direction can be changed), as long as its current pose information can be determined. The current pose information of the camera can be set by the server, and the movement of the camera can be controlled based on the pose information, or the movement of the camera can be controlled by the camera itself or other devices, and the current pose information of the camera can be sent to the server. In some embodiments, more than one camera may be included in the system, and more than one visual sign may also be included.

In one embodiment, a scene coordinate system (which may also be referred to as a real world coordinate system) may be established for the real scene, and the distance between the camera coordinate system and the scene coordinate system may be determined based on the pose information of the camera in the real scene and the transformation matrix between the visual landmark coordinate system and the scene coordinate system is determined based on the pose information of the visual landmark in the real scene. In this case, it is possible to convert the coordinates in the camera coordinate system or the visual landmark coordinate system to the coordinates in the scene coordinate system without transforming between the camera coordinate system and the visual landmark coordinate system, but it is understood that the camera and visual The relative pose information or transformation matrix between the landmarks can still be known by the server. Therefore, in this application, having a relative pose between the camera and the visual sign refers to objectively having a relative pose between the two, and does not require the system to pre-store the relative pose information between the two above or Use this relative pose information. For example, in one embodiment, only the pose information of the camera and the visual marker in the scene coordinate system may be stored in the system, and the relative poses of the two may not be calculated or used.

Cameras can be used to track objects in a real scene, which can be stationary or moving, such as people, stationary objects, movable objects, etc. in the scene. A camera can be used to track the position of a person or object in a real scene by various methods in the prior art. For example, in the case of using a single monocular camera, the location information of objects in the scene can be determined in combination with scene information (eg, information on the plane on which a person or object in the scene is located). For the case of using a binocular camera, the position information of the target can be determined according to the position of the target in the field of view of the camera and the depth information of the target. In the case of using multiple cameras, the position information of the target can be determined according to the position of the target in the field of view of each camera.

It can be understood that the system may have multiple visual signs or multiple cameras, and the fields of view of the multiple cameras may be continuous or discontinuous.

Fig. 4 shows a method for obtaining identification information of a device in a scene or a user thereof according to an embodiment. The method can be implemented using the system shown in Fig. 3 and can include the following steps:

Step 401: Receive information sent by the device, where the information includes identification information of the device or its user and spatial location information of the device.

The information sent by the device may be various information, such as alarm information, help information, service request information, and so on. The identification information of the device or its user can be any information that can be used to identify or identify the device or its user, such as device ID information, the device's phone number, account information for an application on the device, the user's name or nickname, the user's identity information, user account information, etc.

In one embodiment, the user 303 may use the device 304 to determine the spatial location information of the device 304 by scanning the visual markers 301 deployed in the scene. The user 303 may send information to the server through the device 304, the information may include the spatial position information of the device 304, and the spatial position information may be the spatial position information of the device 304 relative to the visual sign 301 or the spatial position information of the device 304 in the scene . In one embodiment, an image of the visual sign 301 may be collected using the device 304; identification information of the visual sign 301 and spatial position information of the device 304 relative to the visual sign 301 are determined by analyzing the collected image of the visual sign 301; The identification information of 301 determines the position and attitude information of the visual marker 301 in space; and based on the position and attitude information of the visual marker 301 in space and the spatial position information of the device 304 relative to the visual marker 301, determine the position of the device 304 in the scene. Spatial location information. In one embodiment, the device 304 can send the identification information of the visual marker 301 and the spatial position information of the device 304 relative to the visual marker 301 to the server, so that the server can determine the spatial position information of the device 304 in the scene.

In one embodiment, the device 304 can also be used to scan the visual marker 301 to determine the gesture information of the device 304 relative to the visual marker 301 or the gesture information of the device 304 in the scene, and the gesture information can be sent to the server.

In one embodiment, the spatial position information and attitude information of the device may be the spatial position information and attitude information of the device when scanning the visual sign, or the real-time position information and attitude information at any moment after scanning the visual sign. For example, a device can determine its initial spatial position information and attitude information when scanning a visual sign, and then use various sensors built into the device (eg, acceleration sensor, magnetic sensor, orientation sensor, gravity sensor, gyroscope, camera, etc.) The real-time position and/or attitude of the device is determined by measuring or tracking its position change and/or attitude change by methods known in the art (eg, inertial navigation, visual odometry, SLAM, VSLAM, SFM, etc.).

The spatial location information of the device received by the server may be coordinate information, but is not limited thereto, any information that can be used to derive the spatial location of the device belongs to spatial location information. In one embodiment, the spatial location information of the device received by the server may be an image of a visual sign captured by the device, and the server may determine the spatial location of the device according to the image. Similarly, any information that can be used to derive a device's pose is pose information, which in one embodiment may be an image of a visual landmark captured by the device.

Step 402: Identify the device or its user in the image captured by the camera based on the spatial location information of the device.

Various feasible ways can be used to identify the device or its user in the image captured by the camera through the spatial location information of the device.

In one embodiment, the imaging position of the device or its user in the image captured by the camera may be determined based on the spatial position information of the device, and the device or the user in the image captured by the camera may be identified according to the imaging position.

For devices usually held or carried by users, such as mobile phones, smart glasses, smart watches, tablet computers, etc., the imaging position of the user in the image captured by the camera can be determined based on the spatial location information of the device. Since the user usually scans the visual sign while holding the device or wearing the device, the spatial position of the user can be inferred according to the spatial position of the device, and then the imaging position of the user in the image captured by the camera can be determined according to the spatial position of the user. The imaging position of the device in the image captured by the camera can also be determined according to the spatial position of the device, and then the imaging position of the user can be inferred according to the imaging position of the device.

For devices that are not usually held or carried by users, such as automobiles, robots, driverless cars, drones, etc., the imaging position of the device in the image captured by the camera can be determined based on the spatial location information of the device.

In one embodiment, a pre-established mapping relationship between one or more spatial positions (not necessarily all) in the scene and one or more imaging positions in the image captured by the camera and the space of the device may be used Location information to determine the imaging position of the device or its user in the image captured by the camera. For example, for a hall scene, several spatial positions on the floor of the hall can be selected, and the imaging positions of these positions in the image captured by the camera can be determined. After that, the mapping relationship between these spatial positions and the imaging positions can be established, and the An imaging position corresponding to a certain spatial position is deduced based on the mapping relationship.

In one embodiment, the imaging position of the device or its user in the image captured by the camera may be determined based on the spatial position information of the device and the pose information of the camera, where the pose information of the camera may be its position in the scene pose information or its pose information relative to visual landmarks.

After the imaging position of the device or its user in the image captured by the camera is determined, the device or its user can be identified in the image according to the imaging position. For example, a device or user closest to the imaging position may be selected, or a device or user whose distance from the imaging position satisfies a predetermined condition may be selected.

In one embodiment, in order to identify the device or its user in the image captured by the camera, the spatial location information of the device may be compared with the spatial location information of one or more devices or users determined according to the tracking result of the camera. Compare. A camera can be used to determine the spatial position of a person or object in a real scene through various methods in the prior art. For example, in the case of using a single monocular camera, the location information of objects in the scene can be determined in combination with scene information (eg, information on the plane on which a person or object in the scene is located). For the case of using a binocular camera, the position information of the target can be determined according to the position of the target in the field of view of the camera and the depth information of the target. In the case of using multiple cameras, the position information of the target can be determined according to the position of the target in the field of view of each camera. In one embodiment, the spatial location information of one or more users may also be determined by using images captured by a camera in combination with lidar and the like.

In one embodiment, if there are multiple users or devices in the vicinity of the device's spatial location, real-time spatial location information (eg, satellite positioning information or location information obtained through the device's sensors) may be received from the device by A camera tracks the location of the plurality of users or devices and identifies the device or a user thereof by comparing real-time spatial location information received from the device to the locations of the plurality of users or devices tracked by the camera.

In one embodiment, if there are multiple users near the spatial location of the device, feature information of the device user (for example, feature information for face recognition) may be determined based on information sent by the device, and the multiple users may be collected by a camera. feature information of a plurality of users, and identify the device user by comparing the feature information of the plurality of users with the feature information of the device user.

In one embodiment, based on the spatial location information of the device, it is firstly determined that the field of view can cover one or more cameras of the device or its user, and then the imaging position of the device or its user in the images captured by the one or more cameras is determined .

Step 403: Associate the identification information of the device or its user with the device or its user in the image captured by the camera, so as to use the identification information to provide a service to the device or its user.

After identifying the device or its user in the image captured by the camera, the received identification information of the device or its user may be associated with the device or its user in the image. In this way, for example, the ID information, phone number, account information of an application on the device can be known, or the user's name or nickname, the user's identity information, the user's account information, and many more. After knowing the identification information of the device or user in the field of view of the camera, the identification information can be used to provide various services to the device or its user, such as navigation service, explanation service, information display service, and so on. In one embodiment, the above information may be provided visually, audibly, or the like. In one embodiment, a virtual object may be superimposed on a display medium of a device (eg, a mobile phone or glasses), and the virtual object may be, for example, an icon (eg, a navigation icon), a picture, a text, and the like.

The steps in the method shown in FIG. 4 may be implemented by the server in the system shown in FIG. 3 , but it is understood that one or more of these steps may also be implemented by other devices.

In one embodiment, the device or its user in the scene can also be tracked through a camera to obtain its real-time position information and/or attitude information, or the device can be used to obtain its real-time position information and/or attitude information . After the location and/or attitude information of the device or its user is obtained, services can be provided to the device or its user based on the location and/or attitude information.

In one embodiment, after the identification information of the device or its user is associated with the device or its user in the image captured by the camera, information can be sent to the corresponding device or user in the field of view of the camera through the identification information , the information is, for example, navigation information, explanation information, instruction information, advertisement information, and so on.

A specific application scenario of this paper is described below.

One or more visual signs and one or more cameras are deployed in a smart factory scenario where robots are used to deliver goods. During the movement of the robot, the camera is used to track the position of the robot, and navigation instructions are sent to the robot according to the tracked position. In order to determine the identification information (eg, robot ID) of each robot in the camera's field of view, each robot may be made to scan a visual sign, for example, when entering the scene or the camera's field of view, and send its position information and identification information. In this way, the identification information of each robot within the field of view of the camera can be easily determined, so as to send each robot a travel instruction or a navigation instruction based on its current position and the work task to be completed.

In one embodiment, information related to a virtual object may be sent to the device, the virtual object may be, for example, pictures, characters, numbers, icons, videos, three-dimensional models, etc., and the information related to the virtual object may include the spatial location of the virtual object information. After the device receives the virtual object, the virtual object can be presented on the display medium of the device. In one embodiment, the device may present the virtual object at an appropriate location on its display medium based on the device's or user's spatial location information and/or gesture information. The virtual object may be presented on the display medium of the user equipment in an augmented reality or mixed reality manner, for example. In one embodiment, the virtual object is a video image or a dynamic three-dimensional model generated by video capture of live characters. For example, the virtual object may be a video image generated by real-time video capture of service personnel, and the video image may be presented on the display medium of the user equipment, so as to provide services to the user. In one embodiment, the spatial position of the video image can be set so that it can be presented on the display medium of the user equipment in the manner of augmented reality or mixed reality.

In one embodiment, after the identification information of the device or its user is associated with the device or its user in the image captured by the camera, the identification information sent by the device or user within the field of view of the camera can be identified based on the identification information. information, such as service request information, alarm information, help information, comment information, and the like. In one embodiment, after receiving the information sent by the device or the user, a virtual object associated with the device or the user may be set according to the information, wherein the spatial location information of the virtual object may be based on the information of the device or the user The position information of the virtual object can be determined, and the spatial position of the virtual object can be changed accordingly as the position of the device or the user changes. In this way, other users can observe the above-mentioned virtual objects by means of augmented reality or mixed reality through some devices (for example, mobile phones, smart glasses, etc.). In one embodiment, the content of the virtual object (eg, updating the textual content of the virtual object) may be updated according to new information received from the device or user (eg, a new comment by the user).

Fig. 5 shows a method for providing a service to a device or a user in a scene according to one embodiment. The method can be implemented using the system shown in Fig. 3 and can include the following steps:

Step 501: Receive information sent by the device, where the information includes identification information of the device or its user and spatial location information of the device.

Step 502: Identify the device or its user in the image captured by the camera based on the spatial location information of the device.

Step 503: Mark the device or its user in the image captured by the camera.

The device or user can be identified using a variety of methods, for example, an image of the device or user can be framed, a particular icon can be presented adjacent to the device or user's image, or the device or user's image can be highlighted. In one embodiment, the imaging area of the marked device or user can be enlarged, or the camera can be made to shoot for the marked device or user. In one embodiment, the device or user can be continuously tracked through a camera, and real-time spatial location information and/or gesture information of the device or user can be determined.

Step 504: Associate the identification information of the device or its user with the device or its user in the image captured by the camera, so as to use the identification information to provide services to the device or its user.

After a device or user is marked in the image captured by the camera, a person who can observe the image captured by the camera (for example, management or service personnel in airports, stations, shopping malls) can know that the device or user currently needs service, and can know that the device or user currently needs service. The current location of the device or user, so that various required services, such as explanation service, navigation service, consulting service, help service, etc., can be conveniently provided to the device or user. In this way, the help desk deployed in the scenario can be replaced, and any user in the scenario can be provided with the services they need in a convenient and low-cost manner.

In one embodiment, the service may be provided to the user through a device carried or controlled by the user, such as a mobile phone, smart glasses, a vehicle, and the like. In one embodiment, the service may be provided visually, audibly, etc. through a telephony function, an application (APP), etc. on the device.

The steps in the method shown in FIG. 5 may be implemented by the server in the system shown in FIG. 3 , but it is understood that one or more of these steps may also be implemented by other devices.

Fig. 6 shows a method for providing information to a user in a scene through a device (here, glasses are taken as an example) according to an embodiment, the method can be implemented using the system shown in Fig. 3, and can include the following steps :

Step 601: Receive information sent by the glasses, where the information includes spatial position information of the glasses.

In one embodiment, the user may use the glasses to determine the spatial position information of the glasses by scanning the visual landmarks deployed in the scene. The user can send information to the server through the glasses. In one embodiment, the glasses can also be used to scan the visual markers to determine the gesture information of the glasses relative to the visual markers or the gesture information of the glasses in the scene, and the gesture information can be sent to the server.

In one embodiment, in addition to the spatial location information of the glasses, the information sent by the glasses may also include information related to the glasses or their users, such as service request information, help information, alarm information, identification information (such as phone numbers, APP account information) etc.

In one embodiment, the glasses themselves may be capable of direct access to the network. In another embodiment, the glasses may not have the ability to directly access the network, but indirectly access the network through a connection between it and, for example, the user's mobile phone. In this case, the server may use an intermediate device such as a mobile phone. Receive information sent by glasses.

Step 602: Identify the user of the glasses in the image captured by the camera based on the spatial position information of the glasses.

As described above, various feasible ways can be used to identify the user of the glasses in the image captured by the camera through the spatial position information of the glasses.

After the user is identified, the user's identification information can be associated with the user in order to provide services to the user using the identification information.

Step 603: Track the user through the camera and update the spatial location information of the user.

In one embodiment, a camera may be used to track the user and update the imaging position of the user, and determine the spatial position information of the user based on the updated imaging position. Various visual tracking methods known in the art can be used to track the user in the field of view of the camera and update the imaging position of the user. The camera can remain stationary or move while tracking the user. In one embodiment, in tracking the user, multiple cameras may be used, which may have a continuous field of view or a discontinuous field of view. Where the field of view is discontinuous, the user's characteristics can be recorded and re-identified and tracked when the user re-enters the field of view of one or more cameras.

In one embodiment, a pre-established mapping relationship between one or more spatial positions (not necessarily all) in the scene and one or more imaging positions in the image captured by the camera and the imaging positions may be used, to determine the user's spatial location information. In one embodiment, the spatial position information of the user may be determined based on the pose information of the camera and the imaging position. For example, in the case of using a depth camera or a multi-camera camera, the direction of the user relative to the camera can be determined based on the imaging position, the depth information can be used to determine the distance of the user relative to the camera, so as to determine the position of the user relative to the camera, and then , the spatial position information of the user can be further determined based on the pose information of the camera. In one embodiment, the distance of the user relative to the camera may be estimated based on the imaging size of the user, and the spatial position information of the user may be determined based on the pose information of the camera and the imaging position. In one embodiment, the distance of the user relative to the camera may be determined by using a lidar or the like installed on the camera, and the spatial position information of the user may be determined based on the pose information of the camera and the imaging position. In one embodiment, if the visual fields of multiple cameras cover the user at the same time, the multiple cameras can be used to jointly determine the spatial location information of the user. In one embodiment, the spatial position information of the user may be determined based on the pose information of the camera, the imaging position, and optional other information (eg, coordinate information of the ground in the scene).

In one embodiment, the user's gesture information may also be determined based on the tracking result of the user by the camera.

Step 604: Provide information to the user through the user's glasses based on the user's spatial location information.

Knowing the user's spatial location information, the user can be provided with various required information, such as navigation information, instruction information, tutorial information, advertising information, other information related to location-based services, and the like. In one embodiment, the above information may be provided visually, audibly, or the like. In one embodiment, a virtual object may be superimposed on the display medium of the glasses, and the virtual object may be, for example, an icon (eg, a navigation icon), a picture, a text, or the like.

In one embodiment, the glasses themselves may have the ability to directly access the network, so that the glasses may directly receive indication information from the server. In another embodiment, the glasses may not have the ability to directly access the network, but indirectly access the network through a connection between it and, for example, the user's mobile phone, in this case, the glasses may pass through an intermediate device such as a mobile phone Receive instructions from the server.

In one embodiment, information may be further provided to the user in conjunction with the glasses or the gesture information of the user thereof. The posture information of the glasses or the user thereof may be determined by the glasses, or the posture information of the user may be determined by the user image captured by the camera, and the posture information may include the orientation information of the user. In one embodiment, the posture information of the glasses can be obtained through its built-in sensors, for example, by tracking the initial posture or directly determined by the built-in sensors of the glasses (for example, a gravity sensor, a magnetic sensor, an orientation sensor, etc.) . The server may directly receive the gesture information from the glasses, or receive the gesture information through an intermediate device such as a mobile phone.

The steps in the method shown in FIG. 6 may be implemented by the server in the system shown in FIG. 3 , but it is understood that one or more of these steps may also be implemented by other devices.

FIG. 7 shows a system for providing information to a user in a scene through glasses, including a visual sign 701, a camera 702, and a server (not shown in FIG. 7), according to one embodiment. A user 703 is in the scene and carries glasses 704 and a mobile phone 705 . The mobile phone 705 can recognize the visual sign 701 through the image capture device on it, so the glasses 704 may not have an image capture device, or although there is an image capture device on it, the image capture device may not have the ability to recognize the visual sign 701 .

FIG. 8 illustrates a method of providing information to a user in a scene through glasses, which may be implemented using the system shown in FIG. 7 , according to one embodiment. The method includes the following steps (part of the steps are similar to the steps in FIG. 6 , and will not be repeated here, but it can be understood that the content described for each step in FIG. 6 can also be applied to the corresponding steps in FIG. 8 ):

Step 801: Receive information sent by the user's mobile phone, where the information includes spatial location information of the mobile phone.

The user can use the mobile phone to determine the spatial location information of the mobile phone by scanning the visual landmarks deployed in the scene. In one embodiment, the gesture information of the mobile phone can also be determined by scanning the visual sign, and the gesture information can be sent to the server.

Step 802: Identify the user of the mobile phone in the image captured by the camera based on the spatial location information of the mobile phone.

Step 803: Track the user through the camera and update the spatial location information of the user.

In one embodiment, the user's gesture information can also be determined.

Step 804: Provide information to the user through the user's glasses based on the user's spatial location information.

In one embodiment, the glasses themselves may have the ability to directly access the network, so that the glasses may directly receive indication information from the server. In another embodiment, the glasses may not have the ability to directly access the network, but indirectly access the network through a connection between it and, for example, the user's mobile phone, in this case, the glasses may pass through an intermediate device such as a mobile phone Receive instructions from the server. For example, the server may first send first information to the user's mobile phone, and then the mobile phone may send second information (the second information may be the same as or different from the first information) to the glasses based on the first information, so as to provide the user with information based on the glasses through the glasses. location services.

In one embodiment, information may be further provided to the user in conjunction with the glasses or the gesture information of the user thereof.

In one embodiment, the user may also not use the glasses, but only use the cell phone. In this way, in the above step 804, information may be provided to the user through the user's mobile phone based on the user's spatial location information. In one embodiment, the information may be further provided to the user in combination with the gesture information of the mobile phone or its user. The gesture information of the user can be determined through the mobile phone, or the user's gesture information can be determined through the user image captured by the camera. In one embodiment, the gesture information of the mobile phone can be obtained through its built-in sensor.

In this application, a device for scanning a visual sign to determine its spatial location information may be referred to as a "position acquisition device", and a device for providing information to a user may be referred to as an "information receiving device". It can be understood from the above description of this application that the location obtaining device and the information receiving device may be the same device, such as the user's mobile phone or the user's glasses; the location obtaining device and the information receiving device may also be different devices, such as a mobile phone and a user's glasses respectively. Glasses.

Figure 9 illustrates a user interaction system including a visual sign 901, a camera 902, and a server (not shown in Figure 9) according to one embodiment. The camera and the visual sign are each deployed in a real scene with a specific position and attitude (hereinafter collectively referred to as "pose"), and the scene also has a first user 903 and a second user 905 who carry a first device 904 respectively. and the second device 906. The first device 904 and the second device 906 have image capture devices on them and can identify the visual sign 901 through the image capture devices. The first device 904 and the second device 906 may be, for example, mobile phones, glasses and other devices.

FIG. 10 shows a user interaction method according to one embodiment, which can be implemented using the above-mentioned system, and can include the following steps:

Step 1001: Receive information sent by a first device of a first user, where the information includes spatial location information of the first device and identification information of the first user or the first device.

In one embodiment, the first user may use the first device to determine the spatial location information of the first device by scanning the visual markers deployed in the scene. In one embodiment, the first device can also be used to scan the visual marker to determine the gesture information of the first device relative to the visual marker or the gesture information of the first device in the scene, and the gesture information can be sent to the server.

Step 1002: Identify the first user in the image captured by the camera based on the spatial location information of the first device.

Step 1003: Associate the identification information of the first user or the first device with the first user in the image captured by the camera.

Step 1004: Track the first user through the camera and update the spatial location information of the first user.

In one embodiment, the gesture information of the user or the device may also be determined based on the tracking result of the user or the device by the camera.

Step 1005: Set relevant information of the first virtual object associated with the first user, the relevant information includes content information and spatial position information, wherein the first virtual object is set according to the spatial position information of the first user. Spatial location information.

For example, the spatial position of the first virtual object may be configured to be at a predetermined distance above the first user. The content information of the first virtual object is related information used to describe the content of the virtual object, which may include, for example, pictures, characters, numbers, icons, animations, videos, three-dimensional models, etc. contained in the virtual object, and may also include virtual objects shape information, color information, size information, posture information, etc. In one embodiment, the content information of the first virtual object may be set according to the information from the first user or the first device identified by the identification information of the first user or the first device. In one embodiment, the content information of the first virtual object may be, for example, the occupation, identity, gender, age, name, nickname, etc. of the first user.

The spatial location information of the first virtual object may change accordingly as the location of the first user changes, and the virtual object may be updated according to new information received from the first user or the first device (eg, new comments by the user). Content information of the object (eg, updating the textual content of the virtual object).

In one embodiment, the pose information of the virtual object may also be set, and the pose information of the virtual object may be set based on the pose information of the device or user associated therewith, but may also be set in other ways.

Step 1006: Send the relevant information of the first virtual object to the second device of the second user.

Information about the first virtual object can be used by the second device to render the first virtual object on its display medium based on its position information and/or gesture information (eg, in an augmented or mixed reality manner) .

The location information and attitude information of the second device may be determined in various feasible ways. In one embodiment, the second device may determine its position information and/or gesture information by scanning the visual landmarks. In one embodiment, the location information and/or posture information of the second device may be determined through the tracking result of the second device or its user by the camera. In one embodiment, the second device may also use various sensors built in it to determine its position information and/or attitude information. In one embodiment, the second device may use point cloud information of the scene to determine its position information and/or pose information.

In one embodiment, after obtaining the spatial position information of the first virtual object and the position and attitude information of the second device, the first virtual object can be superimposed at a suitable position in the real scene presented by the display medium of the second device virtual object. In the case that the first virtual object has gesture information, the gesture of the superimposed first virtual object may be further determined.

In one embodiment, after the first virtual object is superimposed, the user of the second device may perform various interactive operations on the first virtual object.

In one embodiment, a second virtual object may also be set for the second user of the second device in a similar manner, and the content information and spatial location information of the second virtual object may be sent to the first device or the first device of the first user. other devices (the first device and the other devices may be, for example, a mobile phone and glasses, respectively), wherein the content information and the spatial location information of the second virtual object can be used by the first device or the other devices to be based on its location information and/or gesture information to present the second virtual object on its display medium.

The steps in the method shown in FIG. 10 may be implemented by the server in the system shown in FIG. 9 , but it is understood that one or more of these steps may also be implemented by other devices.

11 illustrates a first user and virtual objects associated with the first user as viewed by a second user through his device (eg, glasses or cell phone), according to one embodiment. The virtual object may be, for example, an icon containing text, wherein the text is "pick-up, XXX of XX company". The spatial position of the virtual object is associated with the spatial position of the first user and can move as the first user moves.

Although two users are used as an example for description in some of the above embodiments, this is not a limitation, and the solution of the present application can also be applied to more users. Figure 12 shows an actual image observed by a user through his cell phone screen, the image including multiple users, each user having an associated virtual object, according to one embodiment.

In the above embodiments, a camera is used as an example of a sensor for description, but it can be understood that the embodiments herein are also applicable to any other sensor that can sense or determine the target position, such as lidar, millimeter-wave radar, wireless Signal Transceivers, etc.

It will be appreciated that the devices involved in the embodiments of the present application may be any devices carried or controlled by the user (eg, mobile phones, tablet computers, smart glasses, AR glasses, smart helmets, smart watches, vehicles, etc.), and also It can be various machines that can move autonomously, for example, unmanned aerial vehicles, unmanned vehicles, robots, etc., and image acquisition devices are installed on the equipment. It should be noted that the glasses in this application may be AR glasses, smart glasses, or any other glasses that can be used to present information to the user. The glasses in this application also include glasses formed by adding components or inserts to ordinary optical glasses, for example, glasses formed by adding a display device to ordinary optical glasses.

In one embodiment of the present invention, the present invention may be implemented in the form of a computer program. The computer program can be stored in various storage media (eg, hard disk, optical disk, flash memory, etc.), and when the computer program is executed by the processor, can be used to implement the method of the present invention.

In another embodiment of the present invention, the present invention may be implemented in the form of an electronic device. The electronic device includes a processor and a memory, and the memory stores a computer program that, when executed by the processor, can be used to implement the method of the present invention.

References herein to "various embodiments," "some embodiments," "one embodiment," or "an embodiment" etc. refer to the fact that a particular feature, structure, or property described in connection with the embodiment is included in the in at least one embodiment. Thus, appearances of the phrases "in various embodiments," "in some embodiments," "in one embodiment," or "in an embodiment" in various places throughout this document are not necessarily referring to the same implementation example. Furthermore, the particular features, structures, or properties may be combined in any suitable manner in one or more embodiments. Thus, particular features, structures, or properties shown or described in connection with one embodiment may be combined, in whole or in part, with the features, structures, or properties of one or more other embodiments without limitation, so long as the combination does not limit the Logical or not working. Expressions such as "according to A", "based on A", "by A" or "using A" appearing herein are meant to be non-exclusive, that is, "according to A" may encompass "according to A only" or Covers "according to A and B" unless specifically stated to mean "according to A only". In this application, for the sake of clarity, some schematic operation steps are described in a certain order, but those skilled in the art can understand that each of these operation steps is not essential, and some of them may be omitted or replaced by other steps. These operational steps also do not have to be performed sequentially in the manner shown, rather, some of these operational steps may be performed in a different order as practically desired, or in parallel, as long as the new implementation is not illogical or inoperable.

Having thus described several aspects of at least one embodiment of this invention, it will be appreciated that various changes, modifications, and improvements will readily occur to those skilled in the art. Such changes, modifications and improvements are intended to be within the spirit and scope of the present invention. Although the present invention has been described by way of some embodiments, the present invention is not limited to the embodiments described herein, and various changes and changes can be made without departing from the scope of the present invention.

Claims

A method for obtaining identification information of a device or its user in a scene in which one or more sensors and one or more visual markers are deployed, the sensors capable of being used to sense or determine the scene The location information of the device or user in the , the method includes:

receiving information sent by a device, the information including identification information of the device or its user and spatial location information of the device, wherein the device determines its spatial location information by scanning the visual marker;

Identifying the device or its user within the sensing range of the sensor based on the spatial location information of the device; and

The identification information of the device or its user is associated with the device or its user within the sensing range of the sensor, so as to provide a service to the device or its user.
The method of claim 1, further comprising:

Track the device or its user through the sensor and update the spatial location information of the device or its user; and

A service is provided to the device or its user based on the spatial location information of the device or its user.
The method of claim 1, further comprising:

Track the device or its user through the sensor and update the spatial location information of the device or its user; and

Setting related information of the virtual object associated with the device or its user, the related information including content information and spatial location information, wherein the spatial location information of the virtual object is related to the spatial location information of the device or its user .
The method according to claim 3, further comprising:

Information about the virtual object is sent to the other device, wherein the information about the virtual object can be used by the other device to present the virtual object on its display medium based on its position information and/or gesture information.
The method of claim 1, wherein the sensor comprises one or more of the following:

Camera;

radar;

Wireless signal transceiver.
The method of claim 1, further comprising: providing a service to the device or its user based on location information and/or gesture information of the device or its user.
6. The method of claim 6, further comprising sending information about a virtual object to the device, the information including spatial location information of the virtual object, wherein the virtual object can be presented to the device on the display medium.
The method of claim 1, further comprising:

Track the device or its user through the sensor to obtain location and/or gesture information of the device or its user; or

Its position information and/or attitude information is obtained through the device.
The method of claim 1, wherein the sensor comprises a camera, and wherein the identifying the device or its user within the sensing range of the sensor based on the spatial location information of the device comprises:

determining the imaging position of the device or its user in the image captured by the camera based on the spatial location information of the device; and

The device or its user in the image captured by the camera is identified based on the imaging position.
The method according to claim 9, wherein the determining the imaging position of the device or its user in the image captured by the camera based on the spatial position information of the device comprises:

Determine the device or the device based on the pre-established mapping relationship between one or more spatial positions in the scene and one or more imaging positions in the image captured by the camera and the spatial position information of the device. the imaging position of its user in the image captured by the camera; or

The imaging position of the device or its user in the image captured by the camera is determined based on the spatial position information of the device and the pose information of the camera.
The method of claim 1, wherein the identifying the device or its user within the sensing range of the sensor based on the spatial location information of the device comprises:

comparing the spatial location information of the device with the spatial location information of one or more devices or users determined according to the sensing result of the sensor to identify the device or the user within the sensing range of the sensor its users.
The method of claim 1, wherein the device determining its spatial location information by scanning the visual marker comprises:

capturing an image of the visual sign using the device;

determining the identification information of the visual sign and the position of the device relative to the visual sign by analyzing the image;

Obtain the position and attitude information of the visual sign in space through the identification information of the visual sign;

Based on the position and attitude information of the visual landmark in space and the position of the device relative to the visual landmark, the spatial position information of the device is determined.
A system for obtaining identification information of a device in a scene or a user thereof, the system comprising:

one or more sensors deployed in the scene, the sensors can be used to sense or determine location information of devices or users in the scene;

one or more visual markers deployed in the scene; and

A server configured to implement the method of any of claims 1-12.
The system of claim 13, wherein the sensor comprises one or more of the following:

Camera;

radar;

Wireless signal transceiver.
A storage medium in which a computer program is stored, which can be used to implement the method of any one of claims 1-12 when the computer program is executed by a processor.