CN112561953A

CN112561953A - Method and system for target recognition and tracking in real scenes

Info

Publication number: CN112561953A
Application number: CN201910917491.9A
Authority: CN
Inventors: 李江亮; 方俊
Original assignee: Beijing Whyhow Information Technology Co Ltd
Current assignee: Beijing Yimu Technology Co ltd
Priority date: 2019-09-26
Filing date: 2019-09-26
Publication date: 2021-03-26

Abstract

A method and system for target recognition and tracking in a real scene having a camera and an optical communication device mounted therein with a relative pose therebetween, the method comprising: tracking one or more targets in the real scene by a camera; obtaining the position information of the one or more targets according to the tracking result of the camera; the method comprises the steps that a server obtains information from a first device, and attribute information of the first device and position information of the first device can be determined according to the information from the first device; the server compares the position information of the first device with the position information of the one or more targets to determine a target matched with the first device; and setting related information of a target matched with the first equipment according to the attribute information of the first equipment.

Description

Method and system for target recognition and tracking in real scenes

Technical Field

The invention belongs to the technical field of augmented reality, and particularly relates to a method and a system for identifying and tracking a target in a real scene.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

Currently, common target recognition and tracking means include radar and computer vision. In the radar identification, target feature information such as amplitude, phase, frequency spectrum and polarization in echo is utilized, physical parameters such as size and shape of a target are estimated through various mathematical multi-dimensional space transformations, and then the target is classified and identified. Computer vision is to train a computer by using a large amount of data through an artificial intelligence technology such as deep learning, and the like, so that a high-precision target detection and recognition system is established.

Augmented Reality (AR), also known as mixed Reality technology, superimposes virtual objects into a real scene via computer technology so that the real scene and virtual objects can be rendered in real time into the same picture or space, thereby enhancing the user's perception of the real world. The augmented reality technology can perform augmented display output on a real environment, so that the method can be used for calibrating targets in a real scene.

However, the cost of using radar for target identification and tracking is high, a large amount of training is required for target detection and identification through machine learning, the time complexity is high, and the method is not suitable for daily target identification and detection. Furthermore, in augmented reality, the superimposed virtual object cannot move as the real object in the real scene moves.

Therefore, in order to solve the above problems, a fast, accurate, and low-cost target identification and tracking method and system are needed.

Disclosure of Invention

The scheme of the invention provides a method and a system for identifying and tracking targets in a real scene, which are characterized in that the position information of one or more targets is obtained through a camera, the position information of the one or more targets is compared with the position information of equipment scanning an optical communication device, the target matched with the equipment is determined, and the related information of the target is set according to the attribute information of the equipment, so that the target identification is realized. Further, identification information associated with the target may be set based on the related information of the target and presented near the target in the image acquired by the camera; a virtual object associated with the target can also be set based on the position of the target and related information, the virtual object can be accurately presented on a display medium, and the presented virtual object can follow the target in a real scene, so that target tracking is realized.

One aspect of the present invention relates to a method for target recognition and tracking in a real scene, wherein a camera and an optical communication device are installed in the real scene, and a relative pose is provided between the camera and the optical communication device, and the method comprises the following steps: tracking one or more targets in the real scene by a camera; obtaining the position information of the one or more targets according to the tracking result of the camera; the server obtains information from a first device and determines attribute information of the first device and position information of the first device according to the information from the first device, wherein the position information of the first device is obtained at least partially based on the position information of the first device relative to the optical communication device; the server compares the position information of the first device with the position information of the one or more targets to determine a target matched with the first device; and setting related information of a target matched with the first equipment according to the attribute information of the first equipment.

Optionally, the obtaining the position information of the one or more targets according to the tracking result of the camera includes: obtaining position information of the one or more targets relative to the camera according to the tracking result of the camera, and determining the position information of the one or more targets relative to the optical communication device according to the position information of the one or more targets relative to the camera and the relative pose information between the camera and the optical communication device.

Optionally, the obtaining the position information of the one or more targets according to the tracking result of the camera includes: obtaining position information of the one or more targets relative to the camera according to the tracking result of the camera, and determining the position information of the one or more targets in the real scene according to the position information of the one or more targets relative to the camera and the pose information of the camera in the real scene.

Optionally, the attribute information of the first device includes information related to a user of the first device.

Optionally, the first device determines its position information relative to the optical communication apparatus at least in part by capturing an image comprising the optical communication apparatus and analyzing the image.

Optionally, the method further includes: and setting identification information associated with the target based on the related information of the target, and presenting the identification information in the vicinity of the target in the image acquired by the camera.

Optionally, the method further includes: setting a virtual object associated with the target based on the related information of the target, wherein the spatial position information of the virtual object is determined according to the position information of the target; transmitting information related to the virtual object to a second device, wherein the information related to the virtual object is usable by the second device to render the virtual object on its display medium based on its position information and pose information.

Optionally, the spatial position information of the virtual object is updated with the change of the position information of the target, and is sent to the second device.

Optionally, the method further includes: acquiring attitude information of the target according to a tracking result of the camera; and setting the attitude information of the virtual object according to the attitude information of the target.

Optionally, the pose of the virtual object is adjustable according to a change in position and/or pose of the second device relative to the virtual object.

Optionally, a certain orientation of the virtual object is always towards the second device.

Another aspect of the invention relates to a system for target recognition and tracking in real scenes, comprising: one or more cameras installed in the real scene for tracking one or more targets in the real scene; one or more optical communication devices installed in the real scene, wherein the optical communication devices and the camera have relative poses; and a server for implementing any of the above methods.

A further aspect of the invention relates to a storage medium in which a computer program is stored which, when being executed by a processor, can be used for carrying out the above-mentioned method.

Yet another aspect of the invention relates to an electronic device comprising a processor and a memory, in which a computer program is stored which, when being executed by the processor, is operative to carry out the method as described above.

Drawings

Embodiments of the invention are further described below with reference to the accompanying drawings, in which:

FIG. 1 illustrates an exemplary optical label;

FIG. 2 illustrates a real-world scene for target recognition and tracking according to one embodiment;

FIG. 3 illustrates a method for target recognition and tracking in a real-world scene, according to one embodiment;

FIG. 4 shows a schematic diagram of a method for target recognition and tracking in a real scene, according to one embodiment;

FIG. 5 illustrates a method for target recognition and tracking in a real scene, according to one embodiment;

FIG. 6 shows a schematic diagram of a method for target recognition and tracking in a real scene according to one embodiment

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Optical communication devices are also referred to as optical labels, and these two terms are used interchangeably herein. The optical label can transmit information through different light emitting modes, has the advantages of long identification distance and loose requirements on visible light conditions, and the information transmitted by the optical label can change along with time, so that large information capacity and flexible configuration capacity can be provided. Compared with the traditional two-dimensional code, the optical label has longer identification distance and stronger information interaction capacity, thereby providing great convenience for users.

An optical label may typically include a controller and at least one light source, the controller may drive the light source through different driving modes to communicate different information to the outside. Fig. 1 shows an exemplary optical label 100 comprising three light sources (first light source 101, second light source 102, third light source 103, respectively). Optical label 100 further comprises a controller (not shown in fig. 1) for selecting a respective driving mode for each light source in dependence on the information to be communicated. For example, in different driving modes, the controller may control the manner in which the light source emits light using different driving signals, such that when the optical label 100 is photographed using the imaging-capable device, the image of the light source therein may take on different appearances (e.g., different colors, patterns, brightness, etc.). By analyzing the imaging of the light sources in the optical label 100, the driving pattern of each light source at the moment can be analyzed, so that the information transmitted by the optical label 100 at the moment can be analyzed.

In order to provide corresponding services to subscribers based on optical labels, each optical label may be assigned identification Information (ID) that is used by the manufacturer, manager, user, or the like of the optical label to uniquely identify or identify the optical label. In general, the light source may be driven by a controller in the optical tag to transmit the identification information outwards, and a user may use the device to perform image capture on the optical tag to obtain the identification information transmitted by the optical tag, so that a corresponding service may be accessed based on the identification information, for example, accessing a web page associated with the identification information of the optical tag, acquiring other information associated with the identification information (e.g., location information of the optical tag corresponding to the identification information), and so on. The devices referred to herein may be, for example, devices that a user carries with or controls (e.g., a cell phone with a camera, a tablet, smart glasses, AR glasses, a smart helmet, a smart watch, etc.), or may be machines that are capable of autonomous movement (e.g., a drone, an unmanned automobile, a robot, etc.). The device can acquire images of the optical labels through the cameras on the device to obtain images containing the optical labels, and can identify information transmitted by the optical labels by analyzing images of the optical labels (or all light sources in the optical labels) in the images.

Identification Information (ID) and other information of each optical label, such as service information related to the optical label, description information or attributes related to the optical label, such as position information, model information, physical size information, physical shape information, attitude or orientation information, etc. of the optical label may be maintained on the server. The optical label may also have uniform or default physical size information and physical shape information, etc. The device may use the identification information of the identified optical label to obtain further information related to the optical label from the server query. The position information of the optical label may refer to an actual position of the optical label in the physical world, which may be indicated by geographical coordinate information. A server may be a software program running on a computing device, or a cluster of computing devices. The optical label may be offline, i.e., the optical label does not need to communicate with the server. Of course, it will be appreciated that an online optical tag capable of communicating with a server is also possible. FIG. 2 illustrates a real-world scene for target recognition and tracking, including a system for target recognition and tracking, according to one embodiment. The system includes a camera, an optical tag, and a server (not shown in fig. 2), wherein the camera and the optical tag are each installed in a real-world scene at a specific position and posture (which may be collectively referred to as a "pose" hereinafter).

In one embodiment, the server may obtain respective pose information for the camera and the optical label, and may obtain relative pose information between the camera and the optical label based on the respective pose information for the camera and the optical label. In one embodiment, the server may also directly obtain the relative pose information between the camera and the optical label. In this manner, the server may obtain a transformation matrix between the camera coordinate system and the optical label coordinate system, which may include, for example, a rotation matrix R and a displacement vector t between the two coordinate systems. The coordinates in one coordinate system can be converted to coordinates in the other coordinate system by a transformation matrix between the camera coordinate system and the optical label coordinate system. In one embodiment, when the camera and the optical label are installed, the pose information of the camera and the optical label can be manually calibrated, and the pose information is stored in the server. The camera may be a camera mounted in a fixed position and having a fixed orientation, but it is understood that the camera may also be a moveable (e.g., position changeable or direction adjustable) camera, so long as its current pose information can be determined. The current pose information of the camera can be set by the server, the movement of the camera is controlled based on the pose information, and the movement of the camera can be controlled by the camera or other devices and the current pose information of the camera is sent to the server. In some embodiments, more than one camera may be included in the system, and more than one optical label may be included.

In one embodiment, a scene coordinate system (which may also be referred to as a real-world coordinate system) may be established for the real scene, and a transformation matrix between the camera coordinate system and the scene coordinate system may be determined based on pose information of the camera in the real scene, and a transformation matrix between the optical tag coordinate system and the scene coordinate system may be determined based on pose information of the optical tag in the real scene. In this case, the coordinates in the camera coordinate system or the optical tag coordinate system may be converted to coordinates in the scene coordinate system without transformation between the camera coordinate system and the optical tag coordinate system, but it will be appreciated that the relative pose information or transformation matrix between the camera and the optical tag can still be known.

The camera is used to track objects in a real scene, which may be stationary or moving, which may be, for example, people in the scene, stationary objects, movable objects, and so on. For the scenario shown in fig. 2, the system may track the locations of multiple targets therein via the cameras. The camera may be, for example, a monocular camera, a binocular camera, or other forms of cameras. The position of a person or object in a real scene can be tracked using a camera by various methods known in the art. For example, for the case where a single monocular camera is used, the location information of the objects in the scene may be determined in conjunction with the scene information (e.g., information of the plane in which the person or object in the scene is located). For the case of using a binocular camera, the position information of the target may be determined according to the position of the target in the camera view field and the depth information of the target. In the case of using a plurality of cameras, the position information of the target can be determined according to the position of the target in the field of view of each camera.

In one embodiment, the process of determining location information of objects in a scene may be performed by a camera and corresponding results may be sent to a server. In another embodiment, the location information of the objects in the scene may be determined by the server from images taken by the cameras. The server may convert the determined location information of the object into location information in a light tag coordinate system or a scene coordinate system.

A person in a scene carrying a device (e.g., a cell phone with a camera or AR glasses) may use the device to determine its location information by scanning an optical label and send the location information to a server through the device. Devices in the scene (e.g., drones, unmanned vehicles, robots, etc.) may also use cameras mounted thereon to determine their location information by scanning optical labels and sending the location information to a server. The position information may be, for example, position information of the device relative to the optical label. The device may also send attribute information of the device to the server, which may include, for example, any information about the device or its user.

After the server obtains the position information of the target and the position information of the device, the position information of the device can be compared with the position information of each target in the scene, the target matched with the device is determined, and then the related information of the target can be set according to the attribute information of the device to realize target identification.

The airport hall is taken as an example for explanation, a plurality of cameras and optical labels are installed in the airport hall, and the relative pose information between each camera and each optical label is known. Airport staff can observe all passengers and position information thereof in an airport hall through a camera, but cannot identify the passengers, namely cannot obtain personal information (such as names, nationalities, certificate numbers, flight numbers and the like) of the passengers. When a traveler scans an optical label in an airport lobby with a device (e.g., cell phone, smart glasses, etc.) carried by the traveler in exchange for an electronic registration card or otherwise, the server obtains information from the traveler's device, including location information for the device and attribute information for the device (which may include information about the traveler). The specific passenger using the equipment is determined by comparing the position information of the equipment with the position information of all passengers collected by the camera, and the identity of the passenger is calibrated according to the personal information related to the passenger in the equipment, so that the identification of the identity of the passenger is realized.

And then, a hospital is taken as a scene for explanation, a plurality of cameras and optical labels can be installed in an outpatient department or an inpatient department of the hospital, and the relative pose information between each camera and each optical label is known. Medical personnel can observe the patient and the position information thereof through the camera, but cannot determine the identity, the disease condition, the department or ward of the patient, the guardian or the contact person of the patient and the like. After a patient scans an optical label in a hospital through a carried device to perform registration, payment or other operations, the server may obtain information from the patient's device, including location information of the device and attribute information of the device (which may also include information related to the patient). The specific patient using the equipment is determined by comparing the position information of the equipment with the position information of all patients collected by the camera, and the identity of the patient is calibrated according to the patient information in the equipment, so that the accurate identification of the identity of the patient is realized.

FIG. 3 illustrates a method for target recognition and tracking in a real scene, according to one embodiment, the method comprising the steps of:

step 310: one or more targets in a real scene are tracked by a camera.

By continuously acquiring images of a real scene using one or more cameras, targets present in the real scene can be visually tracked. Camera-based visual tracking techniques involve detecting, extracting, recognizing, or tracking a target in a sequence of images to obtain a position, pose, velocity, acceleration, or motion trajectory of the target, etc. Visual tracking techniques are known in the art and will not be described in detail herein.

In one embodiment, the camera may only perform continuous image acquisition when tracking the target and provide the acquired images as a tracking result to the server, after which the images may be analyzed by the server and position information of the respective target determined. In another embodiment, the camera may also perform further processing on the acquired images, such as image processing, object detection, object extraction, object recognition, determining object position or pose, etc., and the corresponding processing results may be provided to the server as tracking results.

Step 320: and obtaining the position information of one or more targets in the scene according to the tracking result of the camera.

The server may receive the tracking result from the camera and obtain position information of the target in the real scene according to the tracking result. In one embodiment, the target position information finally obtained by the server may be position information of the target in a camera coordinate system, position information of the target in an optical label coordinate system, or position information of the target in a scene coordinate system. The server can realize the conversion of the target position between different coordinate systems according to the transformation matrix between the different coordinate systems. For example, the server may first obtain the position information of the target relative to the camera according to the tracking result of the camera (i.e., the position information in the camera coordinate system), then may determine the position information of the target relative to the optical label (i.e., the position information in the optical label coordinate system) according to the position information of the target relative to the camera and the relative pose information between the camera and the optical label, and may also determine the position information of the target in the real scene (i.e., the position information in the scene coordinate system) according to the position information of the target relative to the camera and the pose information of the camera in the real scene.

In one embodiment, in addition to obtaining the position information of the target, the server may obtain attitude information of the target, such as the orientation of a person or an object, and the like, according to the tracking result of the camera.

Step 330: the server obtains information from the first device and determines attribute information of the first device and location information of the first device based on the information from the first device, wherein the location information of the first device is obtained based at least in part on the location information of the first device relative to the optical label.

The information from the first device may include attribute information of the first device, location information of the first device, and other information. The first device may identify information conveyed by the optical tag by scanning the optical tag and access the server based on the information to transmit attribute information and location information of the first device to the server.

In one embodiment, the attribute information of the first device may include information related to the device, such as a device name, an identification number, and the like. In one embodiment, the attribute information of the first device may further include information related to a user using the device, such as name, occupation, identity, gender, age of the owner of the device, account information of a certain application on the device, or information related to a certain operation performed by the user using the device (e.g., changing boarding pass by scanning optical label in airport scenario, or outpatient registration or payment by scanning optical label in hospital scenario, etc.). In one embodiment, the attribute information of the first device further includes user-customized information, such as a nickname, avatar, signature, and other personalized settings of the user.

The position information of the first device may be position information of the first device relative to the optical label, or may be position information of the first device in the real scene. In one embodiment, the server may extract the location information of the first device relative to the optical label from the information from the first device. In one embodiment, the server may obtain the location information of the first device relative to the optical label by analyzing information from the first device. For example, the information from the first device may include an image taken by the first device containing an optical label, and the server may obtain the location information of the first device relative to the optical label by analyzing the image. In one embodiment, the server may obtain the location information of the first device in the real scene based on its location information relative to the optical label transmitted by the first device and the location information of the optical label itself. In one embodiment, the location information transmitted by the first device to the server may be obtained based on its location information relative to the optical label and the location information of the optical label itself, which may be obtained from the server. In one embodiment, the location information transmitted by the first device to the server may also be new location information obtained by the first device after scanning the optical label by measuring or tracking using a built-in acceleration sensor, gyroscope, camera, etc. by methods known in the art (e.g., inertial navigation, visual odometer, SLAM, VSLAM, SFM, etc.).

The device may determine its position information relative to the optical label by capturing an image including the optical label and analyzing the image. For example, the device may determine the relative distance of the optical label from the device (the greater the imaging, the closer the distance; the smaller the imaging, the further the distance) by the imaging size of the optical label in the image and optionally other information (e.g., actual physical dimension information of the optical label, the focal length of the camera of the device). The device may obtain actual physical size information of the optical label from the server using the identification information of the optical label, or the optical label may have a uniform physical size and store the physical size on the device. The device may determine orientation information of the device relative to the optical label by perspective deformation of the optical label imaging in the image comprising the optical label and optionally other information (e.g., imaging position of the optical label). The device may obtain physical shape information of the optical label from a server using identification information of the optical label, or the optical label may have a uniform physical shape and store the physical shape on the device. In one embodiment, the device may also directly obtain the relative distance of the optical label from the device through a depth camera or a binocular camera or the like mounted thereon. The device may also use any other positioning method known in the art to determine its position information relative to the optical label. In one embodiment, the device may scan the optical label and may determine its pose information relative to the optical label based on the imaging of the optical label, and may consider the second device as currently facing the optical label when the imaging location or imaging area of the optical label is centered in the imaging field of view of the second device. The direction of imaging of the optical label may further be taken into account when determining the pose of the device. As the pose of the device changes, the imaging position and/or imaging direction of the optical label on the device changes accordingly, and therefore pose information of the device relative to the optical label can be obtained from the imaging of the optical label on the device. In one embodiment, the position and pose information of the device relative to the optical label may also be determined as follows. In particular, a coordinate system may be established from the optical label, which may be referred to as the optical label coordinate system. Some points on the optical label may be determined as some spatial points in the optical label coordinate system, and the coordinates of these spatial points in the optical label coordinate system may be determined according to the physical size information and/or the physical shape information of the optical label. Some of the points on the optical label may be, for example, corners of a housing of the optical label, ends of a light source in the optical label, some identification points in the optical label, and so on. According to the object structure features or the geometric structure features of the optical label, image points corresponding to the space points can be found in the image shot by the equipment camera, and the positions of the image points in the image are determined. According to the coordinates of each space point in the optical label coordinate system and the positions of corresponding image points in the image, and by combining the internal reference information of the equipment camera, the pose information (R, t) of the equipment camera in the optical label coordinate system when the image is shot can be obtained through calculation, wherein R is a rotation matrix which can be used for representing the pose information of the equipment camera in the optical label coordinate system, and t is a displacement vector which can be used for representing the position information of the equipment camera in the optical label coordinate system. Methods of calculating R, t are known in the art, and R, t may be calculated using, for example, the 3D-2D PnP (Passive-n-Point) method, and will not be described in detail herein in order not to obscure the invention.

Step 340: and the server compares the position information of the first equipment with the position information of one or more targets in the real scene to determine the target matched with the first equipment.

The server may, for example, select one of the one or more targets collected by the camera that is closest to the location information of the first device and consider the target to match the first device.

Step 350: and setting related information of the matched target according to the attribute information of the first device.

After determining the target matching the first device, the server may, for example, add the attribute information of the first device to the related information of the target, or may modify the related information of the target according to the attribute information of the first device, or the like, thereby achieving identification of the target. In one embodiment, the attribute information of the first device may include information related to the device, such as a device name, an identification number, and the like. In one embodiment, the attribute information of the first device may also include information about the user associated with the device, such as the name, occupation, identity, gender, age of the owner of the device, account information for an application on the device, or information about an operation performed by the user using the device, such as the time at which the optical label was scanned and identified, etc. In one embodiment, the attribute information of the first device further includes user-customized information, such as a nickname, avatar, signature, and other personalized settings of the user. The attribute information from the passenger devices may include, for example, the passenger's name, nationality, certificate number, departure location, arrival location, airline, flight number, membership grade, etc., as exemplified above in the airport lobby. After the server obtains the attribute information from the passenger device, the relevant information of the passenger can be set according to the attribute information.

In another embodiment of the present invention, after the target recognition is implemented based on the above method, the server may further set identification information (e.g., icons, letters, numbers, etc.) associated with the target according to the related information of the target, and present the identification information near the target in the image captured by the camera, where the identification information may move along with the target. The images collected by the camera can track all targets in a real scene and can present relevant information of all targets. Fig. 4 is a schematic diagram illustrating a method for identifying and tracking a target according to an embodiment of the present invention, as shown in fig. 4, by taking the above-mentioned airport lobby as an example, identification information of passengers (such as VIP passengers and ordinary passengers) is set according to the information about the passengers and is presented in an image collected by a camera, so that airport staff can conveniently observe the information about each passenger in the airport lobby, thereby greatly improving security, management, service, and the like of the airport.

In another embodiment of the present invention, after the target identification is implemented based on the above method, the server may further set a virtual object associated with the target according to the related information of the target, where the virtual object has spatial location information, and the spatial location information is determined by the location information of the target; the server may send information about the virtual object to the second device, enabling it to be used by the second device to render the virtual object on its display medium based on its position information and pose information.

Fig. 5 shows a target recognition and tracking method for use in a real scene according to an embodiment, the method may further set a virtual object for a target matching a first device, and may be presented on a second device, where step 510 and step 550 are similar to step 310 and step 350 in fig. 3, and are not described herein again. The method of fig. 5 further comprises the steps of:

step 560: the server sets a virtual object associated with a target matching the first device based on the related information of the target, the spatial position information of the virtual object being determined according to the position information of the target.

After obtaining the related information of the target matching with the first device, the server may set the virtual object associated therewith, for example, based on the related information of the target, for example, set the related information of the virtual object. The information related to the virtual object is information for describing the virtual object, and may include, for example, a picture, a letter, a number, an icon, and the like included in the virtual object, and may also include shape information, color information, size information, posture information, and the like of the virtual object. In one embodiment, the related information of the virtual object may include information related to the target device, such as a device name, an identification number, and the like. In one embodiment, the information related to the virtual object may also include information related to the target, such as the name, occupation, identity, gender, age of the target, account information of an application of the target, or information related to an operation performed by the target using the device, such as the time at which the optical label was scanned and recognized, etc. Taking the airport lobby mentioned above as an example, the relevant information from the virtual object associated with the traveler may include, for example, the traveler's name, nationality, certificate number, departure location, arrival location, airline, flight number, membership grade, etc. Based on this information, the device may render the corresponding virtual object. The server may configure the relevant information of the virtual object corresponding to the target according to the information related to the target, and thus, the corresponding virtual object may be customized for each target.

The related information of the virtual object may further include spatial position information of the virtual object. The server may determine spatial location information of its virtual object based on the location information of the target. For example, the spatial position of the virtual object may be configured to be located at a predetermined distance above the target position. The spatial position information of the virtual object may be, for example, spatial position information of the virtual object with respect to the optical label, or position information of the virtual object in the scene coordinate system.

In one embodiment, the server may further set the pose information of the virtual object, which may be the pose information of the virtual object with respect to the optical label, or its pose information in the real world coordinate system.

Step 570: the information related to the virtual object is sent to the second device, which can be used by the second device to render the virtual object on its display medium based on its position information and pose information.

The server may send the information about the virtual object to the second device in a variety of ways. In one embodiment, the server may send the information relating to the virtual object directly to the second device, for example over a wireless link. In one embodiment, the optical tag identification device may identify information (e.g., identification information) conveyed by the optical tag by scanning the optical tag disposed in the scene and use the information to access a server (e.g., by wireless signal access) to obtain information about the virtual object from the server. In one embodiment, the server may send information about the virtual object to the optical label identification device in an optical communication manner using the optical label.

The relevant information of the virtual object can be used by the optical label recognition device to render the virtual object on its display medium based on its position information and pose information.

FIG. 6 is a diagram illustrating a second device presenting a virtual object in a target recognition and tracking method according to an embodiment of the present invention. As shown in fig. 6, after the server obtains the information related to the traveler, the server may set a virtual object "VIP" associated with the traveler according to the information related to the traveler (such as VIP traveler and general traveler) and send it to the second device that scans and recognizes the optical tag, so that it can be used by the second device to present the virtual object "VIP" on its display medium based on its position information and posture information, for example.

As previously mentioned, the device may determine its position information and pose information by optical labels. The position information and the posture information of the device may be position information and posture information obtained when the device scans the optical label, or may be new position information and posture information obtained by the device after scanning the optical label by measuring or tracking using a built-in acceleration sensor, gyroscope, camera, or the like by a method known in the art (for example, inertial navigation, visual odometer, SLAM, VSLAM, SFM, or the like).

In one embodiment, as the target moves, the spatial position information of the virtual object may be updated based on new position information of the target obtained by the camera and transmitted to the second device.

In the case of a virtual object having pose information, the pose of the virtual object may be adjusted with the position and/or pose of the device relative to the virtual object, for example, such that a certain orientation of the virtual object (e.g., the frontal direction of the virtual object) is always directed towards the device. In one embodiment, a direction from the virtual object to the device may be determined in space based on the location of the device and the virtual object, and the pose of the virtual object may be determined based on the direction. By the above method, the same virtual object can actually have respective postures for the devices at different positions.

In one embodiment of the invention, the invention may be implemented in the form of a computer program. The computer program may be stored in various storage media (e.g., hard disk, optical disk, flash memory, etc.), which when executed by a processor, can be used to implement the methods of the present invention.

In another embodiment of the invention, the invention may be implemented in the form of an electronic device. The electronic device comprises a processor and a memory in which a computer program is stored which, when being executed by the processor, can be used for carrying out the method of the invention.

References herein to "various embodiments," "some embodiments," "one embodiment," or "an embodiment," etc., indicate that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, appearances of the phrases "in various embodiments," "in some embodiments," "in one embodiment," or "in an embodiment," or the like, in various places throughout this document are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Thus, a particular feature, structure, or characteristic illustrated or described in connection with one embodiment may be combined, in whole or in part, with a feature, structure, or characteristic of one or more other embodiments without limitation, as long as the combination is not logically inconsistent or workable. Expressions appearing herein similar to "according to a", "based on a", "by a" or "using a" mean non-exclusive, i.e. "according to a" may encompass "according to a only", as well as "according to a and B", unless specifically stated or clear from context that the meaning is "according to a only". In the present application, for clarity of explanation, some illustrative operational steps are described in a certain order, but one skilled in the art will appreciate that each of these operational steps is not essential and some of them may be omitted or replaced by others. It is also not necessary that these operations be performed sequentially in the manner shown, but rather that some of these operations be performed in a different order, or in parallel, as desired, provided that the new implementation is not logically or operationally unfeasible.

Having thus described several aspects of at least one embodiment of this invention, it is to be appreciated various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications, and improvements are intended to be within the spirit and scope of the invention. Although the present invention has been described by way of preferred embodiments, the present invention is not limited to the embodiments described herein, and various changes and modifications may be made without departing from the scope of the present invention.

Claims

1. A method for target recognition and tracking in a real scene, wherein a camera and an optical communication device are installed in the real scene, and the camera and the optical communication device have a relative pose therebetween, the method comprising:

tracking, by the camera, one or more targets in the real scene;

obtaining the position information of the one or more targets according to the tracking result of the camera;

the server obtains information from a first device and determines attribute information of the first device and position information of the first device according to the information from the first device, wherein the position information of the first device is obtained at least partially based on the position information of the first device relative to the optical communication device;

the server compares the position information of the first device with the position information of the one or more targets to determine a target matched with the first device;

and setting related information of a target matched with the first equipment according to the attribute information of the first equipment.

2. The method of claim 1, wherein the obtaining the location information of the one or more targets from the tracking result of the camera comprises:

obtaining position information of the one or more targets relative to the camera according to the tracking result of the camera, an

Determining position information of the one or more targets relative to the optical communication device based on the position information of the one or more targets relative to the camera and relative pose information between the camera and the optical communication device.

3. The method of claim 1, wherein the obtaining the location information of the one or more targets from the tracking result of the camera comprises:

Determining position information of the one or more targets in the real scene according to the position information of the one or more targets relative to the camera and pose information of the camera in the real scene.

4. The method of claim 1, wherein the attribute information of the first device comprises information related to a user of the first device.

5. The method of claim 1, wherein the first device determines its position information relative to the optical communication apparatus at least in part by capturing an image comprising the optical communication apparatus and analyzing the image.

6. The method of claim 1, further comprising:

and setting identification information associated with the target based on the related information of the target, and presenting the identification information in the vicinity of the target in the image acquired by the camera.

7. The method of claim 1, further comprising:

setting a virtual object associated with the target based on the related information of the target, wherein the spatial position information of the virtual object is determined according to the position information of the target;

transmitting information related to the virtual object to a second device, wherein the information related to the virtual object is usable by the second device to render the virtual object on its display medium based on its position information and pose information.

8. The method of claim 7, wherein the spatial location information of the virtual object is updated as the location information of the target changes and is transmitted to the second device.

9. The method of claim 7, further comprising:

acquiring attitude information of the target according to a tracking result of the camera; and

and setting the attitude information of the virtual object according to the attitude information of the target.

10. The method of claim 7, wherein the pose of the virtual object is adjustable according to a change in position and/or pose of the second device relative to the virtual object.

11. The method of claim 10, wherein a certain orientation of the virtual object is always towards the second device.

12. A system for target recognition and tracking in real scenes, comprising:

one or more cameras installed in the real scene for tracking one or more targets in the real scene;

one or more optical communication devices installed in the real scene, wherein the optical communication devices and the camera have relative poses; and

a server for implementing the method of any one of claims 1-11.

13. A storage medium in which a computer program is stored which, when being executed by a processor, is operative to carry out the method of any one of claims 1-11.

14. An electronic device comprising a processor and a memory, the memory having stored therein a computer program which, when executed by the processor, is operable to carry out the method of any of claims 1-11.