CN113280817A

CN113280817A - Visual navigation based on landmarks

Info

Publication number: CN113280817A
Application number: CN202010652637.4A
Authority: CN
Inventors: 诸小熊; 李军舰; 姚迪狄
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2021-08-20
Anticipated expiration: 2040-07-08
Also published as: CN113280817B

Abstract

The invention provides a visual navigation method based on landmarks, which comprises the following steps: determining landmarks in a visual scene; acquiring multi-degree-of-freedom information of the intelligent object relative to the landmark; acquiring multi-degree-of-freedom change information of the intelligent body relative to the landmark; and navigating the motion of the intelligent agent according to the multi-degree-of-freedom change information. The method comprises the steps that six-degree-of-freedom information of a landmark in a visual scene is constructed through camera image information and the position and posture information of a gyroscope of the node, and the six-degree-of-freedom information comprises three coordinate information of the landmark in the visual scene, namely transverse, longitudinal and far and near directions, and three angle information of pitching, rotating and yawing on a coordinate point; then, according to the six-degree-of-freedom information, high frame rate visual navigation with the landmark as a reference point can be realized. The invention can be used for displaying virtual articles/characters in VR/AR, can also be used in scenes such as unmanned driving, robot navigation and the like, and can realize high real-time navigation of an intelligent agent on mobile equipment with common computing capability by matching with a gyroscope.

Description

Visual navigation based on landmarks

Technical Field

The invention relates to the technical field of map navigation, in particular to a visual navigation method and device based on landmarks.

Background

With the rapid development of computer vision technology, the visual scene map construction and navigation technology based on computer vision is widely applied to scenes such as VR/AR (virtual reality/augmented reality) and automatic navigation due to the characteristics of low cost, wide application and the like.

The most common visual map scene construction scheme is a visual SLAM (simultaneous Localization and mapping, instant positioning and map construction) technology, and map information is constructed through a sensor, a visual odometer and the like and is used for judging the position information of the current intelligent agent. This solution presents several problems: first, the map construction process of SLAM is complex: the visual SLAM requires that scene information of a plurality of angles is input to a scene, and then map information is constructed through technologies such as feature extraction and matching. Secondly, the computational complexity is high, the navigation speed is slow: because map information established by the visual SLAM is more and the characteristics are richer, the navigation calculation amount based on the map is large, and real-time navigation is difficult to realize in common computing equipment, particularly mobile equipment.

Therefore, a solution for visual navigation is needed to reduce the complexity of map construction and increase the navigation speed, and to support application to a common computing device.

Disclosure of Invention

The invention aims to provide a visual navigation method based on landmarks, so as to realize instant and simple visual landmark construction and high real-time visual navigation.

In order to achieve the above object, an embodiment of the present invention provides a landmark based visual navigation method, including:

determining landmarks in a visual scene;

acquiring multi-degree-of-freedom information of the intelligent object relative to a landmark;

acquiring multi-degree-of-freedom change information of the intelligent body relative to the landmark;

and navigating the motion of the intelligent agent according to the multi-degree-of-freedom change information.

Further, the multiple degree of freedom information includes coordinate information and angle information.

Further, the information with multiple degrees of freedom is information with six degrees of freedom, and the information with six degrees of freedom comprises an abscissa, an ordinate and a depth coordinate of the intelligent body relative to the landmark in the visual scene, and a pitch angle, a yaw angle and a rotation angle of the intelligent body in a space coordinate system; the acquiring of the multi-degree-of-freedom information of the landmark relative to the intelligent agent specifically comprises the following steps:

acquiring a visual scene shot by the intelligent object camera, analyzing the visual scene, and determining the abscissa, the ordinate and the depth coordinate of the intelligent object relative to the landmark;

sensor data of the agent is acquired, and a pitch angle, a yaw angle and a rotation angle of the agent in a spatial coordinate system are determined.

Further, determining landmarks in the visual scene is specifically: a region in the visual scene that is pre-selected by the user serves as the landmark.

Further, determining landmarks in the visual scene is specifically: and identifying a salient object target in the visual scene as the landmark by using a subject identification algorithm, or detecting a specific area as the landmark by using a target detection algorithm.

Further, the method further comprises: after the multi-degree-of-freedom information is obtained, initializing an image tracking algorithm by using the multi-degree-of-freedom information, wherein the image tracking algorithm is used for obtaining the position and/or the area of the landmark in the current visual scene.

Further, the method further comprises: and judging whether the current landmark is lost, if so, stopping the motion navigation and starting a redetection step.

Further, the re-detecting step specifically comprises: and detecting the landmark by taking the last frame before the loss as a template, and if the landmark is detected, re-acquiring the multi-degree-of-freedom information of the intelligent object relative to the landmark.

Further, the area center coordinates of the landmark image are used as the abscissa and the ordinate of the landmark, so that the abscissa and the ordinate of the intelligent object relative to the landmark are obtained, and the distance between the intelligent object camera and the landmark is used as a depth coordinate; wherein, the obtaining process of the depth value is as follows: and acquiring a minimum circumcircle of the region of the landmark image, and taking the product of the radius R of the circumcircle and the prior coefficient k as the depth coordinate of the landmark, thereby acquiring the depth coordinate of the intelligent object relative to the landmark.

Further, the change of the multiple degree of freedom change information of the agent with respect to the landmark includes: information on changes in pitch, yaw and rotation angles, displacement of the agent relative to the landmark in the plane, and depth displacement of the agent relative to the landmark; wherein the displacement in the landmark plane is a variation of coordinates of the landmark in the current image frame and initial coordinates of the landmark.

Further, the depth displacement is determined according to the minimum circumscribed circle radius of the current landmark image region and the minimum circumscribed circle radius of the landmark image region when the landmark is constructed.

The embodiment of the invention also provides a visual navigation device based on the landmark, which comprises:

a landmark determination module to determine landmarks in a visual scene;

the multi-degree-of-freedom information construction module is used for acquiring the multi-degree-of-freedom information of the intelligent object relative to the landmark;

the change information acquisition module is used for acquiring the position change information of the intelligent body relative to the landmark;

and the visual navigation module is used for navigating the motion of the intelligent agent according to the position change information.

Further, the information with multiple degrees of freedom is information with six degrees of freedom, and the information with six degrees of freedom comprises an abscissa, an ordinate and a depth coordinate of the intelligent body relative to the landmark in the visual scene, and a pitch angle, a yaw angle and a rotation angle of the intelligent body in a space coordinate system; the multi-degree-of-freedom information construction module is specifically used for:

Further, the landmark determination module is specifically configured to: a region in the visual scene that is pre-selected by the user serves as the landmark.

Further, the landmark determination module is specifically configured to: and identifying a salient object target in the visual scene as the landmark by using a subject identification algorithm, or detecting a specific area as the landmark by using a target detection algorithm.

Further, the vision multi-degree-of-freedom information construction module is further configured to: after the multi-degree-of-freedom information is obtained, initializing an image tracking algorithm by using the multi-degree-of-freedom information, wherein the image tracking algorithm is used for obtaining the position and/or the area of the landmark in the current visual scene.

Further, the visual navigation module is further configured to determine whether the current landmark is lost, stop the motion navigation if the current landmark is lost, and start the re-detection module.

Further, the re-detection module is configured to detect the landmark by using the last frame before the loss is determined as a template, and if the landmark is detected, re-acquire the multi-degree-of-freedom information of the intelligent object relative to the landmark.

Further, the depth displacement is determined according to the minimum circumscribed circle radius of the current landmark image region and the minimum circumscribed circle radius of the landmark image region when the landmark is constructed. The embodiment of the invention also provides an image acquisition method, which comprises the following steps:

determining an acquisition object in a visual scene, wherein the acquisition object is at least one salient object or a specific area in the visual scene;

acquiring an image of the object;

acquiring multi-degree-of-freedom information of the intelligent body relative to an acquisition object;

associating the image of the acquired object with the multi-degree-of-freedom information;

storing the image of the acquisition object and the associated multiple degree of freedom information.

Further, determining the acquisition object in the visual scene specifically includes: and identifying a salient object in the visual scene as the acquisition object by using an image subject identification algorithm, or detecting a specific area as the acquisition object by using a target detection algorithm.

Further, the information with multiple degrees of freedom is information with six degrees of freedom, and the information with six degrees of freedom comprises an abscissa, an ordinate and a depth coordinate of the intelligent body relative to the acquisition object in the visual scene, and a pitch angle, a yaw angle and a rotation angle of the intelligent body in a space coordinate system; the acquiring of the multi-degree-of-freedom information of the acquisition object relative to the intelligent agent specifically comprises:

acquiring a visual scene shot by the intelligent object camera, analyzing the visual scene, and determining the abscissa, the ordinate and the depth coordinate of the intelligent object relative to the acquisition object;

Further, the method further comprises:

acquiring environment attribute information when the intelligent agent collects the object image;

associating the image of the acquisition object with the environment attribute information;

storing the associated environment attribute information.

Further, the method further comprises:

acquiring multi-degree-of-freedom information and/or environment attribute information of a current intelligent body relative to a specified object;

acquiring an image of the specified object according to the multi-degree-of-freedom information and/or the environment attribute information;

presenting an image of the specified object.

Embodiments of the present invention further provide a computer program product comprising computer program instructions, which when executed by a processor, are configured to implement the aforementioned landmark based visual navigation method or the aforementioned image acquisition method.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the aforementioned landmark-based visual navigation method or the aforementioned image acquisition method is implemented.

The invention has the beneficial effects that: the invention provides a visual navigation method based on landmarks, which comprises the following steps: acquiring multi-degree-of-freedom information of the landmark relative to the intelligent agent; acquiring multi-degree-of-freedom change information of the landmark relative to the intelligent agent; and navigating the motion of the intelligent agent according to the multi-degree-of-freedom change information. The method comprises the steps that six-degree-of-freedom information of a landmark in a visual scene is constructed through camera image information and the position and posture information of a gyroscope of the node, and the six-degree-of-freedom information comprises three coordinate information of the landmark in the visual scene, namely transverse, longitudinal and far and near directions, and three angle information of pitching, rotating and yawing on a coordinate point; then, according to the six-degree-of-freedom information, high frame rate visual navigation with the landmark as a reference point can be realized. The invention can be used for displaying the virtual objects/characters with six degrees of freedom in VR/AR, and can also be used in scenes such as unmanned driving, robot navigation and the like. And by matching with gyroscope information, high real-time navigation of the intelligent agent can be realized on mobile equipment with common computing capability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a method according to a first embodiment of the present invention.

Fig. 2 is a schematic diagram of landmark regions in a visual scene.

FIG. 3 is a block diagram of an apparatus according to a second embodiment of the present invention

FIG. 4 is a flowchart of a method according to a third embodiment of the present invention.

Detailed Description

To facilitate understanding and implementing the present invention for those skilled in the art, the following technical solutions of the present invention are described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Because the map image information constructed by the visual SLAM has complex image features and the algorithm complexity of the visual SLAM is high, real-time navigation on mobile equipment, particularly mobile equipment (such as a mobile phone) with general computing capability is difficult to realize.

According to the scheme, the image tracking algorithm is utilized, only the image tracking of the landmark region is needed, and the displacement and the attitude change of the intelligent object relative to the landmark can be obtained by utilizing the image coordinate system information and the gyroscope information. Most of the existing image tracking algorithms have the characteristic of high real-time performance, so that real-time image tracking can be realized on mobile terminal equipment. Therefore, the high real-time navigation of the intelligent agent can be realized on the mobile equipment with common computing capability by matching with the gyroscope information.

An agent mainly refers to a device that is mobile, equipped with a camera, a gyroscope, and a computing unit, such as a smartphone, a drone with a camera, and the like.

Example one

Referring to fig. 1, an embodiment of the present invention provides a visual navigation method based on landmarks, which includes a landmark determining step, a multiple degree of freedom information constructing step, a change information acquiring step, and a visual navigation step.

A landmark determining step for determining landmarks in the visual scene. The landmark is a mark area used for position and posture reference in the motion navigation process, and the position area in the visual scene is preset by a user, as shown in fig. 2, the user uses a vertical cabinet in the visual scene as the landmark. Of course, landmarks may also be determined by certain intelligent algorithms, such as: the most prominent object target in the visual scene is identified as a landmark using a subject recognition algorithm, or in a particular scene, a particular area (e.g., a logo) is detected as a landmark using a target detection algorithm.

And a multi-degree-of-freedom information construction step, namely acquiring the multi-degree-of-freedom information of the intelligent object relative to the landmark. After the multi-degree-of-freedom information is obtained, initializing an image tracking algorithm by using the multi-degree-of-freedom information so as to realize the area tracking of a visual image layer.

Wherein the multi-degree-of-freedom information is six-degree-of-freedom information. The six degrees of freedom include an abscissa, an ordinate, a depth coordinate of the agent with respect to the landmark in the visual scene, and a pitch angle, a yaw angle, and a rotation angle of the agent in a spatial coordinate system. The visual scene is a visual scene in an image frame shot by the intelligent camera. Attitude angle information such as a pitch angle, a yaw angle, a rotation angle, and the like can be obtained by a gyroscope in the agent.

As shown in fig. 2, the area center coordinates (x, y) of the landmark image are taken as the abscissa and the ordinate of the landmark, and the abscissa and the ordinate of the agent relative to the landmark are further acquired. And taking the distance of the intelligent agent camera relative to the landmark as a depth coordinate. Wherein, the obtaining process of the depth value is as follows: and acquiring a minimum circumcircle of the region of the landmark image, taking the product of the radius R of the circumcircle and a prior coefficient k as a depth coordinate of the landmark, and taking d as R as k, wherein k is an empirical value set according to specific application and a scene.

And the change information acquisition step is used for acquiring the multi-degree-of-freedom change information of the intelligent body relative to the landmark. The multiple degree of freedom change information includes: three azimuthal angle variation information (delta _ P, delta _ R, delta _ Y), displacement of the agent in the plane of the landmark (delta _ x, delta _ Y) and depth displacement of the agent in relation to the landmark delta _ d.

The change information of the three azimuth angles is the change amount of the attitude information of the current agent and the attitude information of the constructed landmark, taking a pitch angle as an example, and if the current gyroscope position is P1 and the pitch angle of the constructed landmark is P0, the angle delta _ P of the change of the pitch angle can be obtained as P1-P0. In the same manner, the change information of the three azimuth angles (delta _ P, delta _ R, delta _ Y) can be obtained.

For the position change, we can obtain the position and the area of the current landmark in the image through the image tracker. The displacement in the landmark plane is the amount of change between the coordinates of the landmark in the current image frame and the initial coordinates of the landmark when the landmark is constructed. Taking the horizontal axis x as an example, assuming that the horizontal axis coordinate of the landmark in the current image area is x1 and the initial position is x0, delta _ x can be obtained as x1-x 0. In the same way, displacements in the image plane (delta _ x, delta _ y) can be obtained.

For depth displacement, let the minimum circumscribed circle radius of the current landmark image region be R1, and R0 be the minimum circumscribed circle radius of the landmark image region when constructing the landmark, then delta _ d is k (R1/R0).

And the visual navigation step is used for navigating the motion of the intelligent body according to the multi-degree-of-freedom change information.

Preferably, the step of visually navigating further comprises: and judging whether the current landmark is lost by using an image tracking algorithm, if so, stopping the motion navigation and starting a redetection step. Taking kcf (kernel Correlation filter) tracking algorithm as an example, the current tracking state can be judged through the filter response value of each frame.

Preferably, the step of re-detecting specifically comprises: and detecting the landmark by taking the image of the last frame before being judged to be lost as a template, and if the landmark is detected, acquiring the information of six degrees of freedom of the landmark again.

The image tracking algorithm can be any algorithm for realizing object tracking through images, and is not limited to the KCF visual target tracking algorithm.

Example two

Referring to fig. 3, a second embodiment of the present invention provides a landmark-based visual navigation device 300, which includes a landmark determining module 301, a multiple degree of freedom information constructing module 302, a change information acquiring module 303, and a visual navigation module 304.

A landmark determining module 301 for determining landmarks in a visual scene. The landmark is a mark area used for position and attitude reference in the motion navigation process, and the position area in the visual scene is preset by a user. Of course, landmarks may also be determined by certain intelligent algorithms, such as: the most prominent object target in the visual scene is identified as a landmark using a subject recognition algorithm, or in a particular scene, a particular area (e.g., a logo) is detected as a landmark using a target detection algorithm.

A multiple degree of freedom information construction module 302, configured to obtain position information of the landmark relative to the agent. After the multi-degree-of-freedom information is obtained, initializing an image tracking algorithm by using the multi-degree-of-freedom information so as to realize the area tracking of a visual image layer. Wherein the multi-degree-of-freedom information is six-degree-of-freedom information. The six degrees of freedom include an abscissa, an ordinate, a depth coordinate of the landmark in the visual scene, and a pitch angle, a yaw angle, and a rotation angle of the agent in a spatial coordinate system; the visual scene is a visual scene in an image frame shot by the intelligent camera.

And a change information acquiring module 303 configured to acquire multi-degree-of-freedom change information of the landmark with respect to the agent.

The visual navigation module 304 is used for navigating the movement of the intelligent body according to the change information of the intelligent body relative to the landmark in six degrees of freedom.

Preferably, the apparatus further comprises a re-detection module 305. The visual navigation module 304 is further configured to determine whether the current landmark is lost through an image tracking algorithm, and if the current landmark is lost, stop the motion navigation and start the re-detection module 305.

The redetection module 305 is configured to detect the landmark by using the image of the last frame before the loss is determined as a template, and if the landmark is detected, reacquire the multi-degree-of-freedom information of the landmark.

EXAMPLE III

Referring to fig. 4, a third embodiment of the present invention provides an image acquisition method, including:

s401, an acquisition object in the visual scene is determined, wherein the acquisition object is at least one salient object or a specific area in the visual scene.

In addition to capturing specified objects, all objects in a visual scene may also be captured by the present invention. For objects, there are differences in the types of objects contained in different scenes. Such as a scene between sample panels, objects including furniture, decorations, etc.; whereas for a museum scene, the object comprises an exhibit.

The acquisition object is determined by certain intelligent algorithms, such as: the method comprises the steps of identifying a salient object in a visual scene by using an image subject identification algorithm, and using the salient object as an acquisition object, or detecting a specific area (such as a logo, furniture, an ornament and the like) as an acquisition object in a specific scene by using a target detection algorithm. As shown in fig. 2, a vertical cabinet in a visual scene is taken as an acquisition object.

S402, acquiring an image of the object. The collected images can help the user browse the scene space, such as a home decoration scene, or a museum scene, and the user can browse the whole image of the visual scene and/or the image of other objects besides browsing some specific objects in multiple angles.

And S403, acquiring the multi-degree-of-freedom information of the intelligent object relative to the acquisition object. After the multi-degree-of-freedom information is obtained, initializing an image tracking algorithm by using the multi-degree-of-freedom information so as to realize the area tracking of a visual image layer.

Wherein the multi-degree-of-freedom information is six-degree-of-freedom information. The six degrees of freedom comprise the abscissa, the ordinate and the depth coordinate of the intelligent body relative to the acquisition object, and the pitch angle, the yaw angle and the rotation angle of the intelligent body in a space coordinate system. The visual scene is a visual scene in an image frame shot by the intelligent camera. Attitude angle information such as a pitch angle, a yaw angle, a rotation angle, and the like can be obtained by a gyroscope in the agent.

As shown in fig. 2, the area center coordinates (x, y) of the landmark image are taken as the abscissa and the ordinate of the landmark, and the abscissa and the ordinate of the agent relative to the acquisition object are further acquired. And taking the distance between the intelligent body camera and the acquisition object relative to the camera as a depth coordinate. The acquisition process of the depth coordinate comprises the following steps: and acquiring a minimum circumcircle of the region of the landmark image, taking the product of the radius R of the circumcircle and a prior coefficient k as a depth coordinate of the landmark, and taking d as R as k, wherein k is an empirical value set according to specific application and a scene.

S404, associating the image of the acquired object with the multi-degree-of-freedom information. Thus, the mapping relation between the image of the acquired object and the information with multiple degrees of freedom is established.

Preferably, environment attribute information when the intelligent agent acquires the image is further acquired, and the object in the visual scene is associated with the environment attribute information. The environment information includes a photographing time, a scene type, season information of photographing, and the like.

S405, storing the image of the acquired object and the associated multi-degree-of-freedom information. Preferably, the associated environment attribute information is also stored.

Through the steps, images of objects in the visual scene relative to the intelligent agent at different angles and different distances are established. The method may also be used to present six degree of freedom virtual items/characters in the VR/AR.

Preferably, the method further comprises: acquiring multi-degree-of-freedom information and/or environment attribute information of a current intelligent body relative to a specified object; acquiring an image of the specified object according to the multi-degree-of-freedom information and/or the environment attribute information; and presenting an image of the specified object.

The method of the third embodiment of the invention can realize the collection of the images of some objects in the visual scene in the visual navigation process and establish the association relationship between the intelligent agent track information and the collected images. Based on the track information, the visual angle position can be accurately determined, the visual angle of a certain formulated object can be viewed, and the like.

Taking the image acquisition between the sample plates as an example, after the acquisition is completed by adopting the method of the third embodiment of the invention, the whole 3D image between the sample plates and the images with different visual angles of certain furniture/ornaments can be generated based on the acquired images. Other users (such as customers visiting the sample boards) can view images of different visual angles of the specified object, and can also view the whole 3D effect between the sample boards as a reference for buying a house or decorating the house. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, modules and units may refer to the corresponding processes of the foregoing method embodiments, and are not described herein again.

The embodiment of the invention also discloses a computer program product which comprises computer program instructions and is used for realizing the method in the first embodiment or the third embodiment when the instructions are executed by a processor.

The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the computer program is executed, the method of the first embodiment or the third embodiment is realized.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods, apparatus, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart and block diagrams may represent a module, segment, or portion of code, which comprises one or more computer-executable instructions for implementing the logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. It will also be noted that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the phrase "comprising a. -. said" to define an element does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention, and is provided by way of illustration only and not limitation. It will be apparent to those skilled in the art from this disclosure that various other changes and modifications can be made without departing from the spirit and scope of the invention.

Claims

1. A landmark-based visual navigation method, the method comprising:

determining landmarks in a visual scene;

acquiring multi-degree-of-freedom information of the intelligent object relative to the landmark;

2. The method of claim 1, wherein the multiple degree of freedom information comprises coordinate information and angle information.

3. The method of claim 2, wherein the multiple degree of freedom information is six degrees of freedom information including an abscissa, an ordinate, a depth coordinate of the agent with respect to the landmark in the visual scene, and a pitch angle, a yaw angle, and a rotation angle of the agent in a spatial coordinate system; the acquiring of the multi-degree-of-freedom information of the landmark relative to the intelligent agent specifically comprises the following steps:

4. The method of claim 1, wherein determining landmarks in a visual scene is specifically: a region in the visual scene that is pre-selected by the user serves as the landmark.

5. The method of claim 1, wherein determining landmarks in a visual scene is specifically: and identifying a salient object target in the visual scene as the landmark by using a subject identification algorithm, or detecting a specific area as the landmark by using a target detection algorithm.

6. The method of claim 1, wherein the method further comprises: after the multi-degree-of-freedom information is obtained, initializing an image tracking algorithm by using the multi-degree-of-freedom information, wherein the image tracking algorithm is used for obtaining the position and/or the area of the landmark in the current visual scene.

7. The method of claim 1, wherein the method further comprises: and judging whether the current landmark is lost, if so, stopping the motion navigation and starting a redetection step.

8. The method according to claim 7, wherein the re-detection step is specifically: and detecting the landmark by taking the last frame before the loss as a template, and if the landmark is detected, re-acquiring the multi-degree-of-freedom information of the intelligent object relative to the landmark.

9. The method of claim 3, wherein coordinates of a center of an area of the landmark image are used as an abscissa and an ordinate of the landmark, and then the abscissa and the ordinate of the agent relative to the landmark are obtained, and a distance of the agent camera relative to the landmark is used as a depth coordinate; the acquisition process of the depth coordinate comprises the following steps: and acquiring a minimum circumcircle of the region of the landmark image, and taking the product of the radius R of the circumcircle and the prior coefficient k as the depth coordinate of the landmark, thereby acquiring the depth coordinate of the intelligent object relative to the landmark.

10. The method of claim 3, wherein the changing of the multiple degree of freedom change information of the agent with respect to the landmark comprises: information on changes in pitch, yaw and rotation angles, displacement of the agent relative to the landmark in the plane, and depth displacement of the agent relative to the landmark; wherein the displacement in the landmark plane is a variation of coordinates of the landmark in the current image frame and initial coordinates of the landmark.

11. The method of claim 10, wherein the depth displacement is determined based on a minimum circumscribed circle radius of the current landmark image region and a minimum circumscribed circle radius of the landmark image region when constructing the landmark.

12. A landmark based visual navigation device, comprising:

a landmark determination module to determine landmarks in a visual scene;

the multi-degree-of-freedom information construction module is used for acquiring multi-degree-of-freedom information of the intelligent object relative to the landmark;

13. A method of image acquisition, the method comprising:

acquiring an image of the object;

acquiring multi-degree-of-freedom information of an intelligent object relative to an acquisition object;

14. The method according to claim 13, wherein determining acquisition objects in the visual scene is in particular: and identifying a salient object in the visual scene as the acquisition object by using an image subject identification algorithm, or detecting a specific area as the acquisition object by using a target detection algorithm.

15. The method of claim 13, wherein the multiple degree of freedom information includes coordinate information and angle information.

16. The method of claim 15, wherein the multiple degree of freedom information is six degrees of freedom information comprising an abscissa, an ordinate, a depth coordinate of the agent with respect to the acquisition object in the visual scene, and a pitch angle, a yaw angle, and a rotation angle of the agent in a spatial coordinate system; the acquiring of the multi-degree-of-freedom information of the acquisition object relative to the intelligent agent specifically comprises:

17. The method of claim 13, wherein the method further comprises:

associating the image of the object with the environmental attribute information;

storing the associated environment attribute information.

18. The method of claim 13, wherein the method further comprises:

presenting an image of the specified object.

19. A computer program product comprising computer program instructions for implementing the visual navigation method of any one of claims 1-11 or the image acquisition method of any one of claims 13-18 when said instructions are executed by a processor.

20. A computer-readable storage medium having stored thereon a computer program which, when executed, implements the visual navigation method of any one of claims 1-11 or the image acquisition method of any one of claims 13-18.