CN114594768B

CN114594768B - Mobile robot navigation decision-making method based on visual feature map reconstruction

Info

Publication number: CN114594768B
Application number: CN202210207094.4A
Authority: CN
Inventors: 吴巧云; 曹翔; 赵东
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2022-03-03
Filing date: 2022-03-03
Publication date: 2022-08-23
Anticipated expiration: 2042-03-03
Also published as: CN114594768A

Abstract

The invention discloses a mobile robot navigation decision-making method based on visual feature map reconstruction, which comprises the following specific processes: at any time of the mobile robot navigation, generating a visual panoramic image with the mobile robot as the center based on an OpenPano algorithm for four images in different directions acquired by four monocular visual sensors carried on the mobile robot; based on the relevance measurement of the visual patches, extracting visual features related to a navigation target of the mobile robot in the visual panorama, and reconstructing a mobile robot visual image feature map guided by the navigation target; and constructing a mobile robot navigation decision model based on the reconstructed mobile robot visual image characteristic diagram, issuing a motion command required by the mobile robot to navigate to a target, and realizing intelligent navigation of the mobile robot driven by the navigation target. The invention improves the generalization capability of the mobile robot navigation to the complex and changeable environment and improves the efficiency of the mobile robot navigation.

Description

Mobile robot navigation decision-making method based on visual feature map reconstruction

Technical Field

The invention relates to the technical field of mobile robot navigation, in particular to a mobile robot navigation decision method based on visual feature map reconstruction.

Background

With the rapid development of sensors and artificial intelligence technologies, mobile robots are gradually developing toward practicality, serialization and intelligence. The basis of realizing the intellectualization and autonomy of the mobile robot is that the safe and accurate navigation can be realized, and the intelligent navigation technology has important use value in various intelligent activities of the mobile robot. Therefore, the smart navigation technology is one of the cores of the research of the mobile robot. The intelligent navigation technology of the mobile robot is researched, so that the mobile robot has the capabilities of effective exploration, efficient calculation, autonomous decision making, quick response and the like for unknown environments, and has wide application prospects in various fields of industrial manufacturing, ocean exploration, family service, medical care, resource development, inspection and risk elimination, aerospace, national defense and the like.

The traditional mobile robot navigation method mainly comprises two parts: firstly, a map is synchronously constructed on line or off line by utilizing an SLAM technology, a depth sensor, a stereo camera or a monocular camera based on structure motion; then, a collision-free trajectory is calculated based on the constructed map to reach the target point. However, in a complex and variable environment, due to occlusion or data noise, such as: the map established in the first stage cannot provide reliable information due to the position changes of the walking human, other mobile robots and various devices; moreover, the frequent re-establishment of the scene map requires a lot of manpower and time, which also limits the popularization and application of the mobile robot in complicated and variable indoor scenes. Recently, the success of the machine learning strategy based on data driving in various control and perception problems opens up a new way for overcoming the limitations of the traditional method. However, the intelligent navigation mode based on the learning strategy mostly realizes the navigation task based on a large amount of navigation experiences in similar environments, and has low data efficiency and poor generalization; most of the current test data come from a simulation synthetic scene, the navigation effect is poor in a complex real scene, the current test data cannot adapt to the change of the positions of various objects in a real environment, and therefore the popularization and the use of the mobile robot are limited.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a mobile robot navigation decision method based on visual feature map reconstruction.

In order to achieve the technical purpose, the invention adopts the following technical scheme: a mobile robot navigation decision-making method based on visual feature map reconstruction comprises the following steps:

(1) at any time t of the mobile robot navigation, generating a visual panoramic image with the mobile robot as the center based on an OpenPano algorithm for four images in different directions acquired by four monocular visual sensors carried on the mobile robot;

(2) based on the relevance measurement of the visual patches, extracting visual features related to a navigation target of the mobile robot in the visual panorama in the step (1), and reconstructing a mobile robot visual image feature map guided by the navigation target;

(3) and constructing a mobile robot navigation decision model based on the depth reinforcement learning based on the reconstructed mobile robot visual image characteristic diagram, and issuing a motion command required by the mobile robot to navigate to the target.

Further, the four monocular vision sensors mounted on the mobile robot in the step (1) are arranged at the same height, and the shooting angle between any two adjacent monocular vision sensors is 90 degrees.

Further, the step (2) comprises the following sub-steps:

(2.1) designing a feature map extraction framework of the current visual panorama based on the VGG19 network;

(2.2) extracting a feature map of a navigation target image and a feature map of a current visual image of the robot through a feature map extraction framework, and measuring the correlation between a piece on the current visual feature map of the robot and a piece on the feature map of the navigation target image;

and (2.3) constructing a mobile robot visual image feature map patch guided by a navigation target based on the measurement result of the correlation, and reconstructing a mobile robot visual image feature map.

Further, the structure of the VGG19 network in step (2.1) is specifically: the method is characterized in that 16 convolutional layers are provided, the number of channels of each convolutional layer is 64, 128, 256, 512 and 512, the size of a convolutional core of each convolutional layer is 3 multiplied by 3, and the step size is 1.

Further, the step (2.2) comprises the following specific processes: extracting navigation target image I through characteristic map extraction framework _g Navigation target image feature map Ψ (I) _g ) And a visual panorama I of the robot at the current moment _t The robot current visual image feature map Ψ (I) _t ) Respectively extracting a characteristic map psi (I) of the navigation target image _g ) Dough sheet on

And a robot current visual image feature map psi (I) _t ) Dough sheet on

Enabling the robot to obtain a current visual image feature map Ψ (I) by maximizing the correlation between feature patches _t ) Each characteristic patch Ψ _i (I _t ) Matching to the characteristic map Ψ (I) of the navigation target image _g ) The most relevant one of the feature patches Ψ _* (I _g ) The method comprises the following steps:

wherein: denotes a definitional symbol, n _g Image feature map psi (I) representing navigation target _g ) The number of patches above, j represents Ψ (I) _g ) The index of the upper feature patch,

characteristic diagram psi (I) representing current visual image of robot _t ) Number of patches on, I denotes Ψ (I) _t ) The index of the upper feature patch,<Ψ _i (I _t ),Ψ _j (I _g )>characteristic diagram psi (I) representing current visual image of robot _t ) Ith patch Ψ _i (I _t ) With the navigation target image feature map Ψ (I) _g ) Inner product of jth patch on, | | | Ψ _i (I _t ) I means Ψ _i (I _t ) Modulo, | | Ψ _j (I _g ) | | denotes Ψ _j (I _g ) Die (2).

Further, constructing a mobile robot visual image feature map patch Ψ guided by the navigation target in the step (2.3) _i (I _t ,I _g ) The process of (2) is as follows:

therein, Ψ _i (I _t ) Characteristic diagram psi (I) representing current visual image of robot _t ) The ith patch, Ψ _* (I _g ) Image characteristic diagram psi (I) representing navigation target _g ) Upper and Ψ _i (I _t ) The most relevant one of the feature patches,<Ψ _i (I _t ),Ψ _* (I _g )>denotes Ψ _i (I _t ) To Ψ _* (I _g ) Inner product of, | | Ψ _i (I _t ) I means Ψ _i (I _t ) Modulo, | | Ψ _* (I _g ) I means Ψ _* (I _g ) Die (2).

Further, the step (3) comprises the following sub-steps:

(3.1) determining an output decision action space of the navigation decision model and a reward function in the process of training the decision model;

(3.2) taking the reconstructed visual image feature map of the mobile robot as the front end of a navigation decision-making model, and establishing the navigation decision-making model based on a depth reinforcement learning framework A3C;

(3.3) training the navigation decision model based on the deep reinforcement learning framework A3C until the reward function obtained by the mobile robot in each round in the training process is not increased any more, stopping training to obtain the trained navigation decision model, and issuing a motion command required by the mobile robot to navigate to a target on the trained navigation decision model to realize intelligent navigation of the mobile robot.

Further, the step (3.1) is realized as follows:

(a) establishing a robot body coordinate system with the center of a mobile robot chassis as an origin, wherein the direction vertical to the chassis is a Z-axis forward direction, the direction right in front of the mobile robot is an X-axis forward direction, and the Y-axis forward direction is determined based on a right-hand rule;

(b) determining an output decision action space A of the navigation decision model as { forward translation, backward translation, leftward translation, rightward translation, leftward rotation, rightward rotation and stopping }, and specifying that the distance of each translation motion in the decision action space A is 0.5 m, and the angle of each rotation is 30 degrees;

(c) determining a reward function r in training a decision model _t ＝r _nav +r _g +r _c Wherein r is _nav Reward, r, representing effective navigation of the mobile robot _nav ＝Geo(I _t-1 ,I _g )-Geo(I _t ,I _g )，Geo(I _t ,I _g ) Geodesic distance, Geo (I), representing the current time of the mobile robot to the location of the target image _t-1 ,I _g ) The geodesic distance between the robot and the position of the target image at the previous moment is represented; r is _g A reward indicating that the mobile robot completed the navigation task, r when the mobile robot completed the navigation task _g Not greater than 10.0, otherwise r _g ＝0；r _c The penalty of collision in the navigation process of the mobile robot is shown, and when the mobile robot collides in the navigation process, r _c ＝-2.0 otherwise, r _c ＝0。

Further, the navigation decision model based on the deep reinforcement learning framework A3C is composed of 5 convolutional layers, 3 first fully-connected layers and 1 second fully-connected layer which are connected in sequence; the number of channels of the convolutional layers is 512, 256, 128 and 128 respectively, the size of a convolutional core of each convolutional layer is 3 multiplied by 3, and the step length is 1; the sizes of the first fully-connected layer are 512, 128 and 64 respectively, the second fully-connected layer is provided with two branches, one branch is used for outputting 7-dimensional decision commands in a decision action space, and the other branch is used for outputting 1-dimensional decision evaluation values.

Further, the step (3.3) is realized as follows: randomly initializing parameters of a navigation decision model based on a deep reinforcement learning framework A3C, updating the parameters of the navigation decision model by using a random gradient descent method SGD based on a loss function A3C, setting a learning rate to be 1e-4 and setting momentum to be 0.9; dividing 6 navigation tasks into 6 threads at random for execution in the same time period, and starting to calculate a loss function and a gradient and update navigation decision model parameters only after 10 navigation decisions are executed on the 6 navigation tasks each time; and iterating the process until the reward function obtained by the mobile robot in each round is not increased any more, completing the on-line training process to obtain a trained navigation decision model, and issuing a motion command required by the mobile robot to navigate to a target on the trained navigation decision model to realize the intelligent navigation of the mobile robot.

Compared with the prior art, the invention has the following beneficial effects: the mobile robot navigation decision method realizes that the mobile robot can autonomously navigate to the position of the target image under the condition of no environment map in a complex and changeable environment based on the navigation decision model of deep reinforcement learning, reduces manual intervention compared with the traditional navigation mode, and improves the navigation efficiency of the mobile robot; in addition, based on a visual panorama acquired by the mobile robot, the visual image feature map is reconstructed by combining with the navigation target image, the visual features closely related to the navigation target in the visual image feature map of the mobile robot are enhanced, the final navigation decision is guided, and the adaptability of the mobile robot navigation to the complex and changeable environment is improved. The mobile robot navigation decision method based on visual feature map reconstruction enables the mobile robot to navigate to the position near the target image without building an environment map, and improves adaptability to complex and variable environments and navigation efficiency.

Drawings

FIG. 1 is a flow chart of a navigation decision method based on visual feature map reconstruction in accordance with the present invention;

fig. 2 is a view showing four visual images and a generated visual panorama acquired by four monocular visual sensors mounted on the mobile robot in the present invention: a in fig. 2 is four visual images, and b in fig. 2 is a generated visual panorama;

FIG. 3 is a block diagram of a navigation decision model according to the present invention.

Detailed Description

The technical solution of the present invention is further explained below with reference to the accompanying drawings.

Compared with the traditional navigation method, the mobile robot navigation decision-making method based on visual feature map reconstruction can adapt to completely unknown scenes and the change of the positions of various objects in the environment, and improves the generalization capability of the mobile robot navigation to the complex and changeable environment, thereby further improving the navigation efficiency of the mobile robot. Fig. 1 shows a flow chart of a mobile robot navigation decision method of the present invention, which comprises the following steps:

(1) at any time t of the mobile robot navigation, generating a visual panoramic image taking the mobile robot as a center based on an OpenPano algorithm for four images in different directions acquired by four monocular visual sensors carried on the mobile robot so as to realize relatively continuous observation of the mobile robot on a local environment; the four monocular vision sensors carried on the mobile robot are arranged at the same height, and the shooting angle between any two adjacent monocular vision sensors is 90 degrees, so that the information of the surrounding environment taking the mobile robot as the center is obtained as comprehensively as possible, the final navigation decision is assisted, and the accuracy of the navigation decision of the mobile robot is improved. As a in fig. 2 is four visual images and b in fig. 2 is a generated visual panorama, it can be seen that: and (4) splicing the four visual images in the step (a) in the step (2) to obtain a panoramic image with the step (b) in the step (2) centered on the mobile robot.

(2) Based on the relevance measurement of the visual patches, extracting visual features related to a navigation target of the mobile robot in the visual panorama in the step (1), and reconstructing a mobile robot visual image feature map guided by the navigation target; the method specifically comprises the following substeps:

(2.1) designing a feature map extraction framework of the current visual panorama based on the VGG19 network; the structure of the VGG19 network is specifically as follows: the number of channels of the convolutional layers is 64, 128, 256, 512 and 512, the size of a convolutional core of each convolutional layer is 3 multiplied by 3, the step size is 1, and tensors of adjacent convolutional layers are guaranteed to keep the same width and height.

(2.2) extracting a feature map of a navigation target image and a feature map of a current visual image of the robot through a feature map extraction framework, and measuring the correlation between a piece on the current visual feature map of the robot and a piece on the feature map of the navigation target image; specifically, a navigation target image I is extracted through a feature map extraction framework _g Navigation target image feature map Ψ (I) _g ) And a visual panorama I of the robot at the current moment _t The robot current visual image feature map Ψ (I) _t ) Respectively extracting a characteristic map Ψ (I) of the navigation target image _g ) Dough sheet on

And a robot current visual image feature map psi (I) _t ) Dough sheet on

Enabling the robot to obtain a current visual image feature map Ψ (I) by maximizing the correlation between feature patches _t ) Each of the feature patches Ψ _i (I _t ) Matching to the characteristic map Ψ (I) of the navigation target image _g ) Is most relevant at the upperA characteristic patch Ψ of _* (I _g ) The correlation metric formula is:

wherein: denotes a definitional symbol, n _g Image characteristic diagram psi (I) representing navigation target _g ) The number of patches on, j representing Ψ (I) _g ) The index of the upper feature patch,

characteristic diagram psi (I) representing current visual image of robot _t ) Number of patches on, I denotes Ψ (I) _t ) The index of the upper feature patch,<Ψ _i (I _t ),Ψ _j (I _g )>characteristic diagram psi (I) representing current visual image of robot _t ) Ith patch Ψ _i (I _t ) With the navigation target image feature map Ψ (I) _g ) Inner product of the jth patch of above, | | Ψ _i (I _t ) I means Ψ _u (I _t ) Modulo, | | Ψ _j (I _g ) I means Ψ _j (I _g ) Die (2). In order to enhance the characteristics of the mobile robot visual image which are closely related to the navigation target, the correlation between the two tensors can be measured by adopting the correlation measurement formula, so that the correlation measurement formula is used for measuring the correlation between the visual image characteristic diagram characteristic patch and any one navigation target image characteristic diagram characteristic patch and finding the most relevant navigation target image characteristic diagram characteristic patch. By reconstructing the visual image feature map through the method, the visual features closely related to the navigation target in the visual image of the mobile robot can be enhanced, the final navigation decision is guided, and the adaptability of the mobile robot navigation to the complex and changeable environment is improved.

(2.3) constructing a visual image feature picture of the mobile robot guided by the navigation target based on the measurement result of the correlation

Reconfiguring mobile robot visionAn image feature map; finally reconstructed visual image characteristic map and target image I _g Closely related features are enhanced.

Therein, Ψ _i (I _t ) Characteristic diagram psi (I) representing current visual image of robot _t ) The ith upper patch, Ψ _* (I _g ) Image characteristic diagram psi (I) representing navigation target _g ) Upper and psi _i (I _t ) The most relevant one of the feature patches,<Ψ _i (I _t ),Ψ _* (I _g )>denotes Ψ _i (I _t ) With Ψ _* (I _g ) Inner product of, | | Ψ _i (I _t ) I means Ψ _i (I _t ) Mode, | | Ψ _* (I _g ) I means Ψ _* (I _g ) The die of (1).

(3) Based on the reconstructed visual image characteristic diagram of the mobile robot, a mobile robot navigation decision model based on deep reinforcement learning is constructed, a motion command required by the mobile robot to navigate to a target is issued, and intelligent navigation of the mobile robot driven by the navigation target in a complex indoor environment is realized; the method specifically comprises the following substeps:

(3.1) determining an output decision action space of the navigation decision model and a reward function in the process of training the decision model; the specific implementation process is as follows:

(b) determining an output decision action space A of the navigation decision model as { forward translation, backward translation, leftward translation, rightward translation, leftward rotation, rightward rotation, and stopping }, and specifying that the distance of each translation motion in the decision action space A is 0.5 m, and the angle of each rotation is 30 degrees; the safety of the mobile robot is difficult to ensure because the distance of the translational motion is too large or the rotation angle is too large; the distance of the translational motion is too large or the rotation angle is too small, or the navigation time is prolonged.

(c) Determining a reward function r in training a decision model _t ＝r _nav +r _g +r _c Wherein r is _nav Reward for representing effective navigation of mobile robot when robot is far from target image I _g The more recent, the positive value of the reward; the farther from the goal, the negative value of the reward, r _nav ＝Geo(I _t-1 ,I _g )-Geo(I _t ,I _g )，Geo(I _t ,I _g ) Geodesic distance, Geo (I), representing the current time of the mobile robot to the location of the target image _t-1 ,I _g ) A geodesic distance, r, representing the distance from the robot from the previous moment to the position of the target image _nav The arrangement of the device can ensure the effective movement of the mobile robot; r is _g A reward indicating that the mobile robot completed the navigation task, r when the mobile robot completed the navigation task _g Not greater than 10.0, otherwise _g ＝0，r _g The setting of the target image is mainly to train the mobile robot to reach the position of the target image by using as little time as possible; r is _c The penalty of collision in the navigation process of the mobile robot is shown, and when the mobile robot collides in the navigation process, r _c -2.0, otherwise, r _c ＝0，r _c The arrangement of (2) is to reduce the collision of the robot with the environment as much as possible in the navigation process. The determined reward function can improve the accuracy of a mobile robot navigation decision model, and further improve the success rate and efficiency of navigation.

(3.2) taking the reconstructed visual image feature map of the mobile robot as the front end of a navigation decision model, and establishing the navigation decision model based on a depth reinforcement learning framework A3C, wherein the navigation decision model based on the depth reinforcement learning framework A3C consists of 5 convolutional layers, 3 first full-connection layers and 1 second full-connection layer which are connected in sequence, as shown in the frame schematic diagram of the navigation decision model in the invention, the reconstructed visual image feature map of the mobile robot is taken as the front end of the navigation decision model; the number of channels of the convolutional layers is 512, 256, 128 and 128 respectively, the size of a convolutional core of each convolutional layer is 3 multiplied by 3, and the step length is 1; the sizes of the first full connection layer are 512, 128 and 64 respectively, and the second full connection layer is provided with two branches, wherein one branch is used for outputting decision commands with 7 dimensions in a decision action space, and the other branch is used for outputting decision evaluation values.

(3.3) training the navigation decision model based on the deep reinforcement learning framework A3C until the reward function obtained by the mobile robot in each round in the training process is not increased any more, stopping training to obtain the trained navigation decision model, and issuing a motion command required by the mobile robot to navigate to a target on the trained navigation decision model to realize intelligent navigation of the mobile robot; the specific implementation process is as follows: randomly initializing parameters of a navigation decision model based on a deep reinforcement learning framework A3C, updating the parameters of the navigation decision model by using a random gradient descent method SGD based on a loss function A3C, setting a learning rate to be 1e-4 and setting momentum to be 0.9; randomly dividing 6 navigation tasks into 6 threads to execute in the same time period, and only after the 6 navigation tasks execute 10 navigation decisions, starting to calculate loss functions and gradients and updating navigation decision model parameters; and iterating the process until the reward function obtained by the mobile robot in each round is not increased any more, completing the on-line training process to obtain a trained navigation decision model, and issuing a motion command required by the mobile robot to navigate to a target on the trained navigation decision model to realize the intelligent navigation of the mobile robot.

The mobile robot is arranged in a complex space scene, and a navigation target point is arranged, so that the mobile robot carries out intelligent navigation by adopting the navigation decision method based on visual characteristic diagram reconstruction, and the mobile robot can avoid environmental obstacles and successfully reach the target point. The mobile robot navigation decision method based on visual feature map reconstruction enables the mobile robot to navigate to the position near the target image without building an environment map, and improves adaptability to complex and variable environments and navigation efficiency.

The above are only preferred embodiments of the present invention, and the scope of the present invention is not limited to the above examples, and all technical solutions that fall under the spirit of the present invention belong to the scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims

1. A mobile robot navigation decision-making method based on visual feature map reconstruction is characterized by comprising the following steps:

(1) at any time t of the mobile robot navigation, generating a visual panorama centering on the mobile robot based on an OpenPano algorithm for four images in different directions acquired by four monocular visual sensors carried on the mobile robot;

(2) based on the relevance measurement of the visual patches, extracting visual features related to a navigation target of the mobile robot in the visual panorama in the step (1), and reconstructing a mobile robot visual image feature map guided by the navigation target; the method comprises the following substeps:

(2.2) extracting a navigation target image feature map and a robot current visual image feature map through a feature map extraction frame, and measuring the correlation between a piece on the robot current visual feature map and a piece on the navigation target image feature map;

(2.3) constructing a mobile robot visual image feature map patch guided by a navigation target based on the measurement result of the correlation, and reconstructing a mobile robot visual image feature map;

(3) based on the reconstructed visual image characteristic diagram of the mobile robot, a mobile robot navigation decision model based on deep reinforcement learning is constructed, and a motion command required by the mobile robot to navigate to a target is issued, wherein the method comprises the following substeps:

2. The mobile robot navigation decision method based on visual feature map reconstruction as claimed in claim 1, wherein four monocular visual sensors mounted on the mobile robot in step (1) are arranged at the same height, and a shooting angle between any two adjacent monocular visual sensors is 90 degrees.

3. The visual feature map reconstruction-based mobile robot navigation decision method according to claim 1, wherein the VGG19 network structure in step (2.1) is specifically: 16 convolutional layers are provided, the number of channels of the convolutional layers is 64, 128, 256, 512, and 512, the convolutional kernel size of each convolutional layer is 3 x 3, and the step size is 1.

4. The mobile robot navigation decision method based on visual feature map reconstruction as claimed in claim 1, wherein the step (2.2) is implemented by the following specific processes: extracting navigation target image I through characteristic map extraction framework _g Navigation target image feature map Ψ (I) _g ) And a visual panorama I of the robot at the current moment _t The robot current visual image feature map psi (I) _t ) Respectively extracting a characteristic map Ψ (I) of the navigation target image _g ) Dough sheet on

And a robot current visual image feature map psi (I) _t ) Dough sheet on

Enabling the robot to obtain a current visual image feature map Ψ (I) by maximizing the correlation between feature patches _t ) Each of the feature patches Ψ _i (I _t ) Matching to the characteristic map Ψ (I) of the navigation target image _g ) Most relevant in the aboveOf a characteristic patch Ψ _* (I _g ) The method comprises the following steps:

wherein the content of the first and second substances,

representing a definitional symbol, n _g Image characteristic diagram psi (I) representing navigation target _g ) The number of patches on, j representing Ψ (I) _g ) The index of the upper feature patch,

characteristic diagram psi (I) representing current visual image of robot _t ) Number of patches on, I denotes Ψ (I) _t ) Index of the above feature patch, < Ψ _i (I _t )，Ψ _j (I _g ) Indicating the robot's current visual image feature map Ψ (I) _t ) Ith patch Ψ _i (I _t ) With the navigation target image feature map Ψ (I) _g ) Inner product of the jth patch of above, | | Ψ _i (I _t ) I means Ψ _i (I _t ) Modulo, | | Ψ _j (I _g ) I means Ψ _j (I _g ) Die (2).

5. The visual feature map reconstruction-based mobile robot navigation decision method as claimed in claim 1, wherein in step (2.3), the mobile robot visual image feature map patch Ψ guided by the navigation target is constructed _i (I _t ，I _g ) The process of (2) is as follows:

therein, Ψ _i (I _t ) Characteristic diagram psi (I) representing current visual image of robot _t ) The ith upper patch, Ψ _* (I _g ) Representing navigation target image featuresSymbol diagram psi (I) _g ) Upper and psi _i (I _t ) One of the most relevant feature patches, < Ψ _i (I _t )，Ψ _* (I _g ) (> means Ψ) _i (I _t ) To Ψ _* (I _g ) Inner product of, | | Ψ _i (I _t ) I means Ψ _i (I _t ) Mode, | | Ψ _* (I _g ) I means Ψ _* (I _g ) The die of (1).

6. The visual feature map reconstruction-based mobile robot navigation decision method according to claim 1, wherein the step (3.1) is realized by the following steps:

(b) determining an output decision action space A of the navigation decision model as { forward translation, backward translation, leftward translation, rightward translation, leftward rotation, rightward rotation, and stopping }, and specifying that the distance of each translation motion in the decision action space A is 0.5 m, and the angle of each rotation is 30 degrees;

(c) determining a reward function r in training a decision model _t ＝r _nav +r _g +r _c Wherein r is _nav Reward, r, representing effective navigation of the mobile robot _nav ＝Geo(I _t-1 ，I _g )-Geo(I _t ，I _g )，Geo(I _t ，I _g ) Geodesic distance, Geo (I), representing the current time of the mobile robot to the location of the target image _t-1 ,I _g ) The geodesic distance between the robot and the position of the target image at the previous moment is represented; r is a radical of hydrogen _g A reward indicating that the mobile robot completed the navigation task, r when the mobile robot completed the navigation task _g Not greater than 10.0, otherwise _g ＝0；r _c The penalty of collision in the navigation process of the mobile robot is shown, and when the mobile robot collides in the navigation process, r _c -2.0, otherwise, r _c ＝0。

7. The visual feature map reconstruction-based mobile robot navigation decision method as claimed in claim 1, wherein the navigation decision model based on the deep reinforcement learning framework A3C is composed of 5 convolutional layers, 3 first fully-connected layers, and 1 second fully-connected layer, which are connected in sequence; the number of channels of the convolutional layers is 512, 256, 128 and 128 respectively, the size of a convolutional core of each convolutional layer is 3 multiplied by 3, and the step length is 1; the sizes of the first fully-connected layer are 512, 128 and 64 respectively, the second fully-connected layer is provided with two branches, one branch is used for outputting 7-dimensional decision commands in a decision action space, and the other branch is used for outputting 1-dimensional decision evaluation values.

8. The visual feature map reconstruction-based mobile robot navigation decision method according to claim 1, wherein the step (3.3) is realized by the following steps: randomly initializing parameters of a navigation decision model based on a deep reinforcement learning framework A3C, updating the parameters of the navigation decision model by using a random gradient descent method SGD based on a loss function A3C, setting a learning rate to be 1e-4 and setting momentum to be 0.9; dividing 6 navigation tasks into 6 threads at random for execution in the same time period, and starting to calculate a loss function and a gradient and update navigation decision model parameters only after 10 navigation decisions are executed on the 6 navigation tasks each time; and iterating the process until the reward function obtained by the mobile robot in each round is not increased any more, completing the on-line training process to obtain a trained navigation decision model, and issuing a motion command required by the mobile robot to navigate to a target on the trained navigation decision model to realize the intelligent navigation of the mobile robot.