CN116989800A

CN116989800A - Mobile robot visual navigation decision-making method based on pulse reinforcement learning

Info

Publication number: CN116989800A
Application number: CN202311263699.6A
Authority: CN
Inventors: 吴巧云; 周云; 赵冬; 谭春雨
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2023-11-03
Anticipated expiration: 2043-09-27
Also published as: CN116989800B

Abstract

The application relates to a mobile robot vision navigation decision method based on pulse reinforcement learning, which comprises the following steps of S1, collecting a robot vision image and a navigation target image, and extracting local characteristics of each image pixel point; s2, processing each element of the local characteristics to generate a time-step pulse sequence; s3, extracting a visual image pulse sequence and a navigation target image pulse sequence of the mobile robot; s4, designing a double-synaptic pulse nerve layer for realizing information fusion of the two pulse sequence sets; s5, designing a visual navigation reward function, and optimizing model parameters in a visual navigation simulator to realize the non-image visual navigation decision of the mobile robot. The visual navigation model is based on binary pulse sequence operation instead of floating point number characteristic operation, so that a large number of floating point number high-precision multiplication operations in the traditional reinforcement learning visual navigation model are avoided, and the visual navigation model is extremely low in energy consumption and easy to deploy into an embedded system of a mobile robot in the navigation process.

Description

Mobile robot visual navigation decision-making method based on pulse reinforcement learning

Technical Field

The application relates to the technical field of mobile robot navigation, in particular to a mobile robot visual navigation decision method based on pulse reinforcement learning.

Background

Mobile robot visual navigation is an important research direction in the fields of artificial intelligence and robot technology, and aims to enable a robot to navigate and position autonomously in an unknown or partially unknown environment by sensing visual information in the environment. With the continuous development of robot technology, visual navigation robots are increasingly used in various fields including industrial production, logistics distribution, medical care, agriculture and the like. These application scenarios require robots to be able to navigate in complex and diverse environments, while robot vision can provide a richer perception and understanding of the environment.

Traditional navigation methods have always relied on global information in the form of an environment map, however, a large number of applications requiring efficient navigation, considering real-time requirements and limited energy resources, are generally unable to afford the construction cost of global maps. Recently introduced deep reinforcement learning methods, such as: DDPG/deep deterministic policy gradient and A3C/Asynchronous Advantage Actor-Critic are able to learn the optimal control strategy for map-free visual navigation, where robots make navigation decisions based on their local visual inputs and limited global information. However, the optimization of these navigation methods comes at the cost of high energy consumption. Given that increasingly complex mobile robot applications are difficult to continuously offset by the equivalent growth of on-board energy sources, there will be an increasing need for low power solutions for map-free visual navigation of mobile robots.

The impulse neural network is an emerging alternative architecture to deep neural networks inspired by the brain, exhibiting extremely high energy efficiency advantages. The neuron computation is asynchronous, and communication is carried out through discrete events called pulses, so that a large number of floating point number high-precision multiplication operations in the traditional neural network are avoided, and therefore, the energy consumption is extremely low in the system operation process. At present, the impulse neural network is widely applied to the field of visual perception for detecting and classifying visual images, but the research of the impulse neural network on the problem of the mobile robot graphical-free visual navigation is quite limited, and more attention and investment are being gained.

Disclosure of Invention

Aiming at the defects of the prior art, the application provides a mobile robot visual navigation decision method based on pulse reinforcement learning, which solves the problem of high energy consumption in the conventional mobile robot visual navigation decision process, and aims at the problems in the prior art, the application provides a visual navigation decision method based on pulse reinforcement learning, wherein a visual navigation model is based on binary pulse sequence operation instead of floating point feature operation, and a large number of floating point high-precision multiplication operations in the conventional reinforcement learning visual navigation model are avoided, so that the energy consumption is extremely low in the navigation process and the method is extremely easy to be deployed in an embedded system of the mobile robot.

In order to solve the technical problems, the application provides the following technical scheme: a mobile robot visual navigation decision-making method based on pulse reinforcement learning comprises the following steps:

s1, acquiring a robot visual image p and a navigation target image q, and extracting local features of each image pixel point;

s2, processing each element of the local characteristics through the leakage-integration-discharge LIF neurons to generate a pulse sequence of a K time step;

s3, designing a pulse feature extraction module to respectively extract pulse sequences of the visual image p and the navigation target image q in the kth time step and />Thereby obtaining a K time step visual image pulse sequence set +.>NavigationTarget image pulse sequence set->；

S4, designing a double-synaptic pulse nerve layer based on a circuit leakage-integration-discharge working principle to realize the two pulse sequence sets and />Obtaining a fused pulse sequence set;

s5, constructing a pulse reinforcement learning visual navigation decision model, designing a visual navigation reward function, and optimizing model parameters in a visual navigation simulator to realize the mobile robot non-image visual navigation decision.

Further, the local feature of each image pixel point in S1 is extracted by two-dimensional image convolution.

Further, the step S3 extracts pulse sequences of the visual image p and the navigation target image q at the kth time step respectively and />Thereby obtaining a K time step visual image pulse sequence set +.>And navigation target image pulse sequence set +.>The method comprises the following steps:

s301: constructing a kernel-pixel-pulse neural layer calculation model, stacking 4 kernel-pixel-pulse neural layers, processing the pulse sequence of the K time steps in S2 to obtain 3-dimensional pulse sequence tensors, wherein any kth time stepAll kernel-pixel-pulses of layer 4 of (2)Pulse signal output by impulse neuron is +.>；

S302: pulse signals output by all kernel-pixel-pulse neurons of the 4 th layer of the kth time stepThe pulse sequence tensor which is unfolded into one dimension is the pulse sequence of the kth time step of the final input image, and the pulse sequence of the kth time step robot vision image p is expressed as +.>The pulse sequence set is expressed as +.>The pulse sequence of the kth time step navigation target image is expressed as +.>The pulse sequence set is expressed as +.>。

Further, the constructing a kernel-pixel-pulse neural layer calculation model in S301 includes: for any position on the imageIn the kth time step +.>The kernel-pixel-pulse neural layer calculation formula is:

；

wherein ,is an attenuation coefficient, is a trainable parameter of the module; d represents derivative; />Is the kth time step +.>Layer kernel-pixel-pulse neuron is +.>A body wall membrane potential is located; />Is a kernel matrix, which is a trainable parameter of the module, with a rank size of +.>, wherein />Respectively representing index numbers; />Is->Layer input potential, which is defined and +.>Output correlation of layers; />Represents the kth time step +.>Layer kernel-pixel-pulse neuron is +.>A pulse signal at; when->In the time-course of which the first and second contact surfaces,the method comprises the steps of carrying out a first treatment on the surface of the Otherwise->Remain unchanged, wherein->Is a set threshold value, and is generally 1.0 #>Is a reset voltage, typically takes a value of 0, based on which the kth time step +.>Layer kernel-pixel-pulse neuron is +.>A pulse signal at the same.

Further, the step S4 is based on the circuit leakage-integration-discharge working principle, and designs a double-synaptic pulse neural layer for realizing the two pulse sequence sets and />Comprises the following steps:

s401, designing a synaptic electric potential calculation model of a double-synaptic pulse neuron, and realizing pulse sequence set of a K-time-step robot visual image pAnd pulse sequence set of navigation target image q +.>To obtain a pulse sequence with the robot visual image p for any kth time step +.>Synaptic potential for input->And a pulse sequence for arbitrary kth time step to navigate the target image q>Synaptic potential for input->；

S402, designing a double-synaptic pulse neuron body wall membrane potential calculation model to calculate the double-synaptic pulse neuron body wall membrane potential；

S403, judging the wall membrane potential of the double-synaptic pulse neuronAnd threshold->Is as follows>When (I)>The method comprises the steps of carrying out a first treatment on the surface of the Otherwise->Remain unchanged, wherein->The value of 1.0 is generally taken to be,is reset voltage, and is generally 0, < ->Represents the kth time step +.>Pulse output of the double-synaptic pulse neurons;

s404, calculating the fused kth time step pulse signalRealize two by twoSeed pulse sequence information fusion to output a fused K time step pulse sequence set +.>, wherein />Is the number of double synaptic impulse neurons, here value 512.

Further, the two-synapse pulse neuron wall membrane potential in S402The calculation formula of (2) is as follows:

；

wherein ,is an attenuation coefficient, is a trainable parameter; />The dendrite electrical conductivity is shown as 0.25 and 0.25, respectively.

Further, the step S5 is to construct a pulse reinforcement learning visual navigation decision model, design a visual navigation reward function, optimize model parameters in a visual navigation simulator and realize the non-image visual navigation decision of the mobile robot, and specifically comprises the following steps:

s501, a pulse reinforcement learning visual navigation model consists of a strategy network and a value network, wherein the strategy network consists of the pulse characteristic extraction module, a double-synaptic pulse neural layer and a full-connection layer, wherein the input of the full-connection layer is a fused K time step pulse sequence setThe output is the probability of the different navigational actions that the robot should take at the next moment +.>Thus the output dimension isNavigation action space dimension, and navigation action space a= { forward translation, backward translation, left translation, right translation, left rotation, right rotation, stop }, therefore, here navigation action space a dimension is 7; the value network and the strategy network share a pulse feature extraction module and a double-synaptic pulse neural layer, and the output dimension of a subsequent full-connection layer is 1, so that the strategy value +.>；

S502, designing a pulse reinforcement learning visual navigation reward functionSetting a threshold +.>Judging the distance of the mobile robot to the navigation target +.>And at threshold->The magnitude relation of (1) if->Then->The method comprises the steps of carrying out a first treatment on the surface of the When the mobile robot collides with an object in the training scene, the robot is in the form of a ++>The method comprises the steps of carrying out a first treatment on the surface of the In other cases, in order to make the mobile robot reach the position of the navigation target as soon as possible, the mobile robot performs a navigation operation every time,/->；

S503, optimizing model parameters in a visual navigation simulation AI2THOR, updating navigation decision model parameters based on a loss function of a traditional reinforcement learning model Asynchronous Advantage Actor-Critic and A3C by utilizing a random gradient descent method SGD, setting a learning rate to be 1e-4 until a reward function curve converges, and completing an online training process to obtain an optimized strategy network and a value network;

s504, deploying the optimized strategy network to the mobile robot to realize the non-image visual navigation decision of the robot, so that the mobile robot predicts the navigation action at the next moment based on the acquired visual image and the navigation target image at any moment of the navigation task.

By means of the technical scheme, the application provides a mobile robot visual navigation decision method based on pulse reinforcement learning, which has at least the following beneficial effects:

the visual navigation model is based on binary pulse sequence operation instead of floating point number characteristic operation, so that a large number of floating point number high-precision multiplication operations in the traditional reinforcement learning visual navigation model are avoided, and the visual navigation model is extremely low in energy consumption and easy to deploy into an embedded system of a mobile robot in the navigation process; meanwhile, the problem of the pulse neural network in the mobile robot non-image visual navigation is well solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of a visual navigation decision of a mobile robot based on pulse reinforcement learning;

FIG. 2 is a diagram of a pulse reinforcement learning visual navigation model framework in the present application;

fig. 3 is a schematic diagram of a part of visual images obtained by a mobile robot in a navigation task process and a navigation action predicted by a strategy network based on the visual images.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. Therefore, the realization process of how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in a method of implementing an embodiment described above may be implemented by a program to instruct related hardware, and thus, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Referring to fig. 1-3, a specific implementation manner of the present embodiment is shown, the visual navigation model of the present application is based on binary pulse sequence operation instead of floating point feature operation, so that a large number of floating point high-precision multiplication operations in the conventional reinforcement learning visual navigation model are avoided, and therefore, in the navigation process, the energy consumption is extremely low, and the present application is very easy to be deployed in an embedded system of a mobile robot; meanwhile, the problem of the pulse neural network in the mobile robot non-image visual navigation is well solved.

Referring to fig. 1, the present embodiment provides a mobile robot visual navigation decision method based on pulse reinforcement learning, which includes the following steps:

specifically, the local feature of each image pixel point in S1 is extracted by two-dimensional image convolution.

s3, designing a pulse feature extraction module to respectively extract pulse sequences of the visual image p and the navigation target image q in the kth time step and />Thereby obtaining a K time step visual image pulse sequence set +.>And navigation target image pulse sequence set +.>；

Specifically, in S3, pulse sequences of the visual image p and the navigation target image q at the kth time step are extracted respectivelyAndthereby obtaining a K time step visual image pulse sequence set +.>And navigating a set of target image pulse sequencesThe method comprises the following steps:

s301: constructing a kernel-pixel-pulse neural layer calculation model, stacking 4 kernel-pixel-pulse neural layers, processing the pulse sequence of the K time steps in S2 to obtain 3-dimensional pulse sequence tensors, wherein any kth time stepThe pulse signal output by all core-pixel-pulse neurons of layer 4 of (a) is +.>；

Specifically, constructing a kernel-pixel-pulse neural layer calculation model in S301 includes: for any position on the imageIn the kth time step +.>The kernel-pixel-pulse neural layer calculation formula is:

；

wherein ,is an attenuation coefficient, is a trainable parameter of the module; d represents derivative; />Is the kth time step +.>Layer kernel-pixel-pulse neuron is +.>A body wall membrane potential is located; />Is a kernel matrix, which is a trainable parameter of the module, with a rank size of +.>, wherein />Respectively representing index numbers; />Is->Layer input potential, which is defined and +.>Output correlation of layers; />Represents the kth time step +.>Layer kernel-pixel-pulse neuron is +.>A pulse signal at; when->In the time-course of which the first and second contact surfaces,the method comprises the steps of carrying out a first treatment on the surface of the Otherwise->Remain unchanged, wherein->Is a set threshold value, and is generally 1.0 #>Is a reset voltage, typically takes a value of 0, based on which the kth time step +.>Layer kernel-pixel-pulse neuron is +.>Pulse signal->The pulse sequence tensor which is unfolded into one dimension is the pulse sequence of the kth time step of the final input image, namely the pulse sequence of the kth time step robot vision image p is expressed as +.>The pulse sequence of the kth time step navigation target image q is expressed as +.>，/>Is a set of pulse sequences representing the visual image p of the robot at time step K, i.e.>，A set of pulse sequences representing a K time step navigation target image q.

S4, designing a double-synaptic pulse nerve layer based on a circuit leakage-integration-discharge working principle to realize the two pulse sequence sets and />The information of (2) is fused to obtain a fused pulse sequence set, as shown in figure 3;

specifically, the circuit leakage-integration-discharge working principle based on S4 designs a double-synaptic pulse neural layer for realizing the two pulse sequence sets and />Comprises the following steps:

s401, designing a synaptic electric potential calculation model of a double-synaptic pulse neuron, and realizing pulse sequence set of a K-time-step robot visual image pAnd pulse sequence set of navigation target image q +.>To obtain a pulse sequence with the robot visual image p for any kth time step +.>Synaptic potential for input->And a pulse sequence for arbitrary kth time step to navigate the target image q>Synaptic potential for input->The method comprises the steps of carrying out a first treatment on the surface of the Wherein for any kth time step a pulse sequence of robot vision image p is used +.>Synaptic potential for input->The calculation formula of (2) is as follows:

；

for any kth time step, to navigate the pulse sequence of the target image qSynaptic potential for input->The calculation formula of (2) is as follows:

；

wherein , and />Is an attenuation coefficient, is a trainable parameter; /> and />Is synaptic weight, also trainable parameter,/-> and />Respectively representing input potentials of two synapses above the kth time step;

s404, calculating the fused kth time step pulse signalRealizing the fusion of the two pulse sequence information to output the fused K time step pulse sequence set +.>, wherein />Is the number of double synaptic impulse neurons, here value 512.

Specifically, the two synapse pulse neuron wall membrane potential in S402The calculation formula of (2) is as follows:

；

Specifically, the step S5 of constructing a pulse reinforcement learning visual navigation decision model, designing a visual navigation reward function, optimizing model parameters in a visual navigation simulator, and implementing a mobile robot non-image visual navigation decision, specifically includes the following steps:

s501, a pulse reinforcement learning visual navigation model consists of a strategy network and a value network, wherein the strategy network consists of the pulse characteristic extraction module, a double-synaptic pulse neural layer and a full-connection layer, as shown in figure 2, wherein the input of the full-connection layer is a fused K time-step pulse sequence setThe output is the probability of the different navigational actions that the robot should take at the next moment +.>The output dimension is thus the navigation action space dimension, while the navigation action space a= { forward translation, backward translation, leftward translation, rightward translation, leftward rotation, rightward rotation, stop }, and therefore here the navigation action space a has a dimension of 7, as shown in fig. 3; the value network and the strategy network share a pulse feature extraction module and a double-synaptic pulse neural layer, and the output dimension of a subsequent full-connection layer is 1, so that the strategy value +.>As shown in fig. 2;

s502, designing a pulse reinforcement learning visual navigation reward functionSetting a threshold +.>Judging movementDistance of robot to navigation target +.>And at threshold->The magnitude relation of (1) if->Then->The method comprises the steps of carrying out a first treatment on the surface of the When the mobile robot collides with an object in the training scene, the robot is in the form of a ++>The method comprises the steps of carrying out a first treatment on the surface of the In other cases, in order to make the mobile robot reach the position of the navigation target as soon as possible, the mobile robot performs a navigation operation every time,/->；

s504, deploying the optimized strategy network on the mobile robot to realize the non-image visual navigation decision of the robot, so that the mobile robot predicts the navigation action at the next moment based on the acquired visual image and the navigation target image at any moment of the navigation task, as shown in fig. 3.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.

The foregoing embodiments have been presented in a detail description of the application, and are presented herein with a particular application to the understanding of the principles and embodiments of the application, the foregoing embodiments being merely intended to facilitate an understanding of the method of the application and its core concepts; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. The mobile robot visual navigation decision-making method based on pulse reinforcement learning is characterized by comprising the following steps of:

2. The method for determining the visual navigation of the mobile robot based on pulse reinforcement learning according to claim 1, wherein the local feature of each image pixel in S1 is extracted by two-dimensional image convolution.

3. A base according to claim 1The mobile robot visual navigation decision method for pulse reinforcement learning is characterized in that the pulse sequences of the visual image p and the navigation target image q in the kth time step are respectively extracted in the S3Andthereby obtaining a K time step visual image pulse sequence set +.>And navigating a set of target image pulse sequencesThe method comprises the following steps:

4. A method for determining a visual navigation of a mobile robot based on pulse reinforcement learning according to claim 3, wherein the constructing a kernel-pixel-pulse neural layer calculation model in S301 comprises: for any position on the imageIn the kth time step +.>The kernel-pixel-pulse neural layer calculation formula is:

；

5. The method for visual navigation decision-making of mobile robot based on pulse reinforcement learning as set forth in claim 1, wherein said step S4 is based on circuit leakage-integration-discharge theory of operation, and a double-synaptic pulse neural layer is designed for implementing the two pulse sequence sets and />Comprises the following steps:

S403, judging the wall membrane potential of the double-synaptic pulse neuronAnd threshold->Is as follows>In the time-course of which the first and second contact surfaces,the method comprises the steps of carrying out a first treatment on the surface of the Otherwise->Remain unchanged, wherein->The general value is 1.0,/o>Is reset voltage, and is generally 0, < ->Represents the kth time step +.>Pulse output of the double-synaptic pulse neurons;

6. The method for visual navigation decision of a mobile robot based on pulse reinforcement learning according to claim 5, wherein the two-synaptic pulse neuron wall membrane potential in S402The calculation formula of (2) is as follows:

；

7. The method for mobile robot visual navigation decision based on pulse reinforcement learning according to claim 1, wherein the step S5 is to construct a pulse reinforcement learning visual navigation decision model, design a visual navigation reward function, optimize model parameters in a visual navigation simulator, and realize a mobile robot graphical-free visual navigation decision, and specifically comprises the following steps:

s501, a pulse reinforcement learning visual navigation model consists of a strategy network and a value network, wherein the strategy network consists of the pulse characteristic extraction module, a double-synaptic pulse neural layer and a full-connection layer, wherein the input of the full-connection layer is a fused K time step pulse sequence setThe output is the probability of a different navigational action that the robot should take at the next timeThe output dimension is thus the navigation action space dimension, while the navigation action space a= { forward translation, backward translation, leftward translation, rightward translation, leftward rotation, rightward rotation, stop }, hence here the navigation action space a dimension is 7; the value network and the strategy network share a pulse feature extraction module and a double-synaptic pulse neural layer, and the output dimension of a subsequent full-connection layer is 1, so that the strategy value +.>；