CN116989800A - Mobile robot visual navigation decision-making method based on pulse reinforcement learning - Google Patents

Mobile robot visual navigation decision-making method based on pulse reinforcement learning Download PDF

Info

Publication number
CN116989800A
CN116989800A CN202311263699.6A CN202311263699A CN116989800A CN 116989800 A CN116989800 A CN 116989800A CN 202311263699 A CN202311263699 A CN 202311263699A CN 116989800 A CN116989800 A CN 116989800A
Authority
CN
China
Prior art keywords
pulse
navigation
visual
image
pulse sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311263699.6A
Other languages
Chinese (zh)
Other versions
CN116989800B (en
Inventor
吴巧云
周云
赵冬
谭春雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202311263699.6A priority Critical patent/CN116989800B/en
Publication of CN116989800A publication Critical patent/CN116989800A/en
Application granted granted Critical
Publication of CN116989800B publication Critical patent/CN116989800B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level

Abstract

The application relates to a mobile robot vision navigation decision method based on pulse reinforcement learning, which comprises the following steps of S1, collecting a robot vision image and a navigation target image, and extracting local characteristics of each image pixel point; s2, processing each element of the local characteristics to generate a time-step pulse sequence; s3, extracting a visual image pulse sequence and a navigation target image pulse sequence of the mobile robot; s4, designing a double-synaptic pulse nerve layer for realizing information fusion of the two pulse sequence sets; s5, designing a visual navigation reward function, and optimizing model parameters in a visual navigation simulator to realize the non-image visual navigation decision of the mobile robot. The visual navigation model is based on binary pulse sequence operation instead of floating point number characteristic operation, so that a large number of floating point number high-precision multiplication operations in the traditional reinforcement learning visual navigation model are avoided, and the visual navigation model is extremely low in energy consumption and easy to deploy into an embedded system of a mobile robot in the navigation process.

Description

Mobile robot visual navigation decision-making method based on pulse reinforcement learning
Technical Field
The application relates to the technical field of mobile robot navigation, in particular to a mobile robot visual navigation decision method based on pulse reinforcement learning.
Background
Mobile robot visual navigation is an important research direction in the fields of artificial intelligence and robot technology, and aims to enable a robot to navigate and position autonomously in an unknown or partially unknown environment by sensing visual information in the environment. With the continuous development of robot technology, visual navigation robots are increasingly used in various fields including industrial production, logistics distribution, medical care, agriculture and the like. These application scenarios require robots to be able to navigate in complex and diverse environments, while robot vision can provide a richer perception and understanding of the environment.
Traditional navigation methods have always relied on global information in the form of an environment map, however, a large number of applications requiring efficient navigation, considering real-time requirements and limited energy resources, are generally unable to afford the construction cost of global maps. Recently introduced deep reinforcement learning methods, such as: DDPG/deep deterministic policy gradient and A3C/Asynchronous Advantage Actor-Critic are able to learn the optimal control strategy for map-free visual navigation, where robots make navigation decisions based on their local visual inputs and limited global information. However, the optimization of these navigation methods comes at the cost of high energy consumption. Given that increasingly complex mobile robot applications are difficult to continuously offset by the equivalent growth of on-board energy sources, there will be an increasing need for low power solutions for map-free visual navigation of mobile robots.
The impulse neural network is an emerging alternative architecture to deep neural networks inspired by the brain, exhibiting extremely high energy efficiency advantages. The neuron computation is asynchronous, and communication is carried out through discrete events called pulses, so that a large number of floating point number high-precision multiplication operations in the traditional neural network are avoided, and therefore, the energy consumption is extremely low in the system operation process. At present, the impulse neural network is widely applied to the field of visual perception for detecting and classifying visual images, but the research of the impulse neural network on the problem of the mobile robot graphical-free visual navigation is quite limited, and more attention and investment are being gained.
Disclosure of Invention
Aiming at the defects of the prior art, the application provides a mobile robot visual navigation decision method based on pulse reinforcement learning, which solves the problem of high energy consumption in the conventional mobile robot visual navigation decision process, and aims at the problems in the prior art, the application provides a visual navigation decision method based on pulse reinforcement learning, wherein a visual navigation model is based on binary pulse sequence operation instead of floating point feature operation, and a large number of floating point high-precision multiplication operations in the conventional reinforcement learning visual navigation model are avoided, so that the energy consumption is extremely low in the navigation process and the method is extremely easy to be deployed in an embedded system of the mobile robot.
In order to solve the technical problems, the application provides the following technical scheme: a mobile robot visual navigation decision-making method based on pulse reinforcement learning comprises the following steps:
s1, acquiring a robot visual image p and a navigation target image q, and extracting local features of each image pixel point;
s2, processing each element of the local characteristics through the leakage-integration-discharge LIF neurons to generate a pulse sequence of a K time step;
s3, designing a pulse feature extraction module to respectively extract pulse sequences of the visual image p and the navigation target image q in the kth time step and />Thereby obtaining a K time step visual image pulse sequence set +.>NavigationTarget image pulse sequence set->
S4, designing a double-synaptic pulse nerve layer based on a circuit leakage-integration-discharge working principle to realize the two pulse sequence sets and />Obtaining a fused pulse sequence set;
s5, constructing a pulse reinforcement learning visual navigation decision model, designing a visual navigation reward function, and optimizing model parameters in a visual navigation simulator to realize the mobile robot non-image visual navigation decision.
Further, the local feature of each image pixel point in S1 is extracted by two-dimensional image convolution.
Further, the step S3 extracts pulse sequences of the visual image p and the navigation target image q at the kth time step respectively and />Thereby obtaining a K time step visual image pulse sequence set +.>And navigation target image pulse sequence set +.>The method comprises the following steps:
s301: constructing a kernel-pixel-pulse neural layer calculation model, stacking 4 kernel-pixel-pulse neural layers, processing the pulse sequence of the K time steps in S2 to obtain 3-dimensional pulse sequence tensors, wherein any kth time stepAll kernel-pixel-pulses of layer 4 of (2)Pulse signal output by impulse neuron is +.>
S302: pulse signals output by all kernel-pixel-pulse neurons of the 4 th layer of the kth time stepThe pulse sequence tensor which is unfolded into one dimension is the pulse sequence of the kth time step of the final input image, and the pulse sequence of the kth time step robot vision image p is expressed as +.>The pulse sequence set is expressed as +.>The pulse sequence of the kth time step navigation target image is expressed as +.>The pulse sequence set is expressed as +.>
Further, the constructing a kernel-pixel-pulse neural layer calculation model in S301 includes: for any position on the imageIn the kth time step +.>The kernel-pixel-pulse neural layer calculation formula is:
wherein ,is an attenuation coefficient, is a trainable parameter of the module; d represents derivative; />Is the kth time step +.>Layer kernel-pixel-pulse neuron is +.>A body wall membrane potential is located; />Is a kernel matrix, which is a trainable parameter of the module, with a rank size of +.>, wherein />Respectively representing index numbers; />Is->Layer input potential, which is defined and +.>Output correlation of layers; />Represents the kth time step +.>Layer kernel-pixel-pulse neuron is +.>A pulse signal at; when->In the time-course of which the first and second contact surfaces,the method comprises the steps of carrying out a first treatment on the surface of the Otherwise->Remain unchanged, wherein->Is a set threshold value, and is generally 1.0 #>Is a reset voltage, typically takes a value of 0, based on which the kth time step +.>Layer kernel-pixel-pulse neuron is +.>A pulse signal at the same.
Further, the step S4 is based on the circuit leakage-integration-discharge working principle, and designs a double-synaptic pulse neural layer for realizing the two pulse sequence sets and />Comprises the following steps:
s401, designing a synaptic electric potential calculation model of a double-synaptic pulse neuron, and realizing pulse sequence set of a K-time-step robot visual image pAnd pulse sequence set of navigation target image q +.>To obtain a pulse sequence with the robot visual image p for any kth time step +.>Synaptic potential for input->And a pulse sequence for arbitrary kth time step to navigate the target image q>Synaptic potential for input->
S402, designing a double-synaptic pulse neuron body wall membrane potential calculation model to calculate the double-synaptic pulse neuron body wall membrane potential
S403, judging the wall membrane potential of the double-synaptic pulse neuronAnd threshold->Is as follows>When (I)>The method comprises the steps of carrying out a first treatment on the surface of the Otherwise->Remain unchanged, wherein->The value of 1.0 is generally taken to be,is reset voltage, and is generally 0, < ->Represents the kth time step +.>Pulse output of the double-synaptic pulse neurons;
s404, calculating the fused kth time step pulse signalRealize two by twoSeed pulse sequence information fusion to output a fused K time step pulse sequence set +.>, wherein />Is the number of double synaptic impulse neurons, here value 512.
Further, the two-synapse pulse neuron wall membrane potential in S402The calculation formula of (2) is as follows:
wherein ,is an attenuation coefficient, is a trainable parameter; />The dendrite electrical conductivity is shown as 0.25 and 0.25, respectively.
Further, the step S5 is to construct a pulse reinforcement learning visual navigation decision model, design a visual navigation reward function, optimize model parameters in a visual navigation simulator and realize the non-image visual navigation decision of the mobile robot, and specifically comprises the following steps:
s501, a pulse reinforcement learning visual navigation model consists of a strategy network and a value network, wherein the strategy network consists of the pulse characteristic extraction module, a double-synaptic pulse neural layer and a full-connection layer, wherein the input of the full-connection layer is a fused K time step pulse sequence setThe output is the probability of the different navigational actions that the robot should take at the next moment +.>Thus the output dimension isNavigation action space dimension, and navigation action space a= { forward translation, backward translation, left translation, right translation, left rotation, right rotation, stop }, therefore, here navigation action space a dimension is 7; the value network and the strategy network share a pulse feature extraction module and a double-synaptic pulse neural layer, and the output dimension of a subsequent full-connection layer is 1, so that the strategy value +.>
S502, designing a pulse reinforcement learning visual navigation reward functionSetting a threshold +.>Judging the distance of the mobile robot to the navigation target +.>And at threshold->The magnitude relation of (1) if->Then->The method comprises the steps of carrying out a first treatment on the surface of the When the mobile robot collides with an object in the training scene, the robot is in the form of a ++>The method comprises the steps of carrying out a first treatment on the surface of the In other cases, in order to make the mobile robot reach the position of the navigation target as soon as possible, the mobile robot performs a navigation operation every time,/->
S503, optimizing model parameters in a visual navigation simulation AI2THOR, updating navigation decision model parameters based on a loss function of a traditional reinforcement learning model Asynchronous Advantage Actor-Critic and A3C by utilizing a random gradient descent method SGD, setting a learning rate to be 1e-4 until a reward function curve converges, and completing an online training process to obtain an optimized strategy network and a value network;
s504, deploying the optimized strategy network to the mobile robot to realize the non-image visual navigation decision of the robot, so that the mobile robot predicts the navigation action at the next moment based on the acquired visual image and the navigation target image at any moment of the navigation task.
By means of the technical scheme, the application provides a mobile robot visual navigation decision method based on pulse reinforcement learning, which has at least the following beneficial effects:
the visual navigation model is based on binary pulse sequence operation instead of floating point number characteristic operation, so that a large number of floating point number high-precision multiplication operations in the traditional reinforcement learning visual navigation model are avoided, and the visual navigation model is extremely low in energy consumption and easy to deploy into an embedded system of a mobile robot in the navigation process; meanwhile, the problem of the pulse neural network in the mobile robot non-image visual navigation is well solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a flow chart of a visual navigation decision of a mobile robot based on pulse reinforcement learning;
FIG. 2 is a diagram of a pulse reinforcement learning visual navigation model framework in the present application;
fig. 3 is a schematic diagram of a part of visual images obtained by a mobile robot in a navigation task process and a navigation action predicted by a strategy network based on the visual images.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. Therefore, the realization process of how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in a method of implementing an embodiment described above may be implemented by a program to instruct related hardware, and thus, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Referring to fig. 1-3, a specific implementation manner of the present embodiment is shown, the visual navigation model of the present application is based on binary pulse sequence operation instead of floating point feature operation, so that a large number of floating point high-precision multiplication operations in the conventional reinforcement learning visual navigation model are avoided, and therefore, in the navigation process, the energy consumption is extremely low, and the present application is very easy to be deployed in an embedded system of a mobile robot; meanwhile, the problem of the pulse neural network in the mobile robot non-image visual navigation is well solved.
Referring to fig. 1, the present embodiment provides a mobile robot visual navigation decision method based on pulse reinforcement learning, which includes the following steps:
s1, acquiring a robot visual image p and a navigation target image q, and extracting local features of each image pixel point;
specifically, the local feature of each image pixel point in S1 is extracted by two-dimensional image convolution.
S2, processing each element of the local characteristics through the leakage-integration-discharge LIF neurons to generate a pulse sequence of a K time step;
s3, designing a pulse feature extraction module to respectively extract pulse sequences of the visual image p and the navigation target image q in the kth time step and />Thereby obtaining a K time step visual image pulse sequence set +.>And navigation target image pulse sequence set +.>
Specifically, in S3, pulse sequences of the visual image p and the navigation target image q at the kth time step are extracted respectivelyAndthereby obtaining a K time step visual image pulse sequence set +.>And navigating a set of target image pulse sequencesThe method comprises the following steps:
s301: constructing a kernel-pixel-pulse neural layer calculation model, stacking 4 kernel-pixel-pulse neural layers, processing the pulse sequence of the K time steps in S2 to obtain 3-dimensional pulse sequence tensors, wherein any kth time stepThe pulse signal output by all core-pixel-pulse neurons of layer 4 of (a) is +.>
S302: pulse signals output by all kernel-pixel-pulse neurons of the 4 th layer of the kth time stepThe pulse sequence tensor which is unfolded into one dimension is the pulse sequence of the kth time step of the final input image, and the pulse sequence of the kth time step robot vision image p is expressed as +.>The pulse sequence set is expressed as +.>The pulse sequence of the kth time step navigation target image is expressed as +.>The pulse sequence set is expressed as +.>
Specifically, constructing a kernel-pixel-pulse neural layer calculation model in S301 includes: for any position on the imageIn the kth time step +.>The kernel-pixel-pulse neural layer calculation formula is:
wherein ,is an attenuation coefficient, is a trainable parameter of the module; d represents derivative; />Is the kth time step +.>Layer kernel-pixel-pulse neuron is +.>A body wall membrane potential is located; />Is a kernel matrix, which is a trainable parameter of the module, with a rank size of +.>, wherein />Respectively representing index numbers; />Is->Layer input potential, which is defined and +.>Output correlation of layers; />Represents the kth time step +.>Layer kernel-pixel-pulse neuron is +.>A pulse signal at; when->In the time-course of which the first and second contact surfaces,the method comprises the steps of carrying out a first treatment on the surface of the Otherwise->Remain unchanged, wherein->Is a set threshold value, and is generally 1.0 #>Is a reset voltage, typically takes a value of 0, based on which the kth time step +.>Layer kernel-pixel-pulse neuron is +.>Pulse signal->The pulse sequence tensor which is unfolded into one dimension is the pulse sequence of the kth time step of the final input image, namely the pulse sequence of the kth time step robot vision image p is expressed as +.>The pulse sequence of the kth time step navigation target image q is expressed as +.>,/>Is a set of pulse sequences representing the visual image p of the robot at time step K, i.e.>A set of pulse sequences representing a K time step navigation target image q.
S4, designing a double-synaptic pulse nerve layer based on a circuit leakage-integration-discharge working principle to realize the two pulse sequence sets and />The information of (2) is fused to obtain a fused pulse sequence set, as shown in figure 3;
specifically, the circuit leakage-integration-discharge working principle based on S4 designs a double-synaptic pulse neural layer for realizing the two pulse sequence sets and />Comprises the following steps:
s401, designing a synaptic electric potential calculation model of a double-synaptic pulse neuron, and realizing pulse sequence set of a K-time-step robot visual image pAnd pulse sequence set of navigation target image q +.>To obtain a pulse sequence with the robot visual image p for any kth time step +.>Synaptic potential for input->And a pulse sequence for arbitrary kth time step to navigate the target image q>Synaptic potential for input->The method comprises the steps of carrying out a first treatment on the surface of the Wherein for any kth time step a pulse sequence of robot vision image p is used +.>Synaptic potential for input->The calculation formula of (2) is as follows:
for any kth time step, to navigate the pulse sequence of the target image qSynaptic potential for input->The calculation formula of (2) is as follows:
wherein , and />Is an attenuation coefficient, is a trainable parameter; /> and />Is synaptic weight, also trainable parameter,/-> and />Respectively representing input potentials of two synapses above the kth time step;
s402, designing a double-synaptic pulse neuron body wall membrane potential calculation model to calculate the double-synaptic pulse neuron body wall membrane potential
S403, judging the wall membrane potential of the double-synaptic pulse neuronAnd threshold->Is as follows>When (I)>The method comprises the steps of carrying out a first treatment on the surface of the Otherwise->Remain unchanged, wherein->The value of 1.0 is generally taken to be,is reset voltage, and is generally 0, < ->Represents the kth time step +.>Pulse output of the double-synaptic pulse neurons;
s404, calculating the fused kth time step pulse signalRealizing the fusion of the two pulse sequence information to output the fused K time step pulse sequence set +.>, wherein />Is the number of double synaptic impulse neurons, here value 512.
Specifically, the two synapse pulse neuron wall membrane potential in S402The calculation formula of (2) is as follows:
wherein ,is an attenuation coefficient, is a trainable parameter; />The dendrite electrical conductivity is shown as 0.25 and 0.25, respectively.
S5, constructing a pulse reinforcement learning visual navigation decision model, designing a visual navigation reward function, and optimizing model parameters in a visual navigation simulator to realize the mobile robot non-image visual navigation decision.
Specifically, the step S5 of constructing a pulse reinforcement learning visual navigation decision model, designing a visual navigation reward function, optimizing model parameters in a visual navigation simulator, and implementing a mobile robot non-image visual navigation decision, specifically includes the following steps:
s501, a pulse reinforcement learning visual navigation model consists of a strategy network and a value network, wherein the strategy network consists of the pulse characteristic extraction module, a double-synaptic pulse neural layer and a full-connection layer, as shown in figure 2, wherein the input of the full-connection layer is a fused K time-step pulse sequence setThe output is the probability of the different navigational actions that the robot should take at the next moment +.>The output dimension is thus the navigation action space dimension, while the navigation action space a= { forward translation, backward translation, leftward translation, rightward translation, leftward rotation, rightward rotation, stop }, and therefore here the navigation action space a has a dimension of 7, as shown in fig. 3; the value network and the strategy network share a pulse feature extraction module and a double-synaptic pulse neural layer, and the output dimension of a subsequent full-connection layer is 1, so that the strategy value +.>As shown in fig. 2;
s502, designing a pulse reinforcement learning visual navigation reward functionSetting a threshold +.>Judging movementDistance of robot to navigation target +.>And at threshold->The magnitude relation of (1) if->Then->The method comprises the steps of carrying out a first treatment on the surface of the When the mobile robot collides with an object in the training scene, the robot is in the form of a ++>The method comprises the steps of carrying out a first treatment on the surface of the In other cases, in order to make the mobile robot reach the position of the navigation target as soon as possible, the mobile robot performs a navigation operation every time,/->
S503, optimizing model parameters in a visual navigation simulation AI2THOR, updating navigation decision model parameters based on a loss function of a traditional reinforcement learning model Asynchronous Advantage Actor-Critic and A3C by utilizing a random gradient descent method SGD, setting a learning rate to be 1e-4 until a reward function curve converges, and completing an online training process to obtain an optimized strategy network and a value network;
s504, deploying the optimized strategy network on the mobile robot to realize the non-image visual navigation decision of the robot, so that the mobile robot predicts the navigation action at the next moment based on the acquired visual image and the navigation target image at any moment of the navigation task, as shown in fig. 3.
The visual navigation model is based on binary pulse sequence operation instead of floating point number characteristic operation, so that a large number of floating point number high-precision multiplication operations in the traditional reinforcement learning visual navigation model are avoided, and the visual navigation model is extremely low in energy consumption and easy to deploy into an embedded system of a mobile robot in the navigation process; meanwhile, the problem of the pulse neural network in the mobile robot non-image visual navigation is well solved.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
The foregoing embodiments have been presented in a detail description of the application, and are presented herein with a particular application to the understanding of the principles and embodiments of the application, the foregoing embodiments being merely intended to facilitate an understanding of the method of the application and its core concepts; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (7)

1. The mobile robot visual navigation decision-making method based on pulse reinforcement learning is characterized by comprising the following steps of:
s1, acquiring a robot visual image p and a navigation target image q, and extracting local features of each image pixel point;
s2, processing each element of the local characteristics through the leakage-integration-discharge LIF neurons to generate a pulse sequence of a K time step;
s3, designing a pulse feature extraction module to respectively extract pulse sequences of the visual image p and the navigation target image q in the kth time step and />Thereby obtaining a K time step visual image pulse sequence set +.>And navigation target image pulse sequence set +.>
S4, designing a double-synaptic pulse nerve layer based on a circuit leakage-integration-discharge working principle to realize the two pulse sequence sets and />Obtaining a fused pulse sequence set;
s5, constructing a pulse reinforcement learning visual navigation decision model, designing a visual navigation reward function, and optimizing model parameters in a visual navigation simulator to realize the mobile robot non-image visual navigation decision.
2. The method for determining the visual navigation of the mobile robot based on pulse reinforcement learning according to claim 1, wherein the local feature of each image pixel in S1 is extracted by two-dimensional image convolution.
3. A base according to claim 1The mobile robot visual navigation decision method for pulse reinforcement learning is characterized in that the pulse sequences of the visual image p and the navigation target image q in the kth time step are respectively extracted in the S3Andthereby obtaining a K time step visual image pulse sequence set +.>And navigating a set of target image pulse sequencesThe method comprises the following steps:
s301: constructing a kernel-pixel-pulse neural layer calculation model, stacking 4 kernel-pixel-pulse neural layers, processing the pulse sequence of the K time steps in S2 to obtain 3-dimensional pulse sequence tensors, wherein any kth time stepThe pulse signal output by all core-pixel-pulse neurons of layer 4 of (a) is +.>
S302: pulse signals output by all kernel-pixel-pulse neurons of the 4 th layer of the kth time stepThe pulse sequence tensor which is unfolded into one dimension is the pulse sequence of the kth time step of the final input image, and the pulse sequence of the kth time step robot vision image p is expressed as +.>The pulse sequence set is expressed as +.>The pulse sequence of the kth time step navigation target image is expressed as +.>The pulse sequence set is expressed as +.>
4. A method for determining a visual navigation of a mobile robot based on pulse reinforcement learning according to claim 3, wherein the constructing a kernel-pixel-pulse neural layer calculation model in S301 comprises: for any position on the imageIn the kth time step +.>The kernel-pixel-pulse neural layer calculation formula is:
wherein ,is an attenuation coefficient, is a trainable parameter of the module; d represents derivative; />Is the kth time step +.>Layer kernel-pixel-pulse neuron is +.>A body wall membrane potential is located; />Is a kernel matrix, which is a trainable parameter of the module, with a rank size of +.>, wherein />Respectively representing index numbers; />Is->Layer input potential, which is defined and +.>Output correlation of layers; />Represents the kth time step +.>Layer kernel-pixel-pulse neuron is +.>A pulse signal at; when->In the time-course of which the first and second contact surfaces,the method comprises the steps of carrying out a first treatment on the surface of the Otherwise->Remain unchanged, wherein->Is a set threshold value, and is generally 1.0 #>Is a reset voltage, typically takes a value of 0, based on which the kth time step +.>Layer kernel-pixel-pulse neuron is +.>A pulse signal at the same.
5. The method for visual navigation decision-making of mobile robot based on pulse reinforcement learning as set forth in claim 1, wherein said step S4 is based on circuit leakage-integration-discharge theory of operation, and a double-synaptic pulse neural layer is designed for implementing the two pulse sequence sets and />Comprises the following steps:
s401, designing a synaptic electric potential calculation model of a double-synaptic pulse neuron, and realizing pulse sequence set of a K-time-step robot visual image pAnd pulse sequence set of navigation target image q +.>To obtain a pulse sequence with the robot visual image p for any kth time step +.>Synaptic potential for input->And a pulse sequence for arbitrary kth time step to navigate the target image q>Synaptic potential for input->
S402, designing a double-synaptic pulse neuron body wall membrane potential calculation model to calculate the double-synaptic pulse neuron body wall membrane potential
S403, judging the wall membrane potential of the double-synaptic pulse neuronAnd threshold->Is as follows>In the time-course of which the first and second contact surfaces,the method comprises the steps of carrying out a first treatment on the surface of the Otherwise->Remain unchanged, wherein->The general value is 1.0,/o>Is reset voltage, and is generally 0, < ->Represents the kth time step +.>Pulse output of the double-synaptic pulse neurons;
s404, calculating the fused kth time step pulse signalRealizing the fusion of the two pulse sequence information to output the fused K time step pulse sequence set +.>, wherein />Is the number of double synaptic impulse neurons, here value 512.
6. The method for visual navigation decision of a mobile robot based on pulse reinforcement learning according to claim 5, wherein the two-synaptic pulse neuron wall membrane potential in S402The calculation formula of (2) is as follows:
wherein ,is an attenuation coefficient, is a trainable parameter; />The dendrite electrical conductivity is shown as 0.25 and 0.25, respectively.
7. The method for mobile robot visual navigation decision based on pulse reinforcement learning according to claim 1, wherein the step S5 is to construct a pulse reinforcement learning visual navigation decision model, design a visual navigation reward function, optimize model parameters in a visual navigation simulator, and realize a mobile robot graphical-free visual navigation decision, and specifically comprises the following steps:
s501, a pulse reinforcement learning visual navigation model consists of a strategy network and a value network, wherein the strategy network consists of the pulse characteristic extraction module, a double-synaptic pulse neural layer and a full-connection layer, wherein the input of the full-connection layer is a fused K time step pulse sequence setThe output is the probability of a different navigational action that the robot should take at the next timeThe output dimension is thus the navigation action space dimension, while the navigation action space a= { forward translation, backward translation, leftward translation, rightward translation, leftward rotation, rightward rotation, stop }, hence here the navigation action space a dimension is 7; the value network and the strategy network share a pulse feature extraction module and a double-synaptic pulse neural layer, and the output dimension of a subsequent full-connection layer is 1, so that the strategy value +.>
S502, designing a pulse reinforcement learning visual navigation reward functionSetting a threshold +.>Judging the distance of the mobile robot to the navigation target +.>And at threshold->The magnitude relation of (1) if->Then->The method comprises the steps of carrying out a first treatment on the surface of the When the mobile robot collides with an object in the training scene, the robot is in the form of a ++>The method comprises the steps of carrying out a first treatment on the surface of the In other cases, in order to make the mobile robot reach the position of the navigation target as soon as possible, the mobile robot performs a navigation operation every time,/->
S503, optimizing model parameters in a visual navigation simulation AI2THOR, updating navigation decision model parameters based on a loss function of a traditional reinforcement learning model Asynchronous Advantage Actor-Critic and A3C by utilizing a random gradient descent method SGD, setting a learning rate to be 1e-4 until a reward function curve converges, and completing an online training process to obtain an optimized strategy network and a value network;
s504, deploying the optimized strategy network to the mobile robot to realize the non-image visual navigation decision of the robot, so that the mobile robot predicts the navigation action at the next moment based on the acquired visual image and the navigation target image at any moment of the navigation task.
CN202311263699.6A 2023-09-27 2023-09-27 Mobile robot visual navigation decision-making method based on pulse reinforcement learning Active CN116989800B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311263699.6A CN116989800B (en) 2023-09-27 2023-09-27 Mobile robot visual navigation decision-making method based on pulse reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311263699.6A CN116989800B (en) 2023-09-27 2023-09-27 Mobile robot visual navigation decision-making method based on pulse reinforcement learning

Publications (2)

Publication Number Publication Date
CN116989800A true CN116989800A (en) 2023-11-03
CN116989800B CN116989800B (en) 2023-12-15

Family

ID=88534260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311263699.6A Active CN116989800B (en) 2023-09-27 2023-09-27 Mobile robot visual navigation decision-making method based on pulse reinforcement learning

Country Status (1)

Country Link
CN (1) CN116989800B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8990133B1 (en) * 2012-12-20 2015-03-24 Brain Corporation Apparatus and methods for state-dependent learning in spiking neuron networks
CN106845541A (en) * 2017-01-17 2017-06-13 杭州电子科技大学 A kind of image-recognizing method based on biological vision and precision pulse driving neutral net
CN106951923A (en) * 2017-03-21 2017-07-14 西北工业大学 A kind of robot three-dimensional shape recognition process based on multi-camera Vision Fusion
CN113375676A (en) * 2021-05-26 2021-09-10 南京航空航天大学 Detector landing point positioning method based on impulse neural network
CN113688980A (en) * 2020-05-19 2021-11-23 深圳忆海原识科技有限公司 Brain-like visual neural network with forward learning and meta learning functions
CN114594768A (en) * 2022-03-03 2022-06-07 安徽大学 Mobile robot navigation decision-making method based on visual feature map reconstruction
CN115147456A (en) * 2022-06-29 2022-10-04 华东师范大学 Target tracking method based on time sequence adaptive convolution and attention mechanism
WO2022253229A1 (en) * 2021-06-04 2022-12-08 北京灵汐科技有限公司 Synaptic weight training method, target recognition method, electronic device, and medium
CN115631343A (en) * 2022-09-22 2023-01-20 广东人工智能与先进计算研究院 Image generation method, device and equipment based on full pulse network and storage medium
CN115880324A (en) * 2021-09-28 2023-03-31 南京理工大学 Battlefield target image threshold segmentation method based on pulse convolution neural network
WO2023083121A1 (en) * 2021-11-09 2023-05-19 华为技术有限公司 Denoising method and related device
CN116295415A (en) * 2023-03-02 2023-06-23 之江实验室 Map-free maze navigation method and system based on pulse neural network reinforcement learning
CN116394264A (en) * 2023-06-07 2023-07-07 安徽大学 Group coding impulse neural network-based multi-mechanical arm cooperative motion planning method

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8990133B1 (en) * 2012-12-20 2015-03-24 Brain Corporation Apparatus and methods for state-dependent learning in spiking neuron networks
CN106845541A (en) * 2017-01-17 2017-06-13 杭州电子科技大学 A kind of image-recognizing method based on biological vision and precision pulse driving neutral net
CN106951923A (en) * 2017-03-21 2017-07-14 西北工业大学 A kind of robot three-dimensional shape recognition process based on multi-camera Vision Fusion
CN113688980A (en) * 2020-05-19 2021-11-23 深圳忆海原识科技有限公司 Brain-like visual neural network with forward learning and meta learning functions
CN113375676A (en) * 2021-05-26 2021-09-10 南京航空航天大学 Detector landing point positioning method based on impulse neural network
WO2022253229A1 (en) * 2021-06-04 2022-12-08 北京灵汐科技有限公司 Synaptic weight training method, target recognition method, electronic device, and medium
CN115880324A (en) * 2021-09-28 2023-03-31 南京理工大学 Battlefield target image threshold segmentation method based on pulse convolution neural network
WO2023083121A1 (en) * 2021-11-09 2023-05-19 华为技术有限公司 Denoising method and related device
CN114594768A (en) * 2022-03-03 2022-06-07 安徽大学 Mobile robot navigation decision-making method based on visual feature map reconstruction
CN115147456A (en) * 2022-06-29 2022-10-04 华东师范大学 Target tracking method based on time sequence adaptive convolution and attention mechanism
CN115631343A (en) * 2022-09-22 2023-01-20 广东人工智能与先进计算研究院 Image generation method, device and equipment based on full pulse network and storage medium
CN116295415A (en) * 2023-03-02 2023-06-23 之江实验室 Map-free maze navigation method and system based on pulse neural network reinforcement learning
CN116394264A (en) * 2023-06-07 2023-07-07 安徽大学 Group coding impulse neural network-based multi-mechanical arm cooperative motion planning method

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
KATERINA MARIA OIKONOMOU; IOANNIS KANSIZOGLOU; ANTONIOS GASTERATOS: "A Hybrid Reinforcement Learning Approach With a Spiking Actor Network for Efficient Robotic Arm Target Reaching", IEEE ROBOTICS AND AUTOMATION LETTERS *
丁建川: "基于脉冲神经网络的机器人避障导航方法研究", 中国优秀硕士学位论文全文数据库 *
于乃功;李倜;方略;: "基于直接强化学习的面向目标的仿生导航模型", 中国科学:信息科学, no. 03 *
周云: "A Simple Stochastic Neural Network for Improving Adversarial Robustness", 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME) *
张梦炜: "基于局部变量STDP的脉冲神经网络在线监督学习算法研究", 中国优秀硕士学位论文全文数据库 *
罗超: "云移动机器人语义地图构建方法研究", 中国优秀硕士学位论文全文数据库 *

Also Published As

Publication number Publication date
CN116989800B (en) 2023-12-15

Similar Documents

Publication Publication Date Title
CN109711529B (en) Cross-domain federated learning model and method based on value iterative network
Karkus et al. Qmdp-net: Deep learning for planning under partial observability
Cao et al. Target search control of AUV in underwater environment with deep reinforcement learning
CN112119409A (en) Neural network with relational memory
Zheng et al. Parameter identification of nonlinear dynamic systems using an improved particle swarm optimization
CN112819253A (en) Unmanned aerial vehicle obstacle avoidance and path planning device and method
Xie et al. Learning with stochastic guidance for robot navigation
CN115860107B (en) Multi-machine searching method and system based on multi-agent deep reinforcement learning
CN116700327A (en) Unmanned aerial vehicle track planning method based on continuous action dominant function learning
CN114355915B (en) AGV path planning based on deep reinforcement learning
Yang et al. Research and Application of Visual Object Recognition System Based on Deep Learning and Neural Morphological Computation
Othman et al. Deep reinforcement learning for path planning by cooperative robots: Existing approaches and challenges
Prasetyo et al. Spatial Based Deep Learning Autonomous Wheel Robot Using CNN
Liu et al. A hierarchical reinforcement learning algorithm based on attention mechanism for uav autonomous navigation
CN113232016A (en) Mechanical arm path planning method integrating reinforcement learning and fuzzy obstacle avoidance
Ejaz et al. Autonomous visual navigation using deep reinforcement learning: An overview
CN116989800B (en) Mobile robot visual navigation decision-making method based on pulse reinforcement learning
Bogdan et al. Toward enabling automated cognition and decision-making in complex cyber-physical systems
Wang et al. Behavioral decision-making of mobile robot in unknown environment with the cognitive transfer
Zhang et al. Mobile robot localization based on gradient propagation particle filter network
CN111221340B (en) Design method of migratable visual navigation based on coarse-grained features
Reinhart Reservoir computing with output feedback
Qin et al. A path planning algorithm based on deep reinforcement learning for mobile robots in unknown environment
Liu et al. Hierarchical Reinforcement Learning Integrating With Human Knowledge for Practical Robot Skill Learning in Complex Multi-Stage Manipulation
Ma et al. Collaborative planning algorithm for incomplete navigation graphs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant