CN110298440B

CN110298440B - Multi-scale target-oriented navigation method based on cortical pillar network wave-front propagation

Info

Publication number: CN110298440B
Application number: CN201910268918.7A
Authority: CN
Inventors: 阮晓钢; 武悦; 黄静; 柴洁
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2019-04-04
Filing date: 2019-04-04
Publication date: 2021-07-23
Anticipated expiration: 2039-04-04
Also published as: CN110298440A

Abstract

The invention discloses a multiscale target-oriented navigation method based on cortical pillar network wave-front propagation, which belongs to the field of bionic navigation. The invention is based on the cortical pillar network and utilizes the wave-front propagation algorithm to carry out path planning and navigation. Through the learning law of the network weight STDP, the system can calculate the shortest path to the target and memorize the change of the environment. The wave front propagation adopts an integrated excitation type neuron, belongs to a non-attenuation navigation algorithm, and is suitable for navigation in various scale environments. In a navigation experiment, the method can successfully navigate in environments with various scales and real-time changes, and has a good practical effect.

Description

Multi-scale target-oriented navigation method based on cortical pillar network wave-front propagation

Technical Field

The invention belongs to the field of bionic navigation of robots, belongs to one branch of target-oriented navigation of robots in recent years, relates to a neural network organization method, and particularly relates to a cortical pillar network organization method and a neuron activity propagation method.

Background

The intellectualization of robots is a great trend in the future. Environmental awareness and target-oriented navigation of robots are important parts of numerous intelligence and are hot and difficult points of current research. Inspired by the execution of spatial tasks in biological flexibility, current scientific research attempts to understand environmental awareness from a physiological perspective and apply the environmental awareness to robots, so that the robots have human or animal-like cognitive abilities, and thus are really applied to challenging spatial tasks such as automatic driving and the like. As cells reflecting spatial information or motion information, such as position cells, grid cells, and the like, are physiologically found, there have been considerable products applying these cells to biomimetic navigation. RatSlam inputs visual information into a hippocampus neural network, and the established map can be used for target-oriented navigation, so that the effect of the environment map established in the actual urban road is satisfactory. The navigation system is more bionic by using the real neuron organization structure and parameters as much as possible, and is easier to realize by using a basic computing unit, so that the requirement of equipment is reduced, and the utilization rate of equipment resources is improved. Martinet et al constructs a navigation system with a cortical pillar network, but the system can only operate in a single scale environment. Ponulak et al propose a wavefront transfer method between neurons, solving the scale problem of navigation. How to organize the artificial neural network structure and select the synaptic plasticity rule is the key for obtaining good performance of the bionic navigation system.

Different from the prior art, the invention constructs the cortical column network and the connection relation thereof, and applies the STDP (Spike-Timing Dependent reliability) learning law to synaptic connection, so that the navigation system can adapt to environments of different scales and has better simulation on rodent navigation.

Disclosure of Invention

The invention aims to provide application of a cortical pillar network in bionic navigation, which comprises the steps of constructing a cognitive map and navigating towards a target. The basic unit of the cortical pillar network is the cortical pillar, which includes location cells, reward cells, interneurons, action cells, and readout cells.

The technical scheme adopted by the invention is a multi-scale object-oriented navigation method based on cortical pillar network wavefront propagation, which is characterized by comprising the following concrete implementation steps:

and (1) constructing a cortical column network.

Step (1.1) dynamics of neurons;

the position cells in the cortical pillar unit adopt a Gaussian model V_s：

Wherein x represents the current position of the robot, x_cIs the location of the cell center, σ²Is the variance.

Other cell dynamics in the cortical column unit are as follows:

where V is membrane potential, τ is membrane time constant, and I is the input integration received by the neuron:

wherein w_ijIs the synaptic connection weight between neuron i to neuron j.

Step (1.2) activation and inhibition of neurons;

when the activity of the rewarding cell exceeds a threshold, the cell sends out an action potential and enters a state of inhibition for a period of time. During the suppression phase, the reward cells no longer receive input from other reward cells:

when t is_f＜t＜t_f+t_dWhen the above formula holds, V_rIs the membrane potential of the rewarding cell, t_dIs the length of neuronal inhibition, t_fIs the moment when the reward cells discharge.

Step (1.3) synapse learning among cortical pillar units;

reward cells r in cortical pillar units obey the STDP learning law:

Δw_ij＝(1-λ)·(w_sat-w_ji)-λw_ji

where, is the saturated synaptic connection weight, λ ═ 0, and 1 represents whether the robot successfully reaches the next position.

Synaptic learning among interneurons q in cortical pillar units follows the STDP learning law:

where M is the magnitude of the wave in which,

is the difference between the time of firing of the postsynaptic neuron and the time of firing of the presynaptic neuron. This time difference based synaptic learning between interneurons q ensures a machineThe person can record the shortest path to the destination location.

Step (1.3) action decision making;

each cortical column contains a set of action neurons d, representing different directions, which receive inputs from the neurons s, p, q. When the neuron d reaches a threshold value, the robot moves according to the information of the weight values stored between the neurons q, and the head orientation is the weighted average of the inherent head orientation of each neuron d and the corresponding weight value:

step (2) exploring the environment;

step (2.1) establishing a cortical column unit;

the robot was placed in a 5m x 13m maze and allowed to perform a random exploration strategy. The initial orientation of the robot is 0. The head orientation changes by an angle theta (-15 deg. < theta < 15 deg.) every time the robot walks 0.1 m. To ensure that every position in the environment is effectively represented by a position cell in the cortical bar unit, every time V is represented_s＜V_thrAnd when the current position is reached, a cortical pillar unit is added. V_thr-25mV is the threshold to determine whether a new cortical pillar unit is added. During migration, the weights between the reward cells change according to the LTP learning law (λ ═ 0).

Step (2.2) path planning;

the path planning relies on the neural computation function of the cortical pillar network to compute the overall neuronal activity at each discrete time according to the dynamical model. The cortical pillar network operates in a serial mode, i.e., the input of the neuron at the current time is the output of the neuron at the previous time multiplied by the weight. After the corresponding cortical pillar unit is established in the maze environment, the reward value is set at the target point, so that the reward cells in the cortical pillar unit at the target point receive a short activation input.

At present, the bionic navigation method can only be used in a small-scale map. Compared with the prior art, the method can be applied to the environment of any scale, the application scene of robot navigation is expanded, and the adopted method is closer to the physiological reality.

Drawings

Fig. 1 is a diagram of a cortical post network.

FIG. 2 is a diagram of a interneuron synapse STDP learning rule.

Fig. 3 is a path randomly explored by the robot.

Fig. 4 is a simulation of the torman experiment using this navigation method.

FIG. 5 is a comparison of the navigation success rate of the present method and the gradient method in a large scale environment.

Fig. 6 is a flow chart of the method implementation.

Detailed Description

A robot with learning ability, simulating rats in the torman maze experiment. Briefly described is a Turman maze experiment: the Tuermann maze is shown in FIG. 2, and has three paths with different lengths from the starting point to the ending point. And (3) placing food at the endpoint, placing the rat in the maze, searching the maze by the rat and finding the food, and finally selecting the middle road with the shortest distance by the rat after multiple experiments. The obstacle is placed on point a, and after several searches, the rat finally selects the left road with the shortest distance. Place the obstacle at point B, then the rat selects the right road. The Tolman maze experiment proposed the concept of the presence of cognitive maps in the rat brain. The invention realizes Tolman maze experiments by constructing cortical pillar networks in cerebral cortex. The robot has a pair of motors and encoders for travel and path integration, and a lidar for detecting distance to an obstacle.

The method comprises the following concrete implementation steps:

and (1) constructing a cortical column network.

Step (1.1) kinetics of neurons

The positional cells in the cortical column unit were modeled using gaussian:

wherein x represents the current position of the robot, x_cIs the location of the cell center, σ²0.5 is variance.

Other cell dynamics in the cortical column unit are as follows:

where V is membrane potential, τ -5 is membrane time constant, I is the input integration received by the neuron:

wherein w_ijIs the synaptic connection weight between neuron i to neuron j.

Step (1.2) activation and inhibition of neurons

when t is_f＜t＜t_f+t_dWhen the above formula holds, t_d0.2s is the neuron inhibition duration, t_fIs the moment when the reward cells discharge.

Step (1.3) synaptic learning between cortical pillar units

Reward cells r in cortical pillar units obey the STDP learning law:

Δw_ij＝(1-λ)·(w_sat-w_ji)-λw_ji

where is the saturated synaptic connection weight, λ ═ 0, and 1 represents whether the robot successfully reaches the next position. w is a_satThe saturation weight is 1.

where M1 is the amplitude, τ_ji＝3T＝0.06s，τ_ij2T 0.04s, T discrete intervals,

is the difference between the time of firing of the postsynaptic neuron and the time of firing of the presynaptic neuron. This time difference based synaptic learning between interneurons q ensures that the robot can record the shortest path to the target location.

Step (1.3) action decision

step (2) exploring the environment

Step (2.1) of establishing cortical pillar units

The robot was placed in a 5m x 13m maze and allowed to perform a random exploration strategy. The initial orientation of the robot is 0. The head orientation changes by an angle theta (-15 deg. < theta < 15 deg.) every time the robot walks 0.1 m. To ensure that every position in the environment is effectively represented by a position cell in the cortical bar unit, every time V is represented_s＜V_thrAnd when the current position is reached, a cortical pillar unit is added. V_thr-25mV is the threshold to determine whether a new cortical pillar unit is added. During the migration process, the weight between the reward cells changes according to the LTP learning law (lambda)＝0)。

Step (2.2) Path planning

The path planning relies on the neural computation function of the cortical pillar network, and at each discrete time, the overall neuron activity is computed according to the dynamic model, and the discrete time interval is taken to be 0.02 ms. The cortical pillar network operates in a serial mode, i.e., the input of the neuron at time n +1 is the output of the neuron at time n multiplied by a weight. After the corresponding cortical pillar unit is established in the maze environment, the reward value is set at the target point, and the reward cells in the cortical pillar unit at the target point receive an activation input with the duration of 0.02ms and the amplitude of 70 mV. The specific calculation steps are as follows:

s1 sets t to 0, activity V for neurons in all cortical pillar units C_iW is connected to-60 mV_ij＝0(i，j∈C)

S2 is according to

Calculating the activity of reward cells r in each cortical pillar unit

S3 is according to

Calculating the activity of the interneuron q in each cortical pillar unit

S4, calculating the weight connection between the intermediate neurons q:

w_ij＝w_ij+Δw_ij (0≤w_ij≤1)

s5 sets t to t +0.02, and if for the whole reward cells,

cortical pillar wavefront propagation is complete, proceeding to S6. Otherwise, go to S2 to calculate.

S6 calculates the moving direction according to the weights between the median neurons q:

there are 4 orientations in each cortical pillar unit, 0 °, ± 90 °,180 °, respectively. The robot travels 0.1m in the current direction of movement. In the process of advancing, if the distance between the laser radar and the obstacle is smaller than 0.3m, the laser radar stops moving, and the weight of the rewarding cells changes by delta w according to the LTD_ij＝-ws_jiAnd returns to S1 to re-perform the path planning calculation.

If the robot does not leave the current position at 5 continuous discrete moments, the robot is trapped at the local maximum position of the reward value, so that the robot randomly explores 11m according to the step (2.1), and then path planning calculation is carried out again according to the step (2.2).

If there is no obstacle, S5 is repeated until the robot moves to the target point.

Claims

1. The multi-scale object-oriented navigation method based on cortical pillar network wavefront propagation is characterized by comprising the following specific implementation steps of:

step (1), constructing a cortical column network;

step (1.1) dynamics of neurons;

the position cells in the cortical pillar unit adopt a Gaussian model V_s：

Wherein x represents the current position of the robot, x_cIs the location of the cell center, σ²Is the variance;

other cell dynamics in the cortical column unit are as follows:

wherein w_ijIs the synaptic connection weight between neuron i and neuron j;

step (1.2) activation and inhibition of neurons;

when the activity of the rewarding cells exceeds a threshold value, the cells send out action potentials and enter a suppression state for a period of time; during the suppression phase, the reward cells no longer receive input from other reward cells:

when t is_f＜t＜t_f+t_dWhen the temperature of the water is higher than the set temperature,

is established, V_rIs the membrane potential of the rewarding cell, t_dIs the length of neuronal inhibition, t_fIs the moment when the reward cell discharges;

step (1.3) synapse learning among cortical pillar units;

reward cells r in cortical pillar units obey the STDP learning law:

Δw_ij＝(1-λ)·(w_sat-w_ji)-λw_ji

wherein λ is a saturated synaptic connection weight, λ is 0, and 1 represents whether the robot successfully reaches the next position;

where M is the magnitude of the wave in which,

is the difference between the firing time of the postsynaptic neuron and the firing time of the presynaptic neuron; the synapse learning based on the time difference between the intermediate neurons q ensures that the robot can record the shortest path to the target position;

step (1.3) action decision making;

each cortical pillar comprises a group of action neurons d, which respectively represent different directions and receive input from the neurons s, p and q; when the neuron d reaches a threshold value, the robot moves according to the information of the weight values stored between the neurons q, and the head orientation is the weighted average of the inherent head orientation of each neuron d and the corresponding weight value:

step (2) exploring the environment;

step (2.1) establishing a cortical column unit;

placing the robot in a 5m x 13m maze and enabling the robot to execute a random exploration strategy; the initial orientation of the robot is 0 °; every time the robot walks by 0.1m, the head orientation changes by an angle theta, wherein theta is more than-15 degrees and less than 15 degrees; to ensure that every position in the environment is effectively represented by a position cell in the cortical bar unit, every time V is represented_s＜V_thrWhen the current position is not reached, a cortical column unit is added; v_thr-25mV is the threshold for determining whether new cortical pillar units are added; in the moving process, the weight values among the rewarding cells are changed according to the LTP learning rule, and lambda is 0;

step (2.2) path planning;

the path planning relies on the neural computation function of the cortical pillar network to compute the overall neuronal activity at each discrete time according to the dynamical model.

2. The method for multi-scale object-oriented navigation based on cortical pillar network wavefront propagation according to claim 1, wherein in step (2.2), the cortical pillar network operates in a serial mode, that is, the input of the neuron at the current time is the output of the neuron at the previous time multiplied by the weight; after the corresponding cortical pillar unit is established in the maze environment, the reward value is set at the target point, so that the reward cells in the cortical pillar unit at the target point receive a short activation input.