CN115994563B

CN115994563B - Brain-like situation learning model construction and training method for intelligent auxiliary driving

Info

Publication number: CN115994563B
Application number: CN202211372963.5A
Authority: CN
Inventors: 杨双鸣; 周羿霏; 唐馨怡; 于改英; 杨嘉禾; 邹凯雯
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2023-08-18
Anticipated expiration: 2042-10-31
Also published as: CN115994563A

Abstract

The invention discloses a brain-like scenario learning model construction and training method for intelligent auxiliary driving, and relates to the technical field of intelligent auxiliary driving. The construction method comprises the following steps: constructing neurons of a pulse neural network by adopting an LIF neuron model; setting membrane voltages of the neurons in different states by adopting a state function equation; introducing a synaptic model into the impulse neural network by adopting STDP learning rules based on a Hubby rule; and constructing a brain-like scene learning model based on the impulse neural network according to the controller unit, the synaptic beam and the synaptic model. The invention learns the rewarding association between stimulus and reaction in the situation learning task through the reinforcement learning mechanism, thereby not only saving the cost, but also obviously reducing the power consumption.

Description

Brain-like situation learning model construction and training method for intelligent auxiliary driving

Technical Field

The invention relates to the technical field of intelligent auxiliary driving, in particular to a brain-like situation learning model construction and training method for intelligent auxiliary driving.

Background

Intelligent driving refers to a technique in which a robot assists a person in driving, and in special cases, replaces the person in driving. Intelligent driving is an important component of strategically emerging industry as an important product of industrial revolution and informatization, is an important branch in the present artificial intelligence era, and is likely to become a next-generation intelligent terminal. Currently, unmanned developments have two paths, mainly with ADAS and artificial intelligence, respectively. The intelligent driving system of the L1-L3 level mainly takes ADAS (advanced driving assistance system) as a dominant, and the core technology is an automatic control system. With the continuous perfection and development of ADAS functions and technologies, intelligent driving can be realized under a perfect matched service system based on rich whole vehicle manufacturing experience.

Among them, the ADAS function needs to rely on neural networks for real-time operation and learning. Impulse neural networks are more similar to biological neurons than traditional artificial neural networks and have higher implementation efficiency due to the nature of event-based, asynchronous processing. However, since the activation of the traditional supervised learning approach is of a non-trivial event-based nature, it cannot be transplanted onto impulse neural networks.

Therefore, there is an urgent need for a network model that can adapt a learning method to a impulse neural network.

Disclosure of Invention

The invention aims to provide a brain-like situation learning model construction and training method for intelligent auxiliary driving, which is capable of saving cost and remarkably reducing power consumption by learning rewarding association between stimulus and reaction in a situation learning task through a reinforcement learning mechanism.

In order to achieve the above object, the present invention provides the following solutions:

in a first aspect, the present invention provides a method for constructing a brain-like situation learning model for intelligent driving assistance, including:

constructing neurons of a pulse neural network by adopting an LIF neuron model;

setting membrane voltages of the neurons in different states by adopting a state function equation;

introducing a synaptic model into the impulse neural network by adopting STDP learning rules based on a Hubby rule;

a pulsed neural network is constructed based on the controller unit, the synaptic beam and the synaptic model.

Further, the LIF neuron model is specifically:

wherein ,τ_m Time constant for neuronal membrane; u (u) _reset Is the resting potential of the neuron membrane, i.e. the membrane voltage value of the neuron in a resting state; u (u) _m The voltage value of the film at the moment t; i (t) is the input signal to the LIF neuron.

Further, the state function equation is specifically:

wherein ,membrane voltage at the nth moment of the jth neuron; u (u) _reset Is the resting potential of the neuronal membrane;membrane voltage at the (n-1) th moment of the jth neuron; w [ i ]][j]A weight value for the interconnection between the neuron i and the neuron j; a is that _i An input value for the ith neuron; />A direct input value that replaces a single peak at the pulse rate; u (u) _l Is the leakage value of the membrane potential.

Further, the method for introducing a synaptic model into the impulse neural network by adopting STDP learning rules based on the Hubby law specifically comprises the following steps:

simplifying the STDP learning rule based on the Hubbard rule to obtain a simplified STDP learning rule; the simplified operation comprises a reduction multiplier and an exponential function on the basis of preserving time sequence association;

based on the simplified STDP learning rule, inhibitory static synapses and excitatory plastic synapses are introduced in the impulse neural network.

Further, the simplified STDP learning rule specifically includes:

wherein Δw represents a correction amount of the synaptic weight; w represents the current synaptic weight; wmax and Wmin represent the maximum and minimum synaptic weights respectively; Δt is the difference in pulse arrival times between pre-and post-synapses.

Further, based on the simplified STDP learning rule, an inhibitory static synapse and an excitatory plastic synapse are introduced in the impulse neural network, specifically:

where Δt is the difference in pulse arrival times between pre-and post-synapses; τ _w A time constant representing a synaptic weight update procedure; τ -, τ+ represent the time windows in which the strengthening and weakening phenomena occur, respectively; w represents the current synaptic weight; wmax and Wmin represent the set maximum and minimum synaptic weights, respectively.

Further, the construction of the impulse neural network based on the controller unit, the synaptic beam and the synaptic model specifically comprises:

constructing a connection relationship between neurons of the impulse neural network based on the synaptic beam and the synaptic model;

constructing a impulse neural network based on the controller unit and a synaptic crossover core; the synaptic crossover core is a crossover network formed by the connection relations between neurons of the impulse neural network.

Further, the constructing a connection relationship between neurons of the impulse neural network based on the synaptic beam and the synaptic model specifically includes:

all neurons in the impulse neural network are connected by adopting a synaptic beam;

constructing a WTA network in a pulsed neural network based on the synaptic model; the impulse neural network comprises an input layer, a hidden layer and an output layer; the hidden layer and the output layer are WTA networks;

connecting the neurons of each layer with the neurons of the previous layer in the input layer, the hidden layer and the output layer through an excited plastic synapse;

in the hidden layer and the output layer, the winner of each layer is connected with the neighboring neurons of the winner of the layer through the inhibitory static synapses.

Further, the controller unit includes a scheduler, a behavior pattern block, a playback pattern block, a history sequence module, and an initialization synapse module;

the scheduler is respectively connected with the behavior mode block, the playback mode block and the initialization synaptic module, and is used for generating a control signal according to the received external control instruction and sending the control signal to the playback mode block and the initialization synaptic module; and receiving a first feedback signal of the behavior mode block and a second feedback signal of the playback mode block, controlling a synaptic crossover core according to the first feedback signal and the second feedback signal; the pulse signal represents a discrete pulse sequence at a certain moment;

the behavior mode block is respectively connected with the scheduler and the history sequence module, and is used for receiving pulse signals of the scheduler and providing input for the behavior stage of the neuron according to the pulse signals; for transmitting a first feedback signal to the scheduler; and a module for transmitting the activity sequence of neurons to the history sequence module;

the playback mode block is connected with the dispatcher and the history sequence module, and is used for receiving a pulse signal of the dispatcher and a third feedback signal of the history sequence module and providing input for a playback stage of the neuron according to the pulse signal and the third feedback signal; and for sending a second feedback signal to the scheduler;

the history sequence module is connected with the behavior pattern block and the playback pattern block, and is used for receiving the activity sequence of the behavior pattern block and storing the activity sequence of two neuron target moments; a third feedback signal for transmitting the stored active sequence of the two neuron target moments to the playback mode block;

the initialization synaptic module is respectively connected with the scheduler and the synaptic crossover core, and is used for receiving a control signal of the scheduler and performing initialization weight operation on the synaptic crossover core according to the control signal.

Further, the initialization synapse module generates an initialization weight by adopting a linear feedback shift register; the initialization weight is (W) _Max -W _Min ) A positive random number around/2.

In a second aspect, the present invention provides a method for constructing a brain-like situation learning model for intelligent driving assistance, including:

constructing a sample data set; the sample dataset includes eight distinct triples; the triplet includes: context, location, and item;

inputting the sample data set into a brain-like scenario learning model based on a pulse neural network to perform context-related task reinforcement learning training, so as to obtain a trained brain-like scenario learning model based on the pulse neural network; the context-dependent task reinforcement learning training includes both behavior and playback modes.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the pulse neural network of the invention sets membrane voltages in different states according to the LIF neuron model, adopts a WTA network model and a simplified STDP learning rule, does not use high cost functions such as indexes or multipliers and the like for neurons and synapses, can learn rewarding association between stimulus-reactions in a situation learning task through a reinforcement learning mechanism, is a pulse neural network architecture driven by a multiplicative event based on a reinforcement learning algorithm, can be used for context-related tasks, and can not only save cost, but also remarkably reduce power consumption.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of a brain-like situation learning model for intelligent driving assistance according to an embodiment of the present invention;

FIG. 2 is a state transition diagram of the LIF neuron model of the present invention;

FIG. 3 is a graph illustrating membrane voltage for different states of LIF neurons of the present invention;

FIG. 4a is a graph showing the comparison of the time difference of pulse arrival under the original rule of the synaptic modifier of three neurons with different initial weights according to the present invention;

FIG. 4b is a graph showing the correction versus pulse arrival time difference for the original rule and the simplified STDP rule;

FIG. 5 is a framework diagram of a brain-like situation learning model of the base impulse neural network of the invention;

FIG. 6 is a schematic illustration of the stimulus combination of the present invention used in an animal context-dependent task reinforcement learning experiment;

FIG. 7 is a schematic representation of replay operations in an animal context-dependent task reinforcement learning experiment in accordance with the present invention; wherein (a) a sequence of actions awarded for the replay: (b) replaying the action sequence diagram which is not rewarded;

FIG. 8 is a graph showing the timing of firing of three layers of neurons awarded in accordance with the present invention; wherein, (a) is a neuron discharge situation map corresponding to fig. 7 (a); (b) is a neuron discharge situation map corresponding to fig. 7 (a).

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention aims to provide a brain-like scene learning model construction method for intelligent auxiliary driving, which is capable of saving cost and remarkably reducing power consumption by learning rewarding association between stimulus and reaction in a situation learning task through a reinforcement learning mechanism.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

As shown in FIG. 1, the invention models the hippocampal network involved in reinforcement learning in the context-dependent tasks of animals by constructing a brain-like scenario learning model based on a pulsed neural network.

The impulse neural network of the present invention mimics the CA1 region in the hippocampus, comprising an input (sensory) layer, a hidden (hippocampal) layer and an output (motor) layer, comprising a total of 16 neurons (6 inputs, 8 hidden and 2 outputs), 64 plastic excitatory synapses (synapses between 6×8 input and hidden layers, synapses between 8×2 hidden and output layers), 58 inhibitory non-plastic synapses (56 hidden layers, 2 output layers) and a controller unit.

Wherein the input layer comprises 6 neurons and provides context-location information, and the adaptive unidirectional excitation weights connect all these input neurons to all hippocampal neurons (solid line); the hidden layer includes 8 hippocampal neurons with inhibitory linkages between them (dashed lines), but without self-inhibition, and a large number of plastic excitatory synapses connect all of the hippocampal neurons with output layer neurons (solid lines). The output layer includes 2 neurons, which are also inhibitory connections themselves, similar to the hippocampal layer. The sensory layer and the output layer are WTA networks.

In constructing a brain-like scenario learning model based on a pulsed neural network, the heubias law (Hebbian) is used as a basic learning mechanism, modeled by pulse timing-dependent plasticity rules (spike-timing-dependent plasticity, STDP) in combination with active playback.

The learning method of the context-dependent task is reinforcement learning, and aims to check the behavior generated by the main neurons of the hippocampus according to the environmental attribute in the training process and carry out corresponding rewards. Finally, the "cost function" in the impulse neural network is updated by going through a series of state-behaviors. Two phases are involved in the training process: a behavior phase and a replay phase.

Based on this, the embodiment of the invention provides a brain-like scenario learning model construction method for intelligent auxiliary driving, which comprises the following steps:

step 100: and constructing neurons of the impulse neural network by adopting an LIF neuron model.

The step 100 specifically includes:

the core of neurons of the impulse neural network is constructed based on leaky integral-and-fire (LIF) neurons.

The LIF neuron model specifically comprises:

Step 200: and setting the membrane voltages of the neurons in different states by adopting a state function equation.

The step 200 specifically includes:

as shown in fig. 2-3, the different cores of the neuron outputs depend on the state function equation they employ, so that in the actual learning process, each execution may enter a different state and generate a different membrane voltage output according to the different inputs.

The state function equation is specifically:

Specifically, in a resting state, where neurons are inactive, the active neurons are seen as approximately absent, the membrane voltage is set to a resting potential.

In the state of integration,from the membrane voltage at time n-1->Synaptic input A _i Weight A of (2) _i w[i][j]And a possible direct input potential +.>The sum of w [ i ]][j]Weights for the interrelation between neuron i and neuron j, < >>The LIF neurons after the integration state can transition to either the waiting state or the discharging state depending on the membrane voltage value, which is a direct input value at a pulse rate instead of a single peak.

In the waiting state of the vehicle, the vehicle is in a waiting state,from the membrane voltage at time n-1->Subtracting the leakage value u of the membrane potential _l ，u _l When the membrane voltage does not reach the threshold value, the neuron cells automatically leak from the membrane voltage, and the potential tends to return to the resting potential.

In the discharge state, the membrane potential exceeds its threshold u _th When a peak discharge is generated, the neuron no longer receives the stimulus to maintain the resting potential, and thus the membrane potential returns to the resting value, and the neuron enters a waiting state again.

Step 300: and introducing a synaptic model into the impulse neural network by adopting STDP learning rules based on a Hubby rule.

The step 300 specifically includes:

step 301: simplifying the STDP learning rule based on the Hubbard rule to obtain a simplified STDP learning rule; the simplified operation includes a reduction multiplier and an exponential function on a timing correlation basis.

In order to implement the synaptic model, i.e. digital synapse, on hardware, the original learning rules need to be adapted accordingly. Because the original rule uses more exponential functions and multipliers, silicon with larger area is needed; meanwhile, the synaptic weight correction quantity has a larger relation with the previous synaptic weight. Accordingly, the corresponding adaptation rules require the use of reduced multipliers and exponential functions on the basis of preserving timing correlations.

The simplified STDP learning rule specifically comprises the following steps:

the comparison of the pulse arrival time difference of the simplified STDP learning rule and the original STDP rule (w=0.5) is shown in fig. 4 b.

For implementing digital synapse model calculationsThe method is as follows: each time the scheduler sends a signal to initiate a learning phase (elearning=1), the synapse is eligible for adaptation. The amount of synaptic correction is calculated from the pulse arrival time difference whenever the synapse meets the learning condition (i.e., scheduler=true). Before entering the algorithm, initializing the synaptic weights is an essential step; the initial weight of the excitatory plastic synapse in the model is (W _Max -W _Min ) Positive values around/2. To generate such random values, a plurality of connected configurable linear feedback shift register blocks (LFSRs) are selected.

Step 302: based on the simplified STDP learning rule, inhibitory static synapses and excitatory plastic synapses are introduced in the impulse neural network.

In particular, the synaptic model is another important component of the impulse neural network. There are two synaptic models in neural networks: inhibitory static synapses and excitatory plastic synapses;

the first type of synapses (inhibitory static synapses) provide negative effects on membrane potential while providing strong lateral inhibition between neurons, the weights of which do not change with the operation of the neural network;

the second type of synapse (excitatory plastic synapse) is able to strengthen the mode voltage of the postsynaptic neuron. The intensity of these synapses may be changed in replay mode according to the STDP learning algorithm.

Wherein, the dynamic change process of the plastic synapse is based on STDP rule. This rule varies the weights according to the time difference of arrival of the pre-and post-synaptic pulses. If the presynaptic pulse arrives a few microseconds ahead of the postsynaptic pulse, the highlighting weight is increased, which results in a positive time difference corresponding to a long-term enhancement of the synapse (long-term potentiation, LTP); conversely, a negative time difference results, corresponding to long-term inhibition (LTD) of the synapse.

The specific calculation formula of the change of the synaptic weight is as follows:

as shown in fig. 4a, the change of the synaptic weight correction of three neurons with different initial values to the pulse arrival time difference enhances the amplitude three times as much as the suppression amplitude.

Step 400: and constructing a brain-like scene learning model based on the impulse neural network according to the controller unit, the synaptic beam and the synaptic model.

The step 400 specifically includes:

step 401: and constructing a connection relationship between neurons of the impulse neural network based on the synaptic beam and the synaptic model.

Specifically, as shown in fig. 5, the step 401 specifically includes:

a: all neurons in the impulse neural network are connected by adopting a synaptic beam and are arranged in a row; in the synaptic beam, excitatory plastic synapses and inhibitory static synapses are represented by grey and black circles, respectively.

B: constructing a winner-take-all (WTA) network in a pulsed neural network based on the synaptic model; the impulse neural network comprises an input layer, a hidden layer and an output layer; the hidden layer and the output layer are both WTA networks.

C: in the input layer, the hidden layer and the output layer, neurons of each layer are connected with neurons of the previous layer through excitatory plastic synapses to amplify local activities thereof.

D: in the hidden layer and the output layer, the winner of each layer is connected with the neighboring neurons of the winner of the layer through the inhibitory static synapses.

Step 402: based on the controller unit and the synaptic crossover core, constructing a brain-like scenario learning model based on a pulse neural network; the synaptic crossover core is a crossover network formed by the connection relations between neurons of the impulse neural network.

The invention realizes the management between sequences based on the controller core; the controller unit is responsible for controlling the behavior of the system, the sequence of states, data storage and preparing initial weights for synapses.

Wherein the controller unit includes a scheduler, a behavior pattern block, a playback pattern block, a history sequence block, and an initialization synapse module.

The scheduler module controls the network sequence; the behavior mode block controls the input of the behavior stage SNN; the playback mode block manages the network parameters of the learning and playback phases; the history sequence module stores a neuron activity sequence; the initialization synapse module prepares initial weights for synapses.

Specifically, the scheduler is respectively connected with the behavior mode block, the playback mode block and the initialization synapse module, and is used for generating a control signal according to a received external control instruction and sending the control signal to the playback mode block and the initialization synapse module; and receiving a first feedback signal of the behavior mode block and a second feedback signal of the playback mode block, controlling a synaptic crossover core according to the first feedback signal and the second feedback signal; the pulse signal represents a discrete pulse sequence at a certain moment;

the behavior mode block is respectively connected with the scheduler and the history sequence module and is used for receiving pulse signals of the scheduler and providing input for the behavior stage of the neuron according to the pulse signals; for transmitting a first feedback signal to the scheduler; and a module for transmitting the activity sequence of neurons to the history sequence module;

the playback mode block is connected with the scheduler and the history sequence module, and is used for receiving a pulse signal of the scheduler and a third feedback signal of the history sequence module, and providing input for a playback stage of the neuron according to the pulse signal and the third feedback signal; and for sending a second feedback signal to the scheduler;

the history sequence module is connected with the behavior pattern block and the playback pattern block and is used for receiving the activity sequence of the behavior pattern block and storing the activity sequence of two neuron target moments; a third feedback signal for transmitting the stored active sequence of the two neuron target moments to the playback mode block;

the initialization synaptic module is respectively connected with the scheduler and the synaptic crossover core, and is used for receiving a control signal of the scheduler and performing initialization weight operation on the synaptic crossover core according to the control signal. All initial suppression and excitation values for synaptic weights are generated using a Linear Feedback Shift Register (LFSR) block. A control signal is obtained from the scheduler and initial weights are sent to the synaptic crossover cores.

Based on the above, the invention also discloses a brain-like scenario learning model training method for intelligent auxiliary driving, which comprises the following steps:

step A: constructing a sample data set; the sample dataset includes eight distinct triples; the triplet includes: context, location, and item.

The step A specifically comprises the following steps: as shown in fig. 6, two contexts A, B are set in the experiment, each with two positions (positions 1 and 2), thus yielding 4 spatial positions A1, A2, B1, B2, e.g. A1 refers to position 1 in context a; in each trial, two "cans" (items X and Y) were randomly placed in two different locations; filling these "cans" with different materials, only one of which contains a prize, such as in context a, item X contains a prize and in context B, item Y contains a prize; thus, in the model of this experiment, due to the location and context of items X and Y, eight different triples appear: the bonus group includes A1X, A2X, B1Y and B2Y, and the non-bonus group includes A1Y, A2Y, B1X and B2X. (triplet A1X indicates that in context A, item X is located at position 1).

And (B) step (B): inputting the sample data set into a brain-like scenario learning model based on a pulse neural network to perform context-related task reinforcement learning training, so as to obtain a trained brain-like scenario learning model based on the pulse neural network; the context-dependent task reinforcement learning training includes both behavior and playback modes.

The step B specifically comprises the following steps: at the beginning of the experiment, all neurons were in a resting state. The initialized synaptic kernels provide random initial weights for the plastic synapses, while constructing WTA networks by assigning stronger suppression values to the corresponding synaptic connection points. Each test of the network was started with a random input stimulus value and repeated 100 times.

Each test consisted of two different modes: behavior mode and playback mode. As shown in fig. 7 and 8. FIG. 7 is a schematic representation of replay operations in an animal context-dependent task reinforcement learning experiment of the present invention, illustrating how the learning mechanism works on forward and reverse orders. FIG. 8 is a timing diagram of firing of three layers of neurons awarded in accordance with the present invention.

In fig. 7 (a), when a certain tri-state group triggers the rewarded action "dig", the first layer (corresponding to neurons 1 and 5 in the figure), the second layer (corresponding to neurons 8 in the figure) and the third layer of neurons are sequentially discharged (corresponding to neurons 15 in the figure). By LTP rules, this order of forward firing of neurons may enhance synaptic weights.

Fig. 7 (b) is a sequence diagram of replaying an action which is not rewarded, and when a certain tri-state group triggers an action "dig" which is not rewarded, the third layer (corresponding to the 15 th neuron in the figure), the second layer (corresponding to the 10 th neuron in the figure) and the first layer of neurons are sequentially discharged (corresponding to the 1 st and 6 th neurons in the figure). By LTD rules, this order of neuronal reverse firing may cut down synaptic weights.

Fig. 8 (a) is a diagram of a neuron firing situation corresponding to fig. 7 (a), in which the vertical axis corresponds to the neuron number, the horizontal axis represents the firing time, and the presynaptic neuron firing is described as preceding the postsynaptic neuron firing, i.e., Δt >0.

Fig. 8 (b) corresponds to the graph of the neuronal firing situation shown in fig. 7 (b), describing the situation where the pre-synaptic neuronal firing is later than the post-synaptic neuronal firing, i.e. Δt <0.

When the controller unit receives the control signal run, the whole experiment is started; a single test is initiated upon receipt of the start real signal. The scheduler receives the pulse signal from the outer layer of the neural network.

At the beginning of each trial, the scheduler enables the behavior pattern; the scheduler then switches to playback mode after obtaining events from the "dig" output neurons based on pulses at the network's outer layer. The scheduler also assigns a select signal to the multiplexer, providing an input voltage to the neurons in each mode.

The operation of the network in two modes is described as follows:

behavior pattern: the network starts with randomly selected triplets (each triplet having an accompanying complementary combination of context, location and item, e.g., A2Y is a complementary triplet of A1X). Under the influence of synaptic weights, a behavioral phase may include multiple actions between triplets (if "move" output neurons are active) and eventually end with mining one of them (if "dig" output neurons are active). After each move or dig, the states of all neurons are sampled in a history sequence. The aim is to store a maximum of 2 up-to-date historical action sequences (stimulus-response pairs) at the end of the behavioral phase.

Playback mode: the scheduler triggers a control signal E-learning to the plastic burst to qualify it for learning. The playback unit provides the appropriate inputs to all neurons based on the rewards obtained and the action sequences obtained by the sampling. The sequence is replayed within a specific time window. The awarded action sequence is replayed in forward timing: such playback enhances the relevant synapses, encouraging the awarded action sequences to be selected in the future, in accordance with STDP learning rules. The sequence of un-rewarded actions is replayed in reverse timing: this suppresses the non-rewarded action sequences from being selected in the future, according to the STDP learning rules.

In summary, the pulse neural network of the invention sets membrane voltages in different states according to the LIF neuron model, adopts a WTA network model and a simplified STDP learning rule, does not use high cost functions such as indexes or multipliers and the like for neurons and synapses, can learn rewarding association between stimulus-response in a situation learning task through a reinforcement learning mechanism, is a pulse neural network architecture driven by a multiplicative event based on a reinforcement learning algorithm, and can be used for context-related tasks.

The invention firstly realizes the impulse neural network on the Xilinx Kintext-7FPGA equipment, then uses the flashing LEDs at different heights to define the stimulation triplets, and then connects the network to the robot vehicle through the perception network layer-the sea horse layer-the motion output layer; the vehicle can not only distinguish between different stimulus triples, but also learn the rewards association between the item and the context, successfully verifying the performance of the network on hardware independent of the location of the item in the environment. The network architecture achieves higher efficiency and significantly reduces cost, while also facilitating research into hardware-implemented impulse neural networks using reinforcement learning algorithms.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.

The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, the present disclosure should not be construed as limiting the invention.

Claims

1. The brain-like scenario learning model construction method for intelligent auxiliary driving is characterized by comprising the following steps of:

constructing neurons of a pulse neural network by adopting an LIF neuron model;

according to the controller unit, the synaptic beam and the synaptic model, a brain-like scene learning model based on a pulse neural network is constructed;

the controller unit comprises a scheduler, a behavior mode block, a playback mode block, a history sequence module and an initialization synapse module;

the initialization synaptic module is respectively connected with the scheduler and the synaptic crossover core, and is used for receiving a control signal of the scheduler and performing initialization weight operation on the synaptic crossover core according to the control signal; the initialization synaptic module adopts a linear feedback shift register to generate an initialization weight; the initialization weight is (W) _Max -W _Min ) A positive random number around/2; w represents the current synaptic weight; w (W) _Max 、W _Min Respectively representing the maximum value and the minimum value of the set synaptic weight;

firstly, the impulse neural network is realized on Xilinx Kintext-7FPGA equipment, then, flashing LEDs at different heights are used for defining a stimulation triplet, and then, the network is connected to a robot vehicle through a perception network layer, a sea horse layer and a motion output layer; the vehicle can not only distinguish between different stimulus triples, but also learn rewards associations between items and contexts, independent of the location of the items in the environment.

2. The method for constructing brain-like scenario learning model for intelligent driving assistance according to claim 1, wherein,

the LIF neuron model specifically comprises:

3. The brain-like scenario learning model construction method for intelligent driving assistance according to claim 1, wherein the state function equation is specifically:

wherein ,membrane voltage at the nth moment of the jth neuron; u (u) _reset Is the resting potential of the neuronal membrane; />Membrane voltage at the (n-1) th moment of the jth neuron; w [ i ]][j]A weight value for the interconnection between the neuron i and the neuron j; a is that _i An input value for the ith neuron; />A direct input value that replaces a single peak at the pulse rate; u (u) _l Is the leakage value of the membrane potential.

4. The method for constructing a brain-like scenario learning model for intelligent driving assistance according to claim 1, wherein the step of introducing a synaptic model into the impulse neural network by using STDP learning rules based on a heuristics rule comprises:

5. The method for constructing a brain-like scenario learning model for intelligent driving assistance according to claim 4, wherein the simplified STDP learning rule specifically comprises:

wherein Δw represents a correction amount of the synaptic weight; w represents the current synaptic weight; w (W) _Max 、W _Min Respectively representing the maximum value and the minimum value of the set synaptic weight; Δt is the difference in pulse arrival times between pre-and post-synapses.

6. The method for constructing a brain-like scenario learning model for intelligent driving assistance according to claim 4, wherein the method for introducing inhibitory static synapses and excitatory plastic synapses into the impulse neural network based on the simplified STDP learning rule is as follows:

where Δt is the difference in pulse arrival times between pre-and post-synapses; τ _w A time constant representing a synaptic weight update procedure; τ _- 、τ ₊ Respectively representing time windows in which strengthening and weakening phenomena occur; w represents the current synaptic weight; w (W) _Max 、W _Min Respectively representing the maximum value and the minimum value of the set synaptic weights.

7. The method for constructing a brain-like scenario learning model for intelligent driving assistance according to claim 4, wherein the constructing a brain-like scenario learning model based on a pulse neural network according to a controller unit, a synaptic beam and the synaptic model specifically comprises:

based on the controller unit and the synaptic crossover core, constructing a brain-like scenario learning model based on a pulse neural network; the synaptic crossover core is a crossover network formed by the connection relations between neurons of the impulse neural network.

8. The method for constructing a brain-like scenario learning model for intelligent driving assistance according to claim 7, wherein the constructing a connection relationship between neurons of the impulse neural network based on the synaptic beam and the synaptic model specifically comprises:

9. The method for constructing a brain-like scenario learning model for intelligent driving assistance according to claim 1, wherein a sample data set is constructed; the sample dataset includes eight distinct triples; the triplet includes: context, location, and item;