CN102723112A - Q learning system based on memristor intersection array - Google Patents

Q learning system based on memristor intersection array Download PDF

Info

Publication number
CN102723112A
CN102723112A CN2012101885732A CN201210188573A CN102723112A CN 102723112 A CN102723112 A CN 102723112A CN 2012101885732 A CN2012101885732 A CN 2012101885732A CN 201210188573 A CN201210188573 A CN 201210188573A CN 102723112 A CN102723112 A CN 102723112A
Authority
CN
China
Prior art keywords
state
memristor
switch
resistance
detection module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101885732A
Other languages
Chinese (zh)
Other versions
CN102723112B (en
Inventor
王丽丹
何朋飞
段书凯
钟宇平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University
Original Assignee
Southwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University filed Critical Southwest University
Priority to CN201210188573.2A priority Critical patent/CN102723112B/en
Publication of CN102723112A publication Critical patent/CN102723112A/en
Application granted granted Critical
Publication of CN102723112B publication Critical patent/CN102723112B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Manipulator (AREA)

Abstract

The invention discloses a Q learning system based on a memristor intersection array. The Q learning system comprises the memristor intersection array, and is characterized in that the system also comprises a read/write selective switch, a state selective switch, a column selective switch, a delay unit and a state detection module, wherein the read/write selective switch is used for controlling a read/write operation of the memristor intersection array; a state selective module is used for detecting current environment state st, and selecting corresponding row lines by the state selective switch; the column selective switch is used for selecting column lines corresponding to actions at when a certain memristor value of the memristor intersection array, namely a Q value, is updated; the delay unit is used for delaying time step of voltage of the selected column lines; and the state detection module is used for detecting current environment state and saving the last environment state. According to the Q learning system, a new circuit element, i.e. the memristor is successfully applied to reinforcement learning, so that the problem of a large quantity of storage spaces for reinforcement learning is solved, and a new thought is provided for the research on reinforcement learning in future.

Description

A kind of based on the Q learning system of recalling the resistance crossed array
Technical field
The present invention relates to a kind of storage matrix and intelligence learning algorithm.
Background technology
Intensified learning is a kind of senior intelligence learning algorithm, is widely used in field in intelligent robotics in recent years, becomes the focus of research.1954, Minsky proposed the intensified learning computation model of SNARCs.Then, Sutton has proposed AHC algorithm and TD learning algorithm in its PhD dissertation.Afterwards, people such as Watkins had proposed the classic algorithm-Q learning algorithm in the present intensified learning algorithm on the basis of TD learning algorithm, and the Q learning algorithm is an important milestone in the intensified learning evolution.After the Q learning algorithm proposed, Many researchers was applied to mobile robot's navigation, the scheduling of robot soccer system and intelligent I/O with the Q learning algorithm.But intensified learning also has the limitation of himself, and when problem was comparatively complicated, it needed a large amount of state-action storage spaces.1971, Chua proposed the 4th kind of circuit component-memristor (L.O.Chua.Memristor-the missing circuit element.IEEE Trans.Circuit Theory.1971,18 (5): 507-519.) according to the completeness theory of circuit.
2008, the memristor of first physics realization was successfully made in the HP laboratory, and after this memristor has caused concern widely.Memristor has nano-scale, nonlinear characteristic, and its resistance changes along with the variation of input stimulus, and this variation is non-volatile, so memristor is fit to be used for designing large scale memory very much.The memristor crossed array is a kind of in the memristor storer, it simple in structure, easy design.People such as Hu Xiaofang utilize the memristor crossed array realized the storage of image (Wang Lidan is etc. memristor crossed array and in Application in Image Processing for Hu Xiaofang, Duan Shukai. Chinese science F collects: information science .2011,41 (4): 500-512.).Because memristor has nano-scale; Therefore the memristor crossed array can be made large scale memory, can solve intensified learning when solving challenge, needs the problem of a large amount of state-action storage spaces; Therefore, utilize and to recall the resistance crossed array and realize that Q study is a kind of good selection.
The physical model of HP memristor is as shown in Figure 1, and memristor is made up of doped region and non-doped region two parts.Wherein w and D represent the width of doped region in the memristor and the overall width of memristor respectively.Its mathematical model is following:
M ( t ) = R ON w ( t ) D + R OFF ( 1 - w ( t ) D )
Wherein, R OFFAnd R ONRepresent that respectively w equals 0 when the D, the resistance of memristor.
dw ( t ) dt = μ V R ON D i ( t )
Here, μ vMoving of expression average ion, unit is cm 2s -1V -1
T w = Φ D V A R OFF 2 [ ( R ( w 0 ) ) 2 - ( R ( w ) ) 2 ]
Wherein,
Φ D = ( βD ) 2 2 μ v ( β - 1 )
Here, Tw is the pulse width of the pulse voltage at input memristor two ends, V ABe the amplitude of pulse, R (w 0) the initial resistance of expression memristor, the resistance that R (w) expression memristor can reach, β=R OFF/ R ON
As R (w 0) during smaller or equal to R (w), can obtain
R ( w ) = ( R ( w 0 ) ) 2 - V A T w R OFF 2 Φ D , R ON ≤ R ( w ) ≤ R OFF
Therefore, when Tw one timing, along with V AVariation, the resistance of memristor can change, and this variation is non-volatile.
The memristor memory circuit as shown in Figures 2 and 3.The circuit that writes data is as shown in Figure 2, and the circuit of sense data is as shown in Figure 3.When writing data, add a positive potential pulse to memristor, R (w) can reduce, so memristor can be remembered institute's making alive pulse.When sense data, the resistance of memristor is different, the V that obtains OutAlso different, V OutAnd having formed a corresponding relation between the resistance of memristor, therefore can correctly reflect the resistance size of memristor, also is the size of memristor storing value.
The resistance of memristor can change along with the variation of input stimulus, and this variation is non-volatile; Therefore, memristor has extraordinary storage characteristics.And memristor has nano-scale, is suitable for use in the large scale memory very much.And recall the resistance crossed array is exactly the example that a memristor is made storer.
The structure of recalling the resistance crossed array is as shown in Figure 4, and the circuit of each border circular areas representative is as shown in Figure 5.In Fig. 5, read write switch be the CS that writes data and sense data.When writing data for some memristors, switch connects the point on the left side, at this moment, and corresponding column rule input write data voltage V InWhen reading the data of some memristors, switch connects the point on the right, at this moment, and corresponding column rule input read data voltage V In, corresponding alignment output voltage V Out
Summary of the invention
The purpose of this invention is to provide a kind of Q of realization learning algorithm based on recall the resistance crossed array the Q learning system.
To achieve these goals, adopt following technical scheme: a kind of based on the Q learning system of recalling the resistance crossed array, comprise and recall the resistance crossed array that it is characterized in that: said system also comprises
The read-write SS: the read-write operation of resistance crossed array is recalled in control;
State selecting switch: state detection module detects current environment state s t,, select corresponding column rule through state selecting switch;
Column select switch: when needs to the Q value, also promptly some when recalling resistance and upgrading to what recall the resistance crossed array, column select switch is selected action a tPairing alignment.
Delay cell: with time step of voltage delay of the alignment of selecting;
State detection module: detect current ambient condition, and preserve an ambient condition.When needs were selected action based on state, state detection module detected the current environment state, and this state is offered state selecting switch and state control switch.After carrying out action, state selecting switch detects the ambient condition of this moment, and preserves an ambient condition, and the ambient condition of this moment is offered state selecting switch and state control switch.When the Q value was upgraded, state detection module was exported the ambient condition in the previous moment, and offered state selecting switch, selected corresponding column rule.
The present invention has arrived new circuit component-memristor successful Application in the intensified learning, and having solved intensified learning needs a large amount of memory space problem, for the research of intensified learning later on provides a kind of new thinking.
Description of drawings
Fig. 1 is the physical model structure figure of HP memristor;
Circuit diagram when Fig. 2 is the memristor write data;
Circuit diagram when Fig. 3 is the memristor read data;
Fig. 4 is a structural representation of recalling the resistance crossed array;
Resistance is single in the crossed array recalls resistance circuit figure to Fig. 5 in order to recall;
Fig. 6 is a structural representation of the present invention;
Fig. 7 is the structural representation of robot and barrier in the embodiment of the invention;
Fig. 8 is the simulation result of present embodiment.
Specific embodiment
Below in conjunction with accompanying drawing and specific embodiment the present invention is done and to further describe.
The Q learning algorithm is a classic algorithm in the intensified learning algorithm, and the simplest a kind of form is single step Q study in the Q study, and the more new formula of its Q value does
Q(s t,a t)=Q(s t,a t)+α(r t+1+γmaxQ(s t+1,a)-Q(s t,a t))
Wherein, α is a learning rate, and γ is a discount rate.r T+1Be illustrated in state s tCarry out action a tThe award of the environment that obtains.Q (s t, a t) represent that operating state is to value function, promptly at state s t, carry out action a t, the size of resulting value.
The limitation of intensified learning need to be a large amount of storage spaces; And new circuit component-memristor; Have nano-scale and storage characteristics, have a large amount of storage spaces and parallel processing capability, be fit to very much be used for addressing this problem based on the crossed array of memristor.
In the Q learning algorithm, action of every execution can obtain the prize value of environment, and selects the award of maximum Q value and the acquisition of current state-action pair to go to upgrade the right Q value of action of preceding state and selection.And going to realize Q when study with recalling the resistance crossed array, the output voltage of each memristor is represented the right Q value of pairing state-action.According to the storage principle of memristor, can know that resistance can not change after the power down, therefore only need add and write voltage at the memristor two ends
V i=α(r+γmaxV(s t+1,a)-V(s t,a t))
Just can go s tAnd a tThe resistance of pairing memristor is upgraded, thereby changes the output voltage V (s of this memristor t, a t), also be Q (s t, a t) value.
Recall the resistance crossed array and realize that the process of Q study is as shown in Figure 6.Recall in the resistance crossed array, the corresponding state s of each bar column rule, a little corresponding action a of each bar row, its concrete implementation procedure is as follows:
(1) the read-write SS is selected to read effectively, and the state detection module in the robot detects current environment state s t,, select corresponding column rule through state selecting switch;
(2) column select switch is selected all row; Through state control switch alignment is connected to and selects module at random; Select size at random the selection of module according to each column line voltage at random, the selecteed probability of the alignment that voltage is big more is big more, selects an alignment at last at random; According to the alignment of selecting, the action a that obtains carrying out t, robot carries out action a tAlso can when some state of setting, alignment be connected to comparator module, select the maximum alignment of voltage, through connecting SS this alignment is connected to delay cell again through state control switch.Through state selecting switch, select module, comparer, connection to select module just can realize that the ε-greedy in the intensified learning is tactful at random.
(3) alignment of selecting is connected to delay cell, delay cell is to time step of voltage delay of alignment;
(4) state detection module detects the current environment state, the s that gets the hang of of robot T+1This moment, state control switch was connected to comparer with alignment; Through comparer, select the maximum alignment of voltage, select module that this alignment is connected to Q value update module through connecting; Q value update module is calculated the output voltage of this voltage and delay cell and the award of acquisition environment according to formula (7), obtain writing voltage V i
(5) the read-write SS is selected will write voltage V with effect iBe added in the two ends of memristor, the time is T w
(6) repeat top process, up to the number of times that reaches setting.
The robot obstacle-avoiding experiment is to let robot in the environment of obstacle is arranged, realize collisionless walking.This experiment is adopted based on the Q that recalls the resistance crossed array and is learnt to realize the study of robot, and finally realizes clog-free walking, and mobotsim software is used in this experiment.
In Fig. 7, border circular areas is represented robot, and three sensors are arranged in the robot, respectively corresponding 3 sensors of digital 0-2, and the ultimate range that each sensor can detect is 1.5 meters, black region is represented barrier.
In this experiment, each sensor to the distance with barrier be divided into 3 sections, as follows:
Figure BDA00001740682800071
Figure BDA00001740682800072
Figure BDA00001740682800073
Wherein, The distance to barrier that each sensor of representing the dist0-dist2 submeter arrives makes up s0-s2, can obtain 27 kinds of situation; With these 27 kinds of situation as 27 kinds of states in the residing environment of robot; Store this 27 kinds of states with a three-dimensional array state [s0, s1, s2].Because in this experiment porch, when robot can not detect barrier with barrier collision or sensor, the value that sensor returns all was-1, and therefore, the state when robot and barrier are collided is classified as state 0, also is that s0-s2 is 0 o'clock a situation.
Award function r is defined as:
Figure BDA00001740682800081
In this experiment, robot will carry out three kinds of actions: advance, turn left and right-hand rotation.If when the residing state of robot was state [2,2,2], the execution of action was carried out according to the proportion of Q value at random; During other states, carry out the maximum action of Q value.
Get α=0.8, γ=0.98, simulation times is made as 500 times, and in each 2000 steps of emulation, the experiment simulation result is as shown in Figure 8.

Claims (1)

  1. One kind based on recall the resistance crossed array the Q learning system, comprise and recall the resistance crossed array that it is characterized in that: said system also comprises
    The read-write SS: the read-write operation of resistance crossed array is recalled in control;
    State selecting switch: state detection module detects the current environment state s t,, select corresponding column rule through state selecting switch;
    Column select switch: when needs to the Q value, also promptly some when recalling resistance and upgrading to what recall the resistance crossed array, column select switch is selected action a tPairing alignment;
    Delay cell: with time step of voltage delay of the alignment of selecting;
    State detection module: detect current ambient condition, preserve an ambient condition, when needs are selected action according to state; State detection module detects the current environment state; And this state offered state selecting switch and state control switch, and to carry out after the action, state selecting switch detects the ambient condition of this moment; Preserve an ambient condition, and ambient condition is at this moment offered state selecting switch and state control switch; When the Q value was upgraded, state detection module was exported the ambient condition in the previous moment, and offered state selecting switch, selected corresponding column rule.
CN201210188573.2A 2012-06-08 2012-06-08 Q learning system based on memristor intersection array Expired - Fee Related CN102723112B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210188573.2A CN102723112B (en) 2012-06-08 2012-06-08 Q learning system based on memristor intersection array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210188573.2A CN102723112B (en) 2012-06-08 2012-06-08 Q learning system based on memristor intersection array

Publications (2)

Publication Number Publication Date
CN102723112A true CN102723112A (en) 2012-10-10
CN102723112B CN102723112B (en) 2015-06-17

Family

ID=46948846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210188573.2A Expired - Fee Related CN102723112B (en) 2012-06-08 2012-06-08 Q learning system based on memristor intersection array

Country Status (1)

Country Link
CN (1) CN102723112B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105897585A (en) * 2016-04-11 2016-08-24 电子科技大学 Q learning packet transmission method based on delay constraints for ad hoc network
CN106373611A (en) * 2016-09-29 2017-02-01 华中科技大学 Storage and calculation array structure and operation method thereof
CN106844223A (en) * 2016-12-20 2017-06-13 北京大学 Data search system and method
CN107085429A (en) * 2017-05-23 2017-08-22 西南大学 The robot path planning's system learnt based on memristor crossed array and Q
CN107533862A (en) * 2015-08-07 2018-01-02 慧与发展有限责任合伙企业 Crossed array for calculating matrix multiplication
CN109214048A (en) * 2018-07-27 2019-01-15 西南大学 Utilize mixing CMOS- memristor fuzzy logic gate circuit and its design method
CN109255435A (en) * 2017-07-13 2019-01-22 爱思开海力士有限公司 Neuromorphic equipment with multiple cynapse blocks
CN110842915A (en) * 2019-10-18 2020-02-28 南京大学 Robot control system and method based on memristor cross array
CN113314178A (en) * 2021-05-07 2021-08-27 浙江树人学院(浙江树人大学) Memristor reading and writing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110004579A1 (en) * 2008-03-14 2011-01-06 Greg Snider Neuromorphic Circuit
CN101951258A (en) * 2010-09-27 2011-01-19 中国人民解放军国防科学技术大学 Multidigit variable system asynchronous counting circuit based on memory resistor
CN102354128A (en) * 2011-06-02 2012-02-15 北京大学 Circuit for emotional simulation of robot and control method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110004579A1 (en) * 2008-03-14 2011-01-06 Greg Snider Neuromorphic Circuit
CN101951258A (en) * 2010-09-27 2011-01-19 中国人民解放军国防科学技术大学 Multidigit variable system asynchronous counting circuit based on memory resistor
CN102354128A (en) * 2011-06-02 2012-02-15 北京大学 Circuit for emotional simulation of robot and control method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
胡柏林等: ""忆阻器Simulink建模和图形用户界面设计"", 《西南大学学报(自然科学版)》 *
高士咏等: ""忆阻细胞神经网络及图像去噪和边缘提取中的应用"", 《西南大学学报(自然科学版)》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107533862A (en) * 2015-08-07 2018-01-02 慧与发展有限责任合伙企业 Crossed array for calculating matrix multiplication
CN105897585A (en) * 2016-04-11 2016-08-24 电子科技大学 Q learning packet transmission method based on delay constraints for ad hoc network
CN105897585B (en) * 2016-04-11 2019-07-23 电子科技大学 A kind of Q study block transmission method of the self-organizing network based on delay constraint
CN106373611A (en) * 2016-09-29 2017-02-01 华中科技大学 Storage and calculation array structure and operation method thereof
CN106844223A (en) * 2016-12-20 2017-06-13 北京大学 Data search system and method
CN106844223B (en) * 2016-12-20 2021-04-09 北京大学 Data search system and method
CN107085429B (en) * 2017-05-23 2019-07-26 西南大学 Robot path planning's system based on memristor crossed array and Q study
CN107085429A (en) * 2017-05-23 2017-08-22 西南大学 The robot path planning's system learnt based on memristor crossed array and Q
CN109255435A (en) * 2017-07-13 2019-01-22 爱思开海力士有限公司 Neuromorphic equipment with multiple cynapse blocks
US11205117B2 (en) 2017-07-13 2021-12-21 SK Hynix Inc. Neuromorphic device having a plurality of synapses blocks
CN109214048A (en) * 2018-07-27 2019-01-15 西南大学 Utilize mixing CMOS- memristor fuzzy logic gate circuit and its design method
CN110842915A (en) * 2019-10-18 2020-02-28 南京大学 Robot control system and method based on memristor cross array
WO2021072817A1 (en) * 2019-10-18 2021-04-22 南京大学 Memristor cross array-based robot control system and method
CN113314178A (en) * 2021-05-07 2021-08-27 浙江树人学院(浙江树人大学) Memristor reading and writing method

Also Published As

Publication number Publication date
CN102723112B (en) 2015-06-17

Similar Documents

Publication Publication Date Title
CN102723112A (en) Q learning system based on memristor intersection array
US20220277199A1 (en) Method for data processing in neural network system and neural network system
CN108475519A (en) Including memory and its device and method of operation
CN109657787B (en) Two-value memristor neural network chip
US10825509B2 (en) Full-rail digital read compute-in-memory circuit
CN105474323B (en) The Memory Controller and method of the voltage value of refreshing memory cells
CN108304922A (en) Computing device and computational methods for neural computing
CN106847335A (en) Convolutional calculation storage integration apparatus and method based on resistance-change memory array
KR102567160B1 (en) Neural network circuit with non-volatile synaptic array
KR102618546B1 (en) 2-dimensional array based neuromorphic processor and operating method for the same
CN106158017A (en) The method and apparatus realizing logic and arithmetical operation based on resistance computing
US11829730B2 (en) Elements for in-memory compute
US11468305B2 (en) Hybrid memory artificial neural network hardware accelerator
US11756610B2 (en) Apparatus and method with in-memory delay dependent processing
CN108701485A (en) Mitigate the technology of the offset drift of memory device
WO2023130725A1 (en) Hardware implementation method and apparatus for reservoir computing model based on random resistor array, and electronic device
CN113643175A (en) Data processing method and electronic device
US9977603B2 (en) Memory devices for detecting known initial states and related methods and electronic systems
Chen et al. Low power convolutional architectures: Three operator switching systems based on forgetting memristor bridge
US11031079B1 (en) Dynamic digital perceptron
CN110597487B (en) Matrix vector multiplication circuit and calculation method
Shaarawy et al. 2T2M memristor-based memory cell for higher stability RRAM modules
US20230186086A1 (en) Neural network device and electronic system including the same
CN114861900A (en) Weight updating method for memristor array and processing unit
US20210183437A1 (en) Perpectual digital perceptron

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150617

Termination date: 20170608