CN102723112B - Q learning system based on memristor intersection array - Google Patents

Q learning system based on memristor intersection array Download PDF

Info

Publication number
CN102723112B
CN102723112B CN201210188573.2A CN201210188573A CN102723112B CN 102723112 B CN102723112 B CN 102723112B CN 201210188573 A CN201210188573 A CN 201210188573A CN 102723112 B CN102723112 B CN 102723112B
Authority
CN
China
Prior art keywords
state
memristor
switch
value
resistance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210188573.2A
Other languages
Chinese (zh)
Other versions
CN102723112A (en
Inventor
王丽丹
何朋飞
段书凯
钟宇平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University
Original Assignee
Southwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University filed Critical Southwest University
Priority to CN201210188573.2A priority Critical patent/CN102723112B/en
Publication of CN102723112A publication Critical patent/CN102723112A/en
Application granted granted Critical
Publication of CN102723112B publication Critical patent/CN102723112B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a Q learning system based on a memristor intersection array. The Q learning system comprises the memristor intersection array, and is characterized in that the system also comprises a read/write selective switch, a state selective switch, a column selective switch, a delay unit and a state detection module, wherein the read/write selective switch is used for controlling a read/write operation of the memristor intersection array; a state selective module is used for detecting current environment state st, and selecting corresponding row lines by the state selective switch; the column selective switch is used for selecting column lines corresponding to actions at when a certain memristor value of the memristor intersection array, namely a Q value, is updated; the delay unit is used for delaying time step of voltage of the selected column lines; and the state detection module is used for detecting current environment state and saving the last environment state. According to the Q learning system, a new circuit element, i.e. the memristor is successfully applied to reinforcement learning, so that the problem of a large quantity of storage spaces for reinforcement learning is solved, and a new thought is provided for the research on reinforcement learning in future.

Description

A kind of Q learning system based on recalling resistance crossed array
Technical field
The present invention relates to a kind of storage matrix and intelligent learning algorithm.
Background technology
Intensified learning is a kind of senior intelligent learning algorithm, is widely used in field in intelligent robotics in recent years, becomes the focus of research.1954, Minsky proposed the intensified learning computation model of SNARCs.Then, Sutton proposes AHC algorithm and TD learning algorithm in its PhD dissertation.Afterwards, the people such as Watkins, on the basis of TD learning algorithm, proposed the classic algorithm-Q learning algorithm in current nitrification enhancement, and Q learning algorithm is an important milestone in intensified learning evolution.After Q learning algorithm proposes, Q learning algorithm is applied to the navigation of mobile robot by Many researchers, the scheduling of robot soccer system and intelligent I/O.But intensified learning also has the limitation of himself, when problem is comparatively complicated, it needs a large amount of states-action storage space.1971, Chua is according to the completeness theory of circuit, propose the 4th kind of circuit component-memristor (L.O.Chua.Memristor-the missing circuit element.IEEE Trans.Circuit Theory.1971,18 (5): 507-519.).
2008, HP laboratory successfully manufactured the memristor of first physics realization, and after this memristor causes and pays close attention to widely.Memristor has nano-scale, nonlinear characteristic, and its resistance changes along with the change of input stimulus, and this change is non-volatile, and therefore memristor is applicable to for design large scale memory very much.Memristor crossed array is the one in memristor storer, and its structure is simple, and design is convenient.The people such as Hu little Fang utilize memristor crossed array to achieve the storage (Hu little Fang of image, Duan Shukai, Wang Lidan, etc. memristor crossed array and the application in image procossing. Chinese science F collects: information science .2011, and 41 (4): 500-512.).Because memristor has nano-scale, therefore memristor crossed array can make large scale memory, can solving intensified learning when solving challenge, needing the problem of a large amount of states-action storage space, therefore, utilization recalls resistance crossed array to realize Q study is a kind of good selection.
As shown in Figure 1, memristor is made up of doped region and undoped region two parts the physical model of HP memristor.Wherein w and D represents the width of doped region and the overall width of memristor in memristor respectively.Its mathematical model is as follows:
M ( t ) = R ON w ( t ) D + R OFF ( 1 - w ( t ) D )
Wherein, R oFFand R oNwhen representing that w equals 0 and D respectively, the resistance of memristor.
dw ( t ) dt = μ V R ON D i ( t )
Here, μ vrepresent the movement of average ion, unit is cm 2s -1v -1.
T w = Φ D V A R OFF 2 [ ( R ( w 0 ) ) 2 - ( R ( w ) ) 2 ]
Wherein,
Φ D = ( βD ) 2 2 μ v ( β - 1 )
Here, Tw is the pulse width of the pulse voltage at input memristor two ends, V athe amplitude of pulse, R (w 0) representing the initial resistance of memristor, R (w) represents the resistance that memristor can reach, β=R oFF/ R oN.
As R (w 0) when being less than or equal to R (w), can obtain
R ( w ) = ( R ( w 0 ) ) 2 - V A T w R OFF 2 Φ D , R ON ≤ R ( w ) ≤ R OFF
Therefore, when Tw mono-timing, along with V achange, the resistance of memristor can change, and this change is non-volatile.
Memristor memory circuit as shown in Figures 2 and 3.As shown in Figure 2, the circuit of sense data as shown in Figure 3 for the circuit of write data.When the data is written, add a positive potential pulse to memristor, R (w) can reduce, and therefore memristor can remember added potential pulse.When reading the data, the resistance of memristor is different, the V obtained outalso different, V outand defining a corresponding relation between the resistance of memristor, therefore, it is possible to the resistance size of correct reflection memristor, is also the size of memristor storing value.
The resistance of memristor can change along with the change of input stimulus, and this change is non-volatile; Therefore, memristor has extraordinary storage characteristics.Further, memristor has nano-scale, is suitable for use in very much in large scale memory.And recall resistance crossed array be exactly the example that a memristor makes storer.
Recall the structure of resistance crossed array as shown in Figure 4, the circuit of each border circular areas representative as shown in Figure 5.In Figure 5, read write switch be write data and the gauge tap of sense data.When giving some memristors write data, switch connects the point on the left side, and now, data voltage V is write in corresponding line input in; When reading the data of some memristors, switch connects the point on the right, now, and corresponding line input read data voltage V in, corresponding alignment output voltage V out.
Summary of the invention
The object of this invention is to provide a kind of Q of realization learning algorithm based on recall resistance crossed array Q learning system.
To achieve these goals, by the following technical solutions: a kind of Q learning system based on recalling resistance crossed array, comprises and recall resistance crossed array, it is characterized in that: described system also comprises
Read-write selector switch: control the read-write operation recalling resistance crossed array;
State selecting switch: state detection module detects current ambient conditions s t, by state selecting switch, select corresponding line;
Column select switch: when needs are to Q value, also namely to recall resistance crossed array some recall resistance upgrade time, column select switch selects action a tcorresponding alignment.
Delay cell: by voltage delay time step of the alignment of selection;
State detection module: detect current ambient condition, and preserve an ambient condition.When needs are according to condition selecting action, state detection module detects current ambient conditions, and this state is supplied to state selecting switch and state control switch.After performing an action, state selecting switch detects ambient condition now, and preserves an ambient condition, and ambient condition is now supplied to state selecting switch and state control switch.In time upgrading Q value, state detection module exports the ambient condition in previous moment, and is supplied to state selecting switch, selects corresponding line.
New circuit component-memristor has been successfully applied in intensified learning by the present invention, and solving intensified learning needs a large amount of memory space problem, and the research for later intensified learning provides a kind of new thinking.
Accompanying drawing explanation
Fig. 1 is the physical model structure figure of HP memristor;
Fig. 2 is the circuit diagram of memristor when writing data;
Circuit diagram when Fig. 3 is memristor read data;
Fig. 4 is the structural representation recalling resistance crossed array;
Fig. 5 singlely recalls resistance circuit figure for recalling in resistance crossed array;
Fig. 6 is structural representation of the present invention;
Fig. 7 is the structural representation of robot and barrier in the embodiment of the present invention;
Fig. 8 is the simulation result of the present embodiment.
Specific embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described further.
Q learning algorithm is a classic algorithm in nitrification enhancement, and in Q study, the simplest a kind of form is that single step Q learns, and the more new formula of its Q value is
Q(s t,a t)=Q(s t,a t)+α(r t+1+γmaxQ(s t+1,a)-Q(s t,a t))
Wherein, α is learning rate, and γ is discount rate.R t+1represent at state s tperform an action a tobtain the award of environment.Q (s t, a t) represent that operating state is to value function, namely at state s t, perform an action a t, the size of the value obtained.
The limitation of intensified learning is to need a large amount of storage spaces, and new circuit component-memristor, have nano-scale and storage characteristics, the crossed array based on memristor has a large amount of storage spaces and parallel processing capability, is applicable to very much for addressing this problem.
In Q learning algorithm, often perform an action, the award value of environment can be obtained, and the Q value that the action selecting the award of the maximum Q value of current state-action pair and acquisition to go to upgrade preceding state and selection is right.And with recall resistance crossed array go to realize Q study time, the Q value that the state-action corresponding to output voltage representative of each memristor is right.According to the storage principle of memristor, after can knowing power down, resistance can not change, and therefore only need add at memristor two ends and write voltage
V i=α(r+γmaxV(s t+1,a)-V(s t,a t))
Just can go s tand a tthe resistance of corresponding memristor upgrades, thus changes the output voltage V (s of this memristor t, a t), be also Q (s t, a t) value.
Recall resistance crossed array and realize the process of Q study as shown in Figure 6.Recall in resistance crossed array, the corresponding state s of each line, a little corresponding action a of each row, its specific implementation process is as follows:
(1) read and write selector switch to select to read effectively, the state detection module in robot detects current ambient conditions s t, by state selecting switch, select corresponding line;
(2) column select switch selects all row, by state control switch, alignment is connected to random selection module, random selection module is according to the random selection of the size of each column line voltage, the alignment that voltage is larger is larger by the probability selected, last Stochastic choice goes out an alignment, according to the alignment selected, obtain the action a performed t, robot performs an action a t.Also when some state set, by state control switch, alignment can be connected to comparator module, select the alignment that voltage is maximum, then by connecting selector switch, this alignment is connected to delay cell.ε-the greedy selecting module just can realize in intensified learning by state selecting switch, random selection module, comparer, connection is tactful.
(3) alignment of selection is connected to delay cell, delay cell is to voltage delay time step of alignment;
(4) state detection module detects current ambient conditions, and robot gets the hang of s t+1now alignment is connected to comparer by state control switch, pass through comparer, select the alignment that voltage is maximum, select module that this alignment is connected to Q value update module by connecting, the output voltage of this voltage and delay cell and the award that obtains environment calculate according to formula (7) by Q value update module, obtain writing voltage V i.
(5) read and write selector switch to select with effect, voltage V will be write ibe added in the two ends of memristor, the time is T w.
(6) process is above repeated, until reach the number of times of setting.
Robot obstacle-avoiding experiment to allow robot realize collisionless walking in the environment having obstacle.This experiment adopts the study realizing robot based on the Q study recalling resistance crossed array, and finally realizes clog-free walking, and this experiment uses mobotsim software.
In the figure 7, border circular areas represents robot, robot has three sensors, and digital 0-2 is corresponding 3 sensors respectively, and the ultimate range that each sensor can detect is 1.5 meters, and black region represents barrier.
In this experiment, that each sensor is detected be divided into 3 sections with distance that is barrier, as follows:
Wherein, dist0-dist2 submeter represents the distance to barrier that each sensor detects, s0-s2 is combined, 27 kinds of situations can be obtained, using these 27 kinds of situations as kind of the state of 27 in the environment residing for robot, this 27 kinds of states are stored with a three-dimensional array state [s0, s1, s2].Due in this experiment porch, when robot and barrier collide or sensor barrier can not be detected, the value that sensor returns is all-1, and therefore, state when robot and barrier being collided, is classified as state 0, situation when also namely s0-s2 is 0.
Reward functions r is defined as:
In this experiment, robot is by execution three kinds of actions: advance, and turns left and turns right.If when the state residing for robot is state [2,2,2], the execution of action performs at random according to the proportion of Q value; During other states, perform the action that Q value is maximum.
Get α=0.8, γ=0.98, simulation times is set to 500 times, and each emulation 2000 steps, Simulation results as shown in Figure 8.

Claims (1)

1., based on the Q learning system recalling resistance crossed array, comprise and recall resistance crossed array, it is characterized in that: described system also comprises
Read-write selector switch: control the read-write operation recalling resistance crossed array;
State selecting switch: state detection module detects current ambient conditions s t, by state selecting switch, select corresponding line;
Column select switch: when needs are to Q value, also namely to recall resistance crossed array some recall resistance upgrade time, column select switch selects action a tcorresponding alignment;
Delay cell: by voltage delay time step of the alignment of selection;
State detection module: detect current ambient condition, preserve an ambient condition, when needs are according to condition selecting action, state detection module detects current ambient conditions, and this state is supplied to state selecting switch and state control switch, after performing an action, state selecting switch detects ambient condition now, preserve an ambient condition, and ambient condition is now supplied to state selecting switch and state control switch; In time upgrading Q value, state detection module exports the ambient condition in previous moment, and is supplied to state selecting switch, selects corresponding line, adds write voltage at memristor two ends
It is right just can to go s twith a tthe resistance of corresponding memristor upgrades, thus changes the output voltage of this memristor v(s t, a t), be also q(s t, a t) value; Herein v(s t, a t) value with q(s t, a t) value is equal;
Wherein, α is learning rate, and r is reward function, and γ is discount rate.
CN201210188573.2A 2012-06-08 2012-06-08 Q learning system based on memristor intersection array Expired - Fee Related CN102723112B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210188573.2A CN102723112B (en) 2012-06-08 2012-06-08 Q learning system based on memristor intersection array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210188573.2A CN102723112B (en) 2012-06-08 2012-06-08 Q learning system based on memristor intersection array

Publications (2)

Publication Number Publication Date
CN102723112A CN102723112A (en) 2012-10-10
CN102723112B true CN102723112B (en) 2015-06-17

Family

ID=46948846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210188573.2A Expired - Fee Related CN102723112B (en) 2012-06-08 2012-06-08 Q learning system based on memristor intersection array

Country Status (1)

Country Link
CN (1) CN102723112B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3262651B1 (en) * 2015-08-07 2021-07-21 Hewlett Packard Enterprise Development LP Crossbar arrays for calculating matrix multiplication
CN105897585B (en) * 2016-04-11 2019-07-23 电子科技大学 A kind of Q study block transmission method of the self-organizing network based on delay constraint
CN106373611A (en) * 2016-09-29 2017-02-01 华中科技大学 Storage and calculation array structure and operation method thereof
CN106844223B (en) * 2016-12-20 2021-04-09 北京大学 Data search system and method
CN107085429B (en) * 2017-05-23 2019-07-26 西南大学 Robot path planning's system based on memristor crossed array and Q study
KR20190007642A (en) * 2017-07-13 2019-01-23 에스케이하이닉스 주식회사 Neuromorphic Device Having a Plurality of Synapse Blocks
CN109214048A (en) * 2018-07-27 2019-01-15 西南大学 Utilize mixing CMOS- memristor fuzzy logic gate circuit and its design method
CN110842915B (en) * 2019-10-18 2021-11-23 南京大学 Robot control system and method based on memristor cross array
CN115440277A (en) * 2021-05-07 2022-12-06 浙江树人学院 Memristor-based XOR logic circuit

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101951258A (en) * 2010-09-27 2011-01-19 中国人民解放军国防科学技术大学 Multidigit variable system asynchronous counting circuit based on memory resistor
CN102354128A (en) * 2011-06-02 2012-02-15 北京大学 Circuit for emotional simulation of robot and control method thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101971166B (en) * 2008-03-14 2013-06-19 惠普开发有限公司 Neuromorphic circuit

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101951258A (en) * 2010-09-27 2011-01-19 中国人民解放军国防科学技术大学 Multidigit variable system asynchronous counting circuit based on memory resistor
CN102354128A (en) * 2011-06-02 2012-02-15 北京大学 Circuit for emotional simulation of robot and control method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
胡柏林等."忆阻器Simulink建模和图形用户界面设计".《西南大学学报(自然科学版)》.2011,第33卷(第9期),全文. *
高士咏等."忆阻细胞神经网络及图像去噪和边缘提取中的应用".《西南大学学报(自然科学版)》.2011,第33卷(第11期),全文. *

Also Published As

Publication number Publication date
CN102723112A (en) 2012-10-10

Similar Documents

Publication Publication Date Title
CN102723112B (en) Q learning system based on memristor intersection array
US20220277199A1 (en) Method for data processing in neural network system and neural network system
EP3389051B1 (en) Memory device and data-processing method based on multi-layer rram crossbar array
CN108475519A (en) Including memory and its device and method of operation
CN108304922A (en) Computing device and computational methods for neural computing
CN106847335A (en) Convolutional calculation storage integration apparatus and method based on resistance-change memory array
US11544540B2 (en) Systems and methods for neural network training and deployment for hardware accelerators
US11610105B2 (en) Systems and methods for harnessing analog noise in efficient optimization problem accelerators
KR102567160B1 (en) Neural network circuit with non-volatile synaptic array
Zhang et al. Forgetting memristor based neuromorphic system for pattern training and recognition
US11829730B2 (en) Elements for in-memory compute
US11756610B2 (en) Apparatus and method with in-memory delay dependent processing
WO2023217017A1 (en) Variational inference method and device for bayesian neural network based on memristor array
CN110569962A (en) Convolution calculation accelerator based on 1T1R memory array and operation method thereof
US20230186086A1 (en) Neural network device and electronic system including the same
Dung et al. Reinforcement learning for POMDP using state classification
CN114861900A (en) Weight updating method for memristor array and processing unit
CN110597487B (en) Matrix vector multiplication circuit and calculation method
US11031079B1 (en) Dynamic digital perceptron
CN110175017B (en) Multiplier based on RRAM and operation method thereof
CN109614688A (en) A kind of optimization method, device, medium and the electronic equipment of load management model
CN116739077B (en) Multi-agent deep reinforcement learning method and device based on course learning
US20210098059A1 (en) Precise writing of multi-level weights to memory devices for compute-in-memory
CN116151343B (en) Data processing circuit and electronic device
Wang et al. A Read and Write Method for Forgetting Memristor Crossbar Array with Long-term and Short-term Memory

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150617

Termination date: 20170608

CF01 Termination of patent right due to non-payment of annual fee