WO2017124237A1 - 内存设备及基于多层rram交叉阵列的数据处理方法 - Google Patents

内存设备及基于多层rram交叉阵列的数据处理方法 Download PDF

Info

Publication number
WO2017124237A1
WO2017124237A1 PCT/CN2016/071254 CN2016071254W WO2017124237A1 WO 2017124237 A1 WO2017124237 A1 WO 2017124237A1 CN 2016071254 W CN2016071254 W CN 2016071254W WO 2017124237 A1 WO2017124237 A1 WO 2017124237A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
cross array
rram cross
rram
boolean
Prior art date
Application number
PCT/CN2016/071254
Other languages
English (en)
French (fr)
Inventor
余浩
王雨豪
赵俊峰
杨伟
肖世海
倪磊滨
Original Assignee
华为技术有限公司
南洋理工大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 南洋理工大学 filed Critical 华为技术有限公司
Priority to JP2018537499A priority Critical patent/JP6702596B2/ja
Priority to PCT/CN2016/071254 priority patent/WO2017124237A1/zh
Priority to SG11201805489YA priority patent/SG11201805489YA/en
Priority to CN201680058624.6A priority patent/CN108475522B/zh
Priority to EP16885510.4A priority patent/EP3389051B1/en
Publication of WO2017124237A1 publication Critical patent/WO2017124237A1/zh
Priority to US16/037,767 priority patent/US10459724B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30025Format conversion instructions, e.g. Floating-Point to Integer, decimal conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06JHYBRID COMPUTING ARRANGEMENTS
    • G06J1/00Hybrid computing arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C13/00Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
    • G11C13/0002Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
    • G11C13/0021Auxiliary circuits
    • G11C13/003Cell access
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C5/00Details of stores covered by group G11C11/00
    • G11C5/02Disposition of storage elements, e.g. in the form of a matrix array
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1006Data managing, e.g. manipulating data before writing or reading out, data bus switches or control circuits therefor
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C13/00Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
    • G11C13/0002Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
    • G11C13/0021Auxiliary circuits
    • G11C13/0023Address circuits or decoders
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C13/00Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
    • G11C13/0002Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
    • G11C13/0021Auxiliary circuits
    • G11C13/004Reading or sensing circuits or methods
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C2213/00Indexing scheme relating to G11C13/00 for features not covered by this group
    • G11C2213/70Resistive array aspects
    • G11C2213/71Three dimensional array
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C2213/00Indexing scheme relating to G11C13/00 for features not covered by this group
    • G11C2213/70Resistive array aspects
    • G11C2213/77Array wherein the memory element being directly connected to the bit lines and word lines without any access device being used

Definitions

  • Embodiments of the present invention relate to the field of storage, and more particularly, to a memory device and a data processing method based on a multi-layer RRAM cross array.
  • Memory wall As technology advances, processor performance continues to increase, but memory performance grows slowly. Therefore, the performance of memory becomes the "short board" of the overall performance improvement of the system, the so-called memory wall. Specifically, the processor and the memory are connected and communicated through an input/output (I/O) bus. Due to hardware limitations, the bandwidth of the IO bus is limited, causing the processor to wait for most of the idle state of the memory.
  • I/O input/output
  • Memory wall solution You can add logic units (or logic circuits) in memory so that the data is calculated directly in memory, ie in-memory computing. Taking the sum of 10 numbers as an example, if the memory only has the function of storing data, the processor needs to read 10 numbers from the memory through the IO bus and sum the 10 numbers; if the memory has logic operation function The memory can directly calculate the sum of 10 numbers, and then send the calculation result to the processor through the IO bus. Comparing the above two implementations, it can be found that the memory with logical operation function reduces the transmission pressure of the IO bus by 90%, thereby effectively reducing the limitation of the memory wall.
  • Non-volatile memory can be used instead of volatile Sex memory. Since the memory is non-volatile, the data located therein is not lost due to power outages. Therefore, in the data processing process, it is not necessary to always maintain the power supply of the entire memory, which can effectively reduce power consumption.
  • the core device of the RRAM is a memristor (or the resistor in the RRAM is a memristor), which is non-volatile and can reduce power consumption.
  • the RRAM has a cross array structure (hence, the RRAM is generally referred to as an RRAM cross array, or an RRAM cross, or an RRAM crossbar).
  • the RRAM crossbar can have one layer or multiple layers. In a multi-layer RRAM crossbar, the output of the upper layer can be used as the input to the next layer, and each layer of the RRAM crossbar is provided with a resistor array.
  • the resistance in the RRAM is seen as a neuron in the neural network, it can be found that the RRAM crossbar is very similar to the structure of the neural network. Such a structure is very suitable for logical operations. Specifically, various logic operations can be implemented by configuring the number of layers of the RRAM crossbar, the size of the resistor array in each layer of the RRAM crossbar, and the resistance of each resistor.
  • FIG. 2 shows the circuit structure of a conventional RRAM crossbar capable of logical operations.
  • the resistance of the resistor in the RRAM crossbar can be stored in the RRAM crossbar.
  • the resistance G ij in FIG. 2 corresponds to the ith row and the jth column element of ⁇ , and the resistance value of G ij is represented. The value of the corresponding element.
  • the elements of the matrix X are first converted from the digital quantities x 1 ... x n into analog quantities (analog voltage signals) and input into the rows of the RRAM. Then, the point multiplication of each element in the matrix is performed by the relationship of voltage, current, and resistance between the rows and columns in the RRAM crossbar, and the calculation results V 1 to V m are obtained . Finally, the calculation result (analog voltage amount) is converted into a digital quantity (for example, y1, y2, etc.) and then output.
  • a digital quantity for example, y1, y2, etc.
  • DACs digital to analog converters
  • ADCs analog to digital converters
  • the resistance of the resistor in the RRAM is determined by the integral of the current flowing through the resistor.
  • the characteristics of the resistor elements in the RRAM are not constant, and some fluctuations may occur, resulting in different resistance values obtained by the same current integration operation.
  • the resistance exhibits a different state transition curve (from the R on state (also referred to as low resistance state, or on-state) - due to fluctuations in the characteristics of the device -
  • the intermediate state -R off also known as high-impedance state, or off-state
  • this inaccuracy is The intermediate state of the resistor is particularly prominent.
  • the application provides a memory device to improve the accuracy of the logic operation of the conventional RRAM crossbar.
  • a memory device including a control bus and a plurality of memory units, wherein the plurality of memory units are connected by the control bus, each of the plurality of memory units
  • the control module includes: the control module is connected to the processor through the control bus, and receives and parses instructions of the processor through the control bus, where the instructions of the processor include logic operation instructions; logic modules
  • the logic module is connected to the control module, the logic module includes at least one RRAM crossbar (ie, RRAM crossbar), and the resistance of the resistor in the at least one RRAM cross array is R on or R off .
  • R on represents a Boolean value of 1
  • R off represents a Boolean value of 0.
  • the control module performs a Boolean operation through the at least one layer of the RRAM cross array according to the logic operation instruction.
  • the logic operation instruction is used to instruct the logic module to perform a point multiplication operation of a Boolean vector A and a Boolean vector B, where A and B both represent N-dimensional Boolean a vector, N is a positive integer not less than 2,
  • the logic module includes a multi-layer RRAM cross array, and the first layer RRAM cross array in the multi-layer RRAM cross array includes a N row ⁇ N column resistance array, the An input of a resistor in each row of a layer of RRAM cross array is connected to a word line, and an output of the resistor in each column of the first layer of RRAM cross array is connected to a bit line, the first layer of RRAM cross array N word lines are connected to the control module, and N bit lines of the first layer RRAM cross array are respectively connected to other RRAM cross arrays in the multi-layer RRAM cross array through N comparison circuits; a layer RRAM cross array generating N current signals on the N bit lines according to
  • the point multiplication of the Boolean vector is implemented by a multi-layer RRAM cross array.
  • said N comparison circuits j-th comparison circuit includes a fixed value resistor R s and a comparator, said one end of the resistance R s of the N bit lines in the j-th bit lines, and the comparator is connected, is connected to the other end of the resistor R s and a ground, the threshold voltage of the j-th value V comparison circuit r *g on *R s *(2j+1)/2, where V r represents a voltage value corresponding to Boolean value 1, and g on represents a reciprocal of R on .
  • the logic module includes at least three layers of RRAM cross arrays, the other layer RRAM cross arrays including a second layer of RRAM cross arrays and a third layer RRAM cross array; the second layer RRAM cross array comprising (2N-1) rows x N columns of resistance arrays, the input of the resistance of each row of the second layer RRAM cross array being connected to a word line
  • the output of the resistor of each column of the second layer RRAM cross array is connected to a bit line, and the word line of the second layer RRAM cross array is connected to the output end of the bit line of the first layer RRAM cross array
  • the second layer RRAM cross array receives the voltage signal corresponding to the first calculation result from the output end of the bit line of the first layer RRAM through the 2N-1 word lines, and according to the first calculation As a result, the corresponding voltage signal and the resistance of the resistor in the second layer RRAM cross array are logically operated:
  • the third layer RRAM cross array includes a N row x n column resistance array, the third layer RRAM The input of the resistor of each row in the cross array is connected to a word line, and the output of the resistor of each column in the third layer RRAM cross array is connected to a bit line, where n is greater than or equal to the integer N required by binary The minimum number of bits; the third layer RRAM cross array receives the voltage corresponding to the intermediate calculation result from the N bit lines of the second layer cross array through the N word lines of the third layer RRAM cross array
  • the jth word line of the third layer RRAM cross array is connected to the jth strip of the second layer RRAM cross array
  • the bit line, the resistance of the jth row resistance of the third layer RRAM cross array corresponds to a binary representation of the integer j+1.
  • the Boolean vector A is any one of the row vectors of the Boolean matrix ⁇
  • the Boolean vector B is any one of the column vectors of the Boolean matrix X
  • each of the plurality of logic modules in the memory device is responsible for the partial multiplication of the partial row vector of the Boolean matrix ⁇ and the partial column vector of the Boolean matrix X
  • the plurality of logic modules collectively implement a Boolean matrix multiplication operation of the Boolean matrix ⁇ and the Boolean matrix X.
  • the instructions of the processor further include read/write instructions of data
  • each of the memory units further includes: storing The module is connected to the control module, and the control module reads and writes data through the storage module according to the read/write instruction of the data.
  • a data processing method based on a multi-layer RRAM cross array wherein a resistance of the resistor in the multi-layer RRAM cross array is R on or R off , R on represents a Boolean value 1, and R off represents a Boolean value.
  • the multi-layer RRAM cross array is used for point multiplication of Boolean vector A and Boolean vector B, A and B both represent N-dimensional Boolean vectors, N is a positive integer not less than 2, and the multi-layer RRAM cross array
  • the first layer of RRAM cross arrays includes N rows x N columns of resistor arrays, the input terminals of the resistors in each row of the first layer RRAM cross array are connected to a word line, and each of the first layer RRAM cross arrays An output of the resistor in one column is connected to a bit line, and N bit lines of the first layer RRAM cross array are respectively connected to other RRAM cross arrays in the multi-layer RRAM cross array by N comparison circuits;
  • the method includes: the first layer RRAM cross array according to a voltage signal input by N word lines of the first layer RRAM cross array and a resistance value of a resistor in the first layer RRAM cross array, in the N bits Generating N current signals on the line, wherein the N The voltage value of the
  • the logic module comprises at least three layers of RRAM cross arrays, the other layer RRAM cross array comprising a second layer RRAM cross array and a third layer RRAM cross array
  • the second layer RRAM cross array includes (2N-1) rows x N columns of resistor arrays, and the input of the resistor of each row in the second layer RRAM cross array is connected to a word line, the second layer Output of the resistance of each column in the RRAM cross array Connecting a bit line, the word line of the second layer RRAM cross array is connected to the output end of the bit line of the first layer RRAM cross array;
  • the third layer RRAM cross array comprises N rows x n columns of resistors Array, an input end of a resistor of each row of the third layer RRAM cross array is connected to a word line, and an output end of the resistor of each column of the third layer RRAM cross array is connected to a bit line, where n is greater than Equivalent to the minimum number of bits required to represent the
  • said N comparison circuits j-th comparison circuit includes a resistor R s and relatively fixed resistance One end of the resistor R s is connected to the jth bit line of the N bit lines and the comparator, and the other end of the resistor R s is connected to the ground, the jth comparison circuit
  • the voltage threshold is V r *g on *R s *(2j+1)/2, where V r represents the voltage value corresponding to Boolean value 1, and g on represents the reciprocal of R on .
  • the memory module is a RRAM cross array based memory module.
  • RRAM cross array-based memory modules reduce memory power consumption.
  • control module includes: an instruction queue for buffering instructions of the processor; an instruction decoder for parsing instructions of the processor, and performing corresponding according to the parsed instruction operating.
  • control module includes a static random access memory (SRAM) for storing result data obtained from the logic module and/or the storage module.
  • SRAM static random access memory
  • the control module is further configured to send the result data to the processor.
  • control module is a Complementary Metal Oxide Semiconductor (CMOS) based control module.
  • CMOS Complementary Metal Oxide Semiconductor
  • This application improves the accuracy of the logic operation of the RRAM crossbar.
  • Figure 1 is a schematic diagram of the physical structure of the RRAM crossbar.
  • FIG. 2 is a structural diagram of a conventional RRAM crossbar based logic circuit.
  • Figure 3 is a graph showing the resistance characteristics of the resistors in the RRAM crossbar.
  • FIG. 4 is a schematic structural diagram of a memory device according to an embodiment of the present invention.
  • FIG. 5 is an exemplary diagram of a bus design manner of the control module 45.
  • Figure 6 is a schematic diagram showing the circuit structure of one layer of the RRAM crossbar.
  • Figure 7 is a circuit diagram of a first layer RRAM crossbar of a three layer RRAM crossbar for implementing Boolean matrix multiplication.
  • Figure 8 is a circuit diagram of a second layer RRAM crossbar of a three layer RRAM crossbar for implementing Boolean matrix multiplication.
  • Figure 9 is a circuit diagram of a third layer RRAM crossbar of a three layer RRAM crossbar for implementing Boolean matrix multiplication.
  • FIG. 4 is a schematic structural diagram of a memory device according to an embodiment of the present invention.
  • the memory device 40 is generally H-tree structured.
  • the storage module 43 or data module, data array, data array, etc.
  • the logic module 44 or logic circuit, logic operation circuit, In-memory logic, etc.
  • both the storage module 43 and the logic module 44 may be RRAM crossbars.
  • the memory module 43 may be an RRAM crossbar having a layer structure
  • the logic module 44 may be an RRAM crossbar having a multi-layer structure.
  • the type of the storage module 43 is not specifically limited in the embodiment of the present invention, and other types of storage media may also be used.
  • the embodiment of the present invention does not specifically limit the number of layers of the RRAM crossbar.
  • the storage module 43 can also be designed as a multi-layer RRAM crossbar, and the logic module 44 can also be designed.
  • a layer of logic module 44 (a layer of RRAM crossbar can also implement simple logic operations).
  • the memory device 40 mainly includes four parts: a block decoder 41, a storage module 43, a logic module 44, and a control module 45.
  • Control module 45 can be a CMOS based control module.
  • the storage module 43, the logic module 44 and the control module 45 may form a memory unit 42 (or a Data/Logic pair).
  • control module 45 is associated with the storage module 43, the logic module 44, and the block decoder, respectively.
  • 41 is connected and connected to the processor 47 (e.g., may be a CPU) by a block decoder 41.
  • Control module 45 can receive and parse the instructions of processor 47, and can also be responsible for the exchange or transfer of data between storage module 43 and logic module 44.
  • the block decoder 41 can transfer the instructions issued by the processor 47 to the control module 45 of the memory unit 42 corresponding to the instruction.
  • the address lookup function of memory unit 42 can be integrated into block decoder 41, with the remaining control functions integrated into control module 45.
  • control module 45 can be connected to the block decoder 41, but the block decoder 41 can only be responsible for transmitting the instructions issued by the processor 47 to the control module 45 of the corresponding memory unit 42.
  • control module 45 can be considered to be the subject of receiving and parsing processor instructions.
  • the internal structure of the control module and the processing method of the instructions are described in detail below with reference to FIG. 5 as an example.
  • FIG. 5 is an exemplary diagram of a bus design manner of a control module.
  • the control module 45 can include an instruction queue 451, an instruction decoder 452, an address decoder 453, and an SRAM array 454, the manner of which can be connected as shown in FIG.
  • an instruction queue 451 can be designed within the control module 45 for caching instructions issued by the processor 47 to reduce the latency of the processor 47.
  • the instruction decoder 452 parses the instructions to be executed (for example, instructions such as data reading, data storage, and memory calculation, which will be described in detail later), and then performs corresponding operations.
  • the address decoder 453 can decompose the address information in the instruction into row/column information of the storage module 43 or row/column information of the logic module 44.
  • the SRAM array 454 can be used to temporarily store data read in the memory module 43 or logic module 44 and write the data to the memory module 43 or logic module 44 in accordance with the instructions, or to transfer the data back to the external processor 47.
  • the format and type of the instructions issued by the memory device 40 of the processor 47 are not specifically limited in the embodiment of the present invention.
  • the type of instruction that the processor 47 sends to the memory device 40, or the type of instruction that the control module 45 needs to parse may include the four types of instructions shown in Table 1.
  • logic module 44 is logically configured to configure the resistance of the resistor in logic module 44 to enable it to implement specific operational logic.
  • the logic module 44 is then provided with an input signal that inputs to the logic module 44 the data that needs to be logically operated.
  • the memory calculation can then be performed in logic module 44 based on the input signal and the already configured operational logic.
  • the memory calculation process is described in detail below in conjunction with the instructions in Table 1.
  • the processor 47 can issue the following instructions to the memory device 40.
  • Instruction 1 SW instruction, writing data in processor 47 or memory module 43 to logic module 44 to configure the resistance of RRAM in logic module 44 to enable implementation of specific logic, such as summation, Or, multiplication, etc.
  • Instruction 2 SW instruction to write data in the processor 47 or the memory module 43 into the input column of the logic module 44 (the voltage input by the word line in FIG. 6) ).
  • the input data can be used to implement specific logic operations in the logic module based on the previously configured operational logic.
  • logic module 47 implements the summation logic of a+b, which can be stored in logic module by instruction 1, then a is input through instruction 2, and summed with b.
  • Instruction 3 The ST instruction turns on all row/column switches of logic module 44 such that current flows through all of the rows/columns of logic module 44.
  • Instruction 4 when implementing complex logic operations in the RRAM crossbar, the RRAM crossbar in logic module 44 needs to have multiple layers. In this case, the calculation of the RRAM crossbar may take some time. Therefore, the ST command can be instructed to wait for the control module 45 to wait for the memory calculation of the logic module 44 to complete before executing the subsequent instructions.
  • Instruction 5 SW instruction, after the memory calculation is completed, the data calculated by the logic module 44 can be written back to the storage module 43.
  • control module 45 performs memory calculations according to instructions is described in detail above. It should be noted that the control module 45 can also perform normal data reading and writing according to the instructions. This process is similar to the prior art and will not be described in detail herein.
  • the logic operation process of the logic module 44 based on the RRAM crossbar is briefly described below with reference to FIG. 6 as an example.
  • a comparison circuit is disposed at the bottom of each bit line.
  • the comparison circuit can employ a Sense Amplifier (SA).
  • SA Sense Amplifier
  • the smaller resistance value SA includes a fixed resistor R s (e.g., R s ⁇ R on ⁇ R off) and an operational amplifier, which acts to columns (bit lines) on the current signal into a voltage, and this voltage
  • R s e.g., R s ⁇ R on ⁇ R off
  • an operational amplifier which acts to columns (bit lines) on the current signal into a voltage, and this voltage
  • the voltage thresholds of the first layer of RRAM crossbar are compared to obtain the calculation results for this column. For each column, the formula is as follows:
  • FIG. 6 shows only the circuit structure of one layer of RRAM crossbar, and if the logic module 44 includes a multi-layer RRAM crossbar, the column output of the upper layer It can be used as the input of the next layer. The structure of each layer can be the same or different. Column output of the last layer It can be used as the final output of the logic module 44.
  • the resistance of the resistor in the RRAM crossbar and the voltage threshold of each column can be configured in the following steps (in other words, the resistance value in the RRAM crossbar and each column)
  • the voltage threshold determines what logic is implemented by the RRAM crossbar to low:
  • Step 1 Determine the number of layers of RRAM crossbar required to implement this particular logic and the size of the rows and columns in each layer in the software (MatLab, Octave, etc.).
  • Step 2 Calculate the resistance of the resistor in each layer of the RRAM crossbar and compare the voltage threshold of the circuit.
  • Step 3 Store the calculated resistance of the resistor in the RRAM crossbar into the corresponding resistor by the instruction, and set the voltage threshold of the comparison circuit.
  • R off may represent a Boolean value "0" (or a digital "0")
  • R on may represent a Boolean value "1” (or a digital "1”).
  • the RRAM crossbar's row input interface can be designed as a pure digital interface without AD conversion.
  • the RRAM crossbar's row input is a voltage signal, where the input voltage signal includes high and low voltage.
  • the high level corresponds to a Boolean value of 1
  • the low level corresponds to a Boolean value of 0.
  • the embodiment of the present invention does not specifically limit the voltage values of the high level and the low level.
  • the resistor R s in the SA of each column may be a small resistor having a fixed resistance, and R off , R on , and R S may satisfy R off >>R on >>R s .
  • the SA compares the voltage of the column with the voltage threshold of the column to obtain the output voltage of the column, the output voltage Either the voltage corresponding to Boolean 0 (low level) or the voltage corresponding to Boolean 1 (high level).
  • the product of the matrix ⁇ and the matrix X can also be regarded as a column vector.
  • each element of matrix Y is the result of multiplying a row of matrix ⁇ by a column of matrix X (ie, finding the inner product).
  • the embodiment of the invention firstly proposes a logic module, which can implement a point multiplication logic of a Boolean vector (so-called Boolean vector, that is, elements in the vector are 0 or 1) based on the multi-layer RRAM crossbar.
  • the embodiment of the present invention further proposes a memory device capable of realizing a multiplication operation of a Boolean matrix (so-called Boolean matrix, that is, each element in the matrix is 0 or 1).
  • the memory device can include one or more logic modules capable of performing Boolean vector multiplication.
  • the memory device can be decomposed into a plurality of Boolean vector point multiplication operations by Boolean matrix multiplication, and then the multiplication of the Boolean vectors by the multiplication operation is distributed.
  • the multiplication of the Boolean matrix is implemented by the one or more logic modules.
  • Boolean vector [ ⁇ 0,j , ⁇ 1,j ... ⁇ N-1,j ] (can be regarded as a vector composed of any row element in the Boolean matrix ⁇ , corresponding to the Boolean vector A above) and Boolean vector [x i,0 ,x i,1 ...x i,N-1 ] (can be seen as a Boolean vector of any column element in matrix X, corresponding to the Boolean vector B above)
  • a multi-layer RRAM crossbar that implements Boolean vector point multiplication is described in detail.
  • the multi-layer RRAM crossbar may specifically include a three-layer RRAM crossbar.
  • the first layer RRAM crossbar of the three-layer RRAM crossbar can use the circuit shown in FIG.
  • the first layer of RRAM crossbar includes an N x N array of resistors.
  • the N resistances on the respective columns of the N ⁇ N resistance array respectively represent Boolean vectors [ ⁇ 0, j , ⁇ 1, j ... ⁇ N-1, j ].
  • the N ⁇ N resistance array of the first layer RRAM crossbar The resistance values of the first column in the order are R on (corresponding to 1), R off (corresponding to 0), R on , R off , R on , R off , R on , R off , and N ⁇ N resistance.
  • Each column in the array is configured with the same resistance as the first column.
  • a comparison circuit is provided at the bottom end of each bit line of the N ⁇ N resistance array (hereinafter, the comparison circuit is SA as an example).
  • the comparison circuit may include a smaller resistance of the fixed resistance R s of the comparator.
  • the function of the comparison circuit is to convert the current signal on each column into a voltage signal and compare the voltage signal to the voltage threshold Vth1 of the column to determine whether the result of this column is 0 or 1.
  • the voltage thresholds of the columns in the N ⁇ N resistor array can be sequentially set to: V r *g on *R s *(2j+1)/2, where j is a positive integer from 0 to N-1.
  • V r represents the actual voltage at which the X input is 1 (ie, a high level)
  • g on represents the admittance corresponding to the resistance R on
  • R s represents the resistance of the sampling resistor.
  • the logic functions that the first layer RRAM crossbar can implement are described below.
  • the first layer RRAM crossbar inputs the voltage signal corresponding to the Boolean vector [x i,0 ,x i,1 ...x i,N-1 ] (ie Boolean vector [x i,0 ,x i,1 ...x
  • the word line corresponding to the element 1 in i, N-1 ] is input to the high level, and the word line input corresponding to the element 0 in the Boolean vector [x i, 0 , x i, 1 ... x i, N-1 ] Low level).
  • the resistance of each column resistance in the first layer RRAM crossbar is the resistance value corresponding to the Boolean vector [ ⁇ 0, j , ⁇ 1, j ...
  • the first layer RRAM crossbar implements a Boolean vector [ ⁇ 0,j , ⁇ 1,j ... ⁇ N- on each bit line based on the relationship between voltage and current. 1, j ] and the dot-multiplication logic of the Boolean vector [x i,0 , x i,1 ...x i,N-1 ], the result of which the multiplication logic can be represented by the current on each word line.
  • the SA of the word line connection of the first layer RRAM crossbar outputs a voltage signal corresponding to the first calculation result at the output end of the bit line by setting the stepped voltage threshold described above, wherein the first calculation result is the Boolean vector N and the first K elements of the first calculation result is 1, the remaining elements are 0, K is a Boolean vector [ ⁇ 0, j, ⁇ 1 , j ... ⁇ N-1, j] and Boolean vector [x i,0 ,x i,1 ...x i,N-1 ] The result of the dot multiplication.
  • the output of the first layer RRAM crossbar, O 1,j (0 ⁇ j ⁇ N-1), is 11.10000000 through the logic operation of the first layer RRAM crossbar.
  • the comparison result of the SAs in the 0-3th column is that the column voltage is greater than the voltage threshold, and the comparison results of the SAs in the fourth to seventh columns are all the column voltages being smaller than the voltage threshold.
  • the output of the layer RRAM crossbar is 11100000.
  • the logical task of the second layer RRAM crossbar and the third layer RRAM crossbar is to convert 11100000 into 11, ie binary 3.
  • the second layer RRAM crossbar can adopt the structure shown in FIG.
  • the second layer RRAM crossbar includes a (2N-1) x N resistor array.
  • the resistance of the 2jth and 2j+1th resistors in the jth column resistance of the second layer RRAM crossbar is R on , and the resistance of the remaining resistors is R off , where 0 ⁇ j ⁇ N-2.
  • the resistance of the 2N-1th resistor in the N-1th column resistance of the second layer RRAM crossbar is R on , and the resistance of the remaining resistors is R off .
  • the 2N-1 word lines of the second layer RRAM crossbar and the bit line output terminals O 1,j of the first layer RRAM crossbar may adopt the connection relationship shown in FIG.
  • a comparison circuit is connected to each end of the bit line of the second layer of the RRAM crossbar, and the voltage threshold V th2 of the comparison circuit can be set to V r *g on *R s /2, and the positive and negative poles of the comparator in the operation circuit The polarity needs to be reversed (that is, the threshold voltage V th2 is set at the non-inverting input of the comparator, and the inverting input is connected to the resistor R s and the bit line) to obtain the output of the second layer O 2,j , 0 ⁇ j ⁇ N-1 .
  • Equation (5) is the logic function to be implemented by the second layer of the RRAM crossbar.
  • the logic expressed by equation (5) is actually an XOR logic, that is, the first calculation result of the output of the first layer RRAM crossbar is XORed to obtain an intermediate calculation result, and the intermediate calculation result is an N-dimensional vector, the N The K-1th element of the dimension vector is 1, and the remaining elements are 0.
  • K is a Boolean vector [ ⁇ 0,j , ⁇ 1,j ... ⁇ N-1,j ] and a Boolean vector [x i,0 , x i,1 ...x i,N-1 ]
  • the result of the point multiplication Taking the output result of the first layer of the RRAM crossbar as 11100000 as an example, after the logic operation of the second layer, the result is 00100000.
  • the structure of the RRAM crossbar for implementing the XOR logic is not specifically limited.
  • FIG. 8 is only an example. In practice, the above resistance resistance and voltage threshold configuration may also be used to implement the foregoing. XOR logic.
  • the second layer RRAM crossbar intermediate calculation result is transmitted to the voltage signal corresponding to the second word line layer RRAM crossbar, wherein the output terminal O of the j-th bit line RRAM crossbar of the second layer 2, j RRAM crossbar connecting the third layer The input of the jth word line.
  • the logic circuit of the third layer RRAM crossbar is shown in Figure 9.
  • the binary representation of the integer corresponding to the K-1th row resistance of the crossbar (it is known from the above that the K-1 row resistance corresponds to the binary representation of the integer K) is selected as the final operation result.
  • the output of the third layer RRAM crossbar is the voltage signal corresponding to the above final operation result.
  • the logic output of the second layer RRAM crossbar is 00100000.
  • the inputs and outputs of the third layer RRAM crossbar have the logical correspondence shown in the following table:
  • Table 2 Input and output comparison table of the third layer RRAM crossbar
  • the output corresponding to 00100000 is 0011, which is 3 in binary.
  • the input matrix is a non-boolean matrix (for example, the input matrix is positive) Real matrix) can be decomposed into linear combinations of multiple Boolean matrices by means of linear algebra. Then, the plurality of Boolean matrices are respectively calculated in the above manner, and the results of the operations of the plurality of Boolean matrices are linearly combined to obtain a result of multiplication of the matrices corresponding to the real matrices, which will not be described in detail in the embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • Automation & Control Theory (AREA)
  • Computer Hardware Design (AREA)
  • Fuzzy Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Neurology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Semiconductor Memories (AREA)
  • Logic Circuits (AREA)

Abstract

本发明实施例提供一种内存设备,该内存设备包括用于进行逻辑运算的RRAM交叉阵列,且该RRAM交叉阵列中的电阻的阻值均设置为Ron或Roff,以表示布尔值1或0。在以上设置的基础上,通过RRAM交叉阵列实现布尔运算,能够提高RRAM交叉阵列逻辑运算的可靠性。

Description

内存设备及基于多层RRAM交叉阵列的数据处理方法 技术领域
本发明实施例涉及存储领域,并且更具体地,涉及一种内存设备及基于多层RRAM交叉阵列的数据处理方法。
背景技术
目前许多应用都会涉及复杂的大数据计算,例如指纹识别、机器学习等。对于现在的大数据计算来说,计算系统的性能的瓶颈主要体现在以下两个方面:
内存墙(memory wall):随着技术的发展,处理器的性能在不断提升,但内存性能却增长很慢。因此,内存的性能成为系统整体性能提升的“短板”,即所谓的内存墙。具体而言,处理器与内存之间通过输入输出(Input/Output,I/O)总线连接与通信。由于硬件的限制,该IO总线的带宽有限,导致处理器大部分时间处于等待内存的空闲状态。
能耗墙(power wall):目前的内存大部分采用的是易失性存储器。因此,为了保证数据不丢失,需要一直为易失性内存供电,导致内存的动态功耗与静态功耗都很大。
针对以上两个问题,通常有以下解决方案:
内存墙的解决方案:可以在内存中增加逻辑单元(或逻辑电路),使得数据直接在内存中计算,即in-memory computing。以求10个数字之和为例,如果内存仅具有存储数据的功能,处理器需要通过IO总线从该内存中读取10个数字,并将这10个数字求和;如果内存具有逻辑运算功能,内存能够直接计算出10个数字之和,然后通过IO总线,将计算结果发送至处理器。比较以上两种实现方式可以发现,具有逻辑运算功能的内存将IO总线的传输压力降低了90%,从而能够有效降低内存墙的限制。
能耗墙的解决方案:可以使用非易失性(Non-volatile)存储器替代易失 性存储器。由于存储器是非易失性的,位于其中的数据不会因为断电而丢失。因此,在数据处理过程中,无需时刻保持整个存储器的供电,这样可以有效降低功耗。
电阻式随机存取存储器(Resistive Random Access Memory,RRAM)技术的发展使得以上两个问题的同时解决成为可能。首先,RRAM的核心器件是忆阻器(或者说RRAM中的电阻为忆阻器),具有非易失性,能降低功耗。进一步地,如图1所示,RRAM具有交叉阵列结构(因此,通常将RRAM称为RRAM交叉阵列,或RRAM交叉,或RRAM crossbar)。RRAM crossbar可以具有一层,也可以具有多层。在多层RRAM crossbar中,上一层的输出可以作为下一层的输入,RRAM crossbar的每一层设置有电阻阵列。如果将RRAM中的电阻看成神经网络中的神经元,可以发现RRAM crossbar与神经网络的结构十分类似,这样的结构非常适于逻辑运算。具体地,可以通过配置RRAM crossbar的层数、每层RRAM crossbar中的电阻阵列的规模以及每个电阻的阻值来实现各种各样的逻辑运算。
在现有技术中,RRAM crossbar的逻辑运算能力已经得到了一定的开发和运用。图2示出了传统的能够进行逻辑运算的RRAM crossbar的电路结构。首先,需要根据期望实现的逻辑运算功能(如求和、异或、矩阵乘法等)对RRAM crossbar中的电阻的阻值进行配置。以矩阵乘法Y=ΦX为例,首先可以将矩阵Φ的各元素存储于RRAM crossbar中,例如,图2中的电阻Gij对应为Φ的第i行第j列元素,Gij的阻值表示对应元素的数值。接着,实际进行矩阵乘法时,先将矩阵X的元素从数字量x1…xn转换成模拟量(模拟的电压信号),并输入到RRAM的行中。然后,利用RRAM crossbar中的行列之间的电压、电流、电阻的关系进行矩阵中各元素的点乘运算,得到计算结果V1至Vm。最后,再将计算结果(模拟电压量)转换成数字量(例如y1、y2等)后进行输出。
从上文的描述可以看出,传统的RRAM crossbar利用模拟量进行逻辑运 算,采用这样的运算方式主要有以下两个缺点:
第一、需要大量的数字模拟转换器(Digital to Analog Converter,DAC)和模拟数字转换器(Analog to Digital Converter,ADC)对信号进行DA和AD转换操作,这些转换器件和转换操作不但耗时,而且消耗功率。
第二、为了实现某种运算逻辑,需要预先对RRAM中的电阻进行配置或编程。实际中,RRAM中的电阻的阻值由流过该电阻的电流的积分来确定。但RRAM中的电阻元件的特性并非一尘不变的,会产生一定的波动,导致相同的电流积分操作得到的阻值可能不同。具体地,图3的(a)中示出了受到元件特性的波动的影响,电阻会呈现出不同的状态转换曲线(从Ron状态(也可称为低阻态,或on-state)-中间态(intermediate state)-Roff(也可称为高阻态,或off-state),导致电阻编程的不准确。而且,从图3的(b)可以看出,这种不准确性在电阻的中间态表现的尤为突出。
发明内容
本申请提供一种内存设备,以提高传统的RRAM crossbar的逻辑运算的准确性。
第一方面,提供一种内存设备,所述内存设备包括控制总线和多个内存单元,所述多个内存单元之间通过所述控制总线相连,所述多个内存单元中的每个内存单元包括:控制模块,所述控制模块通过所述控制总线与处理器相连,并通过所述控制总线接收和解析所述处理器的指令,其中,所述处理器的指令包括逻辑运算指令;逻辑模块,所述逻辑模块与所述控制模块相连,所述逻辑模块包括至少一层RRAM交叉阵列(即RRAM crossbar),所述至少一层RRAM交叉阵列中的电阻的阻值为Ron或Roff,Ron表示布尔值1,Roff表示布尔值0,所述控制模块根据所述逻辑运算指令,通过所述至少一层RRAM交叉阵列进行布尔运算。
通过将RRAM交叉阵列中的电阻的配置为Ron或Roff,并利用Ron和Roff, 分别布尔值1和0,实现了RRAM交叉阵列的布尔运算,提高了RRAM交叉阵列逻辑运算的准确性。
结合第一方面,在第一方面的第一种实现方式中,所述逻辑运算指令用于指示所述逻辑模块进行布尔向量A和布尔向量B的点乘运算,A、B均表示N维布尔向量,N为不小于2的正整数,所述逻辑模块包括多层RRAM交叉阵列,所述多层RRAM交叉阵列中的第一层RRAM交叉阵列包括N行×N列的电阻阵列,所述第一层RRAM交叉阵列的每一行中的电阻的输入端连接一条字线,所述第一层RRAM交叉阵列的每一列中的电阻的输出端连接一条位线,所述第一层RRAM交叉阵列的N条字线与所述控制模块相连,所述第一层RRAM交叉阵列的N条位线分别通过N个比较电路与所述多层RRAM交叉阵列中的其他RRAM交叉阵列相连;所述第一层RRAM交叉阵列根据所述N条字线输入的电压信号以及所述第一层RRAM交叉阵列中的电阻的阻值,在所述N条位线上生成N个电流信号,其中,所述N条字线中的第j条字线输入的电压信号的电压值为Bj对应的电压值,所述第一层RRAM交叉阵列中的第j行电阻的阻值为Aj对应的阻值,Bj为布尔向量B的第j个元素,Aj为布尔向量A的第j个元素,j的取值分别从0到N-1;所述N个比较电路分别将所述N个电流信号转换成N个电压信号,并将所述N个电压信号与所述N个比较电路各自对应的电压阈值进行比较,使得所述N条位线的输出端输出第一计算结果对应的电压信号,其中,所述第一计算结果为N维布尔向量,且所述第一计算结果的前K个元素为1,其余元素为0,K为A与B点乘的运算结果;所述其他RRAM交叉阵列从所述N条位线的输出端接收所述第一计算结果对应的电压信号,并根据所述第一计算结果对应的电压信号以及所述其他RRAM交叉阵列中的电阻的阻值,获得第二计算结果对应的电压信号,其中,所述第二计算结果为K的二进制表示。
通过多层RRAM交叉阵列实现布尔向量的点乘运算。
结合第一方面的第一种实现方式,在第一方面的第二种实现方式中,所述N个比较电路中的第j个比较电路包括固定阻值的电阻Rs和比较器,所述电阻Rs的一端与所述N条位线中的第j条位线以及所述比较器相连,所述电阻Rs的另一端与地相连,所述第j个比较电路的电压阈值为Vr*gon*Rs*(2j+1)/2,其中,Vr表示布尔值1对应的电压值,gon表示Ron的倒数。
结合第一方面的第二种实现方式,在第一方面的第三种实现方式中,所述逻辑模块包括至少三层RRAM交叉阵列,所述其他层RRAM交叉阵列包括第二层RRAM交叉阵列和第三层RRAM交叉阵列;所述第二层RRAM交叉阵列包括(2N-1)行×N列的电阻阵列,所述第二层RRAM交叉阵列中的每一行的电阻的输入端连接一条字线,所述第二层RRAM交叉阵列中的每一列的电阻的输出端连接一条位线,所述第二层RRAM交叉阵列的字线与所述第一层RRAM交叉阵列的位线的输出端相连;所述第二层RRAM交叉阵列通过所述2N-1条字线从所述第一层RRAM的位线的输出端接收所述第一计算结果对应的电压信号,并根据所述第一计算结果对应的电压信号和所述第二层RRAM交叉阵列中的电阻的阻值进行逻辑运算:
Figure PCTCN2016071254-appb-000001
获得中间计算结果对应的电压信号,其中
Figure PCTCN2016071254-appb-000002
为所述第一层RRAM交叉阵列的第j条位线输出的电压信号对应的布尔值取反,O1,j+1为所述第一层RRAM交叉阵列的第j+1条位线输出的电压信号对应的布尔值,
Figure PCTCN2016071254-appb-000003
为所述第二层RRAM交叉阵列的第j条位线输出的电压信号对应的布尔值取反;所述第三层RRAM交叉阵列包括N行×n列的电阻阵列,所述第三层RRAM交叉阵列中的每一行的电阻的输入端连接一条字线,所述第三层RRAM交叉阵列中的每一列的电阻的输出端连接一条位线,其中,n大于等于利用二进制表示整数N所需的最小位数;所述第三层RRAM交叉阵列通过所述第 三层RRAM交叉阵列的N条字线,从所述第二层交叉阵列的N条位线接收所述中间计算结果对应的电压信号,并根据所述中间计算结果对应的电压信号以及所述第三层RRAM交叉阵列中的电阻的阻值,对所述中间计算结果进行编码,获得所述第二计算结果对应的电压信号。
上文中的n的取值大于等于利用二进制表示整数N的最小位数可以这样理解:假设N=8,那么要利用二进制表示N至少需要4位,即1000表示N,因此,n≥4。
结合第一方面的第三种实现方式,在第一方面的第四种实现方式中,所述第三层RRAM交叉阵列的第j条字线连接所述第二层RRAM交叉阵列的第j条位线,所述第三层RRAM交叉阵列的第j行电阻的阻值对应整数j+1的二进制表达。
结合第一方面的第一种至第四种实现方式的任一种,在第一方面的第五种实现方式中,所述布尔向量A为布尔矩阵Φ的任意一个行向量,所述布尔向量B为布尔矩阵X的任意一个列向量,所述内存设备中的多个逻辑模块中的每个逻辑模块负责所述布尔矩阵Φ的部分行向量和所述布尔矩阵X的部分列向量的点乘运算,所述多个逻辑模块共同实现布尔矩阵Φ和布尔矩阵X的布尔矩阵乘法运算。
结合第一方面或其上述实现方式的任一种,在第一方面的第六种实现方式中,所述处理器的指令还包括数据的读写指令,所述每个内存单元还包括:存储模块,所述存储模块与所述控制模块相连,所述控制模块根据所述数据的读写指令,通过所述存储模块进行数据读写。
第二方面,提供一种基于多层RRAM交叉阵列的数据处理方法,所述多层RRAM交叉阵列中的电阻的阻值为Ron或Roff,Ron表示布尔值1,Roff表示布尔值0,所述多层RRAM交叉阵列用于进行布尔向量A和布尔向量B的点乘运算,A、B均表示N维布尔向量,N为不小于2的正整数,所述多层RRAM交叉阵列中的第一层RRAM交叉阵列包括N行×N列的电阻阵列, 所述第一层RRAM交叉阵列的每一行中的电阻的输入端连接一条字线,所述第一层RRAM交叉阵列的每一列中的电阻的输出端连接一条位线,所述第一层RRAM交叉阵列的N条位线分别通过N个比较电路与所述多层RRAM交叉阵列中的其他RRAM交叉阵列相连;所述方法包括:所述第一层RRAM交叉阵列根据所述第一层RRAM交叉阵列的N条字线输入的电压信号以及所述第一层RRAM交叉阵列中的电阻的阻值,在所述N条位线上生成N个电流信号,其中,所述N条字线中的第j条字线输入的电压信号的电压值为Bj对应的电压值,所述第一层RRAM交叉阵列中的第j行电阻的阻值为Aj对应的阻值,Bj为布尔向量B的第j个元素,Aj为布尔向量A的第j个元素,j的取值分别从0到N-1;所述N个比较电路将所述N个电流信号转换成N个电压信号,并将所述N个电压信号与所述N个比较电路各自对应的电压阈值进行比较,使得所述N条位线的输出端输出第一计算结果对应的电压信号,其中,所述第一计算结果为N维布尔向量,且所述第一计算结果的前K个元素为1,其余元素为0,K为A与B点乘的运算结果;所述其他RRAM交叉阵列从所述N条位线的输出端接收所述第一计算结果对应的电压信号,并根据所述第一计算结果对应的电压信号以及所述其他RRAM交叉阵列中的电阻的阻值,获得第二计算结果对应的电压信号,其中,所述第二计算结果为K的二进制表示。
通过将RRAM交叉阵列中的电阻的配置为Ron或Roff,并利用Ron和Roff,分别布尔值1和0,实现了RRAM交叉阵列的布尔运算,提高了RRAM交叉阵列逻辑运算的准确性。
结合第二方面,在第二方面的第一种实现方式中,所述逻辑模块包括至少三层RRAM交叉阵列,所述其他层RRAM交叉阵列包括第二层RRAM交叉阵列和第三层RRAM交叉阵列;所述第二层RRAM交叉阵列包括(2N-1)行×N列的电阻阵列,所述第二层RRAM交叉阵列中的每一行的电阻的输入端连接一条字线,所述第二层RRAM交叉阵列中的每一列的电阻的输出 端连接一条位线,所述第二层RRAM交叉阵列的字线与所述第一层RRAM交叉阵列的位线的输出端相连;所述第三层RRAM交叉阵列包括N行×n列的电阻阵列,所述第三层RRAM交叉阵列中的每一行的电阻的输入端连接一条字线,所述第三层RRAM交叉阵列中的每一列的电阻的输出端连接一条位线,其中,n大于等于利用二进制表示整数N所需的最小位数;所述其他RRAM交叉阵列从所述N条位线的输出端接收所述第一计算结果对应的电压信号,并根据所述第一计算结果对应的电压信号以及所述其他RRAM交叉阵列中的电阻的阻值,获得第二计算结果对应的电压信号,包括:所述第二层RRAM交叉阵列通过所述2N-1条字线从所述第一层RRAM的位线的输出端接收所述第一计算结果对应的电压信号,并根据所述第一计算结果对应的电压信号和所述第二层RRAM交叉阵列中的电阻的阻值进行逻辑运算:
Figure PCTCN2016071254-appb-000004
获得中间计算结果对应的电压信号,其中
Figure PCTCN2016071254-appb-000005
为所述第一层RRAM交叉阵列的第j条位线输出的电压信号对应的布尔值取反,O1,j+1为所述第一层RRAM交叉阵列的第j+1条位线输出的电压信号对应的布尔值,
Figure PCTCN2016071254-appb-000006
为所述第二层RRAM交叉阵列的第j条位线输出的电压信号对应的布尔值取反;所述第三层RRAM交叉阵列通过所述第三层RRAM交叉阵列的N条字线,从所述第二层交叉阵列的N条位线接收所述中间计算结果对应的电压信号,并根据所述中间计算结果对应的电压信号以及所述第三层RRAM交叉阵列中的电阻的阻值,对所述中间计算结果进行编码,以获得所述第二计算结果对应的电压信号。
结合第二方面或第二方面的第一种实现方式,在第二方面的第二种实现方式中,所述N个比较电路中的第j个比较电路包括固定阻值的电阻Rs和比较器,所述电阻Rs的一端与所述N条位线中的第j条位线以及所述比较 器相连,所述电阻Rs的另一端与地相连,所述第j个比较电路的电压阈值为Vr*gon*Rs*(2j+1)/2,其中,Vr表示布尔值1对应的电压值,gon表示Ron的倒数。
在上述某些实现方式中,所述存储模块为基于RRAM交叉阵列的存储模块。基于RRAM交叉阵列的存储模块能够降低内存的功耗。
在上述某些实现方式中,所述控制模块包括:指令队列,用于缓存所述处理器的指令;指令解码器,用于解析所述处理器的指令,并根据解析出的指令执行对应的操作。通过在控制模块中设置缓存队列,能够降低处理器的等待时间。
在上述某些实现方式中,所述控制模块包括静态随机存取存储器(Static Random Access Memory,SRAM),用于存储从所述逻辑模块和/或所述存储模块中的获取的结果数据,所述控制模块还用于将所述结果数据发送至所述处理器。
在上述某些实现方式中,所述控制模块为基于互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)的控制模块。
本申请提高了RRAM crossbar的逻辑运算的准确性。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面所描述的附图仅仅是本发明的一些实施例的附图。
图1是RRAM crossbar的物理结构示意图。
图2是传统的基于RRAM crossbar的逻辑电路结构图。
图3是RRAM crossbar中的电阻的阻值特性曲线图。
图4是本发明实施例的内存设备的示意性结构图。
图5是控制模块45的总线设计方式的示例图。
图6是RRAM crossbar的一层的电路结构示意图。
图7是用于实现布尔矩阵乘法的三层RRAM crossbar的第一层RRAM crossbar的电路图。
图8是用于实现布尔矩阵乘法的三层RRAM crossbar的第二层RRAM crossbar的电路图。
图9是用于实现布尔矩阵乘法的三层RRAM crossbar的第三层RRAM crossbar的电路图。
具体实施方式
图4是本发明实施例的内存设备的示意性结构图。从图4可以看出,该内存设备40总体上呈H-tree结构。在该结构中,存储模块43(或称数据模块,数据阵列,data array等)与逻辑模块44(或称逻辑电路,逻辑运算电路,In-memory logic等)形成配对。
可选地,在一个实施例中,存储模块43与逻辑模块44均可以是RRAM crossbar。进一步地,存储模块43可以是具有一层结构的RRAM crossbar,逻辑模块44可以是具有多层结构的RRAM crossbar。当然,本发明实施例对存储模块43的类型不作具体限定,也可以采用其他类型的存储介质。此外,即使存储模块43和逻辑模块44均采用RRAM crossbar,本发明实施例也对RRAM crossbar的层数不作具体限定,例如,存储模块43也可以设计成多层RRAM crossbar,逻辑模块44也可以设计成一层逻辑模块44(一层RRAM crossbar也可以实现简单的逻辑运算)。
继续参见图4,内存设备40主要包括四个部分:块解码器41(Block decoder)、存储模块43、逻辑模块44和控制模块45。控制模块45可以是基于CMOS的控制模块。存储模块43、逻辑模块44与控制模块45可以形成内存单元42(或称数据逻辑对,Data/Logic pair)。
在图4中,控制模块45分别与存储模块43、逻辑模块44以及块解码器 41相连,并通过块解码器41与处理器47(例如,可以是CPU)相连。控制模块45可以接收并解析处理器47的指令,还可以负责存储模块43和逻辑模块44之间数据的交换或传输。块解码器41可以将处理器47下发的指令传到该指令对应的内存单元42的控制模块45中。此外,在一个实施例中,可以将内存单元42的地址查找功能集成在块解码器41中,其余的控制功能均集成在控制模块45中。
由上文的描述可知,控制模块45可以与块解码器41相连,但块解码器41可以仅负责将处理器47发布的指令传送到对应的内存单元42的控制模块45中。因此,换个角度,可以将控制模块45视为接收和解析处理器指令的主体。下面以图5为例,详细描述控制模块的内部结构及其对指令的处理方式。
图5是控制模块的总线设计方式的示例图。控制模块45可以包括指令队列451、指令解码器452、地址解码器453与SRAM阵列454,它们之间的连接方式可以如图5所示。考虑到内存设备的工作频率会低于处理器的工作频率,因此,可以在控制模块45内部设计指令队列451,用于缓存处理器47下达的指令,以减少处理器47的等待时间。指令解码器452将所要执行的指令(例如,可以是数据的读取、数据的存储以及内存计算等指令,后文会详细描述)进行解析,然后执行对应的操作。地址解码器453可以将指令中的地址信息分解为存储模块43的行/列信息或逻辑模块44的行/列信息。SRAM阵列454可用于暂存存储模块43或逻辑模块44中读出的数据,并根据指令将该数据写入存储模块43或逻辑模块44,或者将数据传回外部的处理器47。
本发明实施例对处理器47内存设备40下发的指令的格式和类型不作具体限定。例如,处理器47向内存设备40下发的指令类型,或者控制模块45需要解析的指令类型可以包括表1所示的4种类型的指令。
表1、控制模块45需要解析的指令类型及参数
Figure PCTCN2016071254-appb-000007
以内存计算的过程为例,首先,对逻辑模块44进行逻辑配置,即配置逻辑模块44中的电阻的阻值,使其能够实现特定的运算逻辑。然后,为逻辑模块44提供输入信号,即向逻辑模块44输入需要进行逻辑运算的数据。接着就可以在逻辑模块44中根据输入信号和已经配置好的运算逻辑进行内存计算。下面结合表1中的指令对内存计算过程进行详细描述。
当需要进行内存计算时,处理器47可以向内存设备40发布以下指令。
指令1:SW指令,将处理器47或存储模块43中的数据写入到逻辑模块44中,以配置逻辑模块44中的RRAM的阻值,使其能够实现特定的逻辑,如求和、异或、乘法等。
指令2:SW指令,将处理器47或存储模块43中的数据写入到逻辑模块44的输入列中(图6中的字线(word line)输入的电压
Figure PCTCN2016071254-appb-000008
)。输入的数据基于之前配置好的运算逻辑就可以在逻辑模块中实现特定的逻辑运算。例如,逻辑模块47实现的是a+b的求和逻辑,可以通过指令1将b存入逻辑模块中,然后通过指令2将a输入,并与b进行求和。
指令3:ST指令,将逻辑模块44的所有行/列的开关打开,使得逻辑模块44的所有行/列均有电流流过。
指令4:WT指令,当利用RRAM crossbar中实现复杂的逻辑运算时,逻辑模块44中的RRAM crossbar需要有多层。这种情况下,RRAM crossbar的计算可能需要一定的时间。因此,可以通过ST指令指示控制模块45等待逻辑模块44的内存计算完成后,再执行后续指令。
指令5:SW指令,完成内存计算后,可以将逻辑模块44运算得到的数据写回到存储模块43中。
需要说明的是,对于特定的逻辑,只需对逻辑模块44进行一次逻辑配置即可,并不是每次内存计算前均要执行上述指令1。换句话说,通过改变逻辑模块44输入列的数据即可实现不同数据的同一逻辑运算。
上文详细描述了控制模块45根据指令进行内存计算的过程。需要说明的是,控制模块45还可以根据指令进行普通的数据读写,该过程与现有技术类似,此处不再详述。下文以图6为例,简单介绍基于RRAM crossbar的逻辑模块44的逻辑运算过程。
图6中,每一条位线(bit line)底端设置有比较电路,实际中,该比较电路可以采用灵敏放大器(Sense Amplifier,SA)。该SA包括阻值较小的固 定电阻Rs(例如,Rs<Ron<Roff)和运算放大器,其作用是将列(即位线)上的电流信号转换为电压,并将此电压与第一层RRAM crossbar的电压阈值进行比较,得到这一列的计算结果。对每一列来说,其计算公式如下所示:
Figure PCTCN2016071254-appb-000009
Figure PCTCN2016071254-appb-000010
其中,公式(1)中,
Figure PCTCN2016071254-appb-000011
表示第i行的字线(word line)的电压,
Figure PCTCN2016071254-appb-000012
表示第j列的位线(bit line)的电压,gij表示电阻Rij对应的导纳(Rij的倒数),
Figure PCTCN2016071254-appb-000013
表示第j列对应的电压阈值,
Figure PCTCN2016071254-appb-000014
表示第j列的输出电压。此外,图6仅示出一层RRAM crossbar的电路结构,如果逻辑模块44包括多层RRAM crossbar,则上一层的列输出
Figure PCTCN2016071254-appb-000015
可以作为下一层的行输入,每一层的结构可以相同,也可以不同。最后一层的列输出
Figure PCTCN2016071254-appb-000016
即可作为逻辑模块44最终的输出。
为了用RRAM crossbar实现特定的逻辑运算(或函数),可以通过以下几个步骤来配置RRAM crossbar中的电阻的阻值以及每一列的电压阈值(换句话说,RRAM crossbar中的电阻值和每一列的电压阈值决定了RRAM crossbar到低实现了怎样的逻辑):
第一步:在软件(MatLab、Octave等)中确定实现该特定逻辑所需的RRAM crossbar的层数以及每层中的行列大小。
第二步:计算出每层RRAM crossbar中的电阻的阻值,以及比较电路的电压阈值。
第三步:通过指令将计算所得的RRAM crossbar中的电阻的阻值分别存入对应的电阻,并设置比较电路的电压阈值。
第四步:通过硬件(逻辑模块44的电路)实现该特定的逻辑计算。
上文结合图2和图3详细描述了基于模拟信号的RRAM crossbar存在的一些缺点,如过多的AD和DA转换操作,以及阻值的配置过程存在误差等。 为了能够克服这些缺点,下面结合具体的实施例,介绍基于数字信号的RRAM crossbar的具体实现方式。
首先,从图3中的(b)可以看出,相比于中间态,如果将RRAM crossbar中的电阻配置(RRAM crossbar中的电阻的配置过程可以称为RRAM programming)为Ron或Roff更为可靠。也就是说,如果仅将RRAM crossbar中的电阻配置为Ron或Roff,可以降低电阻配置的误差,提高逻辑运算的可靠性。基于此,本发明实施例将RRAM crossbar中的电阻要么配置为Ron状态,要么配置为Roff状态。其中,Roff可以表示布尔值“0”(或数字“0”),Ron可以表示布尔值“1”(或数字“1”)。然后,可以将RRAM crossbar的行输入接口设计成纯数字接口,无需AD转换(从图6可以看出,RRAM crossbar的行输入的是电压信号,这里使其输入的电压信号包括高电平和低电平即可,高电平对应布尔值1,低电平对应布尔值0,但本发明实施例对高电平和低电平的电压值不作具体限定)。通过以上设置,就可以将图2所示的基于模拟信号的逻辑模块转换成了基于数字信号的逻辑模块。
参见图6,每一列的SA中的电阻Rs可以是具有固定阻值的小电阻,Roff、Ron、RS之间可以满足Roff>>Ron>>Rs。在每一列的末端连接一个比较电路(图中以SA为例),参见公式(2),该SA将该列的电压与该列的电压阈值进行比较,得到该列的输出电压,该输出电压要么是布尔值0对应的电压(低电平),要么是布尔值1对应的电压(高电平)。
应理解,通过配置RRAM crossbar中的电阻和字线的每一列的电压阈值能够实现特定的逻辑,但本发明实施例对逻辑类型不作具体限定。下文以布尔矩阵(矩阵中的元素均为0、1元素)乘法为例,详细描述为了实现布尔矩阵乘法,应该如何配置RRAM crossbar中的电阻的阻值,以及RRAM crossbar中的字线的阈值电压。
为了便于理解,先对矩阵乘法Y=ΦX的计算过程进行说明。
矩阵X和Φ的一般形式和向量形式如下:
Figure PCTCN2016071254-appb-000017
Figure PCTCN2016071254-appb-000018
矩阵Φ与矩阵X的乘积也可以看成是列向量
Figure PCTCN2016071254-appb-000019
和行向量[X1 X2 X3 ...]的乘积,具体参见式(4):
Figure PCTCN2016071254-appb-000020
从公式(3)和(4)可以看出,矩阵Y的每个元素均是矩阵Φ的一行与矩阵X的一列点乘的结果(即求内积)。
本发明实施例首先提出一种逻辑模块,该逻辑模块可以基于多层RRAM crossbar实现布尔向量(所谓布尔向量,即向量中的元素均为0或1)的点乘运算逻辑。在此基础上,本发明实施例进一步提出一种能够实现布尔矩阵(所谓布尔矩阵,即矩阵中的每个元素均为0或1)乘法运算的内存设备。该内存设备可以包括能够实现布尔向量乘法的一个或多个逻辑模块。由于布尔矩阵的乘法运算可以分解成多个布尔向量的点乘运算,该内存设备可以布尔矩阵乘法运算分解成多个布尔向量的点乘运算,然后将该多个布尔向量的点乘运算任务分发至该一个或多个逻辑模块,通过该一个或多个逻辑模块共同实现布尔矩阵的乘法运算。
下面以布尔向量[φ0,j1,j...φN-1,j](可以看成布尔矩阵Φ中的任一行元素组成的向量,对应于上文中的布尔向量A)与布尔向量[xi,0,xi,1...xi,N-1](可以看 成矩阵X中的任一列元素组成的布尔向量,对应于上文中的布尔向量B)的点乘为例,对实现布尔向量点乘运算的多层RRAM crossbar的结构和功能进行详细描述。
该多层RRAM crossbar具体可以包括三层RRAM crossbar。该三层RRAM crossbar的第一层RRAM crossbar可以采用图7所示的电路。在图7中,第一层RRAM crossbar包括N×N的电阻阵列。该N×N的电阻阵列的各列上的N个电阻分别表示布尔向量[φ0,j1,j...φN-1,j]。举例来说,假设N为8,[φ0,j1,j...φN-1,j]的8个元素依次为10101010,则第一层RRAM crossbar的N×N的电阻阵列中的第一列的阻值依次为Ron(对应于1)、Roff(对应于0)、Ron、Roff、Ron、Roff、Ron、Roff,且N×N的电阻阵列中的每一列均采用与第一列相同的电阻配置。
在N×N的电阻阵列的每一列(bit line)的底端均设置有一个比较电路(后文以比较电路为SA为例)。该比较电路可以包括阻值较小的固定电阻Rs与比较器。该比较电路的作用是将每一列上的电流信号转换成电压信号,并将该电压信号与该列的电压阈值Vth1进行比较,从而确定这一列的计算结果是0还是1。N×N的电阻阵列中的各列的电压阈值可以依次设置为:Vr*gon*Rs*(2j+1)/2,j为0到N-1的正整数。其中,Vr表示X输入为1时的实际电压(即高电平),gon表示电阻Ron对应的导纳,Rs表示采样电阻的阻值。从该式可以看出,N×N的电阻阵列中的各列的阈值依次增大,整体呈台阶状(如图7所示)。
下面介绍该第一层RRAM crossbar可以实现的逻辑功能。
第一层RRAM crossbar输入布尔向量[xi,0,xi,1...xi,N-1]对应的电压信号(即布尔向量[xi,0,xi,1...xi,N-1]中的元素1对应的字线输入高电平,布尔向量[xi,0,xi,1...xi,N-1]中的元素0对应的字线输入低电平)。如上文所述,第一层RRAM crossbar中的每一列电阻的阻值为布尔向量[φ0,j1,j...φN-1,j]对应的阻值。 当第一层RRAM crossbar全部的行列开关打开时,第一层RRAM crossbar基于电压和电流的关系,在每条位线上实现布尔向量[φ0,j1,j...φN-1,j]和布尔向量[xi,0,xi,1...xi,N-1]的点乘逻辑,该点乘逻辑的结果可以通过每条字线上的电流表示。然后,第一层RRAM crossbar的字线连接的SA通过设置上文描述的阶梯状的电压阈值,会在位线的输出端输出第一计算结果对应的电压信号,其中,该第一计算结果为N维布尔向量,且第一计算结果的前K个元素为1,其余元素为0,K为布尔向量[φ0,j1,j...φN-1,j]与布尔向量[xi,0,xi,1...xi,N-1]点乘的运算结果。例如,假设N=8,K=3,则通过第一层RRAM crossbar的逻辑运算,第一层RRAM crossbar的输出O1,j(0≤j≤N-1)结果为:11100000。可以这样理解,第0-3列的SA的比较结果均为列电压大于电压阈值,第4-7列的SA的比较结果均为列电压小于电压阈值。
接下来,三层RRAM crossbar中的第二层RRAM crossbar和第三层RRAM crossbar的逻辑任务就是将第一层RRAM crossbar的输出结果转换成K的二进制表达,仍以K=3为例,第一层RRAM crossbar的输出结果为11100000,第二层RRAM crossbar和第三层RRAM crossbar的逻辑任务就是将11100000转换成11,即二进制的3,下面继续描述第二层RRAM crossbar和第三层RRAM crossbar的结构和逻辑功能。(这里用第二层RRAM crossbar和第三层RRAM crossbar共同完成上述逻辑任务,但本发明实施例不限于此,也可以通过一层或三层以上的RRAM crossbar完成上述逻辑任务)。
为了实现上述逻辑任务,第二层RRAM crossbar可以采用如图8所示的结构。在图8中,第二层RRAM crossbar包括(2N-1)×N的电阻阵列。第二层RRAM crossbar的第j列电阻中的第2j和第2j+1个电阻的阻值为Ron,其余电阻的阻值为Roff,其中,0≤j≤N-2。第二层RRAM crossbar的第N-1列电阻中的第2N-1个电阻的阻值为Ron,其余电阻的阻值为Roff。第二层RRAM crossbar的2N-1个字线与第一层RRAM crossbar的位线输出端O1,j可以采用图8所示的连接关系。需要注意的是,第一层RRAM crossbar的某些位线的 输出端需要进行取反操作之后再连接到第二层RRAM crossbar的字线。如图7中的
Figure PCTCN2016071254-appb-000021
这样的取反操作可以通过连接比较器、反相器等器件实现,本发明实施例对此不作具体限定。RRAM crossbar的第二层的每个位线末端连接比较电路,该比较电路的电压阈值Vth2可以设置为Vr*gon*Rs/2,而且该运算电路中的比较器的正负极性需要对调(即比较器的同相输入端设置阈值电压Vth2,反相输入端连接与电阻Rs和位线相连),得到第二层的输出O2,j,0≤j≤N-1。
第二层RRAM crossbar的输出O2,j与第一层RRAM crossbar的输出之间的关系可以通过式(5)表达,即式(5)为RRAM crossbar的第二层要实现的逻辑功能。
Figure PCTCN2016071254-appb-000022
公式(5)所表达的逻辑实际上一种异或逻辑,即将第一层RRAM crossbar输出的第一计算结果两两求异或,得到中间计算结果,该中间计算结果为N维向量,该N维向量的第K-1个元素为1,其余元素为0,K为布尔向量[φ0,j1,j...φN-1,j]与布尔向量[xi,0,xi,1...xi,N-1]点乘的运算结果。以RRAM crossbar的第一层的输出结果为11100000为例,经过第二层的逻辑运算,得到的结果为00100000。但需要说明的是,本发明实施例对实现上述异或逻辑的RRAM crossbar的结构不作具体限定,图8仅是举例说明,实际中,还可以通过其他电阻阻值以及电压阈值的配置方式实现上述异或逻辑。
第二层RRAM crossbar将中间计算结果对应的电压信号传递给第二层RRAM crossbar的字线,其中,第二层RRAM crossbar的第j条位线的输出端O2,j连接第三层RRAM crossbar的第j条字线的输入端。第三层RRAM crossbar的逻辑电路如图9所示。第三层RRAM crossbar包括N×n的电阻阵列,n大于等于利用二进制表示整数N所需的最小位数(例如,n=8,那么二进制的8为1000,即表示二进制的8需要4位,那么n大于等于4), 通过设置N行电阻的阻值使其自上而下依次表示二进制的1至N。这样一来,当利用第二层RRAM crossbar输出的中间运算结果对应的电压信号作为第三层RRAM crossbar的输入电压信号时,由于中间运算结果的第K-1个元素为布尔值1,其余元素为布尔值0,那么第三层RRAM crossbar的第K-1条位线就会输入布尔值1对应的电压信号(高电平),其余位线就会输入布尔值0对应的电压信号(低电平),从逻辑的角度来看,这样的配置方式会使得第三层RRAM crossbar最终输出的运算结果(对应于上文中的第二运算结果)为K的二进制表示,相当于第三层RRAM crossbar的第K-1行电阻对应的整数的二进制表示(由上文可知,第K-1行电阻对应整数K的二进制表示)被选为最终的运算结果。第三层RRAM crossbar的输出即为上述最终运算结果对应的电压信号。
仍以N=8,K=3为例,第二层RRAM crossbar的逻辑输出为00100000。第三层RRAM crossbar的输入和输出具有下表所示的逻辑对应关系:
表2:第三层RRAM crossbar的输入输出对照表
Figure PCTCN2016071254-appb-000023
从上表可以看出,00100000对应的输出为0011,即二进制表示的3。
需要说明的是,如果输入的矩阵是非布尔矩阵(例如,输入的矩阵是正 实数矩阵),可以通过线性代数的方式将其分解成多个布尔矩阵的线性组合。然后,将该多个布尔矩阵按照上述方式分别运算,再将该多个布尔矩阵的运算结果进行线性组合,得到实矩阵对应的矩阵相乘的结果,本发明实施例对此不再详细描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此。

Claims (10)

  1. 一种内存设备,其特征在于,所述内存设备包括控制总线和多个内存单元,所述多个内存单元之间通过所述控制总线相连,所述多个内存单元中的每个内存单元包括:
    控制模块,所述控制模块通过所述控制总线与处理器相连,并通过所述控制总线接收和解析所述处理器的指令,其中,所述处理器的指令包括逻辑运算指令;
    逻辑模块,所述逻辑模块与所述控制模块相连,所述逻辑模块包括至少一层电阻式随机存取存储器RRAM交叉阵列,所述至少一层RRAM交叉阵列中的电阻的阻值为Ron或Roff,Ron表示布尔值1,Roff表示布尔值0,所述控制模块根据所述逻辑运算指令,通过所述至少一层RRAM交叉阵列进行布尔运算。
  2. 如权利要求1所述的内存设备,其特征在于,所述逻辑运算指令用于指示所述逻辑模块进行布尔向量A和布尔向量B的点乘运算,A、B均表示N维布尔向量,N为不小于2的正整数;
    所述逻辑模块包括多层RRAM交叉阵列,所述多层RRAM交叉阵列中的第一层RRAM交叉阵列包括N行×N列的电阻阵列,所述第一层RRAM交叉阵列的每一行中的电阻的输入端连接一条字线,所述第一层RRAM交叉阵列的每一列中的电阻的输出端连接一条位线,所述第一层RRAM交叉阵列的N条字线与所述控制模块相连,所述第一层RRAM交叉阵列的N条位线分别通过N个比较电路与所述多层RRAM交叉阵列中的其他RRAM交叉阵列相连;
    所述第一层RRAM交叉阵列根据所述N条字线输入的电压信号以及所述第一层RRAM交叉阵列中的电阻的阻值,在所述N条位线上生成N个电流信号,其中,所述N条字线中的第j条字线输入的电压信号的电压值为Bj对应的电压值,所述第一层RRAM交叉阵列中的第j行电阻的阻值为Aj 对应的阻值,Bj为布尔向量B的第j个元素,Aj为布尔向量A的第j个元素,j的取值分别从0到N-1;
    所述N个比较电路分别将所述N个电流信号转换成N个电压信号,并将所述N个电压信号与所述N个比较电路各自对应的电压阈值进行比较,使得所述N条位线的输出端输出第一计算结果对应的电压信号,其中,所述第一计算结果为N维布尔向量,且所述第一计算结果的前K个元素为1,其余元素为0,K为A与B点乘的运算结果;
    所述其他RRAM交叉阵列从所述N条位线的输出端接收所述第一计算结果对应的电压信号,并根据所述第一计算结果对应的电压信号以及所述其他RRAM交叉阵列中的电阻的阻值,获得第二计算结果对应的电压信号,其中,所述第二计算结果为K的二进制表示。
  3. 如权利要求2所述的内存设备,其特征在于,所述N个比较电路中的第j个比较电路包括固定阻值的电阻Rs和比较器,所述电阻Rs的一端与所述N条位线中的第j条位线以及所述比较器相连,所述电阻Rs的另一端与地相连,所述第j个比较电路的电压阈值为Vr*gon*Rs*(2j+1)/2,其中,Vr表示布尔值1对应的电压值,gon表示Ron的倒数。
  4. 如权利要求3所述的内存设备,其特征在于,所述逻辑模块包括至少三层RRAM交叉阵列,所述其他层RRAM交叉阵列包括第二层RRAM交叉阵列和第三层RRAM交叉阵列;
    所述第二层RRAM交叉阵列包括(2N-1)行×N列的电阻阵列,所述第二层RRAM交叉阵列中的每一行的电阻的输入端连接一条字线,所述第二层RRAM交叉阵列中的每一列的电阻的输出端连接一条位线,所述第二层RRAM交叉阵列的字线与所述第一层RRAM交叉阵列的位线的输出端相连;
    所述第二层RRAM交叉阵列通过所述2N-1条字线从所述第一层RRAM的位线的输出端接收所述第一计算结果对应的电压信号,并根据所述第一计 算结果对应的电压信号和所述第二层RRAM交叉阵列中的电阻的阻值进行逻辑运算:
    Figure PCTCN2016071254-appb-100001
    获得中间计算结果对应的电压信号,其中
    Figure PCTCN2016071254-appb-100002
    为所述第一层RRAM交叉阵列的第j条位线输出的电压信号对应的布尔值取反,O1,j+1为所述第一层RRAM交叉阵列的第j+1条位线输出的电压信号对应的布尔值,
    Figure PCTCN2016071254-appb-100003
    为所述第二层RRAM交叉阵列的第j条位线输出的电压信号对应的布尔值取反;
    所述第三层RRAM交叉阵列包括N行×n列的电阻阵列,所述第三层RRAM交叉阵列中的每一行的电阻的输入端连接一条字线,所述第三层RRAM交叉阵列中的每一列的电阻的输出端连接一条位线,其中,n大于等于利用二进制表示整数N所需的最小位数;
    所述第三层RRAM交叉阵列通过所述第三层RRAM交叉阵列的N条字线,从所述第二层交叉阵列的N条位线接收所述中间计算结果对应的电压信号,并根据所述中间计算结果对应的电压信号以及所述第三层RRAM交叉阵列中的电阻的阻值,对所述中间计算结果进行编码,获得所述第二计算结果对应的电压信号。
  5. 如权利要求4所述的内存设备,其特征在于,所述第三层RRAM交叉阵列的第j条字线连接所述第二层RRAM交叉阵列的第j条位线,所述第三层RRAM交叉阵列的第j行电阻的阻值对应整数j+1的二进制表达。
  6. 如权利要求2-5中任一项所述内存设备,其特征在于,所述布尔向量A为布尔矩阵Φ的任意一个行向量,所述布尔向量B为布尔矩阵X的任意一个列向量,所述内存设备中的多个逻辑模块中的每个逻辑模块负责所述布尔矩阵Φ的部分行向量和所述布尔矩阵X的部分列向量的点乘运算,所述多个逻辑模块共同实现布尔矩阵Φ和布尔矩阵X的布尔矩阵乘法运算。
  7. 如权利要求1-6中任一项所述的内存设备,其特征在于,所述处理 器的指令还包括数据的读写指令,所述每个内存单元还包括:
    存储模块,所述存储模块与所述控制模块相连,所述控制模块根据所述数据的读写指令,通过所述存储模块进行数据读写。
  8. 一种基于多层电阻式随机存取存储器RRAM交叉阵列的数据处理方法,其特征在于,所述多层RRAM交叉阵列中的电阻的阻值为Ron或Roff,Ron表示布尔值1,Roff表示布尔值0,所述多层RRAM交叉阵列用于进行布尔向量A和布尔向量B的点乘运算,A、B均表示N维布尔向量,N为不小于2的正整数,所述多层RRAM交叉阵列中的第一层RRAM交叉阵列包括N行×N列的电阻阵列,所述第一层RRAM交叉阵列的每一行中的电阻的输入端连接一条字线,所述第一层RRAM交叉阵列的每一列中的电阻的输出端连接一条位线,所述第一层RRAM交叉阵列的N条位线分别通过N个比较电路与所述多层RRAM交叉阵列中的其他RRAM交叉阵列相连;
    所述方法包括:
    所述第一层RRAM交叉阵列根据所述第一层RRAM交叉阵列的N条字线输入的电压信号以及所述第一层RRAM交叉阵列中的电阻的阻值,在所述N条位线上生成N个电流信号,其中,所述N条字线中的第j条字线输入的电压信号的电压值为Bj对应的电压值,所述第一层RRAM交叉阵列中的第j行电阻的阻值为Aj对应的阻值,Bj为布尔向量B的第j个元素,Aj为布尔向量A的第j个元素,j的取值分别从0到N-1;
    所述N个比较电路将所述N个电流信号转换成N个电压信号,并将所述N个电压信号与所述N个比较电路各自对应的电压阈值进行比较,使得所述N条位线的输出端输出第一计算结果对应的电压信号,其中,所述第一计算结果为N维布尔向量,且所述第一计算结果的前K个元素为1,其余元素为0,K为A与B点乘的运算结果;
    所述其他RRAM交叉阵列从所述N条位线的输出端接收所述第一计算结果对应的电压信号,并根据所述第一计算结果对应的电压信号以及所述其 他RRAM交叉阵列中的电阻的阻值,获得第二计算结果对应的电压信号,其中,所述第二计算结果为K的二进制表示。
  9. 如权利要求8所述的方法,其特征在于,所述逻辑模块包括至少三层RRAM交叉阵列,所述其他层RRAM交叉阵列包括第二层RRAM交叉阵列和第三层RRAM交叉阵列;
    所述第二层RRAM交叉阵列包括(2N-1)行×N列的电阻阵列,所述第二层RRAM交叉阵列中的每一行的电阻的输入端连接一条字线,所述第二层RRAM交叉阵列中的每一列的电阻的输出端连接一条位线,所述第二层RRAM交叉阵列的字线与所述第一层RRAM交叉阵列的位线的输出端相连;
    所述第三层RRAM交叉阵列包括N行×n列的电阻阵列,所述第三层RRAM交叉阵列中的每一行的电阻的输入端连接一条字线,所述第三层RRAM交叉阵列中的每一列的电阻的输出端连接一条位线,其中,n大于等于利用二进制表示整数N所需的最小位数;
    所述其他RRAM交叉阵列从所述N条位线的输出端接收所述第一计算结果对应的电压信号,并根据所述第一计算结果对应的电压信号以及所述其他RRAM交叉阵列中的电阻的阻值,获得第二计算结果对应的电压信号,包括:
    所述第二层RRAM交叉阵列通过所述2N-1条字线从所述第一层RRAM的位线的输出端接收所述第一计算结果对应的电压信号,并根据所述第一计算结果对应的电压信号和所述第二层RRAM交叉阵列中的电阻的阻值进行逻辑运算:
    Figure PCTCN2016071254-appb-100004
    获得中间计算结果对应的电压信号,其中
    Figure PCTCN2016071254-appb-100005
    为所述第一层RRAM交叉阵列的第j条位线输出的电压信号对应的布尔值取反,O1,j+1为所述第一层 RRAM交叉阵列的第j+1条位线输出的电压信号对应的布尔值,
    Figure PCTCN2016071254-appb-100006
    为所述第二层RRAM交叉阵列的第j条位线输出的电压信号对应的布尔值取反;
    所述第三层RRAM交叉阵列通过所述第三层RRAM交叉阵列的N条字线,从所述第二层交叉阵列的N条位线接收所述中间计算结果对应的电压信号,并根据所述中间计算结果对应的电压信号以及所述第三层RRAM交叉阵列中的电阻的阻值,对所述中间计算结果进行编码,以获得所述第二计算结果对应的电压信号。
  10. 如权利要求8或9所述的方法,其特征在于,所述N个比较电路中的第j个比较电路包括固定阻值的电阻Rs和比较器,所述电阻Rs的一端与所述N条位线中的第j条位线以及所述比较器相连,所述电阻Rs的另一端与地相连,所述第j个比较电路的电压阈值为Vr*gon*Rs*(2j+1)/2,其中,Vr表示布尔值1对应的电压值,gon表示Ron的倒数。
PCT/CN2016/071254 2016-01-18 2016-01-18 内存设备及基于多层rram交叉阵列的数据处理方法 WO2017124237A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
JP2018537499A JP6702596B2 (ja) 2016-01-18 2016-01-18 多層rramクロスバー・アレイに基づくメモリデバイス、およびデータ処理方法
PCT/CN2016/071254 WO2017124237A1 (zh) 2016-01-18 2016-01-18 内存设备及基于多层rram交叉阵列的数据处理方法
SG11201805489YA SG11201805489YA (en) 2016-01-18 2016-01-18 Memory device, and data processing method based on multi-layer rram crossbar array
CN201680058624.6A CN108475522B (zh) 2016-01-18 2016-01-18 内存设备及基于多层rram交叉阵列的数据处理方法
EP16885510.4A EP3389051B1 (en) 2016-01-18 2016-01-18 Memory device and data-processing method based on multi-layer rram crossbar array
US16/037,767 US10459724B2 (en) 2016-01-18 2018-07-17 Memory device, and data processing method based on multi-layer RRAM crossbar array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/071254 WO2017124237A1 (zh) 2016-01-18 2016-01-18 内存设备及基于多层rram交叉阵列的数据处理方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/037,767 Continuation US10459724B2 (en) 2016-01-18 2018-07-17 Memory device, and data processing method based on multi-layer RRAM crossbar array

Publications (1)

Publication Number Publication Date
WO2017124237A1 true WO2017124237A1 (zh) 2017-07-27

Family

ID=59361091

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/071254 WO2017124237A1 (zh) 2016-01-18 2016-01-18 内存设备及基于多层rram交叉阵列的数据处理方法

Country Status (6)

Country Link
US (1) US10459724B2 (zh)
EP (1) EP3389051B1 (zh)
JP (1) JP6702596B2 (zh)
CN (1) CN108475522B (zh)
SG (1) SG11201805489YA (zh)
WO (1) WO2017124237A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114594819A (zh) * 2022-01-19 2022-06-07 之江实验室 可跟踪铁电电容工艺的自适应调节操作电压的电路和方法

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10878317B2 (en) * 2017-09-11 2020-12-29 Samsung Electronics Co., Ltd. Method and system for performing analog complex vector-matrix multiplication
IT201700108281A1 (it) * 2017-09-27 2019-03-27 Milano Politecnico "circuito di risoluzione di problemi matematici comprendente elementi resistivi."
US11189345B2 (en) * 2018-01-22 2021-11-30 Institute of Microelectronics, Chinese Academy of Sciences Method for implementing logic calculation based on a crossbar array structure of resistive switching device
JP7070190B2 (ja) * 2018-07-18 2022-05-18 株式会社デンソー ニューラルネットワーク回路
US10643119B2 (en) * 2018-07-24 2020-05-05 Sandisk Technologies Llc Differential non-volatile memory cell for artificial neural network
US11170290B2 (en) 2019-03-28 2021-11-09 Sandisk Technologies Llc Realization of neural networks with ternary inputs and binary weights in NAND memory arrays
US10643705B2 (en) 2018-07-24 2020-05-05 Sandisk Technologies Llc Configurable precision neural network with differential binary non-volatile memory cell structure
US11328204B2 (en) 2018-07-24 2022-05-10 Sandisk Technologies Llc Realization of binary neural networks in NAND memory arrays
CN108763163B (zh) * 2018-08-02 2023-10-20 北京知存科技有限公司 模拟向量-矩阵乘法运算电路
US11410025B2 (en) * 2018-09-07 2022-08-09 Tetramem Inc. Implementing a multi-layer neural network using crossbar array
US10489483B1 (en) * 2018-09-21 2019-11-26 National Technology & Engineering Solutions Of Sandia, Llc Circuit arrangement and technique for setting matrix values in three-terminal memory cells
CN109327219B (zh) * 2018-10-18 2022-05-03 中国科学院微电子研究所 一种基于忆阻器rram的逻辑运算系统
US11636325B2 (en) 2018-10-24 2023-04-25 Macronix International Co., Ltd. In-memory data pooling for machine learning
CN109521995B (zh) * 2018-11-02 2023-05-12 上海交通大学 一种内嵌于忆阻器阵列的逻辑运算装置的计算方法
US11562229B2 (en) 2018-11-30 2023-01-24 Macronix International Co., Ltd. Convolution accelerator using in-memory computation
US11074318B2 (en) 2018-12-14 2021-07-27 Western Digital Technologies, Inc. Hardware accelerated discretized neural network
US11934480B2 (en) * 2018-12-18 2024-03-19 Macronix International Co., Ltd. NAND block architecture for in-memory multiply-and-accumulate operations
US11114158B1 (en) * 2019-01-23 2021-09-07 Tetramem Inc. Reducing column switch resistance errors in RRAM-based crossbar array circuits
US20200349217A1 (en) * 2019-05-03 2020-11-05 Micron Technology, Inc. Methods and apparatus for performing matrix transformations within a memory array
IT201900014688A1 (it) * 2019-08-12 2019-11-12 Univ Degli Studi Di Modena E Reggio Emilia Metodo di lettura per circuiti del tipo Logic-in-Memory e relativa architettura circuitale
US20220286303A1 (en) * 2019-08-22 2022-09-08 Quantumciel Pte. Ltd. Device, system and method for providing information security
US10726331B1 (en) 2019-08-26 2020-07-28 International Business Machines Corporation Neural network circuits providing early integration before analog-to-digital conversion
US11625586B2 (en) 2019-10-15 2023-04-11 Sandisk Technologies Llc Realization of neural networks with ternary inputs and ternary weights in NAND memory arrays
US11568200B2 (en) 2019-10-15 2023-01-31 Sandisk Technologies Llc Accelerating sparse matrix multiplication in storage class memory-based convolutional neural network inference
JP6818116B1 (ja) * 2019-11-22 2021-01-20 ウィンボンド エレクトロニクス コーポレーション クロスバーアレイを用いた電子装置およびデータ処理方法
US11657259B2 (en) 2019-12-20 2023-05-23 Sandisk Technologies Llc Kernel transformation techniques to reduce power consumption of binary input, binary weight in-memory convolutional neural network inference engine
CN111478703B (zh) * 2020-04-14 2023-08-22 中国人民解放军国防科技大学 基于忆阻交叉阵列的处理电路及输出电流的补偿方法
US11397885B2 (en) 2020-04-29 2022-07-26 Sandisk Technologies Llc Vertical mapping and computing for deep neural networks in non-volatile memory
US11544547B2 (en) 2020-06-22 2023-01-03 Western Digital Technologies, Inc. Accelerating binary neural networks within latch structure of non-volatile memory devices
US11568228B2 (en) 2020-06-23 2023-01-31 Sandisk Technologies Llc Recurrent neural network inference engine with gated recurrent unit cell and non-volatile memory arrays
US11663471B2 (en) 2020-06-26 2023-05-30 Sandisk Technologies Llc Compute-in-memory deep neural network inference engine using low-rank approximation technique
US11544061B2 (en) * 2020-12-22 2023-01-03 International Business Machines Corporation Analog hardware matrix computation
CN113934398A (zh) * 2021-10-09 2022-01-14 广东工业大学 一种基于忆阻器的高效加法器和乘法器设计方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1138719A (zh) * 1995-01-31 1996-12-25 三菱电机株式会社 微计算机
US20100046275A1 (en) * 2008-08-25 2010-02-25 Kabushiki Kaisha Toshiba Nonvolatile semiconductor storage apparatus and data programming method thereof
CN101840995A (zh) * 2009-01-13 2010-09-22 三星电子株式会社 电阻型随机存取存储器及其制造方法
US7835174B2 (en) * 2007-11-09 2010-11-16 Kabushiki Kaisha Toshiba Non-volatile memory device and method of reading data therefrom
CN102169720A (zh) * 2010-02-25 2011-08-31 复旦大学 一种消除过写、误写现象的电阻随机存储器

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2501932B2 (ja) 1990-03-19 1996-05-29 富士通株式会社 ニュ―ラルネットワ―クによる重心決定要素出力装置
US8675391B2 (en) * 2010-04-19 2014-03-18 Hewlett-Packard Development Company, L.P. Refreshing memristive systems
CN102412827B (zh) 2011-11-02 2014-06-11 北京大学 利用rram器件实现逻辑运算的方法
JP2014081842A (ja) 2012-10-17 2014-05-08 Sharp Corp 演算装置
US8982647B2 (en) * 2012-11-14 2015-03-17 Crossbar, Inc. Resistive random access memory equalization and sensing
US9152827B2 (en) * 2012-12-19 2015-10-06 The United States Of America As Represented By The Secretary Of The Air Force Apparatus for performing matrix vector multiplication approximation using crossbar arrays of resistive memory devices
CN104756193B (zh) * 2013-01-14 2018-11-06 慧与发展有限责任合伙企业 非易失性存储器阵列逻辑
CN104240753B (zh) 2013-06-10 2018-08-28 三星电子株式会社 突触阵列、脉冲整形电路和神经形态系统
US9489997B2 (en) * 2013-07-03 2016-11-08 Crossbar, Inc. Hardware assisted meta data lookup
CN103716038B (zh) 2013-12-25 2016-05-25 华中科技大学 一种基于相变存储器的非易失性逻辑门电路
US10025704B2 (en) * 2013-12-27 2018-07-17 Crossbar, Inc. Memory system including PE count circuit and method of operating the same
US20150213884A1 (en) * 2014-01-30 2015-07-30 University Of Dayton Partitioned resistive memory array
CN104124960B (zh) * 2014-06-20 2018-02-23 华中科技大学 一种非易失性布尔逻辑运算电路及其操作方法
JP6333477B2 (ja) 2014-10-23 2018-05-30 ヒューレット パッカード エンタープライズ デベロップメント エル ピーHewlett Packard Enterprise Development LP ドット積を求めるためのメムリスティブクロスバーアレイ

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1138719A (zh) * 1995-01-31 1996-12-25 三菱电机株式会社 微计算机
US7835174B2 (en) * 2007-11-09 2010-11-16 Kabushiki Kaisha Toshiba Non-volatile memory device and method of reading data therefrom
US20100046275A1 (en) * 2008-08-25 2010-02-25 Kabushiki Kaisha Toshiba Nonvolatile semiconductor storage apparatus and data programming method thereof
CN101840995A (zh) * 2009-01-13 2010-09-22 三星电子株式会社 电阻型随机存取存储器及其制造方法
CN102169720A (zh) * 2010-02-25 2011-08-31 复旦大学 一种消除过写、误写现象的电阻随机存储器

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3389051A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114594819A (zh) * 2022-01-19 2022-06-07 之江实验室 可跟踪铁电电容工艺的自适应调节操作电压的电路和方法
CN114594819B (zh) * 2022-01-19 2023-12-05 之江实验室 可跟踪铁电电容工艺的自适应调节操作电压的电路和方法

Also Published As

Publication number Publication date
JP6702596B2 (ja) 2020-06-03
EP3389051A4 (en) 2019-01-09
CN108475522B (zh) 2020-12-15
EP3389051B1 (en) 2020-09-09
EP3389051A1 (en) 2018-10-17
JP2019502225A (ja) 2019-01-24
CN108475522A (zh) 2018-08-31
US20180321942A1 (en) 2018-11-08
US10459724B2 (en) 2019-10-29
SG11201805489YA (en) 2018-07-30

Similar Documents

Publication Publication Date Title
WO2017124237A1 (zh) 内存设备及基于多层rram交叉阵列的数据处理方法
US10073733B1 (en) System and method for in-memory computing
US20220374688A1 (en) Training method of neural network based on memristor and training device thereof
Ni et al. An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar
WO2018130029A1 (zh) 用于神经网络计算的计算设备和计算方法
US10936941B2 (en) Efficient data access control device for neural network hardware acceleration system
TWI792682B (zh) 帶有糾錯及數據刷洗電路的記憶體系統
CN109146070B (zh) 一种支撑基于rram的神经网络训练的外围电路及系统
US11424764B2 (en) Recurrent neural networks and systems for decoding encoded data
WO2017127086A1 (en) Analog sub-matrix computing from input matrixes
CN111478703B (zh) 基于忆阻交叉阵列的处理电路及输出电流的补偿方法
US11853594B2 (en) Neural network computing chip and computing method
Singh et al. Low-power memristor-based computing for edge-ai applications
CN110211617B (zh) 一种哈希硬件处理装置及方法
Tsai et al. RePIM: Joint exploitation of activation and weight repetitions for in-ReRAM DNN acceleration
Alam et al. Stochastic computing in beyond von-neumann era: Processing bit-streams in memristive memory
EP4086816A1 (en) Neural network circuit and neural network system
CN110211618B (zh) 一种用于区块链的处理装置及方法
WO2016049862A1 (zh) 实现移位运算的电路以及阵列电路
CN113326914A (zh) 神经网络计算方法和神经网络计算装置
WO2024032220A1 (zh) 基于存算一体电路的神经网络补偿方法、装置及电路
KR20210013397A (ko) 스토리지 장치
RU2708956C2 (ru) Процессор повышенной достоверности функционирования
RU186547U1 (ru) Процессор повышенной достоверности функционирования
US8078657B2 (en) Multi-source dual-port linked list purger

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16885510

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 11201805489Y

Country of ref document: SG

WWE Wipo information: entry into national phase

Ref document number: 2016885510

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2018537499

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2016885510

Country of ref document: EP

Effective date: 20180709