WO2020088005A1

WO2020088005A1 - Embedded quick adder apparatus based on memristor array underflow path and calculation method

Info

Publication number: WO2020088005A1
Application number: PCT/CN2019/097848
Authority: WO
Inventors: 景乃锋; 李桃中; 李彤; 王琴; 蒋剑飞; 贺光辉; 毛志刚
Original assignee: 上海交通大学
Priority date: 2018-11-02
Filing date: 2019-07-26
Publication date: 2020-05-07
Also published as: CN109521993B; CN109521993A

Abstract

The present invention relates to an adder apparatus based on a memristor array underflow path, and compared with a memristor adder performing an addition calculation by means of logical iteration at the present stage, the present invention is characterized in: 1 separating serial and parallel parts of an addition operation, that is, serial carry and parallel summation; 2 aiming at serial carry, constructing a corresponding underflow path and customizing a carry propagation underflow path therein, and using electric current propagation to simulate carry behavior, thereby greatly accelerating a carry calculation; and 3 after obtaining the carry of each bit, using a logical calculation method of the memristor at the current stage and cross array parallel structure features to complete the summation calculation of all bits at the same time.

Description

Embedded fast adder device and calculation method based on memristor array submerged flow path

Technical field

The invention belongs to the field of nonvolatile memory based on new materials and relates to memory calculation technology.

Background technique

At this stage, there are three main methods for addition calculation based on memristor storage arrays. They are based on Boolean logic, lookup table (LUT) and programmable logic array (PLA).

The method based on Boolean logic is the simplest and most intuitive, that is, it is spliced by the basic logic organization supported by the circuit according to the logic expression of addition. Typical implementations include IMPLY circuits and MAGIC circuits. However, the disadvantage of this calculation method is also obvious, that is, the calculation efficiency is low. For the 1-bit full adder, using IMPLY circuit and MAGIC circuit to achieve 29 and 12 steps, respectively, contemporary computing systems are usually 32-bit wide. Without considering the carry movement, only the calculation part requires 928 and 384 steps, considering the write speed of memristors is usually slow, such a large computational overhead is unacceptable.

The way based on look-up table (LUT) is to imitate the design idea of FPGA, use the programmable characteristic of memristor, calculate the result of a certain specific logic function by means of IMPLY or MAGIC in advance, and store it in the memristor array. Because these pre-calculations are done offline, this method has a high computational efficiency, and one calculation is equivalent to only one read operation. For example, if the lookup table stores a 1-bit full adder, 32 steps are required to complete the 32-bit adder calculation, and if a 2-bit adder result is stored, only 16 steps are required to complete the 32-bit calculation. However, this method is not a real memory calculation, they are just using the memristor storage array as a programmable arithmetic unit, and once the storage array is configured as this type of arithmetic unit, there is no way to use it as data The memory stores the operands and operation results. And this way and it consumes hardware resources, only the lookup table of the 1-bit full adder needs to occupy 8 × 14 array space, and as the bit width increases, the consumed array area increases nonlinearly.

The method based on the programmable logic array (PLA) is based on the characteristics that any digital logic can be expressed as a product-of-sum or a sum-of-product form, and is customized using a memristor array The minimum item (maximum item) plane. In this plane, the structure of the minimum item (maximum item) is fixed. However, after these basic items are formed, different digital logics can be implemented by activating different rows or columns to achieve The purpose of programming. The shortcomings of this implementation are consistent with the lookup table-based approach, that is, it is not really a memory calculation, and consumes a lot of hardware resources of the memristor array.

Summary of the invention

Memristor memory is used as an alternative to memory calculation to solve the memory wall problem. In view of the structure of the memristive memory array, the present invention proposes an implementation scheme of an adder to improve the efficiency of memory calculation of the memristive memory.

The logical expression of the 1-bit full adder can be expressed as the following two formulas:

C_o = AB + AC_i + BC_i

For multi-bit adders, if the calculation is performed step by step according to the logic iteration method, the operation efficiency is low, thereby losing the advantages of the memory calculation technology due to the reduction of data movement overhead. However, if the carry of all bits can be obtained in an efficient manner, the parallel nature of the array structure can be used to obtain the sum of all bits at once. According to this idea, Figure 1 shows the principle diagram of carrying calculation based on memristor. Note that the memristor represents the logic value through the resistance value, the low resistance state (LRS) represents logic 1, and the high resistance state (HRS) represents logic. 0. Figure 1 (a) shows the three submerged current paths generated by carry: 1) Carry generation path, R_G = A · B, which means that when both addends are 1, the path is in a low-impedance state, and the power passes through the path charging the capacitor and eventually a carry; 2) carry cancellation ^{^{path, R_D = A - · B -}} , means that when the two addends are 0, the path is a low impedance state, the capacitor discharges through this path is not formed and ultimately Carry; 3) carry propagation path,

This means that when the two addends are mutually exclusive, the path is in a low-impedance state, and the carry of the previous bit is propagated to the current bit through the path. Figure 1 (b) shows a schematic diagram of 4-bit carry calculation based on memristor, where the calculation of the carry of each bit is based on the above three cases. Since the states of R_G, R_D, and R_P can be obtained in advance by means of logical calculation, the circuit can quickly complete the carry calculation according to the submerged flow path of the corresponding bit. The logic calculation here can be implemented using existing memristor operation technology, such as IMPLY or MAGIC. However, unlike the parallel structure of R_G and R_D, R_P forms a carry chain of carry propagation through series connection. The series connection method is not perfectly compatible with the array structure, so it is necessary to customize the carry propagation path on the basis of the array structure. It is not difficult to see that when the carry is propagated from the lowest bit to the highest bit along the carry chain of R_P, the delay of the entire carry calculation is the longest. Although the circuit is still subject to the essence of carry serial propagation, compared to the logic iteration method, the speed of carry calculation is greatly improved, because the present invention uses the method of simulating current propagation and charging to complete the carry propagation The process of logic iteration is to calculate the carry of each bit step by step according to the clock cycle and according to the logical expression, and each step of the carry calculation requires a write operation, which is generally slower for memristive devices. Therefore, this method of carrying calculation is very slow. After getting the carry of all bits, the final result is calculated according to the summation logic expression. In this step of calculation, the bits are independent of each other, so the summation of different bits can be performed in parallel under the array structure .

In summary, the adder implementation based on memristive array is mainly reflected in the following three points:

Carry underflow path mapping: pre-calculate the status of R_G, R_D and R_P, used to determine the way of carrying calculations for different bits;

Construct a serial carry chain: Since the array structure cannot form a carry propagation path, a carry propagation path controlled by R_P needs to be customized to cope with the third case of carry calculation described above;

Summation calculation: After the carry calculation of each bit is completed, the sum calculation of all bits is completed in parallel through the corresponding logic.

This patent mainly proposes an adder design based on a memristor storage array. The design is tested using HSPICE, a new non-volatile memory simulation tool NVSim, which reflects the technical effects of this patent from three aspects: computing performance, area overhead and power consumption overhead. :

Computational performance: Although the delay of carry propagation is still related to the bit width of the operand, because the present invention uses simulation to flow and propagate on the constructed path to form a carry, so that the clock cycle consumed by the carry calculation is The operand width is sub-linear. For example, for a 32-bit adder, IMPLY and MAGIC implementation methods require 928 and 384 clock cycles respectively, but using the present invention only requires 13 clock cycles, and the calculation performance is improved by about 70 and 28 times;

Area overhead: The area overhead is evaluated from two aspects. On the one hand, the number of array elements that need to be occupied by the intermediate data generated during the addition calculation. This part of the unit needs to be reserved for buffering intermediate data during the addition operation and cannot be used for other The storage of data, that is, the greater the proportion of this part of the unit, the lower the utilization rate of the array, and the greater the overhead of the addition operation. Also taking the 32-bit adder as an example, IMPLY and MAGIC require an additional 2 and 352 units, respectively, and this design requires an additional 64 units. For IMPLY design, the array overhead is increased by 31 times, and for MAGIC design, the array overhead is reduced. 4.5 times; on the other hand is the overhead of the carry chain and the control circuit relative to the traditional memristive memory, this part of the overhead is about 12.4%;

Power consumption overhead: The additional power consumption overhead is also caused by the control circuit and the introduced carry chain. Compared with the power consumption of the peripheral circuit during the storage operation, the power consumption overhead of these two parts accounts for about 19.5%.

BRIEF DESCRIPTION

By reading the detailed description of the non-limiting embodiments with reference to the following drawings, other features, objects, and advantages of the present invention will become more apparent:

1 is a schematic diagram of a system framework according to an embodiment of the present invention (a) 1-bit (b) 4-bit;

FIG. 2 is a schematic diagram of a carry potential path mapping according to an embodiment of the present invention (a) voltage division read operation (b) current sensing read operation;

FIG. 3 is a schematic diagram of a carry chain of a carry propagation path according to an embodiment of the present invention (a) based on ReRAM (b) based on CMOS.

detailed description

The addition implementation based on the memristor storage array mainly includes three points: carry-underflow path mapping, construct serial carry chain and sum calculation. The embodiment is specifically explained below.

Carrying underflow path mapping: Figure 1 shows the principle diagram of adding operation based on memristor. The main work completed by carrying underflow path mapping is to map the schematic to the real array structure. Since R_G of different bits are independent of each other, they can be mapped to the same row of the array and the mapping calculation is performed at the same time. For R_D, the mapping method is the same as R_G, and it can be mapped to another row of the array. However, for storage arrays that use voltage-divided read operations, R_D mapping is not necessary because the voltage-divided read operations usually need to be The load resistance is connected to the bottom of the array to sense the resistance state of the selected cell during the reading process. In the carry calculation, the load resistance plays the same role as R_D, so the mapping step of R_D can be omitted, further reducing the Calculate delay and mapping overhead. R_P cannot be directly mapped into the array structure, and a serial carry chain needs to be constructed to realize the function of R_P. When all the mapping work is completed, carry out the calculation by applying the operation voltage to the row where R_G is located. In addition, if the array uses a voltage-divided read operation, just turn off the rest of the array; if the array uses other types of read operations, such as current sensing, you need to ground the row where R_D is located and turn off the rest of the array. Figure 2 shows the schematic diagrams of these two different mapping methods respectively.

Construct a serial carry chain: The purpose of constructing a serial carry chain is to customize a current transmission path for R_P that cannot be directly mapped into the array structure, so as to realize the serial propagation of carry. However, it was found in the circuit implementation that if the memristor is directly used to construct the carry propagation path (Figure 3a), there are two problems. One is that it is impossible to program the serially connected memristors in parallel, which leads to the mapping process and operation. The digital wide correlation greatly reduces the efficiency of the adder; the second is that the current driving force is insufficient during the carry calculation. Generally speaking, the low resistance state of the memristor should also reach the order of kilo-ohms, so even if the memorization on the carry path The resistors are in a low-impedance state, and the current may still be gradually weakened in the propagation process due to the excessive resistance of the path, and the calculation of the subsequent carry cannot be completed. To solve these two problems, the present invention uses a traditional MOS tube to complete the construction of the carry chain, as shown in FIG. 3b. First of all, in order to solve the problem of serial mapping, the present invention will use the line buffer to temporarily store the control signal of the carry propagation path

And use this signal to directly control the turning on and off of the transistors in the carry chain. Because these control signals are calculated independently by each bit line, the entire calculation process can be performed in parallel, and the time for writing to the memristor is also saved. Secondly, in order to solve the problem of insufficient current driving force in the carry propagation process, the present invention does not directly connect the transistor to the adjacent bit line, but connects its lower bit end to the corresponding sensitive amplifier output, using the strong power of the sensitive amplifier The driving ability drives the spread of carry. In this way, no matter how long the carry chain is, the carry propagation will not be affected by the lack of driving force, because the carry propagation of any bit is directly driven by the sensitive amplifier of the previous bit, so that the process of carry propagation is independent The length of the carry chain. The serial carry chain constructed in this way can quickly obtain the carry of each bit after the carry calculation is activated.

Sum calculation: After the carry calculation is completed, the carry value corresponding to all bits can be obtained. At this time, the sum calculation of all bits is independent, so the final operation can be performed in parallel according to its logical expression.

In the adder solution based on the memristor array described above, all the steps involved in logic calculation are used in the present invention using existing memristor-based logic operation techniques, such as IMPLY and MAGIC. This shows that the design of this adder is independent of the specific logic operation technology, as long as the logic operation can be executed under the array structure and can achieve some specific logic functions required by the present invention, you can use different logic operations to complete the carry Path mapping and final sum operation.

Claims

An adder design based on a memristor storage array is characterized by:

Carry path mapping;

Construction of serial carry chain;

Summation calculation.
The design of an adder based on a memristor storage array according to claim 1, wherein the carry path mapping includes:

For R_G mapping, complete parallel mapping on the memristor storage array;

For R_D mapping, parallel mapping is done on the memristor storage array.
The design of an adder based on a memristor memory array according to claim 1, wherein the structure of the serial carry chain includes:

Carry propagation path is formed by MOS tube;

Use array line buffer to calculate and store carry chain control signals in parallel;

Improve the current driving force with the help of sensitive amplifier.
The design of an adder based on a memristor memory array according to claim 1, wherein the summation calculation includes:

Use the result of the carry calculation to complete the sum of all bits in parallel.
The design of an adder based on a memristor storage array according to claim 2 is characterized in that, for R_G mapping, parallel mapping is performed on the memristor storage array, including:

When starting the carry calculation, apply the operation voltage to the row where R_G is located.
The adder design based on the memristor storage array according to claim 2, wherein, for R_D mapping, parallel mapping is performed on the memristor storage array, including:

For the divided voltage read operation, R_D mapping is not required;

For other read operations, when starting the carry calculation, ground the row where R_D is located.
The design of an adder based on a memristor storage array according to claim 3, characterized in that the array line buffer is used to calculate and store the carry chain control signals in parallel, including:

The line buffer temporarily stores the control signal R_P = A⊕B of the carry propagation path, and uses this signal to directly control the turning on and off of the transistors in the carry chain.
The design of an adder based on a memristor memory array according to claim 3, wherein the current driving force is improved by means of a sensitive amplifier, including:

Connect the end of the transistor close to the low bit to the output of the sensitive amplifier, and the end of the high bit to the bit line corresponding to the next bit.