WO2020238889A1 - 一种基于Radix-4编码和差分权重的乘累加电路 - Google Patents

一种基于Radix-4编码和差分权重的乘累加电路 Download PDF

Info

Publication number
WO2020238889A1
WO2020238889A1 PCT/CN2020/092286 CN2020092286W WO2020238889A1 WO 2020238889 A1 WO2020238889 A1 WO 2020238889A1 CN 2020092286 W CN2020092286 W CN 2020092286W WO 2020238889 A1 WO2020238889 A1 WO 2020238889A1
Authority
WO
WIPO (PCT)
Prior art keywords
circuit
differential
weight
encoding
switch
Prior art date
Application number
PCT/CN2020/092286
Other languages
English (en)
French (fr)
Inventor
黄科杰
肖蕊
沈海斌
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Priority to US17/274,962 priority Critical patent/US20210365241A1/en
Publication of WO2020238889A1 publication Critical patent/WO2020238889A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/54Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using elements simulating biological cells, e.g. neuron
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C13/00Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
    • G11C13/0002Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
    • G11C13/0021Auxiliary circuits
    • G11C13/003Cell access
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1006Data managing, e.g. manipulating data before writing or reading out, data bus switches or control circuits therefor
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M3/00Conversion of analogue values to or from differential modulation
    • H03M3/04Differential modulation with several bits, e.g. differential pulse code modulation [DPCM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/48Indexing scheme relating to groups G06F7/48 - G06F7/575
    • G06F2207/4802Special implementations
    • G06F2207/4814Non-logic devices, e.g. operational amplifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06GANALOGUE COMPUTERS
    • G06G7/00Devices in which the computing operation is performed by varying electric or magnetic quantities
    • G06G7/12Arrangements for performing computing operations, e.g. operational amplifiers
    • G06G7/14Arrangements for performing computing operations, e.g. operational amplifiers for addition or subtraction 
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06GANALOGUE COMPUTERS
    • G06G7/00Devices in which the computing operation is performed by varying electric or magnetic quantities
    • G06G7/12Arrangements for performing computing operations, e.g. operational amplifiers
    • G06G7/16Arrangements for performing computing operations, e.g. operational amplifiers for multiplication or division
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C13/00Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
    • G11C13/0002Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C2213/00Indexing scheme relating to G11C13/00 for features not covered by this group
    • G11C2213/70Resistive array aspects
    • G11C2213/79Array wherein the access device being a transistor
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/22Read-write [R-W] timing or clocking circuits; Read-write [R-W] control signal generators or management 
    • G11C7/222Clock generating, synchronizing or distributing circuits within memory device

Definitions

  • This application relates to a resistive non-volatile memory circuit, in particular to a multiply-accumulate circuit based on Radix-4 encoding and differential weights, which is used in the core calculation circuit array of neuromorphic chips and belongs to the field of neuromorphic circuits .
  • Neuromorphic computing integrates the storage unit and the computing unit by imitating the structure of the human brain, which can solve the bottleneck problem encountered by the von Neumann structure mentioned above, and can greatly improve the computational efficiency of the artificial neural network.
  • RRAM Resistive Random-Access-Memory
  • MAC Matrix multiplication and accumulation
  • this application uses radix-4booth encoding to encode the input data, reduces the bit width of the input data by half, and proposes a new and efficient combination of in-memory multiplication and integration operations, which can achieve large-scale parallel computing,
  • the amplitude reduces the power consumption of the multiply-accumulate operation and improves the computational performance of the neural network processor.
  • weights in the neural network follows the normal distribution law, and the proportion of “1” accounts for about 1/3.
  • the weights in most current neuromorphic processors are stored in the form of complements. If they are stored in the form of complements, then The proportion of "1" in the weight will rise to 1/2, and the calculated power consumption will increase. Therefore, this application proposes a multiplication-accumulation array with differential weight storage, which stores the weights as positive and negative values differentially, which greatly reduces the proportion of "1” in the weight, thereby reducing the power consumption of the neural network circuit during calculation.
  • DACs digital-to-analog converters
  • DACs Digital-to-Analog Converters
  • this application provides a multiply and accumulate circuit and a multiply and accumulate method based on Radix-4 coding and differential weights.
  • the multiply and accumulate operation of the neuromorphic computing core in the neuromorphic chip can realize large-scale parallel computing only through hardware under the condition of low energy consumption, and can be used in high-precision and high-performance deep neural networks.
  • FIG. 1 The structure diagram of the traditional multiply-accumulate circuit is shown in Figure 1.
  • a single multiplier can only process one row of data at a time, and the result calculated by the multiplier is input into the adder to complete the accumulation operation. Assuming that there are n inputs and n outputs for multiplication, then It takes n multipliers and n-1 two-input adders to get the final result. The scale of the whole circuit is huge, and the power consumption is also very huge.
  • This application includes an input data encoding circuit, a differential weight storage circuit, an integral calculation circuit, and a differential ADC circuit.
  • the output end of the input data encoding circuit is connected to the differential ADC circuit after the differential weight storage circuit, the integral calculation circuit and the differential ADC circuit.
  • Accumulation result The original input data is encoded by the input data encoding circuit and multiplied by the weight value stored in the differential weight storage circuit. The positive and negative values of each multiplication result are respectively accumulated through the integral calculation circuit, and then passed through the differential ADC The circuit performs analog-to-digital conversion on the difference between the positive and negative value accumulation results, thereby obtaining the final multiplication and accumulation result.
  • the input data encoding circuit includes a memory, an encoder, and a data selector connected in sequence; the original input data is binary encoded data, which is stored in the memory in advance, and the data in the memory is output according to time sequence.
  • the three bits enter the encoder.
  • the encoder receives the low three bits of data from the memory and encodes it into radix-4 encoding.
  • the encoding result is gated to the differential weight storage circuit to output the encoded data value through the 4 to 1 selector. It is output in sequence within four clock cycles, each clock cycle outputs one kind of radix-4 code, and four clock cycles output four different codes of radix-4 code.
  • the differential weight storage circuit includes two weight storage circuits of equal size and scale, each weight storage circuit is composed of a plurality of basic unit arrays arranged and connected, and the basic units are composed of a field effect transistor and a binary RRAM connection.
  • the basic cells of the column are connected in series, and the basic cells of the same row are connected in series; the basic cells of each row of the weight storage circuit store a weight value together.
  • the two weight storage circuits are a positive value storage circuit and a negative value storage circuit, and a positive value storage circuit And the negative value storage circuit respectively store the positive and negative weight values, the positive value storage circuit stores the weight value of the positive value, and the negative value storage circuit stores the weight value of the negative value; starting from the lower three bits of the original input data, every third Bits are input to the input data coding circuit, and there is one bit of data overlap between every three bits of data.
  • the coded data value output by the selector is sent to the basic unit a 1,n-1 of each row, and the basic unit of each row receives the code For the data value, multiply the coded data value and each weight in the positive value storage circuit/negative value storage circuit and add them in columns to obtain the multiplication and addition result of the current bit.
  • in-storage calculations are performed in binary RRAM, and the calculation result of each bit is obtained by the AND method.
  • the non-volatile resistive random access memory is read bit by bit, so the traditional calculation method of multiplying and accumulating a single input data and a weight value is abandoned, and all input data is simultaneously input into the weight storage circuit.
  • the RRAM array used in this application has the characteristics of high storage density and one-time reading, which can read multiple weight values at a time, greatly reducing the power consumption and area of the multiply-add array .
  • the positive and negative weight values in this application are stored in the original code (a negative number is the absolute value original code), and when the weight value is stored in one array, the other array
  • the corresponding behavior in the middle is zero, and most of the memory cells in the entire weight storage array are in the zero state and do not participate in the calculation process, and the power consumption of the entire circuit will be very low.
  • the basic unit is composed of a field effect tube and a binary RRAM connected in series.
  • One end of the binary RRAM is connected to the source of the field effect tube, the other end of the binary RRAM is grounded, and the gate of the field effect tube is connected to the output terminal of the input data encoding circuit.
  • the drain of the field effect tube is used as the output terminal of the basic unit and connected to the input terminal of the integral calculation circuit, the drains of the field effect transistors of the same column of basic units are connected in series, and the gates of the field effect transistors of the same row of basic units are connected in series And as the input terminal of the basic unit.
  • the integral calculation circuit includes multiple sets of integral sub-circuits and a compensation circuit; each integral sub-circuit is mainly composed of a selection switch tube group and an integrator, and each group of selection switch tube groups consists of two selection MOS tubes Q 1 , n-1 , Q 2, n-1 and a control MOS tube S 1, n-1 three MOS tube connected to form, two select the source of the MOS tube as the input terminal of the integral calculation circuit, and are connected to The output terminals of a column of basic units of the positive value storage circuit and the negative value storage circuit, the drains of the two selection MOS transistors are connected to the source of the control MOS transistor, and the gate of the control MOS transistor is connected to the external control switch circuit to control the MOS
  • the drain of the tube is used as the output terminal of the integration calculation circuit and is connected to the differential ADC circuit; each integrator includes a first integration capacitor C n-1 , a second integration capacitor (C f -C n-1 ), and a reset switch S 4 Integral working
  • a charge split switch S 3 is connected between the first integrating capacitor C n-1 and the integrating work switch S 2 to the ADC sampling line. After the ADC sampling capacitor C f and the ADC sampling switch S adc are connected in series It is connected between the voltage Vdd and the output terminal of the integrating sub-circuit.
  • the first integrating capacitor C n-1 and the second integrating capacitor (C f -C n-1 ) are used to integrate the output terminals of a column of basic units, and work when the integration work switch S2 is closed; the reset switch S 4 is used After resetting the second integrating capacitor (C f- C n-1 ), the operation is performed after the charge stored in all the first integrating capacitor C n-1 and the charge stored in the ADC sampling capacitor C f are equally divided; the integration operation switch S 2 It is used to control the operation of the first integrating capacitor and the second integrating capacitor, and work when data is input at the output terminal of a basic unit; the compensation capacitor C p is used to ensure the first integrating capacitor C n-1 and the compensation capacitor C of the integrating circuit and p is the same and the sampling capacitor C f, work bisecting the charge switch S 3 is closed; S p compensation capacitor reset switch for resetting into the sampling capacitor C f, work bisecting the charge switch S 3 is turned off.
  • the charge halving switch S 3 is used to add all the charges stored in the first integrating capacitor C f and the charges stored in the ADC sampling capacitor C f and divide them equally, and work when the ADC is inputted twice at the output terminal of a basic unit;
  • the sampling capacitor C f and the ADC sampling switch S adc are used to equally divide all the charges stored in the first integrating capacitor C n-1 and are working.
  • the negative judgment processing is output separately, so that the positive value storage circuit and the negative value storage circuit are respectively turned on when the output is output, and the positive value storage circuit and the negative value storage circuit are not turned on when the output is zero.
  • the integration work switch S 2 is turned on and controls the data input of the MOS transistors S 1,n-1 to the two integration capacitors of the first integration capacitor C n-1 and the second integration capacitor (C f -C n-1 ) , S p compensation capacitor reset switch is turned on so that the compensation capacitor C f is reset.
  • the ADC sampling switch S adc is turned on, so that the ADC sampling capacitor C f equally divides all the charges stored in the first integrating capacitor C n-1 .
  • this application Compared with the current multiplication and addition circuit, this application has the advantages of small area, fewer calculation times, fast calculation speed, large-scale parallel calculation, low power consumption, etc. It can be used in the core calculation circuit of neuromorphic chips and configured as a deep neural network application. , Suitable for the deployment of edge computing devices with high power consumption requirements.
  • the integral calculation circuit and analog voltage conversion circuit proposed in this application use the principle of charge redistribution to complete the weighting process of different weight positions and different input positions.
  • the structure is simple, the error is small and easy to control, and it can achieve high integration accuracy and network accuracy. It solves the common problems of complex structure, large circuit error and large power consumption, such as the mirror current source system and dynamic threshold system proposed by the existing multiply-accumulate circuit, and solves the technology that can only be applied to small-scale neural networks. Problem, this application can be applied to large-scale neural networks.
  • Figure 1 is a structure diagram of a traditional multiply-accumulate array
  • Figure 2 is a structure diagram of a multiply-accumulate array based on radix-4 encoding and differential weight storage proposed in this application;
  • Figure 3 is the input data encoding circuit proposed by this application.
  • FIG. 4 is a structural diagram of 1T1R memory used in the specific implementation of this application.
  • FIG. 5 is the differential weight storage circuit proposed by this application.
  • FIG. 6 is a logical block diagram of the weight storage proposed in this application.
  • Figure 7 is a structural diagram of the integral calculation circuit proposed by this application.
  • Figure 7(a) is a positive integral circuit
  • Figure 7(b) is a negative integral circuit
  • FIG. 8 is a block diagram of the integral calculation process proposed by this application.
  • FIG. 9 is a structural diagram of the analog voltage conversion circuit proposed by this application.
  • the specific implementation includes an input data encoding circuit, a differential weight storage circuit, an integral calculation circuit, and a differential ADC circuit.
  • the output end of the input data encoding circuit is connected to the differential ADC circuit after the differential weight storage circuit, the integral calculation circuit and the differential ADC circuit.
  • the differential ADC circuit outputs the multiplication and accumulation result; the original input data is encoded by the input data encoding circuit and multiplied by the weight value stored in the differential weight storage circuit, and the positive and negative values of each multiplication result are respectively processed through the integral calculation circuit Accumulate the original input data to obtain the equivalent analog voltage value corresponding to one input bit currently processed, and then use the differential ADC circuit to perform analog-to-digital conversion on the difference between the positive and negative accumulated results to achieve weighting between different input bits, thereby obtaining the final Multiply and accumulate the result to realize the circuit processing of multiply and accumulate.
  • the input data encoding circuit includes a memory, an encoder, and a data selector connected in sequence; the input data is stored in the memory in advance, and is input into the encoder according to the time sequence for encoding, and then sequentially selected by the selector. Through four different output channels, output different encoding results in sequence.
  • the original input data is binary coded data, which is stored in the memory in advance.
  • the data in the memory is output according to the time sequence.
  • the encoder receives the low three bits of data from the memory. , Which is a three-bit binary number, encode it into radix-4 code, and the coded result is selected by the selector of 4 to 1 to output the coded data value to the differential weight storage circuit, and output in sequence in 4 clock cycles, each clock cycle
  • Output a code of radix-4 code four different codes of radix-4 code are output in four clock cycles, respectively, the output result is 1, -1, 2, -2.
  • the differential weight storage circuit uses a non-volatile resistive random access memory (RRAM) as a storage unit. Compared with the traditional SRAM unit, power consumption is only generated during the calculation process.
  • RRAM resistive random access memory
  • the differential weight storage circuit includes two weight storage circuits of equal size and scale.
  • Each weight storage circuit is composed of a plurality of basic unit arrays arranged and connected, and the basic units are composed of a field effect transistor and Binary RRAM connection structure, the basic units of the same column are connected in series, and the basic units of the same row are connected in series; as shown in Figure 6, each basic unit of the weight storage circuit stores a weight, and each row of the weight storage circuit stores one basic unit together
  • the two weight storage circuits are respectively a positive value storage circuit and a negative value storage circuit.
  • the positive value storage circuit and the negative value storage circuit store the positive and negative weight values respectively.
  • the positive value storage circuit stores the weight value of the positive value, and the negative value storage circuit.
  • the value storage circuit stores the weight value as a negative value.
  • one is a positive weight/negative weight, and the other is a zero value.
  • each three bits are sequentially input to the input data encoding circuit, and there is one data overlap between each three bits of data.
  • the encoded data value output by the selector is transferred to each row.
  • Unit a 1,n-1 the basic unit of each row receives the same encoded data value from the input data encoding circuit, and the basic unit of different rows can receive different encoded data values from the input data encoding circuit.
  • Each row basically The unit receives the coded data value, multiplies the coded data value and each weight in the positive value storage circuit/negative value storage circuit, and then adds them in columns to obtain the multiplication and addition result of the current bit.
  • the basic unit is composed of a field effect tube and a binary RRAM connected in series.
  • One end of the binary RRAM is connected to the source of the field effect tube, and the other end of the binary RRAM is grounded.
  • the pole is connected to the output terminal (data selector) of the input data encoding circuit
  • the drain of the field effect tube is used as the output terminal of the basic unit and connected to the input terminal of the integral calculation circuit
  • the drain string of the field effect tube of the basic unit in the same column Then, the gates of the field effect transistors of the basic unit in the same row are connected in series and used as the input terminal of the basic unit.
  • a 1-transistor-1RRAM (1T1R, 1Transistor 1RRAM) structure is used as the basic component unit, in which transistors control switching behavior, and 8 1-bit RRAMs simulate weighted binary cells in the form of fixed points.
  • This application uses the experimental data of nitrogen-doped alumina structure to model RRAM, and each RRAM has two resistances: a low resistance state (about 10M ⁇ ) and a high resistance state (about 10G-100G ⁇ ).
  • the RRAM array is turned on only during the integration phase and is in the off state most of the time, thereby greatly reducing the power consumption of the synapse array.
  • the input signal from the input circuit is directly used as the control voltage Vc of the 1T1R unit in the RRAM array, thereby eliminating the use of the input interface DAC, and greatly reducing energy consumption and area occupation.
  • this application changes the traditional calculation method of multiplying by row and then accumulating to the method of multiplying and accumulating by column, which can realize large-scale parallel calculation.
  • the number of basic units in the positive value storage circuit and the negative value storage circuit are the same, and the number of rows and columns are the same.
  • the integration calculation circuit includes multiple groups of integration sub-circuits and a compensation circuit
  • Each integrator circuit is mainly composed of a selection switch tube group and an integrator.
  • Each group of selection switch tube groups consists of two selection MOS tubes Q 1,n-1 , Q 2,n-1 and a control MOS tube S 1. Three MOS transistors of n-1 are connected.
  • the sources of the two select MOS transistors are used as the input terminals of the integral calculation circuit, and they are respectively connected to the output terminals of a column of basic units of the positive value storage circuit and the negative value storage circuit ( FET drain), that is, the source of one selection MOS tube is connected to the drain output terminal of the field effect tube of the same column of the basic unit in the positive value storage circuit, and the source of the other selection MOS tube is connected to the negative value storage circuit
  • FET drain negative value storage circuit
  • the drain output terminals of the field effect transistors in the same column of the basic unit, the drains of the two selection MOS tubes are connected to the source of the control MOS tube, and the gate of the control MOS tube is connected to the external control switch circuit to control the MOS tube
  • the drain serves as the output terminal of the integral calculation circuit and is connected to the differential ADC circuit;
  • Each integrator includes a first integration capacitor C n-1 , a second integration capacitor (C f -C n-1 ), a reset switch S 4 , and an integration work switch S 2 ; the first integration capacitor C n-1 and the second The integrating capacitor (C f -C n-1 ) and the reset switch S 4 are connected in parallel between the voltage Vdd and the output terminal of the integrating sub-circuit.
  • An integration work switch S 2 is connected in series between the first integration capacitor C n-1 and the integration work switch S 2 to be connected to the ADC sampling line;
  • the compensation circuit comprises a compensation capacitor and a compensation capacitance C p reset switch S p, C p between the compensation capacitor and a compensation capacitor connected in parallel with the reset switch S p and the ADC sampling line voltage Vdd.
  • the differential ADC circuit includes a charge halving switch S 3 , an ADC sampling capacitor C f and an ADC sampling switch S adc connected to the ADC sampling line; a compensation capacitor C p and a compensation capacitor reset switch S p
  • a charge halving switch S 3 connected to the ADC sampling line, and a charge halving switch S is connected between the first integrating capacitor C n-1 and the integration work switch S 2 to the ADC sampling line.
  • the ADC sampling capacitor C f and the ADC sampling switch S adc are connected in series between the voltage Vdd and the output terminal of the integrating sub-circuit.
  • the number of charge halving switches S 3 in the integration sub-circuit and the differential ADC circuit is the same as the number of columns of the basic unit in the positive value storage circuit/negative value storage circuit.
  • An integration sub-circuit corresponds to the positive value storage circuit and the negative value storage circuit. In a column of basic cells, different integral sub-circuits are correspondingly connected to different columns of basic cells in the positive value storage circuit and the negative value storage circuit.
  • the integral circuit is divided into a positive value integral circuit and a negative value integral circuit, which integrates positive and negative numbers respectively.
  • the positive value storage circuit and its corresponding integral calculation circuit and differential ADC circuit part constitute a positive value integral circuit, as shown in Figure 7(a).
  • the negative value storage circuit and its corresponding integrated calculation circuit and differential ADC circuit constitute a negative value integration circuit, as shown in Figure 7(b).
  • the 8-bit input signal is coded as a four-bit radix-4 code, using multiple input parallel computing and axon priority strategy, each bit of the input data is input into the integral circuit in a certain order, and the charge is used to re
  • the principle of distribution can complete the integral calculation process shown in Figure 8.
  • the result of the multiplication of the input signal and the weight value can be obtained, expressed as a current signal.
  • the current signal is transmitted to the positive and negative integral circuits according to the positive and negative of the input signal and the weight array for integral calculation, which can complete a data The process of multiplying and accumulating the input.
  • the analog voltage converter circuit proposed in this application is shown in FIG. 9.
  • the analog-to-digital conversion circuit uses timing control to perform charge distribution after inputting 1, -1; and performing charge distribution again after inputting 2,-2.
  • the weighting process between different input bits can be completed to obtain two analog output voltages, positive and negative. Input these two analog output voltages into the positive and negative input terminals of a differential SAR ADC (successive approximation analog-to-digital converter) respectively, and then the digital multiplication and accumulation output can be obtained, which constitutes the completed neuromorphic calculation circuit.
  • a differential SAR ADC uccessive approximation analog-to-digital converter
  • This application uses multiple integrators to integrate the calculation results of the multiplication. Based on the special radix-4 encoding, there is a difference between 1/1 and 2/-2 of the radix-4 encoding.
  • the capacitors used by different integrators There is a two-fold relationship between all integrators in parallel and through the charge redistribution, the calculation result of multiplying and accumulating after 1-bit input can be obtained.
  • the code value and the weight value are both positive and negative.
  • the positive integral circuit only integrates the positive value
  • the negative integral circuit only integrates the negative value.
  • the analog voltage conversion operation by closing the two compensation capacitor when the reset switch S p are input adix-4 coded 1/1 and 2 / -2, twice bisected charge operation, That is, the current value*1/4, which can realize the weighting operation between different input data bits.
  • Fig. 3 is the input data encoding circuit proposed by this application, including a memory (mem), an encoder (encoder) and a data selector (mux4to1: 4 out of 1 data selector).
  • the clock port (clk) of the memory is connected to the clock signal, the input port (in) is used to input the original input data, and the flag bit (ing_done) is used to indicate the completion of an original input data output.
  • the three output ports are respectively connected with the input port of the encoder ( A, B, C) are connected, and the four output ports (S02, S01, S2, S1) of the encoder (-2, -1, 2, 1) are respectively connected to the four input ports (11, 10, 01, 00), the two control signals of the data selector are control signal 1 (s_np) and control signal 2 (s_21).
  • the memory is used to store the original input data. At the rising edge of each clock, the memory reads the original input data from the inside for output. Assuming that the original input data is a six-digit binary number 011010, the output process is:
  • the flag bit is 0. From the first clock to the fourth clock, the output from top to bottom is the next low bit 1, the lowest bit 0 and a single 0 of the original input data, and then the original input data is logically shifted one bit to the right to become 001101; the fifth clock To the eighth clock, the output from top to bottom is the low three bits 1, 0, 1 of the original input data, and then the original input data is logically shifted by two bits to the right to become 000011; the ninth to the twelfth clock, The output from top to bottom is the low three bits 0, 1, 1 of the original input data, and then the original data is logically shifted by two bits to the right to become 000000; the thirteenth clock, the original data are all 0, the output ends, the flag bit is changed from 0 becomes 1.
  • the encoder accepts a three-bit input signal from the memory and encodes it with radix-4.
  • the encoding rule is the result of multiplying the value of input A by -2 plus the value of input B plus the value of input C. Assuming the result is -2, then the output terminal -2 is high level, and the other output terminals are low level; if the result is -1, then the output terminal -1 is high level, and the other output terminals are low level; assuming the result If the value is 2, then the output terminal 2 is high and the other output terminals are low; if the result is 1, then the output 1 is high and the other output terminals are low; if the result is 0, then all outputs Both ends are low.
  • the original input data is 011010
  • the output terminal -2 is high, and the other output terminals are low
  • the fifth clock to the eighth clock Output terminal -1 is high level, and other output terminals are low level
  • from the ninth clock to the twelfth clock, output terminal 2 is high level, and the other output terminals are low level
  • the thirteenth clock All output terminals are low level.
  • the output of the encoder is input to the data selector, which is gated by control signal 1 (s_np) and control signal 2 (s_21).
  • control signal 1 (s_np) When the control signal 1 (s_np) is high level, the 11 or 10 terminals are strobed, when the low level is 01 or 00 terminals, the control signal 2 (s_21) is high when the 11 or 01 terminals are strobed, and when the low level is 10 Terminal or 00 terminal strobe.
  • control signal 1 (s_np) and control signal 2 (s_21) sequentially gate 00, 10, 01 and 11, so as to control the output of the input data encoding circuit in four clock cycles.
  • radix-4 encoding form 1, -1, 2, -2.
  • the encoder-2 output from the first clock cycle to the fourth clock cycle is 1, and the port is gated in the fourth clock cycle, so the first clock cycle to the third In the fourth clock cycle, the output of the input data encoding circuit is 0, and the output of the input data encoding circuit is 1 in the fourth clock cycle.
  • the output of the input data encoding circuit in the sixth clock cycle is 1, and the output of the input data encoding circuit in other clock cycles is 0;
  • the ninth clock cycle In the twelfth clock cycle, the output of the input data encoding circuit in the ninth clock cycle is 1, and the output of the input data encoding circuit in other clock cycles is 0.
  • FIG. 5 is a differential weight storage circuit proposed in this application.
  • the positive value storage circuit and the negative value storage circuit have the same scale and structure, and are used to store positive and negative weight values, respectively.
  • the weight storage mechanism adopted in this application is shown in Figure 6. If a certain weight wx is positive, the binary sequence of its value wx is stored in a row of the positive value storage array, and the corresponding row in the negative value storage array The storage value is 0; in the same way, if a certain weight wy is negative, its absolute value
  • the weight storage mechanism adopted in this application is shown in Figure 6.
  • the differential weight array is divided into a positive value storage circuit and a negative value storage circuit.
  • the difference between the data stored in the same row of the two circuits is the current weight value, and each row is guaranteed At least one of the circuits has a stored value of 0.
  • Fig. 7 is the structure of the integration calculation circuit proposed in this application, where 1 is a positive integration circuit, and 2 is a negative integration circuit. Since this application uses radix-4 encoding, the input data has positive and negative values, and the weight values are also stored separately. Therefore, the input signal of the integral calculation circuit is controlled by the control signal 1 (s_np) in Figure 3, and the positive value The integration circuit only accepts the multiplication result of positive numbers, and the negative integration circuit only accepts the multiplication result of negative numbers.
  • the control MOS reset switch S 4 is used to reset the second integration capacitor (C f -C n-1 ), the integration work switch S 2 is used to control the operation of the first integration capacitor and the second integration capacitor, and the compensation capacitor C p is used to ensure a first integration circuit integrating capacitor C n-1 and the compensation capacitor C p and the sampling capacitor C f with the same, a compensation capacitor for a reset switch S p into the sampling capacitor C f is reset, the switch S 3 bisecting charge for all The charge stored in the first integrating capacitor C f and the charge stored in the ADC sampling capacitor C f are added and divided equally. First, control the MOS transistor S 1,n-1 to receive two external control MOS transistors Q 1,n- 1.
  • One of Q 2, n-1 is turned on, and then controls the data output from the basic unit to be output separately according to the positive and negative judgment processing, so that the positive value storage circuit and the negative value storage circuit are respectively turned on when outputting, The positive value storage circuit and the negative value storage circuit do not conduct when the output is zero.
  • the integration work switch S 2 is turned on and controls the data input of the MOS transistors S 1,n-1 to the two integration capacitors of the first integration capacitor C n-1 and the second integration capacitor (C f -C n-1 ) , S p compensation capacitor reset switch is turned on so that the compensation capacitor C f is reset.
  • the ADC sampling switch S adc is turned on, so that the ADC sampling capacitor C f equally divides all the charges stored in the first integrating capacitor C n-1 .
  • the 8-bit input signal is coded as a four-bit radix-4 code, using multiple input parallel computing and axon priority strategy, each bit of the input data is input into the integral circuit in a certain order, and the charge is used to re
  • the principle of distribution can complete the integral calculation process shown in Figure 8.
  • the result of the multiplication of the input signal and the weight value can be obtained, expressed as a current signal.
  • the current signal is transmitted to the positive and negative integral circuits according to the positive and negative of the input signal and the weight array for integral calculation, which can complete a data The process of multiplying and accumulating the input.
  • the analog voltage converter circuit proposed in this application is shown in FIG. 9.
  • the analog-to-digital conversion circuit uses timing control to perform charge distribution after inputting 1, -1; and performing charge distribution again after inputting 2,-2.
  • the weighting process between different input bits can be completed to obtain two analog output voltages, positive and negative. Input these two analog output voltages into the positive and negative input terminals of a differential SAR ADC to obtain the digital multiply and accumulate output to form a completed neuromorphic calculation circuit.
  • the input data is input at the same time starting from the lowest bit, multiplied by the weight, and then input into the integration circuit for integration, and after the integration is completed, input the next bit. That is to say, A 0,0 A 1,0 ...A p-1,0 are input at the same time.
  • the output voltage Vo of the integrating circuit is:
  • T is the integration time
  • G i is the conductance of the 1-bit binary weight (1/RH in the high-impedance state, 1/RL in the low-impedance state)
  • C f is all integrating capacitors Sum.
  • switch S 1 is opened to reduce the energy loss of the memory circuit
  • S 2 is opened to ensure that the charge value of the integrating capacitor C i remains unchanged
  • the input data When the input data is multiple bits, after the 1-bit data input is completed, the input data will move to the next bit. At this time, the equivalent analog voltage needs to be weighted, as shown in Figure 9.
  • the ADC sampling switch is closed.
  • S 3 When S 3 is closed, the previous V out and V s are added and divided equally, that is to say among them Is the output voltage before S 3 is closed, that is, the equivalent analog voltage after all the previous bits are input.
  • the analog voltage conversion circuit proposed in this application is shown in FIG. 9.
  • radix-4 encoding is used, and the number of different bits is 4, and the input signal may be 1/-1, or 2/-2.
  • the weighting between different bits is realized by controlling the timing:
  • V out represents the output voltage after the input calculation of the 1-bit radix-4 code is completed
  • V s2 represents the voltage at the end of the switch S3 connected to the first integrating capacitor after the input 2 and -2 are calculated.
  • the input coding circuit After the two charge halving processes are over, the input coding circuit outputs the next radix-4 coding result, and the next input bit is multiplied and added.
  • V s,0 , V s,1 ,... can be calculated according to the following formula:
  • V s,i V s1,i +2V s2,i (10)
  • V s,i represents the output voltage of the integration circuit when the i-th radix-4 code is input
  • V s1,i represents the output voltage of the integration circuit when the i-th radix-4 code is input
  • the integrator circuit calculates 1 and -1
  • the output voltage, V s2,i represents when the i-th radix-4 code is input
  • the output voltage after the integration circuit calculates 2 and -2
  • m represents the length of the radix-4 code.
  • the multiplication and accumulation operation of the digital radix-4 encoding input and the weight of the differential storage is completed, and the output voltage in the analog form is obtained.
  • the two voltages are differentially input to the ADC for A-D conversion.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Neurology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Analogue/Digital Conversion (AREA)

Abstract

一种基于radix-4编码和差分权重存储的乘累加电路,包括输入数据编码电路(1)、差分权重存储电路(2)、积分计算电路(3)和差分ADC电路,输入数据编码电路(1)的输出端依次经差分权重存储电路(2)、积分计算电路(3)后和差分ADC电路连接,差分ADC电路输出乘累加结果;原始输入数据经输入数据编码电路(1)编码后与差分权重存储电路(2)中所存储的权重值相乘,通过积分计算电路(3)对各个相乘结果的正值和负值进行分别累加,然后通过差分ADC电路对正负值累加结果的差进行模数转换,从而得到最后的乘累加结果。该乘累加电路具有面积小、计算速度快、功耗低等优点,可用于神经形态芯片,特别是对能耗要求较高的边缘计算设备中,可以实现大规模并行计算。

Description

一种基于Radix-4编码和差分权重的乘累加电路
相关申请的交叉引用
本申请基于申请号为201910462374.8,申请日为2019年05月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及了一种电阻式非易失性存储器电路,尤其是涉及了一种基于Radix-4编码和差分权重的乘累加电路,用于神经形态芯片的核心计算电路阵列,属于神经形态电路领域。
背景技术
近年来人工智能技术迅速发展,各种算法和应用都处于高速迭代过程中,这对硬件的存储容量、计算能力以及能效都提出了更高的要求。传统计算机采用冯·诺依曼结构,由于存储单元和运算单元的分离,在处理数据时需要先从存储单元中读出,再输入处理单元中进行运算,而由于存储单元采用缓存(SRAM)-主存(DRAM)-闪存(FLASH)的层次结构,存储单元与计算单元之间的通讯速率过慢,功耗过高,使得神经网络的效率受到了极大的限制。
神经形态计算通过模仿人脑的结构将存储单元和计算单元集成在一起,能够解决上述的冯·诺依曼结构遇到的瓶颈问题,可以大幅提高人工神经网络的计算效率。利用新兴的电阻式非易失性存储器(RRAM,Resistive Random-Access-Memory)将输入信号的加权组合转化为输出电压,可以完成人工神经网络中的基本操作矩阵乘法和累加(MAC,Multiplication-and-Accumulation),从而实现超低功耗的存内并行计算。
传统的乘法器通常采用二进制编码,通过移位加法的方式实现乘法运算,这种计算方式会产生大量的部分积和多次移位运算,需要大量的移位寄存器进行存储,最后再通过加法器将各个部分积相加。这种乘法器无论是面积还是功耗都很大,并且每次只能实现一次乘法,计算效率低下,大大限制了神经网络的计算效率。因此本申请采用radix-4booth编码对输入数据进行编码,将输入数据的位宽减少了一半,并提出了一种新的高效的存储器内乘法和积分运算相结合,可以实现大规模并行计算,大幅度降低乘累加运算的功耗并提高神经网络处理器的计算性能。
神经网络中权重值的分布均遵循正态分布规律,“1”的比例约占1/3,目前的大多神经形态处理器中的权重都采取补码形式存储,如果采取补码形式存储,那么权重中“1”的比例将上升为1/2,计算的功耗将增加。因此本申请提出了一种差分权重存储的乘累加阵列,将 权重按照正、负值差分存储,大幅度降低权重中“1”的比例,从而降低神经网络电路在计算时的功耗。
当前提出的神经形态计算电路在输入端大都需要高精度的数模转换器(DACs,Digital-to-Analog Converters)作为接口器件,导致接口器件的能耗占整体能耗的80%以上,不利于在边缘计算设备里的应用。而且当前的神经形态计算解决方案,实现的权重量化精度和激活值量化精度低,只能面向Lenet等简单网络,对于Alexnet等规模较大的深度神经网络性能损失明显,很大程度上限制了其应用的范围。
发明内容
针对目前乘法器面积开销大,功耗高,计算效率低下以及难以实现大规模并行计算等问题,本申请所提供一种基于Radix-4编码和差分权重的乘累加电路及其乘累加方法,用于神经形态芯片中神经形态计算核的乘累加操作,能够在低能耗的情况下仅通过硬件实现大规模并行计算,能用于高精度高性能的深度神经网络中。
传统的乘累加电路结构图如图1所示,单个乘法器一次只能处理一行数据,乘法器计算的结果输入加法器中完成累加运算,假设有n个输入和n个输出进行相乘,则需要n个乘法器和n-1个二输入加法器才能得到最终的结果,整个电路的规模巨大,功耗也非常巨大。
目前有的电路需要在输入端添加模数转换器,在输出端添加数模转换器,电路的功耗和面积都很高。而本申请采用RRAM作为基本单元来实现差分权重存储电路,在电流经过基本单元的时候实现了输入数据与权重值相乘的过程,不再像传统的乘累加电路一样需要乘法器才能实现乘法过程。通过积分子电路对每一列基本单元输出端输出的数据进行积分,从而实现累加过程,也不需要传统电路中的加法器。本申请的电路和计算方法可以实现大规模并行计算,同时整体电路的能耗非常低,能够在低能耗的情况下实现高精度高性能的深度神经网络。
本申请采用的技术方案是:
本申请包括输入数据编码电路、差分权重存储电路、积分计算电路和差分ADC电路,输入数据编码电路的输出端依次经差分权重存储电路、积分计算电路后和差分ADC电路连接,差分ADC电路输出乘累加结果;原始输入数据经输入数据编码电路编码后与差分权重存储电路中所存储的权重值相乘,通过积分计算电路对各个相乘结果的正值和负值进行分别累加,然后通过差分ADC电路对正负值累加结果的差进行模数转换,从而得到最后的乘累加结果。
所述的输入数据编码电路包括依次连接的存储器、编码器和数据选择器;原始的输入数据为二进制编码数据,事先存储在存储器中,存储器中的数据根据时序输出,每次输出 当前数据的低三位进入编码器中,编码器接收来自存储器的低三位数据,将其编码成radix-4编码,编码结果通过4选1的选择器选通向差分权重存储电路输出编码数据值,在4个时钟周期内依次输出,每个时钟周期输出radix-4编码的一种编码,四个时钟周期分别输出radix-4编码的四种不同编码。
所述的差分权重存储电路包括两个大小规模相等的权重存储电路,每个权重存储电路均由多个基本单元阵列排布连接构成,基本单元均由一个场效应管和二进制RRAM连接构成,同一列的基本单元串接,同一行的基本单元串接;权重存储电路的每一行基本单元共同存储一个权重值,两个权重存储电路分别为正值存储电路和负值存储电路,正值存储电路和负值存储电路分别存储正、负权重值,正值存储电路存储为正值的权重值,负值存储电路存储为负值的权重值;从原始输入数据的低三位开始,依次每三位输入到输入数据编码电路,每三位数据之间有一位数据重叠,经输入数据编码电路处理后由选择器输出的编码数据值到各行基本单元a 1,n-1,各行基本单元接收编码数据值,将编码数据值和正值存储电路/负值存储电路中的每一权重相乘后按列相加得到当前位的乘加结果。
本申请中,在二进制RRAM中做存储内计算,通过与的方式得到每一位的计算结果。非易失性阻变存储器按位读取,因此抛弃传统的单个输入数据与权重值相乘后累加的计算方式,所有的输入数据同时输入到权重存储电路中。这样情况下,相比于传统的SRAM阵列,本申请采用的RRAM阵列具有高存储密度和一次性读取的特点,可以一次读取多个权重值,大幅度减少乘加阵列的功耗和面积。相比于传统的单个存储器阵列的补码存储形式,本申请中正、负权重值都采用原码进行存储(负数为绝对值原码),同时当权重值存储在一个阵列中时,另一个阵列中对应行为零,整个权重存储阵列中绝大部分存储单元都处于零状态,不参与计算过程,整个电路的功耗将非常低。
所述的基本单元均由一个场效应管和二进制RRAM串接构成,二进制RRAM一端连接在场效应管的源极,二进制RRAM另一端接地,场效应管的栅极连接到输入数据编码电路的输出端,场效应管的漏极作为基本单元的输出端并连接到积分计算电路的输入端,同一列基本单元的场效应管的漏极串接,同一行基本单元的场效应管的栅极串接并作为基本单元的输入端。
所述的积分计算电路包括多组积分子电路和一个补偿电路;每个积分子电路主要由一个选择开关管组和一个积分器构成,每组选择开关管组均由两个选择MOS管Q 1,n-1、Q 2,n-1和一个控制MOS管S 1,n-1的三个MOS管连接构成,两个选择MOS管的源极作为积分计算电路的输入端,并分别连接到正值存储电路和负值存储电路的一列基本单元的输出端,两个选择MOS管的漏极均连接到控制MOS管的源极,控制MOS管的栅极连接到外部控制开关电路,控制MOS管的漏极作为积分计算电路的输出端并连接到差分ADC电路;每个 积分器包括第一积分电容C n-1、第二积分电容(C f-C n-1)、复位开关S 4、积分工作开关S 2;第一积分电容C n-1、第二积分电容(C f-C n-1)、复位开关S 4并联在电压Vdd和积分子电路的输出端之间,第一积分电容C n-1和第二积分电容(C f-C n-1)之间串接有积分工作开关S 2,第一积分电容C n-1和积分工作开关S 2之间引出连接到ADC采样线;补偿电路包括补偿电容C p和补偿电容复位开关S p,补偿电容C p和补偿电容复位开关S p并联在电压Vdd和ADC采样线之间;所述的差分ADC电路包括设置连接在ADC采样线上的电荷平分开关S 3、ADC采样电容C f和ADC采样开关S adc;补偿电容C p和补偿电容复位开关S p并联后与ADC采样线之间连接有一个电荷平分开关S 3,每处第一积分电容C n-1和积分工作开关S 2之间引出到ADC采样线之间均连接有一个电荷平分开关S 3,ADC采样电容C f和ADC采样开关S adc串联后连接在电压Vdd和积分子电路的输出端之间。
第一积分电容C n-1、第二积分电容(C f-C n-1)用于对一列基本单元的输出端进行积分操作,在积分工作开关S2闭合的时候工作;复位开关S 4用于对第二积分电容(C f-C n-1)复位,在所有的第一积分电容C n-1所存储的电荷与ADC采样电容C f所存储的电荷平分之后工作;积分工作开关S 2用于控制第一积分电容和第二积分电容工作,在一列基本单元的输出端输入数据的时候工作;补偿电容C p用于保证积分电路的第一积分电容C n-1和补偿电容C p之和与采样电容C f相同,在电荷平分开关S 3闭合时工作;补偿电容复位开关S p用于对采样电容C f进复位,在电荷平分开关S 3断开时工作。电荷平分开关S 3用于将所有的第一积分电容C f所存储的电荷与ADC采样电容C f所存储的电荷相加后平分,在一列基本单元输出端输入两次数据之后时候工作;ADC采样电容C f和ADC采样开关S adc用于平分所有的第一积分电容C n-1所存储的电荷,在工作。
首先,控制MOS管S 1,n-1接收外部控制两个选择MOS管Q 1,n-1、Q 2,n-1的其中一个导通工作,进而控制从基本单元输出的数据根据正、负判断处理分别输出,使得正值存储电路输出时和负值存储电路输出时分别导通,正值存储电路和负值存储电路输出为零时不导通。
其次,积分工作开关S 2导通控制控制MOS管S 1,n-1的数据输入到第一积分电容C n-1和第二积分电容(C f-C n-1)的两个积分电容,补偿电容复位开关S p导通使补偿电容C f复位。
接着,所有电荷平分开关S 3导通情况下,控制进行电荷平分分配,补偿电容C p和所有积分器的第一积分电容C n-1进行电荷分配,电荷平分时所有的第一积分电容与ADC采样电容平分,最后得到一个等效电压。
最后,ADC采样开关S adc导通处理,使得ADC采样电容C f平分所有的第一积分电容C n-1所存储的电荷。
本申请的有益效果是:
本申请相比于目前乘加电路具有面积小,计算次数少,计算速度快,可大规模并行计 算,低功耗等优势,可用于神经形态芯片的核心计算电路中,配置成深度神经网络应用,适用于部署对功耗要求高的边缘计算设备中。
本申请提出的积分计算电路和模拟电压转换电路,利用电荷重新分配的原理,完成不同权重位和不同输入位的加权过程,结构简单且误差小易控制,能实现较高的积分精度和网络正确率,而解决了现有乘累加电路提出的镜像电流源体制和动态阈值体制等普遍存在结构复杂、电路误差大和功耗大的问题,解决了导致只能应用在小规模的神经网络中的技术问题,本申请能够应用在大规模的神经网络中。
附图说明
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1为传统的乘累加阵列结构图;
图2为本申请提出的基于radix-4编码和差分权重存储的乘累加阵列结构图;
图中:1、输入数据编码电路,2、差分权重存储电路,3、积分计算电路;
图3为本申请提出的输入数据编码电路;
图4为本申请具体实施采用的1T1R存储器结构图;
图5为本申请提出的差分权重存储电路;
图6为本申请提出的权重存储逻辑框图;
图7为本申请提出的积分计算电路结构图;
图7(a)为正值积分电路,图7(b)为负值积分电路;
图8为本申请提出的积分计算过程框图;
图9为本申请提出的模拟电压转换电路结构图。
具体实施方式
以下结合附图及具体实施例对本申请作进一步详细说明。
如图2所示,具体实施包括输入数据编码电路、差分权重存储电路、积分计算电路和差分ADC电路,输入数据编码电路的输出端依次经差分权重存储电路、积分计算电路后和差分ADC电路连接,差分ADC电路输出乘累加结果;原始输入数据经输入数据编码电路编码后与差分权重存储电路中所存储的权重值相乘,通过积分计算电路对各个相乘结果的正值和负值进行分别累加得到原始输入数据的当前处理的一个输入位对应的等效模拟电压值,然后通过差分ADC电路对正负值累加结果的差进行模数转换实现不同输入位之间的加权,从而得到最后的乘累加结果,实现乘累加的电路处理。
如图3所示,所述的输入数据编码电路包括依次连接的存储器、编码器和数据选择器;输入数据提前存储在存储器中,按照时序输入到编码器中进行编码,随后通过选择器顺序选通四个不同的输出通道,依次输出不同的编码结果。
具体的,原始的输入数据为二进制编码数据,事先存储在存储器中,存储器中的数据根据时序输出,每次输出当前数据的低三位进入编码器中,编码器接收来自存储器的低三位数据,即三位二进制数,将其编码成radix-4编码,编码结果通过4选1的选择器选通向差分权重存储电路输出编码数据值,在4个时钟周期内依次输出,每个时钟周期输出radix-4编码的一种编码,四个时钟周期分别输出radix-4编码的四种不同编码,分别为输出结果为1、-1、2、-2。
所述的差分权重存储电路采用非易失性阻变存储器(RRAM)作为存储单元,相对于传统的SRAM单元,仅在计算过程中会产生功耗。
如图5所示,所述的差分权重存储电路包括两个大小规模相等的权重存储电路,每个权重存储电路均由多个基本单元阵列排布连接构成,基本单元均由一个场效应管和二进制RRAM连接构成,同一列的基本单元串接,同一行的基本单元串接;如图6所示,权重存储电路的每一个基本单元存储一个权重,权重存储电路的每一行基本单元共同存储一个权重值,两个权重存储电路分别为正值存储电路和负值存储电路,正值存储电路和负值存储电路分别存储正、负权重值,正值存储电路存储为正值的权重值,负值存储电路存储为负值的权重值。
具体实施中,两个权重存储电路的同一行中,其一为正值权重/负值权重,其二为零值。
两个权重存储电路的同一行作差得到一个权重值:当权重值为正时,在正值存储电路中存储权重值,负值存储电路置零;当权重值为负时,在负值存储电路中存储权重值,正值存储电路置零。
从原始输入数据的低三位开始,依次每三位输入到输入数据编码电路,每三位数据之间有一位数据重叠,经输入数据编码电路处理后由选择器输出的编码数据值到各行基本单元a 1,n-1,每一行的基本单元从输入数据编码电路接收到的相同的编码数据值,不同行的基本单元从输入数据编码电路接收到的可以是不同的编码数据值,各行基本单元接收编码数据值,将编码数据值和正值存储电路/负值存储电路中的每一权重相乘后按列相加得到当前位的乘加结果。
从原始输入数据的最低位开始,一位原始输入数据计算完成后,原始输入数据移到下一位继续输入计算。
如图4所示,具体实施中,所述的基本单元均由一个场效应管和二进制RRAM串接构成,二进制RRAM一端连接在场效应管的源极,二进制RRAM另一端接地,场效应管的 栅极连接到输入数据编码电路的输出端(数据选择器),场效应管的漏极作为基本单元的输出端并连接到积分计算电路的输入端,同一列基本单元的场效应管的漏极串接,同一行基本单元的场效应管的栅极串接并作为基本单元的输入端。
具体实施中,采用1晶体管-1RRAM(1T1R,1Transistor 1RRAM)结构作为基本组成单元,其中晶体管控制开关行为,8个1位RRAM以固定点数的形式来模拟权重的二进制单元。本申请采用氮掺杂氧化铝结构的实验数据对RRAM进行建模,每个RRAM具有2个电阻:低阻态(约10MΩ)和高阻态(约10G—100GΩ)。通过时序安排,使RRAM阵列只在积分阶段导通,大多数时间处于关断状态,从而大大降低了突触阵列的功耗。来自输入电路的输入信号直接作为RRAM阵列中1T1R单元的控制电压Vc,从而消除了输入接口DAC的使用,大幅减少了能量的消耗和面积的占用。
简单来说,本申请将传统的按行相乘再累加的计算方法改成了按列相乘后累加的方法,可以实现大规模并行计算。
所述的正值存储电路和负值存储电路中的基本单元数量相同,行数和列数均相同。
如图7和图9所示,所述的积分计算电路包括多组积分子电路和一个补偿电路;
每个积分子电路主要由一个选择开关管组和一个积分器构成,每组选择开关管组均由两个选择MOS管Q 1,n-1、Q 2,n-1和一个控制MOS管S 1,n-1的三个MOS管连接构成,两个选择MOS管的源极作为积分计算电路的输入端,并分别连接到正值存储电路和负值存储电路的一列基本单元的输出端(场效应管漏极),即一个选择MOS管的源极连接到正值存储电路中同一列基本单元的场效应管的漏极输出端,另一个选择MOS管的源极连接到负值存储电路中同一列基本单元的场效应管的漏极输出端,两个选择MOS管的漏极均连接到控制MOS管的源极,控制MOS管的栅极连接到外部控制开关电路,控制MOS管的漏极作为积分计算电路的输出端并连接到差分ADC电路;
每个积分器包括第一积分电容C n-1、第二积分电容(C f-C n-1)、复位开关S 4、积分工作开关S 2;第一积分电容C n-1、第二积分电容(C f-C n-1)、复位开关S 4并联在电压Vdd和积分子电路的输出端之间,第一积分电容C n-1和第二积分电容(C f-C n-1)之间串接有积分工作开关S 2,第一积分电容C n-1和积分工作开关S 2之间引出连接到ADC采样线;
补偿电路包括补偿电容C p和补偿电容复位开关S p,补偿电容C p和补偿电容复位开关S p并联在电压Vdd和ADC采样线之间。
如图9所示,所述的差分ADC电路包括设置连接在ADC采样线上的电荷平分开关S 3、ADC采样电容C f和ADC采样开关S adc;补偿电容C p和补偿电容复位开关S p并联后与ADC采样线之间连接有一个电荷平分开关S 3,每处第一积分电容C n-1和积分工作开关S 2之间引出到ADC采样线之间均连接有一个电荷平分开关S 3,ADC采样电容C f和ADC采样开关 S adc串联后连接在电压Vdd和积分子电路的输出端之间。
积分子电路和差分ADC电路中电荷平分开关S 3的数量均与正值存储电路/负值存储电路中基本单元的列数相同,一个积分子电路对应连接正值存储电路和负值存储电路中的一列基本单元,不同积分子电路对应连接正值存储电路和负值存储电路中的不同列基本单元。
积分电路分为正值积分电路和负值积分电路,分别对正数和负数进行积分。由正值存储电路及其对应连接的积分计算电路和差分ADC电路部分构成了正值积分电路,如图7(a)所示。由负值存储电路及其对应连接的积分计算电路和差分ADC电路部分构成了负值积分电路,如图7(b)所示。
在本申请中,8位输入信号被编码为四位radix-4编码,采用多路输入并行计算和轴突优先的策略,将输入数据的每一位按照一定的顺序输入积分电路,利用电荷再分配的原理,可以完成如图8所示的积分计算过程。输入信号可以表示为数字形式x=A n/2-1A n/2-2…A 0,其中输入信号的每一位二进制数A i(i=n/2-1、n/2-2、…、0)在时分复用电路的控制下依次选通编码值为1,-1,2,-2的输入行,权重值可以用数字形式w=a n-1a n-2…a 0,w=a n-1a n-2…a 0进行表示。通过欧姆定律可以得到输入信号与权重值相乘的结果,以电流信号表示,同时根据输入信号与权重阵列的正负分别将电流信号传送到正、负积分电路中进行积分计算,可以完成一次数据输入的乘累加的过程。
本申请提出的模拟电压转换器电路如图9所示。通过积分计算电路得到1次数据输入乘累加的结果后,模数转换电路通过时序控制,在输入完1,-1后进行一次电荷分配;在输入完2,-2后再进行一次电荷分配,可以完成不同输入位之间的加权过程,得到正、负两个模拟输出电压。将这两个模拟输出电压分别输入一个差分的SAR ADC(逐次逼近型的模数转换器)的正、负输入端,即可得到数字的乘累加输出,构成完成的神经形态计算电路。
本申请采用多个积分器对乘法计算结果进行积分操作,在特殊采用了radix-4编码基础上,radix-4编码的1/-1和2/-2之间存在差异,不同积分器所用电容之间存在2倍关系,将所有积分器并联通过电荷再分配后可得到1位输入后乘累加的计算结果。
编码值与权重值都有正负之分,通过开关控制信号输入,正值积分电路仅对正值进行积分,负值积分电路仅对负值进行积分。1位积分的数据完成后,通过模拟电压转换操作,在adix-4编码的1/-1和2/-2的分别输入时闭合两次补偿电容复位开关S p,进行两次电荷平分操作,即当前值*1/4,可以实现不同输入数据位之间的加权操作。
所有数据输入完成后,正、负积分电路所得模拟输出电压的差值即为最后的乘累加结果。
本申请的实施例及其实施工作过程如下:
图3为本申请提出的输入数据编码电路,包括存储器(mem)、编码器(encoder)和数据选择器(mux4to1:4选1数据选择器)。存储器的时钟端口(clk)连接时钟信号,输入端口(in)用于输入原始输入数据,标志位(ing_done)用于表示一个原始输入数据输出完成,三个输出端口分别与编码器的输入端口(A、B、C)相连接,编码器的四个输出端口(S02、S01、S2、S1)即(-2、-1、2、1)分别连接数据选择器的四个输入端口(11、10、01、00),数据选择器的两个控制信号分别为控制信号1(s_np)和控制信号2(s_21)。
存储器用于存储原始输入数据,在每个时钟的上升沿,存储器从内部读出原始输入数据进行输出。假设原始输入数据为六位二进制数011010,输出过程为:
初始情况下,标志位为0。第一个时钟到第四个时钟,从上到下的输出为原始输入数据的次低位1,最低位0和单独的0,然后原始输入数据逻辑右移一位变成001101;第五个时钟到第八个时钟,从上到下的输出为原始输入数据的低三位1、0、1,然后原始输入数据逻辑右移两位变成000011;第九个时钟到第十二个时钟,从上到下的输出为原始输入数据的低三位0、1、1,然后原始数据逻辑右移两位变成000000;第十三个时钟,原始数据全部为0,输出结束,标志位从0变成1。
编码器接受来自存储器的三位输入信号,并对其进行radix-4编码,编码规则为输入A的值乘以-2加上输入B的值再加上输入C的值所得的结果。假设结果为-2,那么输出端-2为高电平,其他输出端为低电平;假设结果为-1,那么输出端-1为高电平,其他输出端为低电平;假设结果为2,那么输出端2为高电平,其他输出端为低电平;假设结果为1,那么输出端1为高电平,其他输出端为低电平;假设结果为0,那么所有输出端都是低电平。依旧假设原始输入数据为011010,根据存储器的输出,第一个时钟到第四个时钟,输出端-2为高电平,其他输出端为低电平;第五个时钟到第八个时钟,输出端-1为高电平,其他输出端为低电平;第九个时钟到第十二个时钟,输出端2为高电平,其他输出端为低电平;第十三个时钟,所有输出端均为低电平。
编码器的输出输入数据选择器中,通过控制信号1(s_np)和控制信号2(s_21)进行选通。当控制信号1(s_np)高电平时11端或者10端选通,低电平时01端或者00端选通;控制信号2(s_21)高电平时11端或者01端选通,低电平时10端或者00端选通。通过改变控制信号1(s_np)和控制信号2(s_21)的时序,依次选通00端,10端,01端和11端,从而在控制输入数据编码电路的输出在四个时钟周期内依次为radix-4编码形式的1、-1、2、-2。依旧假设原始输入数据为011010,编码器第一个时钟周期到第四个时钟周期编码器-2输出端为1,该端口在第四个时钟周期选通,因此第一个时钟周期到第三个时钟周期,输入数据编码电路输出为0,第四个时钟周期输入数据编码电路输出为1。与前四个时钟周期相同,第五个时钟周期到第八个时钟周期中,第六个时钟周期输入数据编码电路输出为1, 其他时钟周期输入数据编码电路输出为0;第九个时钟周期到第十二个时钟周期,第九个时钟周期输入数据编码电路输出为1,其他时钟周期输入数据编码电路输出为0。
图5为本申请提出的差分权重存储电路,正值存储电路和负值存储电路,二者的规模和结构完全相同,分别用于存储正值的权重值和负值的权重值。在本存储阵列中,采用8个基本单元来存储一个二进制权重,因此可将一个8位固定点权重表示为w=a n-1a n-2…a 0,a n-1为权重值的一个二进制位,用如图4所示的基本单元来表示。当基本单元在高阻态时,电阻值为100G,表示a n-1=0;当基本单元在低阻态时,电阻值为10M,表示a n-1=1。
本申请中采取的权重存储机制如图6所示,如果某个权重wx为正数,则它的值wx的二进制序列存储在正值存储阵列的某一行中,而负值存储阵列中相应行存储值为0;同理,如果某个权重wy为负数,则它的绝对值|wy|的二进制序列存储在负值存储阵列中的某一行中,而正值存储阵列中相应行存储值为0。需要注意的是,当权重值为负数时,存储的是它的负数,也就是绝对值,这样可以保证所有的符号位均为0,也简化之后计算过程。
本申请中采取的权重存储机制如图6所示,差分权重阵列分为正值存储电路和负值存储电路,两个电路相同行所存储数据的差值即为当前权重值,且保证每一行中至少有一个电路的存储值为0。也就是说,当正值存储电路中某一行存储值为w x时,该行负值存储电路存储值应为0,此时存储的权重值为w x-0=w x;同理,如果当负值存储电路中某一行存储值为w y,那么该行正值存储电路存储值应为0,该行的权重值为0-w y=-w y。采用这样的存储机制,可以保证所有的符号位均为0,简化后面的计算过程。
图7为本申请提出的积分计算电路结构,其中1为正值积分电路,2为负值积分电路。由于本申请采用radix-4编码,输入数据有正值也有负值,同时权重值也是正负值分开存储,因此通过图3中控制信号1(s_np)来控制积分计算电路的输入信号,正值积分电路只接受正数的相乘结果,负值积分电路只接受负数的相乘结果。也就是说当控制信号1(s_np)为高电平时,输入数据编码电路输出负值,正值积分电路接受来自负值存储电路的计算结果,即负数输入*负数权重=正数结果,负值积分电路接受来自正值存储电路的计算结果,即负数输入*正数权重=负数结果;当控制信号1(s_np)为低电平时输入数据编码电路输出正值,正值积分电路接受来自正值存储电路的计算结果,即正数输入*正数权重=正数结果,负值积分电路接受来自负值存储电路的计算结果,即正数输入*负数权重=负数结果对于n位权重值而言,用于积分的第一积分电容分别为C n-1,C n-2,...,C 0,并且它们之间的相互关系为C n-1=2C n-2=...=2 n-1C 0,作用是用于存储一列基本单元输出所代表的电荷;用于积分的第二积分电容为C f-C n-1,C f-C n-2…C f-C 0,作用是保证每个积分器在进行积分时积分电容总和都与ADC采样电容(C f)相等。控制MOS复位开关S 4用于对第二积分电容(C f-C n-1)复位,积分工作开关S 2用于控制第一积分电容和第二积分电容工作,补偿电容C p用于保证积分电路 的第一积分电容C n-1和补偿电容C p之和与采样电容C f相同,补偿电容复位开关S p用于对采样电容C f进复位,电荷平分开关S 3用于将所有的第一积分电容C f所存储的电荷与ADC采样电容C f所存储的电荷相加后平分,首先,控制MOS管S 1,n-1接收外部控制两个选择MOS管Q 1,n-1、Q 2,n-1的其中一个导通工作,进而控制从基本单元输出的数据根据正、负判断处理分别输出,使得正值存储电路输出时和负值存储电路输出时分别导通,正值存储电路和负值存储电路输出为零时不导通。
其次,积分工作开关S 2导通控制控制MOS管S 1,n-1的数据输入到第一积分电容C n-1和第二积分电容(C f-C n-1)的两个积分电容,补偿电容复位开关S p导通使补偿电容C f复位。
接着,所有电荷平分开关S 3导通情况下,控制进行电荷平分分配,补偿电容C p和所有积分器的第一积分电容C n-1进行电荷再分配。
最后,ADC采样开关S adc导通处理,使得ADC采样电容C f平分所有的第一积分电容C n-1所存储的电荷。
在本申请中,8位输入信号被编码为四位radix-4编码,采用多路输入并行计算和轴突优先的策略,将输入数据的每一位按照一定的顺序输入积分电路,利用电荷再分配的原理,可以完成如图8所示的积分计算过程。输入信号可以表示为数字形式x=A n/2-1A n/2-2…A 0,其中每一位A i在时分复用电路的控制下依次选通编码值为1,-1,2,-2的输入行,权重值可以用数字形式w=a n-1a n-2…a 0进行表示。通过欧姆定律可以得到输入信号与权重值相乘的结果,以电流信号表示,同时根据输入信号与权重阵列的正负分别将电流信号传送到正、负积分电路中进行积分计算,可以完成一次数据输入的乘累加的过程。
本申请提出的模拟电压转换器电路如图9所示。通过积分计算电路得到1次数据输入乘累加的结果后,模数转换电路通过时序控制,在输入完1,-1后进行一次电荷分配;在输入完2,-2后再进行一次电荷分配,可以完成不同输入位之间的加权过程,得到正、负两个模拟输出电压。将这两个模拟输出电压分别输入一个差分的SAR ADC的正、负输入端,即可得到数字的乘累加输出,构成完成的神经形态计算电路。
在本申请中,输入数据从最低位开始同时输入,与权重相乘后输入积分电路进行积分,积分完成后输入下一bit。也就是说,A 0,0A 1,0...A p-1,0同时输入,当A i=1时,输入电压为V A,当A i=0时,输入电压为0。
在积分过程中,开关S 1,S 2导通,开关S 3断开以使各个积分器分立,积分电路的输出电压Vo为:
Figure PCTCN2020092286-appb-000001
其中,
Figure PCTCN2020092286-appb-000002
为积分器上一状态输出电压,T为积分时间,G i为1-bit二进制权重的电导(高 阻态时为1/RH,低阻态时为1/RL),C f为所有积分电容的总和。积分过程结束后,开关S 1断开以减少存储器电路的能量损耗,S 2断开以保证积分电容C i电荷值不变,S 3闭合以获得
Figure PCTCN2020092286-appb-000003
的等效电压。由于积分电容间的关系为C n-1=2C n-2=...=2 n-1C 0,因此在S 3闭合时,电容电荷量将根据电容值进行再分配,即可得到等效的模拟电压。
当输入数据为多个bit时,1-bit数据输入完成后,输入数据将移动到下一位,这时需要对等效模拟电压进行加权,如图9所示。在整个积分过程中,ADC采样开关闭合,当S 3闭合时,之前的V out与V s相加后进行平分,也就是说
Figure PCTCN2020092286-appb-000004
其中
Figure PCTCN2020092286-appb-000005
为S 3闭合前的输出电压,即之前所有bit输入后的等效模拟电压。
本申请提出的模拟电压转换电路如图9所示。本申请中采用radix-4编码,不同bit之间的进制为4,输入信号有可能为1/-1,也有可能为2/-2。为了简化硬件结构,通过控制时序的方式来实现不同bit之间的加权:
(1)输入信号为1时,正负积分电路分别积分并将积分电压保持;输入信号为-1时,正负积分电路交换输入,再次积分。两次积分操作后,S 3闭合进行一次电荷平分,此时输出电压为
Figure PCTCN2020092286-appb-000006
表示开关S 3闭合前的输出电压,V s1表示输入1和-1进行计算后,第一积分电容连接开关S 3端的电压,V out,1表示输入1和-1进行计算,并完成不同bit之间的加权后的输出电压。电荷平分完成后,S 3断开,S 2,S 4闭合完成第一积分电容和第二积分电容的复位。
(2)输入信号为2时,正负积分电路分别积分并将积分电压保持;输入信号为-2时,正负积分电路交换输入,再次积分。两次积分操作后与前面的积分电压一起再进行一次电荷平分,此时输出电压为
Figure PCTCN2020092286-appb-000007
V out表示1-bit radix-4编码的输入计算完成后的输出电压,V s2表示输入2和-2进行计算后,第一积分电容连接开关S3端的电压。
(3)两次电荷平分过程结束后,输入编码电路输出下一个radix-4编码结果,进行下一个输入位的乘加计算。
因此,最后的等效模拟电压为
Figure PCTCN2020092286-appb-000008
其中,可根据以下公式计算V s,0、V s,1、…:
V s,i=V s1,i+2V s2,i      (10)
其中,V s,i表示当输入第i个radix-4编码时,积分电路的输出电压,V s1,i表示当输入第i个radix-4编码时,积分电路计算完1和-1之后的输出电压,V s2,i表示当输入第i个radix-4编码时,积分电路计算完2和-2之后的输出电压,m表示radix-4编码的长度。
将正、负两个等效模拟电压相减,即可得到最后的模拟电压输出。
通过上述积分过程和电荷再分配过程,完成了数字形式的radix-4编码输入和差分存储的权重的乘累加运算,并得到了模拟形式的输出电压。如需进行后续运算,将两个电压差分输入ADC中进行A-D转换。
以上实施例仅用于说明本申请的技术方案而非对其限制,尽管参照上述实施例对本申请进行了详细的说明,所属领域的普通技术人员应当理解;其依然可以对本申请的具体实施方式进行修改或者等同替换,而未脱离本申请精神和范围的任何修改或者等同替换,其均应涵盖在本申请的权利要求范围当中。

Claims (5)

  1. 一种基于radix-4编码和差分权重存储的乘累加电路,其特征在于:包括输入数据编码电路、差分权重存储电路、积分计算电路和差分ADC电路,输入数据编码电路的输出端依次经差分权重存储电路、积分计算电路后和差分ADC电路连接,差分ADC电路输出乘累加结果;原始输入数据经输入数据编码电路编码后与差分权重存储电路中所存储的权重值相乘,通过积分计算电路对各个相乘结果的正值和负值进行分别累加,然后通过差分ADC电路对正负值累加结果的差进行模数转换,从而得到最后的乘累加结果。
  2. 根据权利要求1所述的一种基于radix-4编码和差分权重存储的乘累加电路,其特征在于:所述的输入数据编码电路包括依次连接的存储器、编码器和数据选择器;原始的输入数据为二进制编码数据,事先存储在存储器中,存储器中的数据根据时序输出,每次输出当前数据的低三位进入编码器中,编码器接收来自存储器的低三位数据,将其编码成radix-4编码,编码结果通过4选1的选择器选通向差分权重存储电路输出编码数据值,在4个时钟周期内依次输出,每个时钟周期输出radix-4编码的一种编码,四个时钟周期分别输出radix-4编码的四种不同编码。
  3. 根据权利要求1所述的一种基于radix-4编码和差分权重存储的乘累加电路,其特征在于:所述的差分权重存储电路包括两个大小规模相等的权重存储电路,每个权重存储电路均由多个基本单元阵列排布连接构成,基本单元均由一个场效应管和二进制RRAM连接构成,同一列的基本单元串接,同一行的基本单元串接;权重存储电路的每一行基本单元共同存储一个权重值,两个权重存储电路分别为正值存储电路和负值存储电路,正值存储电路和负值存储电路分别存储正、负权重值,正值存储电路存储为正值的权重值,负值存储电路存储为负值的权重值;
    从原始输入数据的低三位开始,依次每三位输入到输入数据编码电路,每三位数据之间有一位数据重叠,经输入数据编码电路处理后由选择器输出的编码数据值到各行基本单元a 1,n-1,各行基本单元接收编码数据值,将编码数据值和正值存储电路/负值存储电路中的每一权重相乘后按列相加得到当前位的乘加结果。
  4. 根据权利要求1所述的一种基于radix-4编码和差分权重存储的乘累加电路,其特征在于:所述的基本单元均由一个场效应管和二进制RRAM串接构成,二进制RRAM一端连接在场效应管的源极,二进制RRAM另一端接地,场效应管的栅极连接到输入数据编码电路的输出端,场效应管的漏极作为基本单元的输出端并连接到积分计算电路的输入端,同一列基本单元的场效应管的漏极串接,同一行基本单元的场效应管的栅极串接并作为基本单元的输入端。
  5. 根据权利要求1所述的一种基于radix-4编码和差分权重存储的乘累加电路,其特征在于:所述的积分计算电路包括多组积分子电路和一个补偿电路;
    每个积分子电路主要由一个选择开关管组和一个积分器构成,每组选择开关管组均由两个选择MOS管Q 1,n-1、Q 2,n-1和一个控制MOS管S 1,n-1的三个MOS管连接构成,两个选择MOS管的源极作为积分计算电路的输入端,并分别连接到正值存储电路和负值存储电路的一列基本单元的输出端,两个选择MOS管的漏极均连接到控制MOS管的源极,控制MOS管的栅极连接到外部控制开关电路,控制MOS管的漏极作为积分计算电路的输出端并连接到差分ADC电路;每个积分器包括第一积分电容C n-1、第二积分电容(C f-C n-1)、复位开关S 4、积分工作开关S 2;第一积分电容C n-1、第二积分电容(C f-C n-1)、复位开关S 4并联在电压Vdd和积分子电路的输出端之间,第一积分电容C n-1和第二积分电容(C f-C n-1)之间串接有积分工作开关S 2,第一积分电容C n-1和积分工作开关S 2之间引出连接到ADC采样线;补偿电路包括补偿电容C p和补偿电容复位开关S p,补偿电容C p和补偿电容复位开关S p并联在电压Vdd和ADC采样线之间;
    所述的差分ADC电路包括设置连接在ADC采样线上的电荷平分开关S 3、ADC采样电容C f和ADC采样开关S adc;补偿电容C p和补偿电容复位开关S p并联后与ADC采样线之间连接有一个电荷平分开关S 3,每处第一积分电容C n-1和积分工作开关S 2之间引出到ADC采样线之间均连接有一个电荷平分开关S 3,ADC采样电容C f和ADC采样开关S adc串联后连接在电压Vdd和积分子电路的输出端之间。
PCT/CN2020/092286 2019-05-30 2020-05-26 一种基于Radix-4编码和差分权重的乘累加电路 WO2020238889A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/274,962 US20210365241A1 (en) 2019-05-30 2020-05-26 Multiplication and accumulation circuit based on radix-4 booth code and differential weight

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910462374.8A CN110209375B (zh) 2019-05-30 2019-05-30 一种基于radix-4编码和差分权重存储的乘累加电路
CN201910462374.8 2019-05-30

Publications (1)

Publication Number Publication Date
WO2020238889A1 true WO2020238889A1 (zh) 2020-12-03

Family

ID=67789543

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092286 WO2020238889A1 (zh) 2019-05-30 2020-05-26 一种基于Radix-4编码和差分权重的乘累加电路

Country Status (3)

Country Link
US (1) US20210365241A1 (zh)
CN (1) CN110209375B (zh)
WO (1) WO2020238889A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113314174A (zh) * 2021-05-06 2021-08-27 安徽大学 一种用于sram阵列的列移位多位乘法二进制分解运算的电路结构
EP4180946A1 (en) * 2021-11-15 2023-05-17 Samsung Electronics Co., Ltd. Computing device for performing digital pulse-based crossbar operation and method of operating the computing device

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209375B (zh) * 2019-05-30 2021-03-26 浙江大学 一种基于radix-4编码和差分权重存储的乘累加电路
CN112558917B (zh) * 2019-09-10 2021-07-27 珠海博雅科技有限公司 存算一体电路和基于存算一体电路的数据运算方法
US11915125B2 (en) 2019-10-31 2024-02-27 SK Hynix Inc. Arithmetic devices for neural network
US20210132953A1 (en) * 2019-10-31 2021-05-06 SK Hynix Inc. Arithmetic devices for neural network
US11574173B2 (en) * 2019-12-19 2023-02-07 Qualcomm Incorporated Power efficient near memory analog multiply-and-accumulate (MAC)
CN110941185B (zh) * 2019-12-20 2022-06-07 安徽大学 一种用于二值神经网络的双字线6tsram单元电路
US11663455B2 (en) * 2020-02-12 2023-05-30 Ememory Technology Inc. Resistive random-access memory cell and associated cell array structure
CN112071343B (zh) * 2020-08-18 2022-09-13 安徽大学 一种在存储器中结合电容实现乘法的sram电路结构
US11688457B2 (en) * 2020-12-26 2023-06-27 International Business Machines Corporation Using ferroelectric field-effect transistors (FeFETs) as capacitive processing units for in-memory computing
US20220206754A1 (en) * 2020-12-31 2022-06-30 Redpine Signals, Inc. Scaleable Analog Multiplier-Accumulator with Shared Result Bus
US11886835B2 (en) * 2020-12-31 2024-01-30 Ceremorphic, Inc. Cascade multiplier using unit element analog multiplier-accumulator
CN113157034B (zh) * 2021-01-19 2022-06-03 浙江大学 一种被动稳压电路实现的高线性度神经形态计算电路
CN113328818B (zh) * 2021-05-14 2022-06-21 南京大学 一种基于频分复用并行化模拟存内计算的装置及方法
US20220382516A1 (en) * 2021-05-31 2022-12-01 Redpine Signals, Inc. Chopper Stabilized Analog Multiplier Accumulator with Binary Weighted Charge Transfer Capacitors
CN113949385B (zh) * 2021-12-21 2022-05-10 之江实验室 一种用于rram存算一体芯片补码量化的模数转换电路
CN114400031B (zh) * 2022-03-24 2022-07-08 之江实验室 一种补码映射的rram存算一体芯片及电子设备
CN114913895B (zh) * 2022-07-19 2022-11-01 中科南京智能技术研究院 一种实现两比特输入单比特权重的存内计算宏单元
KR20240014147A (ko) 2022-07-25 2024-02-01 에스케이하이닉스 주식회사 Mac 연산 동작을 수행하는 반도체 장치
CN116126282B (zh) * 2022-12-21 2023-08-18 辉羲智能科技(上海)有限公司 自动驾驶辅助控制方法、系统及其ai计算方法、装置
CN116185334B (zh) * 2023-04-19 2023-07-04 上海登临科技有限公司 阵列电路的输入控制方法和阵列电路

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5796645A (en) * 1996-08-27 1998-08-18 Tritech Microelectronics International Ltd. Multiply accumulate computation unit
CN101010665A (zh) * 2004-08-26 2007-08-01 松下电器产业株式会社 乘法装置
CN102662624A (zh) * 2012-04-13 2012-09-12 钜泉光电科技(上海)股份有限公司 乘法器
CN109146070A (zh) * 2017-06-16 2019-01-04 华为技术有限公司 一种支撑基于rram的神经网路训练的外围电路及系统
CN109196528A (zh) * 2016-05-17 2019-01-11 硅存储技术公司 使用非易失性存储器阵列的深入学习神经网络分类器
CN110209375A (zh) * 2019-05-30 2019-09-06 浙江大学 一种基于radix-4编码和差分权重存储的乘累加电路

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0001517D0 (en) * 2000-01-25 2000-03-15 Jaber Marwan Computational method and structure for fast fourier transform analizers
JP2004032609A (ja) * 2002-06-28 2004-01-29 Nec Corp 非線形歪み補償回路
US7809783B2 (en) * 2006-02-15 2010-10-05 Qualcomm Incorporated Booth multiplier with enhanced reduction tree circuitry
CN101201731B (zh) * 2008-02-15 2010-08-18 刘杰 二进制数字减法器
CN101382882B (zh) * 2008-09-28 2010-08-11 宁波大学 一种基于CTGAL的Booth编码器及绝热补码乘累加器
CN102595060B (zh) * 2012-03-15 2013-06-19 天津大学 Cmos图像传感器内部实现tdi功能的模拟累加器
CN103176767B (zh) * 2013-03-01 2016-08-03 浙江大学 一种低功耗高吞吐的浮点数乘累加单元的实现方法
CN103176948B (zh) * 2013-03-04 2016-06-29 浙江大学 一种低成本的单精度初等函数运算加速器
US9721332B2 (en) * 2015-02-16 2017-08-01 Hrl Laboratories, Llc Spike domain convolution circuit
CN105005057B (zh) * 2015-08-03 2017-02-22 北京理工大学 一种北斗导航系统d1导航电文的捕获方法
CN109104197B (zh) * 2018-11-12 2022-02-11 合肥工业大学 应用于卷积神经网络的非还原稀疏数据的编译码电路及其编译码方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5796645A (en) * 1996-08-27 1998-08-18 Tritech Microelectronics International Ltd. Multiply accumulate computation unit
CN101010665A (zh) * 2004-08-26 2007-08-01 松下电器产业株式会社 乘法装置
CN102662624A (zh) * 2012-04-13 2012-09-12 钜泉光电科技(上海)股份有限公司 乘法器
CN109196528A (zh) * 2016-05-17 2019-01-11 硅存储技术公司 使用非易失性存储器阵列的深入学习神经网络分类器
CN109146070A (zh) * 2017-06-16 2019-01-04 华为技术有限公司 一种支撑基于rram的神经网路训练的外围电路及系统
CN110209375A (zh) * 2019-05-30 2019-09-06 浙江大学 一种基于radix-4编码和差分权重存储的乘累加电路

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113314174A (zh) * 2021-05-06 2021-08-27 安徽大学 一种用于sram阵列的列移位多位乘法二进制分解运算的电路结构
EP4180946A1 (en) * 2021-11-15 2023-05-17 Samsung Electronics Co., Ltd. Computing device for performing digital pulse-based crossbar operation and method of operating the computing device

Also Published As

Publication number Publication date
CN110209375B (zh) 2021-03-26
CN110209375A (zh) 2019-09-06
US20210365241A1 (en) 2021-11-25

Similar Documents

Publication Publication Date Title
WO2020238889A1 (zh) 一种基于Radix-4编码和差分权重的乘累加电路
US11714749B2 (en) Efficient reset and evaluation operation of multiplying bit-cells for in-memory computing
WO2021004466A1 (zh) 一种基于多位并行二进制突触阵列的神经形态计算电路
Jiang et al. Analog-to-digital converter design exploration for compute-in-memory accelerators
CN107636640A (zh) 具有求反指示符的点积引擎
CN113157034B (zh) 一种被动稳压电路实现的高线性度神经形态计算电路
CN111260048A (zh) 一种基于忆阻器的神经网络加速器中激活函数的实现方法
US11018687B1 (en) Power-efficient compute-in-memory analog-to-digital converters
Wei et al. Trends and challenges in the circuit and macro of RRAM-based computing-in-memory systems
CN115390789A (zh) 基于磁隧道结计算单元的模拟域全精度存内计算电路及方法
Liu et al. Sme: Reram-based sparse-multiplication-engine to squeeze-out bit sparsity of neural network
CN114499538A (zh) 多比特输入数据编码方法、装置、电子设备及存储介质
Tsai et al. RePIM: Joint exploitation of activation and weight repetitions for in-ReRAM DNN acceleration
Meng et al. Exploring compute-in-memory architecture granularity for structured pruning of neural networks
Peng et al. Inference engine benchmarking across technological platforms from CMOS to RRAM
US11544037B2 (en) Low area multiply and accumulate unit
Chen et al. RIMAC: An array-level ADC/DAC-free ReRAM-based in-memory DNN processor with analog cache and computation
Xiao et al. A Low-Power In-Memory Multiplication and Accumulation Array With Modified Radix-4 Input and Canonical Signed Digit Weights
Ogbogu et al. Energy-Efficient ReRAM-Based ML Training via Mixed Pruning and Reconfigurable ADC
CN114168107A (zh) 一种存内精度可调的矢量矩阵乘法运算方法及运算器
CN111988031B (zh) 一种忆阻存内矢量矩阵运算器及运算方法
Liu et al. AFPR-CIM: An Analog-Domain Floating-Point RRAM-based Compute-In-Memory Architecture with Dynamic Range Adaptive FP-ADC
Guo et al. VCCIM: a voltage coupling based computing-in-memory architecture in 28 nm for edge AI applications
Xuan et al. HPSW-CIM: A novel ReRAM-based computing-in-memory architecture with constant-term circuit for full parallel hybrid-precision-signed-weight MAC operation
CN113672854A (zh) 一种基于电流镜和存储单元的存内运算方法、卷积运算方法、装置及其应用

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20815230

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20815230

Country of ref document: EP

Kind code of ref document: A1