WO2021197073A1 - 基于时间可变的电流积分和电荷共享的多位卷积运算模组 - Google Patents
基于时间可变的电流积分和电荷共享的多位卷积运算模组 Download PDFInfo
- Publication number
- WO2021197073A1 WO2021197073A1 PCT/CN2021/081322 CN2021081322W WO2021197073A1 WO 2021197073 A1 WO2021197073 A1 WO 2021197073A1 CN 2021081322 W CN2021081322 W CN 2021081322W WO 2021197073 A1 WO2021197073 A1 WO 2021197073A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- convolution operation
- bit
- current
- capacitor
- input
- Prior art date
Links
- 230000010354 integration Effects 0.000 title claims abstract description 46
- 239000003990 capacitor Substances 0.000 claims abstract description 101
- 238000000034 method Methods 0.000 claims abstract description 23
- 230000008569 process Effects 0.000 claims abstract description 11
- 239000011159 matrix material Substances 0.000 claims description 14
- 230000001186 cumulative effect Effects 0.000 claims description 7
- 230000001052 transient effect Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 abstract description 10
- 230000008859 change Effects 0.000 abstract description 9
- 238000004364 calculation method Methods 0.000 description 11
- 238000009825 accumulation Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000013139 quantization Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
- G06N3/065—Analogue means
Definitions
- the invention relates to an analog operation module, in particular to an analog operation module related to convolution operation.
- the invention also relates to an analog calculation method for convolution operation.
- analog operations have higher efficiency than traditional digital operations. Therefore, digital quantities are usually converted into analog quantities and then operated.
- digital quantities are usually converted into analog quantities and then operated.
- neural networks compared to its computational energy consumption in the realization of neural network's medium and large hardware, because traditional data is stored in the disk, the data needs to be extracted into the memory when performing operations. This process requires a lot of I/O. Storage connected to traditional memory often takes up more power consumption.
- the calculation process can be sent to the data for local execution, which greatly improves the calculation speed, saves storage area, and reduces data transmission and calculation power consumption.
- the present invention proposes an effective method for realizing ultra-low power consumption analog memory or near-memory operation.
- the realization of the analog operation circuit does not involve the change of the weight of the multiplier or the multiplicand. It is limited to the input of the first level of the multiplication operation of 1 bit, and cannot be used for the convolution of multi-digit binary numbers. Simulation calculations.
- multi-bit operations are all controlled by modulating the control bus in the current domain, capacitive charge sharing, pulse-width-modulated (Pulse-width-modulated, PWM) to control the read and write of SRAM, modify the SRAM cell, or use near ⁇ memory operations Complicated digital matrix vector processing and other ways to achieve.
- PWM pulse-width-modulated
- multi-bit analog multipliers and accumulators have always been controlled by very complex digital processing.
- traditional digital operations consume a lot of power compared with analog operations. Therefore, these The multi-bit operation under the control of digital processing will produce a lot of operation energy.
- the purpose of the present invention is to provide a multi-bit binary convolutional analog operation module based on time-variable current integration and charge sharing with ultra-low power consumption, compact structure and fast operation speed.
- the module is It supports general convolution of two or more inputs, and the number of binary digits can be adjusted, especially as a unit of analog memory operation implemented by neural network convolution operation unit or arithmetic accelerator hardware.
- the present invention proposes a multi-bit convolution operation module based on time-adjustable current integration and charge sharing.
- the module includes: at least one digital input x i , at least one digital to analog converter (Digital to Analog Converter, DAC) converts the digital input into a current for transmission in the circuit; at least one weight w ji , the weight represents When it is a binary number, w ji,k is the value at the k-th position; each convolution operation unit (i,j,k) is used for 1 bit-weighted 1-bit binary w ji,k and 1 Multiplication operation of multi-bit binary x i , a convolution operation array composed of multiple convolution operation units, the array completes the multiplication operation and addition operation of the convolution operation; at least one output y j ;
- the current Ix i is converted by the DAC to convert the digital input x i according to the number given by the DAC.
- the current Ix i is mirrored or copied to the convolution operation array.
- the current on the same j*k plane is the same. Allows the input of multi-bit signals and the current to be scaled in the DAC, so that the time for the current to reach the switch is the same.
- each operation unit (i, j, k) includes a current Ix i , a switch, an integral control module, a node a ji, k , and at least one capacitor .
- the weight w ji , w ji,k is the value of the k-th position in the binary representation of the weight w ji , k ⁇ [1,B]
- each bit w ji,k corresponds to a convolution operation unit, and the k-direction convolution operation
- the units are arranged from low to high according to the bit w ji,k.
- the AND gate output of the w ji, k and PWM signals in the control module controls the switch to close, the output is 1, and the switch is closed.
- the weight change of the multiplicand or the multiplier in the multiplication stage is realized in the module by the PWM signal controlling the integration time of the current in the capacitor, and the units corresponding to the same k bits with different weight values w ji ,
- the duration of the PWM signal is the same; the duration of the PWM signal of the convolution operation unit corresponding to the same weight value is twice that of the previous one, and one end of the capacitor is grounded, then the voltage across the capacitor is the upper pole of the capacitor
- the voltage at the board is controlled by a PWM signal because it can improve the flexibility of the system.
- SRAM Static Random-Access Memory
- the PWM signal duration refers to the duration of a high level
- the PWM signal duration refers to the duration of a low level.
- the voltage at node a ji,k is the result of the multiplier of x i *w ji,k *2 (k-1) , and its value is determined by the connection time between the node and the upper plate of the capacitor by the value w ji on each of the weights , k and the duration of the PWM signal is determined; combined voltage corresponding to x i 1 * k th convolution arithmetic unit is the result of x i * w ji.
- the y j is the voltage of the combined node obtained by connecting all a ji and k nodes of an i*k plane for a given j. Due to the characteristics of capacitor discharge, the capacitors in different arithmetic units are connected through each The node performs charge sharing. After the charge sharing is over, the amount of charge in each capacitor is the same, but the total amount of charge obtained by the current integration in the multiplication stage remains unchanged.
- the accumulated voltage at the combined node is The result of ⁇ x i ⁇ w ji , completes the operation of the convolution process of the convolution kernel and the input matrix;
- the bias b j is converted into an additional input fixed current I b for a given current Ix i , which is calculated separately by adding an additional bias calculation unit.
- the scale of the bias unit array is j*k.
- Each arithmetic unit (j, k) includes a current I b , a switch, an integral control module, a node a j, k , and a capacitor with a value of C u.
- bias b j of y j is the cumulative voltage sum of all nodes a j and k of the 1*k group of units.
- a counter or a clock divider is used to generate a PWM signal based on the clock at the maximum speed, which speeds up the integration speed of the capacitor.
- the switch is a virtual switch or a current device or a non-switching element.
- the present invention also includes a multi-bit convolutional analog operation method based on time-variable current integration and charge sharing, including:
- the DAC converts the digital input x i into the current Ix i of the analog signal according to the given position number and transmits it in the circuit;
- the current Ix i When the current Ix i arrives at the switch, it contains a logic operation integral control module.
- the input of the logic operation is the k- th bit w ji,k of the weight w ji and the PWM signal modulated according to the bit position weight in the k-direction convolution operation unit
- the duration of the PWM signal is increased by 2 times from the low to the high.
- the duration of the PWM signal at the k-th position is 2 (k-1) * ⁇ , where ⁇ is the clock period of the PWM signal, and the output of this logic operation controls the closing of the switch;
- the current Ix i is integrated into the capacitor through the node a ji,k connected to the upper plate of the capacitor. After a period of integration, the voltage across the capacitor is obtained. After the switch is opened, the current does not pass through the node a ji,k , so the integration After a period of time, the voltage across the capacitor is 0, the integration time is the duration of the PWM signal , and the voltage at the node a ji, k is the multiplication of the convolution operation x i *w ji,k *2 (k-1) result;
- FIG. 1 is a schematic diagram of a circuit implementation of a multiplication stage of a convolution operation in an embodiment of the present invention
- Figure 2 is a schematic diagram of an integral control module in an embodiment of the present invention.
- FIG. 3 is a schematic diagram of the output realization of the convolution operation addition stage in an embodiment of the present invention (the ADC is not shown in the figure, and it can be added before each output y j when it is necessary to convert y j into a digital output);
- FIG. 5 is a schematic diagram of implementing multiplication by adding a bias arithmetic unit to a convolution operation according to an embodiment of the present invention
- Fig. 6 is a schematic diagram of output after biasing according to an embodiment of the present invention.
- the result of the convolution operation is the feature extraction of a layer of neural network;
- W ji represents the time of multi-bit binary number, w ji, the value of the k-th bit k w ji; two multi-bit binary ⁇ x i * w ji convolution calculation process is divided into two stages:
- Multiplication stage input x i multiplied by each bit of weight w ji and then multiplied by the bit weight 2 (k-1) , that is, x i *w ji,k *2 (k-1) , w ji,k It is 0 or 1.
- Addition stage accumulate and sum the results of each multiplication operation in the multiplication stage to get the output y j .
- the weight matrix formed by the weight w ji of the multiplication stage is shared, that is, j changes from 1 to n-m+1
- the present invention needs to solve the change of the bit weight when the multiplier is multiplied by each bit of the multiplier in the multiplication stage and the addition stage of the accumulation of the multiplication result.
- the embodiment of the present invention proposes an arithmetic module 10 for realizing the above-mentioned multi-bit convolution operation based on the time adjustment of current integration and charge accumulation.
- the module 10 includes: at least one digital input x i , at least one digital to analog converter 101 (Digital to analog converter, DAC) converts the digital input into a current Ix i for transmission in the circuit; at least one weight w ji , When the weight is expressed as a binary number, w ji,k is the value of the k-th bit in binary representation; the convolution operation array composed of multiple convolution operation units 102, the size of the convolution operation array is i*j *k, each convolution operation unit 102 (i, j, k) includes a current Ix i , a switch 1021, an integral control module 103, a node a ji, k , a capacitor 1022 with a value of C u , and one end of the capacitor 1022 is grounded, The capacitor
- the multiplication stage combines the PWM signal 1031 to perform an AND operation to achieve weighted multi-bit.
- This embodiment is based on the realization of the arithmetic unit of in-memory or near-memory convolution based on the matrix unit, which not only reduces the power of processes related to memory access, but also makes the physical realization of the matrix more compact.
- the digital-analog converter 101 according to the current input x i is converted into an analog signal given number Ix i, DAC resolution digital input x i bits of the same.
- the current Ix i is mirrored or copied by the current mirror to the j*k convolution operation units 102 corresponding to the same i.
- the current integration of the convolution operation units 102 in the j direction can be performed at the same time.
- the number of bits of the weight w ji in the k direction increases, and the corresponding convolution operation unit 102 is arranged from low to high for each bit w ji,k.
- the current Ix i that needs to be converted by the DAC can be scaled in the DAC first and then transmitted in the circuit to control the current value not to exceed a certain threshold and reduce the transmission power loss.
- the switch 1021 may be a virtual switch or a current device or a non-switching element such as a current device or a virtual load.
- the integral control module 103 controls the on-off and on-off time of the switch 1021.
- the logic operation included in the module may be an AND gate 1033.
- the module includes a Static Random-Access Memory (SRAM). ) unit 1032, a convolution operation for the entire array, the SRAM cells may be the same or different SRAM cells 6T SRAM cell stores one binary bit in a JI w w ji, k, k is the direction of weights w ji low to high direction; the input of the AND gate 1033 is w ji,k and the PWM signal 1031 modulated according to the bit position weight, and the output signal of the AND gate 1033 controls the switch 1021 to turn on and off, so as to realize the binary multiplication stage.
- SRAM Static Random-Access Memory
- an input PWM signal 1031 of the AND gate 1033 of the present invention changes according to the different bits of the corresponding weight w ji of the unit, and the duration of the PWM signal 1031 of the i*j units corresponding to adjacent bits is within
- the duration of the PWM signal 1031 is 2 (k-1) * ⁇ , where ⁇ is the clock period of the PWM signal 1031.
- the duration of the PWM signal 1031 refers to the duration of the high level; when the bit w ji,k is 1 and the PWM signal 1031 is high, the AND gate 1033 outputs At this time, the switch 1021 is in the closed state, and the current Ix i enters the capacitor 1022 through the switch 1021 to integrate, and the capacitor 1022 begins to store charge; when the high level duration of the PWM signal 1031 elapses, the signal enters a low level state. When the switch is in the off state, the current Ix i does not pass, and the current stops integrating in the capacitor 1022.
- the logic operation of the integral control module 103 can also be an OR gate.
- the duration of the PWM signal 1031 is the duration of the low level, and the PWM signal 1031 and w ji, k do the OR operation.
- a counter or clock divider is used to generate the PWM signal 1031 based on the maximum speed clock, that is, to make the ⁇ as small as possible, to speed up the integration speed of the capacitor 1022, that is, to speed up each step of the multiplication operation.
- the use of PWM signal 1031 to control the time is that it can improve the flexibility of the system.
- the switch 1021 when the switch 1021 is in the closed state, the current Ix i reaches the node a ji,k through the switch 1021, and the node a ji,k is connected to the upper plate of the capacitor 1022, and then the current Ix i enters the capacitor 1022.
- the capacitor 1022 In the convolution operation, the capacitor 1022 needs to be reset to a given DC voltage before the current Ix i flows in, and the last operation result is cleared.
- the capacitor 1022 is grounded, so the voltage across the capacitor 1022 is the voltage at the node a ji,k .
- the amount of charge stored in the capacitor 1022 increases as the integration time elapses. That is, when the switch 1021 is in the closed state, the current is continuously integrated. At this time, the voltage across the capacitor 1022 gradually increases.
- the integration time Is the on-off time of the switch 1021.
- each bit w ji,k corresponds to the convolution operation unit
- the corresponding k 1, 2, 3
- the duration of the PWM signal 1031 is ⁇ , 2 ⁇ , 4 ⁇
- the duration of the k-th PWM signal 1031 is 2 (k-1) * the highest PWM signal of ⁇
- the duration of 1031 is 2 (B-1) * ⁇
- the voltage at the node a ji,k in each convolution operation unit 1022 is the voltage across the capacitor 1022, and the voltage value is defined as x i *w ji,k *2 (k- 1) The result of the multiplier.
- the addition stage obtains the convolution output through charge sharing.
- x 1 corresponding to the k arithmetic unit completes a x 1 * w 11 and x 1 * w 11 of The operation is disassembled to see that the input x 1 is multiplied by each bit w 11,k of the weight w 11 and the bit weight 2 (k-1) of the bit, that is, x 1 *w 11,k *2 (k-1) , And then add the results obtained separately.
- the voltage at the node a ji, k of each convolution operation unit 102 of the i*1*k array is the result of the multiplication.
- ,k at this time, all the capacitors in the corresponding array are connected in parallel.
- the capacitor 1022 in the short-circuited array performs charge sharing, and each capacitor The amount of stored charge in 1022 is the same, but the total charge value remains unchanged.
- the voltage of the combined node obtained is the cumulative sum of the voltages of each multiplication result node a ji, k in the multiplication stage, which is the output y 1 .
- the convolution kernels corresponding to different windows are the same, that is, when the convolution results of different windows are calculated, the weight formed by the multiplicand (weight w ji)
- the other corresponding output y j can be obtained by short-circuiting the corresponding array of other j, as shown in the following equation 1:
- the output y j is an analog signal.
- the output y j is a digital signal
- an analog-to-digital converter Analog-to-Digital Converter, ADC
- the output y j obtained is a digital signal.
- the convolution operation module is applied to a convolutional neural network, and the digital output y j can be used as a digital input into the convolution operation array to perform the convolution operation of the neural network of the second layer.
- each group of convolution operation unit 102 needs The increase in the number of capacitors requires more physical area, which is not conducive to miniaturization of components. Therefore, it is considered that when connecting the combined node, an additional attenuation capacitor 105 with a value of C att is connected into the combined node at the same time, so as to adjust the scale range of the accumulated voltage, so that the accumulated voltage is scaled to a certain scale range to meet the requirements of the digital-to-analog converter. Enter the range.
- the convolution operation module meets the needs of unit reuse.
- the number of digits of the weight w ji is generally fixed, that is, the size of k is fixed.
- the unit of the higher number of bits Do not participate in the operation.
- the convolution operation unit 102 corresponding to the high number of digits is connected to the circuit, the power consumption of the circuit will increase.
- a group of cells associated with the k- th position of the weight is reused for input x i or input x ii, and the corresponding current is Ix i or Ix ii respectively , and the voltage signal corresponding to the current is Vgx i or Vgx ii .
- the multiplexer control signal controlled according to the bit k selects the voltage signal corresponding to the unused cell according to the remaining unused bit number corresponding to the cell, that is, the selected voltage V'gx i is the same as Vgx i or Vgx ii , respectively. Then the cell current I'x i corresponding to the bit k is the same as Ix i or Ix ii.
- the current can be controlled by the diode in the current mirror via the voltage V'gx i , the DAC can be reconfigured when inputting a given position number and the ADC may be used for quantizing the output y j, The resolution of the DAC or ADC can be matched with the number of bits of the corresponding output.
- the duration of the array PWM signal 1031 ranges from ⁇ to 2 (B-1) * ⁇ .
- the 1031 duration is ⁇ , and all weights are quantized as a single bit, instead of quantizing each bit of the 8-bit weight w ji in the previous case.
- 5 and 6 are an embodiment of adding a bias operation unit 1051 when the convolution operation unit 102 of the present invention is used for convolution neural network operation.
- a convolution operation such that the offset b is added a convolution operation more efficient and accurate, typically, for a given output y j adding binary offset b j. Then the corresponding convolution output y j is changed from Equation 1 to Equation 2 below.
- Figure 5 illustrates how to add this extra function in the multiplication phase. Since the implementation of the quantization of the bias bit is similar to the weight in FIG. 1 or FIG. 2, the realization of the bias is regarded as the fixed current I b of the additional input of the given current Ix i .
- each bias arithmetic unit 1061 (j, k) includes a current I b , a switch 1021, a bias arithmetic unit integral control module 1062, a node a j,k , and a capacitor 1022 with a value of C u;
- I b is integrated in the capacitor 1022, similar to the convolution operation stage, the weight w ji is transformed into b j , then the input of the bias AND gate in the integral control module 1062 of the bias operation unit is b j, k and b j , the PWM signal 1031 modulated by k-bit weights, the output of the bias AND gate controls the closing time of the switch
- the PWM signal 1031 is the same as the PWM signal 1031 at the weight w ji,k in the convolution operation unit 102.
- the duration of the PWM signal 1031 refers to the duration of the high level; when the bits b j,k are 1, and the PWM signal 1031 is high, the AND gate is biased The output is 1, the switch 1021 is closed at this time, the current I b is integrated into the capacitor 1022 through the switch, and the capacitor stores the charge; when the high level duration of the PWM signal 1031 elapses, the signal enters the low level state, and the switch 1021 is in the off state, the current I b does not pass, and the current stops integrating in the capacitor 1022.
- the capacitor 1022 After the switch 1021 is turned off, the capacitor 1022 has no new charge accumulation, and the stored charge is the accumulation in the high-level state; b j, When k is 0, the bias AND gate outputs 0. At this time, the switch 1021 is in an off state, the current I b does not pass, there is no current integration in the capacitor 1022, and the stored charge is 0. Similarly, the voltage across the capacitor 1022 is the calculation result of the bias operation unit 1061 in the multiplication stage.
- FIG. 6 illustrates that during the accumulation phase, an additional capacitor 1022 needs to be added for charge sharing and node accumulation.
- the k unit nodes a j,k corresponding to a given j are short-circuited. Due to the discharge characteristics of the capacitor 1022, the capacitors 1022 in the short-circuited array perform charge sharing, and the amount of stored charge in each capacitor 1022 is the same , But the total charge value does not change, the voltage of the combined node obtained is the cumulative sum of the voltages of each multiplication result node a ji,k in the multiplication stage, that is, the bias b of y j is 1*k group unit all nodes a j, k cumulative voltage sum, as shown in Figure 6, the physical realization of the arithmetic unit of convolution and bias is independent, but when outputting the convolution result with bias at the end, the convolution arithmetic unit 102 and the bias can be connected Set the corresponding node of the operation unit 1061, and the voltage of the combined node obtained is the result of the convolution operation with the bias added.
- the present invention also includes a multi-bit convolutional analog operation method based on time-variable current integration and charge sharing, including:
- the digital-to-analog converter 101 converts the digital input x i into an analog signal current Ix i according to the position number and transmits it in the circuit.
- the logic operation is in the integral control module 103.
- the input of the logic operation is the k- th bit w ji,k of the weight w ji and the PWM signal 1031 modulated according to the bit weight.
- the duration of the PWM signal 1031 in the k-direction convolution operation unit is doubled from low to high.
- the duration of the k-th PWM signal 1031 is 2 (k-1) * ⁇ , where ⁇ is the clock period of the PWM signal.
- the output control switch 1021 is closed.
- the current Ix i enters the capacitor 1022 through the nodes a ji, k connected to the upper plate of the capacitor and is integrated. After a period of integration, the voltage across the capacitor is obtained. After the switch is opened, the current does not pass through the node a ji. ,k , the voltage across the capacitor 1022 is 0 after integration for a period of time, the integration time is the duration of the PWM signal 1031 , and the voltage at the node a ji,k is the convolution operation x i *w ji,k *2 ( The result of the multiplication of k-1).
- the modules included are only divided according to the functional logic, but not limited to the above-mentioned division, as long as the corresponding function can be realized; in addition, the specific name of each functional unit is also It is just for the convenience of distinguishing each other, and is not used to limit the protection scope of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Analogue/Digital Conversion (AREA)
- Complex Calculations (AREA)
Abstract
Description
模组 | 10 |
数模转换器 | 101 |
卷积运算单元 | 102 |
积分控制模块 | 103 |
PWM信号 | 1031 |
静态随机存取存储器 | 1032 |
与门 | 1033 |
开关 | 1021 |
电容 | 1022 |
多路复用器 | 104 |
衰减电容 | 105 |
偏置单元阵列 | 106 |
偏置运算单元 | 1061 |
偏置积分控制模块 | 1062 |
数字输入 | x i |
电流 | Ix i |
权重 | w ji |
Claims (16)
- 基于时间可变的电流积分和电荷共享的多位卷积运算模组,其特征在于,包括:至少一个数字输入x i,至少一个数模转换器DAC,至少一个二进制权重w ji,由多个卷积运算单元构成的卷积运算阵列,至少一个输出y j;所述数字输入x i由DAC按照给定位数转换为模拟信号的电流Ix i在电路中传输;所述二进制权重w ji,j表示该权重是第j窗口的权重索引,w ji,k是权重w ji第k位上的值,w ji,k是0或1,k∈[1,B],其中B指二进制的最高位,每个位w ji,k对应一个卷积运算单元;所述的卷积运算阵列的规模为i*j*k,i方向是输入方向,j方向是卷积窗口方向,k方向卷积运算单元依权重w ji的每一位w ji,k由低位到高位依次排列;每个所述卷积运算单元包括输入电流Ix i、开关、积分控制模块、节点a ji,k、至少一个电容,电容一端接地;所述积分控制模块为给定的逻辑运算,所述逻辑运算的输入是w ji,k和依据w ji,k位权调制的PWM信号,k方向卷积运算单元内PWM信号持续时间依低位到高位2倍递增,第k位的PWM信号持续时间2 (k-1)*τ,τ为PWM信号的时钟周期,积分控制模块的输出控制所述开关的闭合;所述开关闭合时,电流Ix i通过与电容上极板连接的节点a ji,k进入电容内积分;开关断开时,电流Ix i不通过节点a ji,k;积分时间是PWM信号的持续时间,节点a ji,k的电压是卷积运算的x i*w ji,k*2 (k-1)的乘法结果;所述y j是通过短接一个i*k面的所有卷积运算单元内节点a ji,k,每个卷积运算单元内电容间电荷共享,得到的组合节点的电压,该电压为卷积运算的结果。
- 如权利要求1所述的模组,其特征在于,x i对应的1*k个卷积运算单元的组合电压是一个x i*w ji的结果,一个i*k面的卷积运算单元组合节点的电压是∑x i·w ji的结果,完成一次卷积核和输入矩阵的卷积过程的运算。
- 如权利要求2所述的模组,其特征在于,所述的输入x i是至少一位的二进制,转换输入x i的DAC的分辨率可以调整。
- 如权利要求3所述的模组,其特征在于,电流Ix i被电流镜镜像或复制到卷积运算阵列中,同一j*k面的电流是相同的,电流Ix i可以在数模转换器中缩放。
- 如权利要求4所述的模组,其特征在于,所述的积分控制模块的逻辑运算是与门,与门的输入一个是存储在一个SRAM单元的位w ji,k,另一个是随k逐位递增持续时间以τ为基数2倍递增的PWM信号,与门的输出控制所述开关闭合;不同权重w ji的同一k位所对应的卷积运算单元,PWM信号持续时间相同,同一权重w ji不同位对应的卷积运算单元PWM信号的持续时间不同,分别为2 (k-1)*τ。
- 如权利要求5所述的模组,其特征在于,使用计数器或时钟分频器生成最快速度的PWM时钟信号,加快电容积分速度。
- 如权利要求1至6任一项所述的模组,其特征在于,所述卷积运算单元内的开关为虚拟开关或电流器,减弱电流镜上的反冲或瞬态效应。
- 如权利要求7所述的模组,其特征在于,数字输入x i和权重w ji的位数可以重新配置用于重新数字输入x i或新的输入x ii,包括:多路复用器接收重新输入x i和x ii,根据权重w ji剩余未利用的位数对应的卷积运算单元选择与未利用单元符合的输入电压信号,输出的电压信号进入卷积运算单元中;将用于重利用的未利用卷积运算单元中与位权对应的PWM信号持续时间重新配置。
- 如权利要求8所述的模组,其特征在于,重利用阶段,至少一个多路复用器的位数与权值编码的位数适应,多路复用器的输出是权重位数k控制的。
- 如权利要求9所述的模组,其特征在于,所述卷积运算阵列还包括偏置模块,所述偏置模块包括:由多个偏置运算单元构成的偏置单元阵列,所述的偏置单元阵列的规模为j*k,每个偏置运算单元(j,k)包括电流I b、开关、积分控制模块、节点a j,k、值为C u的电容;所述的电流I b是电流Ix i附加的固定电流;b j,k是多位二进制偏置b j的第k位,偏置运算单元(j,k)内电容内 部电流积分时间为b j,k*2 (k-1)τ;所述的积分控制模块中,b j,k与依b j,k位权调制的PWM信号经与门运算输出控制所述开关闭合,控制偏置运算单元内电容中电流I b的积分时间;y j的偏置为1*k组单元所有节点a j,k累积电压和。
- 如权利要求10所述的模组,其特征在于,当组合节点的累积电压摆动高于模数转换器输入范围或高于阈值,输出y j在连接模数转换器前并联衰减电容器来调整累积电压的全刻度范围。
- 基于时间可变的电流积分和电荷共享的多位卷积运算方法,其特征在于,包括如下步骤:DAC按照给定位数将数字输入x i转换为模拟信号的电流Ix i在电路中传输,i∈[1,N],N为正整数;电流Ix i到达开关时,进行一个逻辑运算,逻辑运算的输入是权重w ji的第k位w ji,k和依据w ji,k位权调制的PWM信号,j表示当i确定后,相应的第j窗口,k方向卷积运算单元内PWM信号持续时间依低位到高位2倍递增,第k位的PWM信号持续时间2 (k-1)*τ,τ为PWM信号的时钟周期,该逻辑运算的输出控制所述开关的闭合;开关闭合后,电流Ix i通过与电容上极板连接的节点a ji,k进入电容内积分,积分一段时间后得到电容两端的电压,开关断开后,电流不通过节点a ji,k,积分一段时间后得到电容两端的电压为0,所述积分时间是PWM信号的持续时间,节点a ji,k的电压是卷积运算的x i*w ji,k*2 (k-1)的乘法结果;
- 如权利要求12所述的运算方法,其特征在于DAC在转换数字输入x i前,对DAC的分辨率进行调整。
- 如权利要求13所述的运算方法,其特征在于,在进行所述逻辑运算之前,使用计数器或时钟分频器生成最快速度的PWM时钟信号,加快电流的积分速度。
- 如权利要求14所述的运算方法,其特征在于在输入一次x i后,对未利用的卷积运算单元进行重利用,包括:使用多路复用器接收重新输入x i和x ii,根据权重w ji剩余未利用的位数对应的卷积运算单元选择与未利用单元符合的输入电压信号,输出的电压信号进入卷积运算单元中;在选择输入电压信号之后,对未利用卷积运算单元中与位权对应的PWM信号持续时间重新配置。
- 如权利要求15所述的运算方法,其特征在于,在连接ADC输出y j前,并联衰减电容器来调整累积电压的全刻度范围,使组合节点的累积电压摆动低于模数转换器输入范围。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010257151.0A CN111144558B (zh) | 2020-04-03 | 2020-04-03 | 基于时间可变的电流积分和电荷共享的多位卷积运算模组 |
CN202010257151.0 | 2020-04-03 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021197073A1 true WO2021197073A1 (zh) | 2021-10-07 |
Family
ID=70528805
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/081322 WO2021197073A1 (zh) | 2020-04-03 | 2021-03-17 | 基于时间可变的电流积分和电荷共享的多位卷积运算模组 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111144558B (zh) |
WO (1) | WO2021197073A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11899518B2 (en) | 2021-12-15 | 2024-02-13 | Microsoft Technology Licensing, Llc | Analog MAC aware DNN improvement |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111144558B (zh) * | 2020-04-03 | 2020-08-18 | 深圳市九天睿芯科技有限公司 | 基于时间可变的电流积分和电荷共享的多位卷积运算模组 |
CN111431536B (zh) | 2020-05-18 | 2023-05-02 | 深圳市九天睿芯科技有限公司 | 子单元、mac阵列、位宽可重构的模数混合存内计算模组 |
CN112232501B (zh) * | 2020-12-11 | 2021-09-28 | 中科南京智能技术研究院 | 一种存内计算装置 |
CN113516172B (zh) * | 2021-05-19 | 2023-05-12 | 电子科技大学 | 基于随机计算贝叶斯神经网络误差注入的图像分类方法 |
CN115048075A (zh) * | 2022-04-27 | 2022-09-13 | 北京大学 | 基于电容耦合的sram存算一体芯片 |
CN114723031B (zh) * | 2022-05-06 | 2023-10-20 | 苏州宽温电子科技有限公司 | 一种计算装置 |
US20230386565A1 (en) * | 2022-05-25 | 2023-11-30 | Stmicroelectronics International N.V. | In-memory computation circuit using static random access memory (sram) array segmentation and local compute tile read based on weighted current |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5521857A (en) * | 1992-12-15 | 1996-05-28 | France Telecom | Process and device for the analog convolution of images |
CN108629411A (zh) * | 2018-05-07 | 2018-10-09 | 济南浪潮高新科技投资发展有限公司 | 一种卷积运算硬件实现装置及方法 |
CN108805270A (zh) * | 2018-05-08 | 2018-11-13 | 华中科技大学 | 一种基于存储器的卷积神经网络系统 |
CN110008440A (zh) * | 2019-04-15 | 2019-07-12 | 合肥恒烁半导体有限公司 | 一种基于模拟矩阵运算单元的卷积运算及其应用 |
TW201935266A (zh) * | 2018-02-12 | 2019-09-01 | 美商耐能股份有限公司 | 卷積運算裝置及卷積神經網路的卷積輸入的調規方法 |
CN111144558A (zh) * | 2020-04-03 | 2020-05-12 | 深圳市九天睿芯科技有限公司 | 基于时间可变的电流积分和电荷共享的多位卷积运算模组 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108629406B (zh) * | 2017-03-24 | 2020-12-18 | 展讯通信(上海)有限公司 | 用于卷积神经网络的运算装置 |
GB2568102B (en) * | 2017-11-06 | 2021-04-14 | Imagination Tech Ltd | Exploiting sparsity in a neural network |
CN108764467B (zh) * | 2018-04-04 | 2021-08-17 | 北京大学深圳研究生院 | 用于卷积神经网络卷积运算和全连接运算电路 |
CN109460817B (zh) * | 2018-09-11 | 2021-08-03 | 华中科技大学 | 一种基于非易失存储器的卷积神经网络片上学习系统 |
CN109104197B (zh) * | 2018-11-12 | 2022-02-11 | 合肥工业大学 | 应用于卷积神经网络的非还原稀疏数据的编译码电路及其编译码方法 |
CN109800876B (zh) * | 2019-01-18 | 2021-06-01 | 合肥恒烁半导体有限公司 | 一种基于NOR Flash模块的神经网络的数据运算方法 |
CN110378193B (zh) * | 2019-05-06 | 2022-09-06 | 南京邮电大学 | 基于忆阻器神经网络的羊绒羊毛识别方法 |
CN110543933B (zh) * | 2019-08-12 | 2022-10-21 | 北京大学 | 基于flash存算阵列的脉冲型卷积神经网络 |
-
2020
- 2020-04-03 CN CN202010257151.0A patent/CN111144558B/zh active Active
-
2021
- 2021-03-17 WO PCT/CN2021/081322 patent/WO2021197073A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5521857A (en) * | 1992-12-15 | 1996-05-28 | France Telecom | Process and device for the analog convolution of images |
TW201935266A (zh) * | 2018-02-12 | 2019-09-01 | 美商耐能股份有限公司 | 卷積運算裝置及卷積神經網路的卷積輸入的調規方法 |
CN108629411A (zh) * | 2018-05-07 | 2018-10-09 | 济南浪潮高新科技投资发展有限公司 | 一种卷积运算硬件实现装置及方法 |
CN108805270A (zh) * | 2018-05-08 | 2018-11-13 | 华中科技大学 | 一种基于存储器的卷积神经网络系统 |
CN110008440A (zh) * | 2019-04-15 | 2019-07-12 | 合肥恒烁半导体有限公司 | 一种基于模拟矩阵运算单元的卷积运算及其应用 |
CN111144558A (zh) * | 2020-04-03 | 2020-05-12 | 深圳市九天睿芯科技有限公司 | 基于时间可变的电流积分和电荷共享的多位卷积运算模组 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11899518B2 (en) | 2021-12-15 | 2024-02-13 | Microsoft Technology Licensing, Llc | Analog MAC aware DNN improvement |
Also Published As
Publication number | Publication date |
---|---|
CN111144558B (zh) | 2020-08-18 |
CN111144558A (zh) | 2020-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021197073A1 (zh) | 基于时间可变的电流积分和电荷共享的多位卷积运算模组 | |
US20210365241A1 (en) | Multiplication and accumulation circuit based on radix-4 booth code and differential weight | |
US11948659B2 (en) | Sub-cell, mac array and bit-width reconfigurable mixed-signal in-memory computing module | |
CN111448573B (zh) | 用于混合信号计算的系统和方法 | |
US11809837B2 (en) | Integer matrix multiplication based on mixed signal circuits | |
WO2021004466A1 (zh) | 一种基于多位并行二进制突触阵列的神经形态计算电路 | |
CN111611529B (zh) | 电容容量可变的电流积分和电荷共享的多位卷积运算模组 | |
CN115048075A (zh) | 基于电容耦合的sram存算一体芯片 | |
WO2023123973A1 (zh) | 实现卷积运算的电路及其方法 | |
Seo et al. | ARCHON: A 332.7 TOPS/W 5b variation-tolerant analog CNN processor featuring analog neuronal computation unit and analog memory | |
CN111611528B (zh) | 电流值可变的电流积分和电荷共享的多位卷积运算模组 | |
Al Maharmeh et al. | A comparative analysis of time-domain and digital-domain hardware accelerators for neural networks | |
Al Maharmeh et al. | Compute-in-time for deep neural network accelerators: Challenges and prospects | |
Lim et al. | AA-ResNet: Energy efficient all-analog ResNet accelerator | |
US20220416801A1 (en) | Computing-in-memory circuit | |
Lin et al. | A reconfigurable in-SRAM computing architecture for DCNN applications | |
CN113741857A (zh) | 一种乘累加运算电路 | |
CN112784971A (zh) | 基于数模混合神经元的神经网络运算电路 | |
Youssefi et al. | Hardware realization of mixed-signal neural networks with modular synapse-neuron arrays | |
Mirhassani et al. | Robust low-sensitivity adaline neuron based on continuous valued number system | |
US20240036525A1 (en) | Energy efficient digital to time converter (dtc) for edge computing | |
Youssefi et al. | Efficient mixed-signal synapse multipliers for multi-layer feed-forward neural networks | |
Yin et al. | A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface | |
Musah et al. | A DCSRO Based Time Domain MAC Core | |
CN116434802A (zh) | 有符号位的sram多值单元及存算一体芯片 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21780953 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21780953 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.03.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21780953 Country of ref document: EP Kind code of ref document: A1 |