WO2023160735A2 - Procédé de fonctionnement et unité de fonctionnement - Google Patents

Procédé de fonctionnement et unité de fonctionnement Download PDF

Info

Publication number
WO2023160735A2
WO2023160735A2 PCT/CN2023/099468 CN2023099468W WO2023160735A2 WO 2023160735 A2 WO2023160735 A2 WO 2023160735A2 CN 2023099468 W CN2023099468 W CN 2023099468W WO 2023160735 A2 WO2023160735 A2 WO 2023160735A2
Authority
WO
WIPO (PCT)
Prior art keywords
signal
component
input
output
mapping
Prior art date
Application number
PCT/CN2023/099468
Other languages
English (en)
Chinese (zh)
Other versions
WO2023160735A3 (fr
Inventor
闫锋
吴天泽
岳涛
王凯
Original Assignee
南京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京大学 filed Critical 南京大学
Priority to PCT/CN2023/099468 priority Critical patent/WO2023160735A2/fr
Publication of WO2023160735A2 publication Critical patent/WO2023160735A2/fr
Publication of WO2023160735A3 publication Critical patent/WO2023160735A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the invention relates to a calculation method and a calculation unit, belonging to the field of integrated circuits.
  • the convolution acceleration chips of the traditional convolutional neural network need to use digital modules such as multipliers and accumulators to realize linear operations such as convolution and full connection.
  • the calculation method and device based on the multiplication and addition operation can realize the MAC operation in the traditional neural network based on the linear neuron circuit.
  • the neuron morphology circuit of the present invention only needs about 10 transistors to be realized, but an 8-bit multiplier requires tens of thousands of transistors. Therefore, in terms of power consumption, area, etc., the present invention Compared with the multipliers and accumulators used in traditional convolutional neural network acceleration chips to realize MAC operations, the published calculation method and chip device for multiply-accumulate operations have great advantages.
  • the relatively mature and mainstream implementation methods of "integrated storage and computing" in the industry include Flash-based integrated storage and computing, RRAM and MRAM-based integrated storage and computing, etc.
  • the off-chip weight A single MAC operation can be completed with the input excitation through a single floating gate device, which is also the key to the concept of "integration of storage and calculation", but this method has certain limitations: the floating gate device completes a single MAC operation, and the operation result is in the form of Output in the form of analog current, which means that the current is input into the I-V conversion module, connected to the ADC, and the signal is read out.
  • the multiplication and addition operation in the analog domain of the floating gate device array often brings problems such as low signal-to-noise ratio and IR-Drop, and the design of high-precision ADC is relatively complicated, and the ADC will also bring constraints such as large area and large power consumption. Therefore, in comparison, the ASIC chip device for multiplication and addition operations announced by the present invention also uses off-chip digital excitation input and off-chip weight input to realize the digital readout of the analog domain pulse voltage output by neurons, thereby completely avoiding the above limitations.
  • the object of the present invention is to provide a method and device for computing.
  • CPU operations In the field of hardware acceleration of artificial intelligence algorithms, common hardware-based multiplication and accumulation solutions are as follows: CPU operations, GPU operations, FPGA accelerated operations, and new types of "integrated storage and calculation" architecture operations such as RRAM and Flash.
  • the CPU has no obvious advantage in speed over other types of operations in accelerating neural network operations, which is related to the working principle of the CPU itself.
  • GPU and FPGA operations are used to accelerate linear operations in neural networks, multiplication accumulators designed with digital logic circuits are used. Therefore, when pursuing lower power consumption and higher computing power, the chip The design process has certain requirements, and the smaller the node process, the greater the improvement in computing performance. However, this has brought about a drawback that the research and development cost is also getting higher and higher.
  • the neural network accelerator using the digital-analog hybrid design is superior to the neural network accelerator designed only using digital logic circuits in terms of the key index of computing energy efficiency (computing power/power consumption).
  • Network accelerators therefore, the new RRAM, Flash and other "storage-computing integration" architecture operations have already occupied a place in the field of AI algorithm acceleration, but this "storage-computing integration" computing architecture is in terms of floating-gate device storage-computing integration operations.
  • the input is a digital control signal
  • the digital control signal is converted into a corresponding analog voltage control signal through the level shift module, which is applied to the port of the device, so that the device operates according to its own electrical and physical characteristics.
  • the result of the operation is output in the form of analog current, which means that the current is input into the I-V conversion module and connected to the ADC to realize the readout of the signal.
  • the multiplication and addition operation in the analog domain of the floating gate array often brings problems such as low signal-to-noise ratio and IR-Drop, and the design of high-precision ADC is relatively complicated, and the ADC will also bring constraints such as large area and large power consumption.
  • the present invention provides a technical solution for multiplication and accumulation operations and the corresponding digital-analog hybrid circuit computing device, which avoids the current digital-analog hybrid computing architecture of "integration of storage and calculation" at the level of program design.
  • the input of the device can generally use the grayscale binary data of the image as the excitation input and the output waveform of the pulse width modulator with the binary signal as the input as the weight input.
  • This is also the core of the computing method provided by the present invention, and the working principle on which the corresponding digital-analog hybrid computing device is based. The following is a detailed introduction of an embodiment of the computing device corresponding to the computing method provided by the present invention.
  • an operation method which is characterized in that it includes at least one of an electrical signal input component and an optical signal generation component, as well as a mapping component, an interconnection component, a calculation component, and an output component, and: the The electrical signal input component includes at least one electrical signal A input terminal for inputting the electrical signal A;
  • the optical signal generating component is used to shoot for the target field of view, or accepts the optical signal of other optical signal shooting or generating components to generate the optical signal B;
  • the mapping component is connected with the electrical signal A input component and/or optical signal respectively
  • the signal generating component is connected to receive the electrical signal A of the electrical signal input component and/or the optical signal B of the optical signal generating component, and map the electrical signal A and/or optical signal B to generate signals representing the electrical signals respectively.
  • the interconnection component selects part of the signals from the digital signals output by the electrical signal A and/or optical signal B through the mapping component as the excitation input part, and the rest
  • the excitation input part and at least part of the weight input part are cross-linked to form a multiply-accumulate (MAC) architecture, and the required subset is taken for the fully connected structure in the neural network algorithm of machine learning , to generate an operation corresponding to the digital signal after the electrical signal A and the optical signal B pass through the mapping component, and the output of the subset includes the required operation results of the excitation input part and the weight input part.
  • MAC multiply-accumulate
  • the calculation component includes a switch unit, an integration unit, a reset unit, an integral discharge unit, and a pulse release unit, which are connected to the interconnection component and receive the calculation results of the input part and the weight part from the interconnection component. , to obtain the required target amount of results.
  • the weight value is input, and a fully-connected sub-architecture corresponding to the required operation under the multiply-accumulate architecture is generated from the set of weight values.
  • the output signal of the mapping component of the excitation input signal of the full connection operation is correspondingly subjected to a logical AND operation,
  • the matrix of P and the sub-matrix of size ⁇ P ⁇ P in the matrix of ⁇ Q ⁇ Q are multiplied according to the one-to-one correspondence of all elements in the matrix
  • the method of the embodiment of the present invention is used to perform the multiplication and addition operation of Q input signals under the condition of Q n-bit weight units, wherein: 1) the electrical signal input component and/or the optical signal capturing component are Q 1, used to input the expected Q excitation input signals, 2) the weight units are Q in total, used to input the expected Q weight input signals, the value of each weight is represented by n bits, 3 ) performing an AND operation on the Q binary excitation signals and Q n-bit weight signals respectively, and then obtaining the values of the Q binary excitation signals modulated by the Q pulse width modulation waveforms, and combining these The value is input into the calculation component (linear neuron counting circuit), and the multiplication and addition operation result of multiplying and accumulating Q input signals by Q n-bit weight signals is obtained.
  • the electrical signal input component and/or the optical signal capturing component are Q 1, used to input the expected Q excitation input signals
  • the weight units are Q in total, used to input the expected Q weight input signals
  • the value of each weight is represented
  • mapping method of the mapping component is a binary format, a pulse width modulation format, a format that maintains the output of the original input component and/or the camera component, an analog-to-digital conversion format, a random number format, and the like.
  • the electrical signal A or the optical signal B and the weight part are mapped in binary form, and the operation is a bitwise OR operation of binary numbers.
  • the result of the target quantity obtained by the calculation component is: a voltage pulse signal with a certain frequency and a duty ratio; The counted number of pulses.
  • the electrical signal A or the optical signal B is mapped in a binary form, and before and after the logical AND operation with the weight part is performed, a shift operation is performed on the mapped binary signal.
  • the high-level duration of the pulse signal is determined by the size of the signal itself, and the high-level signal is a continuous signal without low-level signal interruption in the middle.
  • the electrical signal A or the optical signal B is mapped in the form of random numbers, and the signal is mapped to a random pulse sequence that releases a high-level signal subject to a certain probability, wherein the probability is determined by the size of the signal itself.
  • the electrical signal A or the optical signal B is an analog current signal, and the signal directly skips the mapping component and the interconnection component, and is connected to the input end of the integration unit of the computing component.
  • the output result is used to characterize the electrical signal A or optical signal B output by the input component.
  • a kind of operation method is provided, it is characterized in that, take multiplication and accumulation operation as the basic operation module, and decompose the linear operation involved in the machine learning algorithm to be carried out into multiplication and accumulation operation as The operation paradigm of the unit, the nonlinear operation involved is expressed by the superposition of linear operation, and the linear operation unit involved is further decomposed into the operation paradigm with multiplication and accumulation operation as the unit, and the operation paradigm is set according to the operation paradigm
  • the connection structure of computing components, each multiplication and accumulation operation module in the connection structure 1) adopts an excitation signal input component, and 2) adopts a weight signal input component, and also includes, adopts a mapping component, a calculation component, and an output component , and the mapping component is connected to the excitation signal input component and the weight signal input component respectively, and maps the excitation signal and the weight signal, and maps the mapped excitation signal and the mapped weight signal Carry out logical AND operation, input the operation result to the calculation component, and the calculation component performs combined operation with multi
  • a computing unit including at least one of an electrical signal input component and an optical signal generating component, characterized in that: the electrical signal input component includes at least one electrical signal A input terminal for input An electrical signal analog quantity expressing the electrical signal A; an optical signal generation component, used for shooting a target field of view, or receiving an optical signal from another optical signal component, and generating an optical signal B analog quantity, the optical signal B It includes image data with a certain bit width, or a voltage pulse sequence in a time domain, or an analog current of a certain magnitude; it also includes a mapping component, which is respectively connected to the electrical signal input component and/or the optical signal generation component, and mapping the electrical signal A and/or optical signal B to generate digital signals respectively representing the values of the electrical signal A and optical signal B; the interconnection components adopt a plurality of electrical and/or optical paths and cross-linked structures, The structure corresponds to the calculation performed on the digital signal of the electrical signal A and/or optical signal B after the mapping component; the calculation component is connected to
  • the computing unit of the embodiment of the present invention is used to perform a multiply-accumulate (MAC) operation, and it is characterized in that: the interconnection component adopts the fully connected structure in the algorithm on the machine learning neural network to combine the data from the mapping component
  • the number of output signals, N is the number of interconnection components connected to the calculation components in the computing device; 2) the mapping component includes N ⁇ M output terminals, and the output, the N ⁇ M matrix M elements in each row in are combined in a "point-to-point" manner with the output signals of M mapping components that are set as excitation input signals of fully connected operations in the machine learning algorithm, and logical AND operations are performed, and 3)
  • the output terminal of the interconnection component is connected to the input terminal of a corresponding row of the calculation components, so that the output signal of the interconnection component is
  • the computing unit of the embodiment of the present invention is used to perform multiplication and addition operations of M input signals under the condition of M n-bit weight units, and is characterized in that: 1) the total number of the electrical signal input components and the optical signal capture components M, the M input terminals are respectively used to input the desired M excitation input signals, 2) the M weight input terminals, the input quantities are respectively used to represent n-bit weight units; and, the M binary excitation signals are logically ANDed with M n-bit weight units respectively, and then the values of the M binary excitation signals modulated by the N pulse width modulation waveforms are obtained, and these values are input into
  • the multiplication and addition operation results obtained by multiplying and accumulating M excitation input signals by M n-bit weight units are obtained.
  • the type of the mapping component is one of binary type, pulse width modulation type, analog-to-digital conversion type, and random number type.
  • the electrical signal input component can be a kind of general-purpose input and output interface of SOC, ASIC, FPGA class chip, or be a kind of of register file (Register File), SRAM, DRAM, RRAM class storage unit, or be used for Other components that take an input of an analog current signal or an analog voltage signal.
  • register file Register File
  • the optical signal generating component is an image sensor chip, including a CIS chip, a CCD chip, a single photon avalanche detection diode array, etc., which are used to perceive the optical signal and convert the optical signal into a binary image signal with a certain bit width or a period of time domain pulse. sequence, or an analog current signal.
  • the mapping component is used to map the output signal of the input component as described in the embodiment of the present invention.
  • the mapping component is equivalent to the input component, that is, no change is made to the input component at this time.
  • the mapping component is replaced by the input component; when the output signal of the input component is an analog voltage signal, the mapping component can be an analog-to-digital conversion device at this time, and the mapping component is connected to the subsequent interconnection component, or the mapping component is input component, at this time the output signal of the mapping component is the output signal of the input component, and the output signal is directly used as the input signal of the computing component for the computing device; when the output signal of the input component is an analog current signal At this time, the analog current signal skips the mapping component and the interconnection component, and is connected to the integration unit of the computing component in the computing unit according to the embodiment of the present invention, and the output of the computing component can represent the magnitude of the input analog current signal.
  • the mapping component can be a pulse width modulation waveform generating device, which outputs a number of high-level signals corresponding to the size of the output binary signal of the input component, and the high-level signal is a continuous signal without low voltage in the middle.
  • Flat signal isolation; the duration of the output signal of the device is determined by the bit width of the output binary signal of the input component.
  • the mapping component is a random pulse sequence generating device
  • the random pulse sequence generating device includes a linear feedback shift register and a comparator, and is set to, when the output signal of the input component is a binary signal, the binary signal is used as The initial value of the linear feedback shift register; and in each clock cycle, the linear feedback shift register is set to, its output value is compared with its initial value input to the comparator, if the initial value is greater than the current linear feedback shift register
  • the output value of the comparator is 1, otherwise the output of the comparator is 0; further, the comparator is set to output a random pulse sequence whose occurrence probability of a high-level signal is determined by the output binary signal of the input component, wherein the pulse sequence The length of is determined by the bit width of the output binary signal of the input component.
  • the switch unit of the calculation component is set to accept the output signal of the interconnection component, and output a signal determined by the switch state according to the output signal of the interconnection component;
  • the reset unit is set to be responsible for the reset function of the calculation component, that is, when the reset signal works, the calculation
  • the output signal of the component is zero, and the integration unit is used to integrate the output signal of the switch unit.
  • the pulse release unit starts to release the pulse, and at the same time, the discharge unit discharges the integration signal .
  • the computing component exhibits a linear relationship between the output signal and the input signal; when the input signal is a binary signal, the binary signal is "1", the switch is on or off, and the binary signal is "0”, the switch state is off Or open, and the above two switch states are opposite states, and the number of the switches opened is linearly related to the output signal, where the output signal is a pulse signal with a certain frequency and duty cycle.
  • the characteristic of the calculation component is that the output signal has a linear relationship with the input signal, and when the input signal is an analog current signal, the input signal does not need to pass through the switch unit, but directly enters the integration unit, and the pulse frequency of the output signal is the same as The magnitude of the input analog current signal has a linear relationship.
  • the output component is a digital counter whose input signal is a clock signal and/or a reset signal, or a pulse releasing unit of the calculation component.
  • the operation unit satisfies the number of high levels (low levels) of the input signal of the calculation component and the The number of pulses counted by the counting unit of the output component has a linear relationship, and the calculation component does not need to include all the units contained in the calculation component in the calculation unit as described in the embodiment of the present invention, and is also applicable to the implementation of the present invention Example of the arithmetic unit.
  • step 1 mapping signal X and signal Y to obtain the mapping form of signal X and signal Y; step 2, targeting signal X and signal Y
  • step 2 targeting signal X and signal Y
  • the mapping form of is calculated to obtain the signal Q, wherein the signal Q is set as the algebraic operation result of the signal X and the signal Y.
  • signal X is mapped in binary form and signal Y is mapped in pulse width modulated form.
  • a logical AND operation is performed on the mapped signals of the signal X and the signal Y.
  • the signal Q is in the form of a pulse signal with a certain frequency and a duty cycle.
  • the operation when the signal X is mapped in a binary form, the operation further includes a shift operation of the binary signal itself before and after the execution of the logical AND operation.
  • the high-level duration of the pulse signal is determined by the signal itself, wherein the high-level signal is a continuous signal, and there is no interruption of the low-level signal in the middle.
  • the signal after the operation of the signal X and the signal Y is subjected to a mapping to obtain the signal Q.
  • the computing device includes: an input component, which is used for the input of signal X or signal Y, and which is connected to the mapping component in the device for calculation; a shooting component, which is used for shooting and quantifying the target field of view, to Obtain image data with a certain positioning width, a period of time domain pulse sequence or an analog current of a certain magnitude as signal X or signal Y, when the photographing component outputs image data with a certain positioning width or a period of time domain pulse sequence, the component Connected to the mapping component in the device for calculation, when the shooting component outputs an analog current signal of a certain size, make the analog current signal skip the mapping component and the interconnection component, and connect to the integration unit of the calculation component, At this time, the output of the calculation component can directly represent the magnitude of the analog current signal output by the shooting component; the mapping component is used for mapping the signal X or the signal Y, so as to map the signal X or the signal Y as described in the embodiment of the present invention The form is mapped; the interconnection component is used
  • mapping components When there are multiple mapping components, these mapping components need to be operated according to a certain combination scheme as described in the embodiment of the present invention, and the components are connected to the computing components used in the computing device; the computing components, the components are used It is used to perform calculations on signals output by interconnected components.
  • the assembly generally includes a switch unit, an integration unit, a reset unit, a discharge unit and a pulse release unit, the assembly is connected to the output assembly for the computing device;
  • the output component is used to collect the output signal of the calculation component, and generally consists of a counting unit.
  • the input component of signal X is a register file
  • the input component of signal Y is an ASIC core slice I/O.
  • the signal X is used as an excitation input unit with a bit width of 4 bits, and the mapping component of the signal X is directly replaced by the input component of the signal X, that is, the output signal of the mapping component of the signal X is the input component of the signal X
  • the output signal of the signal; the signal Y is used as a weight unit, and its bit width is 4 bits.
  • the mapping component of the signal Y is a pulse width modulation waveform generator, and the 4-bit weight unit is generated by the pulse width modulation waveform generator. Wave form.
  • the signal X is 1024 4-bit register files, which are input as the excitation signal of the interconnection components; the signal Y is 9 pulse width modulation generators, which are input as the weight signals of the interconnection components, and the interconnection scheme uses a machine learning algorithm
  • the output signals of 1024 mapping components of signal X are arranged into a 32 ⁇ 32 matrix, and the output signals of 9 mapping components of signal Y are arranged into a 3 ⁇ 3 matrix.
  • the arrayed matrix adopts a "point-to-point" combination scheme with full coverage of all elements, a scale of 3 ⁇ 3, and a step of 1 for logical AND operations, and the 1024 ⁇ 9 output signals obtained from the operations are used as the computing components of the device for computing input signal.
  • the signal X is nine 4-bit register files, which are input as the excitation signal of the interconnection components;
  • the signal Y is 36 pulse width modulation generators, which are input as the weight signals of the interconnection components, and the interconnection scheme uses a machine learning algorithm
  • the output signals of the nine signal X mapping components and the output signals of the nine signal Y mapping components are logically ANDed. Since the number of output signals of the signal Y mapping components is 36, the same Within a time period, the logical AND operation needs to be executed 4 times in parallel, and 36 output signals obtained from the 4 logical AND operations are used as input signals of the computing component of the computing device.
  • a single computing component is composed of 9 switching units, an integrating unit, a reset unit, a bleeding unit and a pulse releasing unit.
  • there are 1024 computing components in total that is, there are 1024 ⁇ 9 input signals, and the input signals are 1024 ⁇ 9 output signals of the interconnection components.
  • the output signals of 1024 calculation components are 1024 kinds of pulse signals with their own frequencies and duty ratios.
  • 1024 kinds of pulse signals with respective frequencies and duty ratios output by 1024 computing components are input to 1024 output components, and the output components are digital counters, and the counters are within the time period during which the pulse width modulation waveform generator generates waveforms, Count the input pulse signal with certain frequency and duty cycle.
  • the data counted by 1024 digital counters is the result of convolution operation between signal X and signal Y.
  • the invention provides a calculation method and device, which belong to the field of integrated circuits, and can realize operations such as full connection and convolution in neural network operations.
  • the ASIC chip used for computing provided by the present invention compares in area and power consumption
  • the multipliers and accumulators used by traditional convolutional neural network acceleration chips to implement MAC operations have great advantages.
  • Fig. 1 is a block diagram of a device for computing in an embodiment of the present invention
  • Fig. 2 is an effect diagram of performing pulse width modulation form mapping on the output signal of the input component in the present invention
  • Fig. 3 is a kind of circuit block diagram that is used for computing component in the embodiment 1 of the present invention.
  • Fig. 4 is a kind of circuit block diagram that is used for computing component in embodiment 2 of the present invention.
  • Fig. 5 is a circuit block diagram of a photographing component for realizing output of a time-domain pulse sequence in Embodiment 3 of the present invention
  • Fig. 6 is a schematic diagram of a full-connection connection method used to realize the weight signal input and the excitation signal input of interconnection components in Embodiment 4 of the present invention.
  • FIG. 3 a specific implementation circuit block diagram of the calculation component in the above-mentioned apparatus for calculation is provided, as shown in FIG. 3 .
  • the circuit consists of 9 PMOS transistors as switching units, wherein the gate terminals of the PMOS transistors are connected to the output signals of the interconnection components in the computing device, the source terminals of the 9 PMOS transistors are fixedly biased, generally set as IDC, and the drains of the 9 PMOS transistors are The terminal is connected to the inverting input terminal of an operational amplifier, the non-inverting input terminal of the operational amplifier is connected to the reference voltage V ref , the output terminal of the operational amplifier is connected to the gate terminal of a PMOS transistor, and the source terminal of the PMOS transistor is connected to the negative terminal of the operational amplifier.
  • the phase input terminal here the function of the operational amplifier is to clamp the voltage of the source terminal of the PMOS tube unchanged, that is, to play a clamping role.
  • the drain terminal of the PMOS tube is connected to the drain terminal of the NMOS tube on a branch of a Cascode current mirror. Through the copy effect of the current mirror, the current on the branch is copied to the upper plate of the capacitor used as the integration unit, and the lower plate of the capacitor Connect to VSS.
  • the upper plate of the capacitor is connected to the drain end of an NMOS reset transistor used to reset the unit, the source end of the transistor is connected to VSS, and the gate end of the transistor is connected to the external reset signal. When the reset signal is high, the computing component Perform a reset operation.
  • the gate of the NMOS discharge tube used for the integral discharge unit is connected to the upper plate of a stable capacitor, the source terminal is connected to VSS, the source terminal of the PMOS extraction tube used to extract the charge of the upper plate of the integral unit capacitor is connected to VDD, and the drain The end is connected with the drain end of the NMOS transistor discharge pipe.
  • the drain terminal of the NMOS discharge tube is connected to two-stage inverters, wherein the output of the first-stage inverter is connected to the gate terminal of the PMOS extraction tube, and the output of the second-stage inverter is connected to the upper plate of the stabilizing capacitor.
  • the upper plate of the integrating capacitor is additionally connected with a second-level inverter, which is used as a pulse release unit, and the output of the second-level inverter is the output of the embodiment of the calculation component, thereby realizing the pulse release unit The pulse release function.
  • FIG. 4 a specific implementation circuit block diagram of the calculation component in the above-mentioned apparatus for calculation is provided, as shown in FIG. 4 .
  • This embodiment only replaces the Cascode current mirror, operational amplifier, operational amplifier output tube, 9 PMOS switch tubes and other circuit parts used to charge the upper plate of the integral capacitor of the integration unit in the above-mentioned embodiment 1, and retains other parts in the embodiment 1. components of the circuit.
  • the current mirror in the figure has 9 copy branches, and each branch has a transmission gate as a switch unit to control whether the current is turned on or not.
  • the drain terminal of the PMOS transistor is connected to the upper plate of the integrating capacitor to charge the capacitor.
  • the structures and functions of the remaining circuit parts are the same as those in Embodiment 1.
  • a specific circuit implementation scheme of the above-mentioned photographing component that outputs a time-domain pulse sequence is provided.
  • the Gate signal When the Gate signal is high, the SPAD device is in an overvoltage state, and the gate signal controls the drain voltage of the N1 tube at the gate terminal to 0V. When the SPAD device receives photons, the device has an avalanche effect.
  • the Gate signal becomes low , the N1 tube is turned off, and the N1 tube is regarded as a resistor at this time, and its drain terminal gets a divided voltage V.
  • the drain terminal voltage of the N1 tube changes from the divided voltage V to 0, so that a pulse voltage signal is formed at the drain terminal of the N1 tube, and the signal passes through N2, P2
  • the pulse voltage signal at the drain end of the N1 tube is converted into a square wave signal S with a certain pulse width, and the square wave signal performs an OR logic operation with itself through an inverter, Get a square wave signal OUT with adjustable pulse width, which is the output signal of the circuit.
  • the pulse width of the OUT signal is controlled by the gate bias voltage Vm of the P1 transistor. Therefore, when the circuit works at a certain clock frequency, it can output a pulse sequence in the time domain according to the photon signal sensed by the SPAD device.
  • the signal Ai of the above-mentioned 9 4-bit register files input as the excitation signal of the interconnection component and the signal Wj, k of the 36 pulse width modulation generators input as the weight signal of the interconnection component are provided.
  • Way. 9 signals Ai and 9 signals Wj, k perform logical AND operations to obtain 9 output signals. Since there are 36 signals Wj, k in total, it means that there are 4 ⁇ 9 output signals of logical AND operations.
  • the 36 The output signal is the input signal of the subsequent computing components. Four computing components are required, and each computing component is provided with 9 signal input ports to connect to the input signals of the interconnection components.
  • connection mode of the circuit and the process of signal input and output are exemplified mainly by taking the number of switch units as an example.
  • any number of switch units can be set according to the needs of the actual application scene, and no limitation is set here.
  • the circuit of the embodiment of the present invention may include 4 or 5 switch units; in another example, the circuit of the embodiment of the present invention Can include 8 switch units.
  • the specific structure of the circuit and the connection manner between the switch units may be similar to the examples shown in the above embodiments 1-4, and will not be repeated here.
  • steps and devices in the various embodiments herein are not limited to implementation in a certain embodiment. In fact, some steps and some devices related to the various embodiments herein can be combined according to the concept of the present invention. New embodiments are contemplated and are within the scope of the invention.
  • the means may comprise various hardware and/or software components and/or modules, including but not limited to circuits, application specific integrated circuits (ASICs) or processors.
  • ASICs application specific integrated circuits
  • a general purpose processor digital signal processor (DSP), ASIC, field programmable gate array signal (FPGA) or other programmable logic device (PLD), discrete gate or Transistor logic, discrete hardware components, or any combination thereof to implement or perform each of the illustrated logic blocks, modules and circuits.
  • DSP digital signal processor
  • FPGA field programmable gate array signal
  • PLD programmable logic device
  • a general-purpose processor may be a microprocessor, but in the alternative, the processor may be any commercially available processor, controller, microcontroller, or state machine.
  • a processor may also be implemented as a combination of computing devices, eg, a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors cooperating with a DSP core, or any other such configuration.
  • a software module may reside in any form of tangible storage medium.
  • storage media include random access memory (RAM), read only memory (ROM), flash memory, EPROM memory, EEPROM memory, registers, hard disks, removable disks, CD-ROMs, and the like.
  • RAM random access memory
  • ROM read only memory
  • flash memory EPROM memory
  • EEPROM memory EEPROM memory
  • registers hard disks, removable disks, CD-ROMs, and the like.
  • a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
  • a software module may be a single instruction, or many instructions, and may be distributed over several different code segments, between different programs and across multiple storage media.
  • Methods of the invention herein comprise one or more acts for carrying out the described methods.
  • the methods and/or actions may be interchanged with one another without departing from the scope of the claims.
  • the order and/or use of specific actions may be modified without departing from the scope of the claims.
  • a storage media may be any available tangible media that can be accessed by a computer.
  • such computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices or may be used to carry or store instructions or data structures in the form of desired program code and any other tangible medium that can be accessed by a computer.
  • disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc.
  • a computer program product can perform the operations presented herein.
  • a computer program product may be a computer-readable tangible medium having instructions tangibly stored (and/or encoded) thereon that are executable by one or more processors to perform the operations described herein.
  • a computer program product may include packaging materials.
  • Software or instructions may also be transmitted via transmission media.
  • software may be transmitted from a website, server, or other remote source using a transmission medium such as coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, or microwave.
  • a transmission medium such as coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, or microwave.
  • modules and/or other appropriate means for performing the methods and techniques described herein may be downloaded and/or otherwise obtained by user terminals and/or base stations as appropriate.
  • a device can be coupled to a server to facilitate the transfer of the means for performing the methods described herein.
  • the various methods described herein can be stored via the (e.g. RAM, ROM, physical storage media such as CD or floppy disk etc.) are provided so that user terminals and/or base stations have access to various methods when coupled to or providing storage means to the device.
  • any other suitable technique for providing the methods and techniques described herein to a device may be utilized.

Landscapes

  • Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Analogue/Digital Conversion (AREA)

Abstract

La présente invention divulgue un procédé de fonctionnement et une unité de fonctionnement. Selon le procédé, des signaux d'entrée sont M signaux d'excitation binaires, et M unités de poids à N bits sont délivrées, chaque unité de poids étant composée d'un générateur de forme d'onde à modulation de largeur d'impulsion (PWM) ; M formes d'onde PWM sont obtenues après qu'une opération ET est effectuée sur les signaux d'entrée, et les M formes d'onde PWM sont entrées dans un circuit de comptage à neurone linéaire ; le circuit de comptage à neurone linéaire a M extrémités d'entrée ; lorsque le circuit de comptage à neurone linéaire reçoit les M formes d'onde PWM, un neurone linéaire libère des impulsions de fréquences correspondantes en fonction du nombre de niveaux hauts (ou de niveaux bas) des signaux d'entrée ; un compteur compte les impulsions libérées dans un certain temps T ; et un résultat de comptage est un résultat d'opération d'une unité d'opération de multiplication-accumulation.
PCT/CN2023/099468 2023-06-09 2023-06-09 Procédé de fonctionnement et unité de fonctionnement WO2023160735A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/099468 WO2023160735A2 (fr) 2023-06-09 2023-06-09 Procédé de fonctionnement et unité de fonctionnement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/099468 WO2023160735A2 (fr) 2023-06-09 2023-06-09 Procédé de fonctionnement et unité de fonctionnement

Publications (2)

Publication Number Publication Date
WO2023160735A2 true WO2023160735A2 (fr) 2023-08-31
WO2023160735A3 WO2023160735A3 (fr) 2024-03-21

Family

ID=87764894

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/099468 WO2023160735A2 (fr) 2023-06-09 2023-06-09 Procédé de fonctionnement et unité de fonctionnement

Country Status (1)

Country Link
WO (1) WO2023160735A2 (fr)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033759A (zh) * 2019-12-09 2021-06-25 南京惟心光电系统有限公司 脉冲卷积神经网络算法、集成电路、运算装置及存储介质
CN113033792A (zh) * 2019-12-24 2021-06-25 财团法人工业技术研究院 神经网络运算装置及方法
CN111445012B (zh) * 2020-04-28 2023-04-18 南京大学 一种基于fpga的分组卷积硬件加速器及其方法
CN113902097A (zh) * 2021-09-30 2022-01-07 南京大学 针对稀疏化cnn神经网络模型的游程编码加速器及方法

Also Published As

Publication number Publication date
WO2023160735A3 (fr) 2024-03-21

Similar Documents

Publication Publication Date Title
Wang et al. A deep neural network accelerator based on tiled RRAM architecture
CN110543933B (zh) 基于flash存算阵列的脉冲型卷积神经网络
US20210216280A1 (en) Switched capacitor vector-matrix multiplier
CN115390789A (zh) 基于磁隧道结计算单元的模拟域全精度存内计算电路及方法
Pourmeidani et al. Probabilistic interpolation recoder for energy-error-product efficient DBNs with p-bit devices
CN113364462B (zh) 模拟存算一体多比特精度实现结构
US20220108736A1 (en) Timing-based computer architecture systems and methods
WO2023116923A1 (fr) Dispositif intégré de stockage et de calcul et procédé de calcul
Alam et al. Stochastic computing in beyond von-neumann era: Processing bit-streams in memristive memory
WO2023160735A2 (fr) Procédé de fonctionnement et unité de fonctionnement
Cherupally et al. Leveraging noise and aggressive quantization of in-memory computing for robust dnn hardware against adversarial input and weight attacks
US20190294957A1 (en) Arithmetic device and arithmetic method
Wang et al. A novel RRAM-based adaptive-threshold LIF neuron circuit for high recognition accuracy
CN112002365B (zh) 基于多比特非易失存储器的并行逻辑运算方法及全加器
US11475288B2 (en) Sorting networks using unary processing
JP2022045335A (ja) メモリデバイス及びその動作方法
CN113988279A (zh) 一种支持负值激励的存算阵列输出电流读出方法及系统
CN115756388B (zh) 多模式存算一体电路、芯片及计算装置
Natsui et al. MTJ-based nonvolatile ternary logic gate for quantized convolutional neural networks
CN117636945B (zh) 5bit带符号位的同或与同或累加运算电路、CIM电路
Wei et al. A neuromorphic core based on threshold switching memristor with asynchronous address event representation circuits
CN113724764B (zh) 基于非易失存储器的乘法装置
Bagchi et al. O (1) time sorting algorithms using spiking neurons
US11462268B1 (en) Generating hash codes for locality sensitive hashing
CN116486857B (zh) 一种基于电荷再分配的存内计算电路

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23759356

Country of ref document: EP

Kind code of ref document: A2