WO2023160735A2

WO2023160735A2 - Operation method and operation unit

Info

Publication number: WO2023160735A2
Application number: PCT/CN2023/099468
Authority: WO
Inventors: 闫锋; 吴天泽; 岳涛; 王凯
Original assignee: 南京大学
Priority date: 2023-06-09
Filing date: 2023-06-09
Publication date: 2023-08-31
Also published as: WO2023160735A3

Abstract

Disclosed in the present invention are an operation method and an operation unit. According to the method, input signals are M binary excitation signals, and M N-bit weight units are outputted, wherein each weight unit is composed of a pulse width modulation (PWM) waveform generator; M PWM waveforms are obtained after an AND operation is performed on the input signals, and the M PWM waveforms are inputted into a linear neuron counting circuit; the linear neuron counting circuit has M input ends; when the linear neuron counting circuit receives the M PWM waveforms, a linear neuron releases pulses of corresponding frequencies according to the number of high levels (or low levels) of the input signals; a counter counts the released pulses within a certain time T; and a counting result is an operation result of a multiply-accumulate operation unit.

Description

A computing method and computing unit

technical field

The invention relates to a calculation method and a calculation unit, belonging to the field of integrated circuits.

Background technique

The convolution acceleration chips of the traditional convolutional neural network, such as FPGA, GPU, NPU and other chips, need to use digital modules such as multipliers and accumulators to realize linear operations such as convolution and full connection. The calculation method and device based on the multiplication and addition operation can realize the MAC operation in the traditional neural network based on the linear neuron circuit. Under the silicon-based technology of the same node, the neuron morphology circuit of the present invention only needs about 10 transistors to be realized, but an 8-bit multiplier requires tens of thousands of transistors. Therefore, in terms of power consumption, area, etc., the present invention Compared with the multipliers and accumulators used in traditional convolutional neural network acceleration chips to realize MAC operations, the published calculation method and chip device for multiply-accumulate operations have great advantages.

In addition, with the shrinking of silicon-based technology, the speed difference between computing and storage is getting bigger and bigger, and a lot of energy and time are lost in the process of data transfer. This is the storage wall problem caused by the von Neumann bottleneck. The traditional von Neumann computing architecture is difficult to meet the needs of future massive data processing in terms of speed and power consumption. Therefore, a new computing architecture of "integrated storage and computing" has emerged as the times require. The integrated computing of storage and computing means that storage is completed while computing. This new computing architecture does not require a lot of data movement costs, and it is expected to completely break through the limitations of the storage wall and greatly improve high-throughput data processing capabilities. At present, the relatively mature and mainstream implementation methods of "integrated storage and computing" in the industry include Flash-based integrated storage and computing, RRAM and MRAM-based integrated storage and computing, etc. Taking the mature storage and computing based on floating gate devices as an example, the off-chip weight A single MAC operation can be completed with the input excitation through a single floating gate device, which is also the key to the concept of "integration of storage and calculation", but this method has certain limitations: the floating gate device completes a single MAC operation, and the operation result is in the form of Output in the form of analog current, which means that the current is input into the I-V conversion module, connected to the ADC, and the signal is read out. However, the multiplication and addition operation in the analog domain of the floating gate device array often brings problems such as low signal-to-noise ratio and IR-Drop, and the design of high-precision ADC is relatively complicated, and the ADC will also bring constraints such as large area and large power consumption. Therefore, in comparison, the ASIC chip device for multiplication and addition operations announced by the present invention also uses off-chip digital excitation input and off-chip weight input to realize the digital readout of the analog domain pulse voltage output by neurons, thereby completely avoiding the above limitations.

Contents of the invention

In view of the limitations existing in the above background technology, the object of the present invention is to provide a method and device for computing.

At present, in the field of hardware acceleration of artificial intelligence algorithms, common hardware-based multiplication and accumulation solutions are as follows: CPU operations, GPU operations, FPGA accelerated operations, and new types of "integrated storage and calculation" architecture operations such as RRAM and Flash. The CPU has no obvious advantage in speed over other types of operations in accelerating neural network operations, which is related to the working principle of the CPU itself. Generally, when GPU and FPGA operations are used to accelerate linear operations in neural networks, multiplication accumulators designed with digital logic circuits are used. Therefore, when pursuing lower power consumption and higher computing power, the chip The design process has certain requirements, and the smaller the node process, the greater the improvement in computing performance. However, this has brought about a drawback that the research and development cost is also getting higher and higher. Well, data in the industry have shown that under the same process node, the neural network accelerator using the digital-analog hybrid design is superior to the neural network accelerator designed only using digital logic circuits in terms of the key index of computing energy efficiency (computing power/power consumption). Network accelerators, therefore, the new RRAM, Flash and other "storage-computing integration" architecture operations have already occupied a place in the field of AI algorithm acceleration, but this "storage-computing integration" computing architecture is in terms of floating-gate device storage-computing integration operations. When a single device completes a single MAC operation, the input is a digital control signal, and the digital control signal is converted into a corresponding analog voltage control signal through the level shift module, which is applied to the port of the device, so that the device operates according to its own electrical and physical characteristics. Operation, the result of the operation is output in the form of analog current, which means that the current is input into the I-V conversion module and connected to the ADC to realize the readout of the signal. However, the multiplication and addition operation in the analog domain of the floating gate array often brings problems such as low signal-to-noise ratio and IR-Drop, and the design of high-precision ADC is relatively complicated, and the ADC will also bring constraints such as large area and large power consumption. Therefore, the present invention provides a technical solution for multiplication and accumulation operations and the corresponding digital-analog hybrid circuit computing device, which avoids the current digital-analog hybrid computing architecture of "integration of storage and calculation" at the level of program design. Due to the limitations faced, the input of the device can generally use the grayscale binary data of the image as the excitation input and the output waveform of the pulse width modulator with the binary signal as the input as the weight input. A binary signal with a certain pulse width, that is, "0" and "1" in the digital domain, the analog linear neuron circuit contained in the device only needs to accept the switching signal of "0" or "1" in the digital domain, and then in the analog circuit Corresponding operations such as integration and discharge are performed internally to output pulses with a certain frequency and a certain duty cycle. Then, within the unit multiplication and addition operation time, the number of pulses counted by the digital counter represents the digital signal "1 " (or "0") to form a mapping relationship. This is also the core of the computing method provided by the present invention, and the working principle on which the corresponding digital-analog hybrid computing device is based. The following is a detailed introduction of an embodiment of the computing device corresponding to the computing method provided by the present invention.

According to an embodiment of the present invention, there is provided an operation method, which is characterized in that it includes at least one of an electrical signal input component and an optical signal generation component, as well as a mapping component, an interconnection component, a calculation component, and an output component, and: the The electrical signal input component includes at least one electrical signal A input terminal for inputting the electrical signal A; The optical signal generating component is used to shoot for the target field of view, or accepts the optical signal of other optical signal shooting or generating components to generate the optical signal B; the mapping component is connected with the electrical signal A input component and/or optical signal respectively The signal generating component is connected to receive the electrical signal A of the electrical signal input component and/or the optical signal B of the optical signal generating component, and map the electrical signal A and/or optical signal B to generate signals representing the electrical signals respectively. A digital signal of the value of the signal A and/or optical signal B; the interconnection component selects part of the signals from the digital signals output by the electrical signal A and/or optical signal B through the mapping component as the excitation input part, and the rest As the weight input part, the excitation input part and at least part of the weight input part are cross-linked to form a multiply-accumulate (MAC) architecture, and the required subset is taken for the fully connected structure in the neural network algorithm of machine learning , to generate an operation corresponding to the digital signal after the electrical signal A and the optical signal B pass through the mapping component, and the output of the subset includes the required operation results of the excitation input part and the weight input part.

Wherein, the calculation component includes a switch unit, an integration unit, a reset unit, an integral discharge unit, and a pulse release unit, which are connected to the interconnection component and receive the calculation results of the input part and the weight part from the interconnection component. , to obtain the required target amount of results.

Wherein, based on the fully-connected architecture, the weight value is input, and a fully-connected sub-architecture corresponding to the required operation under the multiply-accumulate architecture is generated from the set of weight values. Under this sub-architecture, a set of excitation The signal is processed to obtain the required target value, wherein: the operation is performed on the weight signal K and the excitation signal, and: the number of the weight signal K is K=N×M, where M is set as the The number of output signals of the mapping component of the excitation input signal of the fully connected operation in the algorithm, M is 0 or a positive integer; N is the number of interconnection components connected to the computing components in the computing device, and N is 0 Or a positive integer; the output signal of the mapping component that performs the weight input signal of the full connection operation is arranged as an N×M matrix, and the M elements in each row in the matrix are set as the M in the algorithm The output signal of the mapping component of the excitation input signal of the full connection operation is correspondingly subjected to a logical AND operation, and the output signal of the interconnection component is used as the input signal of the calculation component of the corresponding row.

Wherein, the interconnection component uses the convolution connection in the algorithm to combine the data from the mapping component, and: when there are multiple output signals of the mapping component, the number of output signals of the mapping component is set to For Z, divide the Z signals into two groups, wherein one group is set as Q, and the other group is set as P, and Z=Q+P; and, the output signals of Q mapping components are set as machine learning algorithms The excitation input signal of the convolution operation in, the output signals of the P mapping components are set as the weight input signals of the convolution operation in the machine learning algorithm; and the method also includes, arranging the output signals of the Q mapping components as A matrix of √Q×√Q (√Q is an integer), arrange the output signals of P mapping components into a matrix of √P×√P (√P is an integer, √P≤√Q); make √P×√ The matrix of P and the sub-matrix of size √P×√P in the matrix of √Q×√Q are multiplied according to the one-to-one correspondence of all elements in the matrix All operations are performed in parallel in the time domain at the same time; the output signals of all operations are output as the output signals of the interconnection components, and the output signals of the interconnection components are input to the calculation In the component, it is used as the input signal of the computing component.

Wherein, the method of the embodiment of the present invention is used to perform the multiplication and addition operation of Q input signals under the condition of Q n-bit weight units, wherein: 1) the electrical signal input component and/or the optical signal capturing component are Q 1, used to input the expected Q excitation input signals, 2) the weight units are Q in total, used to input the expected Q weight input signals, the value of each weight is represented by n bits, 3 ) performing an AND operation on the Q binary excitation signals and Q n-bit weight signals respectively, and then obtaining the values of the Q binary excitation signals modulated by the Q pulse width modulation waveforms, and combining these The value is input into the calculation component (linear neuron counting circuit), and the multiplication and addition operation result of multiplying and accumulating Q input signals by Q n-bit weight signals is obtained.

Wherein, the mapping method of the mapping component is a binary format, a pulse width modulation format, a format that maintains the output of the original input component and/or the camera component, an analog-to-digital conversion format, a random number format, and the like.

Wherein, the electrical signal A or the optical signal B and the weight part are mapped in binary form, and the operation is a bitwise OR operation of binary numbers.

Wherein, the result of the target quantity obtained by the calculation component is: a voltage pulse signal with a certain frequency and a duty ratio; The counted number of pulses.

Wherein, the electrical signal A or the optical signal B is mapped in a binary form, and before and after the logical AND operation with the weight part is performed, a shift operation is performed on the mapped binary signal.

Among them, when the electrical signal A or optical signal B is mapped in the form of pulse width modulation, the high-level duration of the pulse signal is determined by the size of the signal itself, and the high-level signal is a continuous signal without low-level signal interruption in the middle.

Wherein, the electrical signal A or the optical signal B is mapped in the form of random numbers, and the signal is mapped to a random pulse sequence that releases a high-level signal subject to a certain probability, wherein the probability is determined by the size of the signal itself.

Wherein, the electrical signal A or the optical signal B is an analog current signal, and the signal directly skips the mapping component and the interconnection component, and is connected to the input end of the integration unit of the computing component. The output result is used to characterize the electrical signal A or optical signal B output by the input component.

According to another embodiment of the present invention, a kind of operation method is provided, it is characterized in that, take multiplication and accumulation operation as the basic operation module, and decompose the linear operation involved in the machine learning algorithm to be carried out into multiplication and accumulation operation as The operation paradigm of the unit, the nonlinear operation involved is expressed by the superposition of linear operation, and the linear operation unit involved is further decomposed into the operation paradigm with multiplication and accumulation operation as the unit, and the operation paradigm is set according to the operation paradigm The connection structure of computing components, each multiplication and accumulation operation module in the connection structure 1) adopts an excitation signal input component, and 2) adopts a weight signal input component, and also includes, adopts a mapping component, a calculation component, and an output component , and the mapping component is connected to the excitation signal input component and the weight signal input component respectively, and maps the excitation signal and the weight signal, and maps the mapped excitation signal and the mapped weight signal Carry out logical AND operation, input the operation result to the calculation component, and the calculation component performs combined operation with multiplication and addition operation as the unit according to the operation paradigm, to obtain the required target quantity result.

According to one embodiment of the present invention, there is provided a computing unit, including at least one of an electrical signal input component and an optical signal generating component, characterized in that: the electrical signal input component includes at least one electrical signal A input terminal for input An electrical signal analog quantity expressing the electrical signal A; an optical signal generation component, used for shooting a target field of view, or receiving an optical signal from another optical signal component, and generating an optical signal B analog quantity, the optical signal B It includes image data with a certain bit width, or a voltage pulse sequence in a time domain, or an analog current of a certain magnitude; it also includes a mapping component, which is respectively connected to the electrical signal input component and/or the optical signal generation component, and mapping the electrical signal A and/or optical signal B to generate digital signals respectively representing the values of the electrical signal A and optical signal B; the interconnection components adopt a plurality of electrical and/or optical paths and cross-linked structures, The structure corresponds to the calculation performed on the digital signal of the electrical signal A and/or optical signal B after the mapping component; the calculation component is connected to the interconnection component, and accepts the electrical signal from the interconnection component A and/or optical signal B are corresponding digital signals, and the computing components include electrical and/or optical connection structures, the electrical and/or optical connection structures include appropriate switching units, and integration units, reset units, discharge The unit and the pulse release unit are used to calculate the signal output by the interconnection component and output the result.

Wherein, the computing unit of the embodiment of the present invention is used to perform a multiply-accumulate (MAC) operation, and it is characterized in that: the interconnection component adopts the fully connected structure in the algorithm on the machine learning neural network to combine the data from the mapping component The processing also includes: 1) the number of output signals of the mapping component is set to K, and K=N×M, where M is set as the mapping component of the excitation input signal of the fully connected operation in the machine learning algorithm The number of output signals, N is the number of interconnection components connected to the calculation components in the computing device; 2) the mapping component includes N×M output terminals, and the output, the N×M matrix M elements in each row in are combined in a "point-to-point" manner with the output signals of M mapping components that are set as excitation input signals of fully connected operations in the machine learning algorithm, and logical AND operations are performed, and 3) The output terminal of the interconnection component is connected to the input terminal of a corresponding row of the calculation components, so that the output signal of the interconnection component is used as the input signal of the calculation component of the corresponding row.

Wherein, the computing unit of the embodiment of the present invention is used to perform a multiply-add (MAC) operation, and it is characterized in that: the interconnection component adopts the convolution connection in the machine learning algorithm to combine the data from the mapping component, and : 1) A plurality of mapping components have corresponding output terminals and output corresponding output signals, and the number of output signals of the mapping components is Z, and the mapping components associated with the Z signals are divided into two groups, one of which is Q, The other group is set to P, and Z=Q+P; and, the output signals of the Q mapping components are the excitation input signals of the convolution operation, and the output signals of the P mapping components are the weights of the convolution operation Input signal; 2) the output signals of the Q mapping components are arranged into a matrix of √Q×√Q (√Q is an integer), and the output signals of the P mapping components are arranged into a matrix of √P×√P (√P is an integer, √P≤√Q); Let the matrix of √P×√P and the sub-matrix of √P×√P in the matrix of √Q×√Q be multiplied according to the one-to-one correspondence of all elements in the matrix All operations are performed in parallel in the time domain at the same time; and the horizontal or vertical interval between adjacent two in all operations is step S, and the S is a non-negative integer, And 4) the interconnection component includes an output terminal, and the output signal is used as an input signal of the calculation component.

Wherein, the computing unit of the embodiment of the present invention is used to perform multiplication and addition operations of M input signals under the condition of M n-bit weight units, and is characterized in that: 1) the total number of the electrical signal input components and the optical signal capture components M, the M input terminals are respectively used to input the desired M excitation input signals, 2) the M weight input terminals, the input quantities are respectively used to represent n-bit weight units; and, the M binary excitation signals are logically ANDed with M n-bit weight units respectively, and then the values of the M binary excitation signals modulated by the N pulse width modulation waveforms are obtained, and these values are input into In the calculation component composed of a linear neuron counting circuit, the multiplication and addition operation results obtained by multiplying and accumulating M excitation input signals by M n-bit weight units are obtained.

Wherein, the type of the mapping component is one of binary type, pulse width modulation type, analog-to-digital conversion type, and random number type.

Wherein, the electrical signal input component can be a kind of general-purpose input and output interface of SOC, ASIC, FPGA class chip, or be a kind of of register file (Register File), SRAM, DRAM, RRAM class storage unit, or be used for Other components that take an input of an analog current signal or an analog voltage signal.

Among them, the optical signal generating component is an image sensor chip, including a CIS chip, a CCD chip, a single photon avalanche detection diode array, etc., which are used to perceive the optical signal and convert the optical signal into a binary image signal with a certain bit width or a period of time domain pulse. sequence, or an analog current signal.

Wherein, the mapping component is used to map the output signal of the input component as described in the embodiment of the present invention. When the output signal of the input component is a binary signal, the mapping component is equivalent to the input component, that is, no change is made to the input component at this time. The mapping component is replaced by the input component; when the output signal of the input component is an analog voltage signal, the mapping component can be an analog-to-digital conversion device at this time, and the mapping component is connected to the subsequent interconnection component, or the mapping component is input component, at this time the output signal of the mapping component is the output signal of the input component, and the output signal is directly used as the input signal of the computing component for the computing device; when the output signal of the input component is an analog current signal At this time, the analog current signal skips the mapping component and the interconnection component, and is connected to the integration unit of the computing component in the computing unit according to the embodiment of the present invention, and the output of the computing component can represent the magnitude of the input analog current signal.

Wherein, the mapping component can be a pulse width modulation waveform generating device, which outputs a number of high-level signals corresponding to the size of the output binary signal of the input component, and the high-level signal is a continuous signal without low voltage in the middle. Flat signal isolation; the duration of the output signal of the device is determined by the bit width of the output binary signal of the input component.

Wherein, the mapping component is a random pulse sequence generating device, the random pulse sequence generating device includes a linear feedback shift register and a comparator, and is set to, when the output signal of the input component is a binary signal, the binary signal is used as The initial value of the linear feedback shift register; and in each clock cycle, the linear feedback shift register is set to, its output value is compared with its initial value input to the comparator, if the initial value is greater than the current linear feedback shift register The output value of the comparator is 1, otherwise the output of the comparator is 0; further, the comparator is set to output a random pulse sequence whose occurrence probability of a high-level signal is determined by the output binary signal of the input component, wherein the pulse sequence The length of is determined by the bit width of the output binary signal of the input component.

Among them, the switch unit of the calculation component is set to accept the output signal of the interconnection component, and output a signal determined by the switch state according to the output signal of the interconnection component; the reset unit is set to be responsible for the reset function of the calculation component, that is, when the reset signal works, the calculation The output signal of the component is zero, and the integration unit is used to integrate the output signal of the switch unit. When the integration signal reaches the threshold of the integration unit, the pulse release unit starts to release the pulse, and at the same time, the discharge unit discharges the integration signal .

Among them, the computing component exhibits a linear relationship between the output signal and the input signal; when the input signal is a binary signal, the binary signal is "1", the switch is on or off, and the binary signal is "0", the switch state is off Or open, and the above two switch states are opposite states, and the number of the switches opened is linearly related to the output signal, where the output signal is a pulse signal with a certain frequency and duty cycle.

Among them, the characteristic of the calculation component is that the output signal has a linear relationship with the input signal, and when the input signal is an analog current signal, the input signal does not need to pass through the switch unit, but directly enters the integration unit, and the pulse frequency of the output signal is the same as The magnitude of the input analog current signal has a linear relationship.

Wherein, the output component is a digital counter whose input signal is a clock signal and/or a reset signal, or a pulse releasing unit of the calculation component.

Wherein, the operation unit satisfies the number of high levels (low levels) of the input signal of the calculation component and the The number of pulses counted by the counting unit of the output component has a linear relationship, and the calculation component does not need to include all the units contained in the calculation component in the calculation unit as described in the embodiment of the present invention, and is also applicable to the implementation of the present invention Example of the arithmetic unit.

Technical scheme of the present invention is as follows:

An operation method and an operation device, the operation method is divided into two steps: step 1, mapping signal X and signal Y to obtain the mapping form of signal X and signal Y; step 2, targeting signal X and signal Y The mapping form of is calculated to obtain the signal Q, wherein the signal Q is set as the algebraic operation result of the signal X and the signal Y.

According to one embodiment, signal X is mapped in binary form and signal Y is mapped in pulse width modulated form.

According to one embodiment, a logical AND operation is performed on the mapped signals of the signal X and the signal Y.

According to an embodiment, the signal Q is in the form of a pulse signal with a certain frequency and a duty cycle.

According to an embodiment, when the signal X is mapped in a binary form, the operation further includes a shift operation of the binary signal itself before and after the execution of the logical AND operation.

According to an embodiment, when the signal Y is mapped in the form of pulse width modulation, the high-level duration of the pulse signal is determined by the signal itself, wherein the high-level signal is a continuous signal, and there is no interruption of the low-level signal in the middle.

According to an embodiment, the signal after the operation of the signal X and the signal Y is subjected to a mapping to obtain the signal Q.

The computing device includes: an input component, which is used for the input of signal X or signal Y, and which is connected to the mapping component in the device for calculation; a shooting component, which is used for shooting and quantifying the target field of view, to Obtain image data with a certain positioning width, a period of time domain pulse sequence or an analog current of a certain magnitude as signal X or signal Y, when the photographing component outputs image data with a certain positioning width or a period of time domain pulse sequence, the component Connected to the mapping component in the device for calculation, when the shooting component outputs an analog current signal of a certain size, make the analog current signal skip the mapping component and the interconnection component, and connect to the integration unit of the calculation component, At this time, the output of the calculation component can directly represent the magnitude of the analog current signal output by the shooting component; the mapping component is used for mapping the signal X or the signal Y, so as to map the signal X or the signal Y as described in the embodiment of the present invention The form is mapped; the interconnection component is used to perform the operation as described in the embodiment of the present invention on the signals of the signal X and the signal Y after passing through the mapping component. When there are multiple mapping components, these mapping components need to be operated according to a certain combination scheme as described in the embodiment of the present invention, and the components are connected to the computing components used in the computing device; the computing components, the components are used It is used to perform calculations on signals output by interconnected components. The assembly generally includes a switch unit, an integration unit, a reset unit, a discharge unit and a pulse release unit, the assembly is connected to the output assembly for the computing device;

The output component is used to collect the output signal of the calculation component, and generally consists of a counting unit.

According to one embodiment, the input component of signal X is a register file, and the input component of signal Y is an ASIC core slice I/O.

According to one embodiment, the signal X is used as an excitation input unit with a bit width of 4 bits, and the mapping component of the signal X is directly replaced by the input component of the signal X, that is, the output signal of the mapping component of the signal X is the input component of the signal X The output signal of the signal; the signal Y is used as a weight unit, and its bit width is 4 bits. The mapping component of the signal Y is a pulse width modulation waveform generator, and the 4-bit weight unit is generated by the pulse width modulation waveform generator. Wave form.

According to one embodiment, the signal X is 1024 4-bit register files, which are input as the excitation signal of the interconnection components; the signal Y is 9 pulse width modulation generators, which are input as the weight signals of the interconnection components, and the interconnection scheme uses a machine learning algorithm In the convolution connection method, the output signals of 1024 mapping components of signal X are arranged into a 32×32 matrix, and the output signals of 9 mapping components of signal Y are arranged into a 3×3 matrix. By performing a "zero padding" operation around the matrix in which the output signals of the 1024 signal X mapping components are arranged, the 3×3 matrix in which the output signals of the 9 signal Y mapping components are arranged, the signal X The arrayed matrix adopts a "point-to-point" combination scheme with full coverage of all elements, a scale of 3×3, and a step of 1 for logical AND operations, and the 1024×9 output signals obtained from the operations are used as the computing components of the device for computing input signal.

According to one embodiment, the signal X is nine 4-bit register files, which are input as the excitation signal of the interconnection components; the signal Y is 36 pulse width modulation generators, which are input as the weight signals of the interconnection components, and the interconnection scheme uses a machine learning algorithm In the fully connected mode, the output signals of the nine signal X mapping components and the output signals of the nine signal Y mapping components are logically ANDed. Since the number of output signals of the signal Y mapping components is 36, the same Within a time period, the logical AND operation needs to be executed 4 times in parallel, and 36 output signals obtained from the 4 logical AND operations are used as input signals of the computing component of the computing device.

According to an embodiment, a single computing component is composed of 9 switching units, an integrating unit, a reset unit, a bleeding unit and a pulse releasing unit. In this embodiment, there are 1024 computing components in total, that is, there are 1024×9 input signals, and the input signals are 1024×9 output signals of the interconnection components. The output signals of 1024 calculation components are 1024 kinds of pulse signals with their own frequencies and duty ratios.

According to one embodiment, 1024 kinds of pulse signals with respective frequencies and duty ratios output by 1024 computing components are input to 1024 output components, and the output components are digital counters, and the counters are within the time period during which the pulse width modulation waveform generator generates waveforms, Count the input pulse signal with certain frequency and duty cycle. The data counted by 1024 digital counters is the result of convolution operation between signal X and signal Y.

The invention provides a calculation method and device, which belong to the field of integrated circuits, and can realize operations such as full connection and convolution in neural network operations. The ASIC chip used for computing provided by the present invention compares in area and power consumption The multipliers and accumulators used by traditional convolutional neural network acceleration chips to implement MAC operations have great advantages.

Description of drawings

Fig. 1 is a block diagram of a device for computing in an embodiment of the present invention;

Fig. 2 is an effect diagram of performing pulse width modulation form mapping on the output signal of the input component in the present invention;

Fig. 3 is a kind of circuit block diagram that is used for computing component in the embodiment 1 of the present invention;

Fig. 4 is a kind of circuit block diagram that is used for computing component in embodiment 2 of the present invention;

Fig. 5 is a circuit block diagram of a photographing component for realizing output of a time-domain pulse sequence in Embodiment 3 of the present invention;

Fig. 6 is a schematic diagram of a full-connection connection method used to realize the weight signal input and the excitation signal input of interconnection components in Embodiment 4 of the present invention.

Detailed ways

The calculation method and device according to the embodiments of the present invention will be described below with reference to the accompanying drawings. In the drawings, like reference numerals denote like elements throughout. It should be understood that the embodiments described herein are illustrative only and should not be construed as limiting the scope of the present invention.

Example 1

In this embodiment, a specific implementation circuit block diagram of the calculation component in the above-mentioned apparatus for calculation is provided, as shown in FIG. 3 .

The circuit consists of 9 PMOS transistors as switching units, wherein the gate terminals of the PMOS transistors are connected to the output signals of the interconnection components in the computing device, the source terminals of the 9 PMOS transistors are fixedly biased, generally set as IDC, and the drains of the 9 PMOS transistors are The terminal is connected to the inverting input terminal of an operational amplifier, the non-inverting input terminal of the operational amplifier is connected to the reference voltage V _ref , the output terminal of the operational amplifier is connected to the gate terminal of a PMOS transistor, and the source terminal of the PMOS transistor is connected to the negative terminal of the operational amplifier. The phase input terminal, here the function of the operational amplifier is to clamp the voltage of the source terminal of the PMOS tube unchanged, that is, to play a clamping role. The drain terminal of the PMOS tube is connected to the drain terminal of the NMOS tube on a branch of a Cascode current mirror. Through the copy effect of the current mirror, the current on the branch is copied to the upper plate of the capacitor used as the integration unit, and the lower plate of the capacitor Connect to VSS. The upper plate of the capacitor is connected to the drain end of an NMOS reset transistor used to reset the unit, the source end of the transistor is connected to VSS, and the gate end of the transistor is connected to the external reset signal. When the reset signal is high, the computing component Perform a reset operation. The gate of the NMOS discharge tube used for the integral discharge unit is connected to the upper plate of a stable capacitor, the source terminal is connected to VSS, the source terminal of the PMOS extraction tube used to extract the charge of the upper plate of the integral unit capacitor is connected to VDD, and the drain The end is connected with the drain end of the NMOS transistor discharge pipe. The drain terminal of the NMOS discharge tube is connected to two-stage inverters, wherein the output of the first-stage inverter is connected to the gate terminal of the PMOS extraction tube, and the output of the second-stage inverter is connected to the upper plate of the stabilizing capacitor. In addition, the upper plate of the integrating capacitor is additionally connected with a second-level inverter, which is used as a pulse release unit, and the output of the second-level inverter is the output of the embodiment of the calculation component, thereby realizing the pulse release unit The pulse release function.

Example 2

In this embodiment, a specific implementation circuit block diagram of the calculation component in the above-mentioned apparatus for calculation is provided, as shown in FIG. 4 . This embodiment only replaces the Cascode current mirror, operational amplifier, operational amplifier output tube, 9 PMOS switch tubes and other circuit parts used to charge the upper plate of the integral capacitor of the integration unit in the above-mentioned embodiment 1, and retains other parts in the embodiment 1. components of the circuit. The current mirror in the figure has 9 copy branches, and each branch has a transmission gate as a switch unit to control whether the current is turned on or not. The drain terminal of the PMOS transistor is connected to the upper plate of the integrating capacitor to charge the capacitor. The structures and functions of the remaining circuit parts are the same as those in Embodiment 1.

Example 3

In this embodiment, a specific circuit implementation scheme of the above-mentioned photographing component that outputs a time-domain pulse sequence is provided. As shown in Figure 5. When the Gate signal is high, the SPAD device is in an overvoltage state, and the gate signal controls the drain voltage of the N1 tube at the gate terminal to 0V. When the SPAD device receives photons, the device has an avalanche effect. When the Gate signal becomes low , the N1 tube is turned off, and the N1 tube is regarded as a resistor at this time, and its drain terminal gets a divided voltage V. When the divided voltage makes the SPAD not meet the overvoltage working condition, the drain terminal voltage of the N1 tube changes from the divided voltage V to 0, so that a pulse voltage signal is formed at the drain terminal of the N1 tube, and the signal passes through N2, P2 After a buffer composed of , N3, and P3, the pulse voltage signal at the drain end of the N1 tube is converted into a square wave signal S with a certain pulse width, and the square wave signal performs an OR logic operation with itself through an inverter, Get a square wave signal OUT with adjustable pulse width, which is the output signal of the circuit. The pulse width of the OUT signal is controlled by the gate bias voltage Vm of the P1 transistor. Therefore, when the circuit works at a certain clock frequency, it can output a pulse sequence in the time domain according to the photon signal sensed by the SPAD device.

Example 4

In this embodiment, the signal Ai of the above-mentioned 9 4-bit register files input as the excitation signal of the interconnection component and the signal Wj, k of the 36 pulse width modulation generators input as the weight signal of the interconnection component are provided. Way. 9 signals Ai and 9 signals Wj, k perform logical AND operations to obtain 9 output signals. Since there are 36 signals Wj, k in total, it means that there are 4×9 output signals of logical AND operations. The 36 The output signal is the input signal of the subsequent computing components. Four computing components are required, and each computing component is provided with 9 signal input ports to connect to the input signals of the interconnection components.

In the above-mentioned embodiment, the connection mode of the circuit and the process of signal input and output are exemplified mainly by taking the number of switch units as an example. In practical application, in the specific circuit structure of the embodiment of the present invention, any number of switch units can be set according to the needs of the actual application scene, and no limitation is set here. For example, in one example, the circuit of the embodiment of the present invention may include 4 or 5 switch units; in another example, the circuit of the embodiment of the present invention Can include 8 switch units. In the above example, the specific structure of the circuit and the connection manner between the switch units may be similar to the examples shown in the above embodiments 1-4, and will not be repeated here.

Of course, the above specific embodiments are only examples and not limiting, and those skilled in the art can combine and combine some steps and devices from the above separately described embodiments according to the concept of the present invention to realize the effect of the present invention. Embodiments formed by combination and combination are also included in the present invention, and such combination and combination will not be described here one by one.

Note that the advantages, advantages, effects, etc. mentioned in the present invention are only examples rather than limitations, and it cannot be considered that these advantages, advantages, effects, etc. must be possessed by each embodiment of the present invention. In addition, the specific details of the above-mentioned invention are only for the purpose of illustration and understanding, rather than limitation, and the above-mentioned details do not limit the present invention to be realized by using the above-mentioned specific details.

The block diagrams of devices, devices, equipment, and systems involved in the present invention are only illustrative examples and are not intended to require or imply that they must be connected, arranged, and configured in the manner shown in the block diagrams. As will be appreciated by those skilled in the art, these devices, devices, devices, systems may be connected, arranged, configured in any manner. Words such as "including", "comprising", "having" and the like are open-ended words meaning "including but not limited to" and may be used interchangeably therewith. As used herein, the words "or" and "and" refer to the word "and/or" and are used interchangeably therewith, unless the context clearly dictates otherwise. As used herein, the word "such as" refers to the phrase "such as but not limited to" and can be used interchangeably therewith.

The flow charts of the steps in the present invention and the above method descriptions are only illustrative examples and are not intended to require or imply that the steps of the various embodiments must be performed in the order presented. As will be appreciated by those skilled in the art, the order of the steps in the above embodiments may be performed in any order. Words such as "thereafter," "then," "next," etc. are not intended to limit the order of the steps; these words are simply used to guide the reader through the description of the methods. In addition, any reference to an element in the singular, eg, using the articles "a," "an," or "the," is not to be construed as limiting that element to the singular.

In addition, the steps and devices in the various embodiments herein are not limited to implementation in a certain embodiment. In fact, some steps and some devices related to the various embodiments herein can be combined according to the concept of the present invention. New embodiments are contemplated and are within the scope of the invention.

Each operation of the method described above may be performed by any suitable means capable of performing the corresponding function. The means may comprise various hardware and/or software components and/or modules, including but not limited to circuits, application specific integrated circuits (ASICs) or processors.

A general purpose processor, digital signal processor (DSP), ASIC, field programmable gate array signal (FPGA) or other programmable logic device (PLD), discrete gate or Transistor logic, discrete hardware components, or any combination thereof to implement or perform each of the illustrated logic blocks, modules and circuits. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any commercially available processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, eg, a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors cooperating with a DSP core, or any other such configuration.

The steps of a method or algorithm described in connection with the present invention may be embedded directly in hardware, in a software module executed by a processor, or in a combination of both. A software module may reside in any form of tangible storage medium. Some examples of storage media that may be used include random access memory (RAM), read only memory (ROM), flash memory, EPROM memory, EEPROM memory, registers, hard disks, removable disks, CD-ROMs, and the like. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. A software module may be a single instruction, or many instructions, and may be distributed over several different code segments, between different programs and across multiple storage media.

Methods of the invention herein comprise one or more acts for carrying out the described methods. The methods and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of actions is specified, the order and/or use of specific actions may be modified without departing from the scope of the claims.

The described functions may be realized by hardware, software, firmware or any combination thereof. If implemented in software, the functions may be stored as one or more instructions on a tangible computer-readable medium. A storage media may be any available tangible media that can be accessed by a computer. By way of example and not limitation, such computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices or may be used to carry or store instructions or data structures in the form of desired program code and any other tangible medium that can be accessed by a computer. As used herein, disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc.

Accordingly, a computer program product can perform the operations presented herein. For example, such a computer program product may be a computer-readable tangible medium having instructions tangibly stored (and/or encoded) thereon that are executable by one or more processors to perform the operations described herein. A computer program product may include packaging materials.

Software or instructions may also be transmitted via transmission media. For example, software may be transmitted from a website, server, or other remote source using a transmission medium such as coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, or microwave.

In addition, modules and/or other appropriate means for performing the methods and techniques described herein may be downloaded and/or otherwise obtained by user terminals and/or base stations as appropriate. For example, such a device can be coupled to a server to facilitate the transfer of the means for performing the methods described herein. Alternatively, the various methods described herein can be stored via the (e.g. RAM, ROM, physical storage media such as CD or floppy disk etc.) are provided so that user terminals and/or base stations have access to various methods when coupled to or providing storage means to the device. In addition, any other suitable technique for providing the methods and techniques described herein to a device may be utilized.

Other examples and implementations are within the scope and spirit of the invention and the appended claims. For example, due to the nature of software, functions described above can be implemented using software executed by a processor, hardware, firmware, hardwiring or any combination of these. Features implementing functions may also be physically located at various locations, including being distributed so that portions of functions are implemented at different physical locations. Also, as used herein, including in the claims, the use of "or" in a listing of an item beginning with "at least one" indicates a separate listing such that for example "at least one of A, B, or C" The enumeration of means A or B or C, or AB or AC or BC, or ABC (ie, A and B and C). Furthermore, the word "exemplary" does not mean that the described examples are preferred or better than other examples.

Various changes, substitutions and alterations to the technology described herein can be made without departing from the teaching of the technology as defined by the appended claims. Furthermore, the scope of the claims of the present invention is not limited to the specific aspects of the process, machine, manufacture, composition of matter, means, method and acts described above. Any process, machine, manufacture, composition of matter, means, method or act, currently existing or later developed, which performs substantially the same function or achieves substantially the same result as the corresponding aspect described herein may be utilized. Accordingly, the appended claims include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or acts.

The above description of aspects of the invention is provided to enable any person skilled in the art to make or use the invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features of the invention herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the invention to the form of the invention herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions and sub-combinations thereof.

Claims

An operation method, characterized in that it includes at least one of an electrical signal input component and an optical signal generation component, as well as a mapping component, an interconnection component, a calculation component, and an output component, and:

The electrical signal input component includes at least one electrical signal A input terminal for inputting the electrical signal A;

The optical signal generating component shoots the target field of view, or accepts the optical signal of other optical signal shooting or generating components to generate the optical signal B;

The mapping component is respectively connected to the electrical signal input component and/or the optical signal generating component, accepts the electrical signal A of the electrical signal input component and/or the optical signal B of the optical signal generating component, and Signal A and/or optical signal B are mapped to generate digital signals respectively representing the values of electrical signal A and/or optical signal B;

The interconnection component selects part of the signals from the multiple digital signals output by the electrical signal A and/or optical signal B through the mapping component as the excitation input part, and the rest as the weight input part, and combines the excitation input part and at least Part of the weight input part is cross-linked to form a multiply-accumulate (MAC) architecture, and the required subset is taken for the fully-connected structure in the neural network algorithm of machine learning, and the corresponding electrical signal A and optical signal B are generated. The operation to be performed on the digital signal after the mapping component, the output of the subset includes the required operation results of the excitation input part and the weight input part.
The method according to claim 1, wherein the calculation component includes a switch unit, an integral unit, a reset unit, an integral discharge unit, and a pulse release unit, which are connected to the interconnection component and accept the input from the interconnection component. , the calculation results of the input part and the weight part to obtain the required target quantity result.
The method according to claim 1, characterized in that, based on the fully connected architecture, the weight value is input, and a fully connected subarchitecture corresponding to the required operation under the multiply-accumulate architecture is generated from the set of weight values , under this sub-architecture, a set of excitation signals is processed to obtain the required target value, wherein: the operation is performed on the weight signal K and the excitation signal, and:

The number of the weight signal K is K=N×M, wherein M is set as the number of output signals of the mapping component of the excitation input signal of the fully connected operation in the algorithm, and M is 0 or a positive integer; N is the number of interconnection components connected to the computing components in the computing device, and N is 0 or a positive integer;

Arranging the output signal of the mapping component of the weighted input signal that performs the full connection operation into an N×M matrix, each row of the matrix has M elements and M is set as the full connection operation in the algorithm incentive to lose The output signal of the mapping component of the input signal is correspondingly subjected to a logical AND operation, and the output signal of the interconnection component is used as the input signal of the calculation component of the corresponding row.
The method of claim 1, wherein the interconnection component combines data from the mapping component using convolutional connections in the algorithm, and:

When there are multiple output signals of mapping components, the number of output signals of mapping components is set to Z, and the Z signals are divided into two groups, one of which is set to Q, and the other set to P, and Z =Q+P; And, the output signals of the Q mapping components are set as the excitation input signals of the convolution operation in the machine learning algorithm, and the output signals of the P mapping components are set as the weights of the convolution operation in the machine learning algorithm input signal; and the method further includes,

Arrange the output signals of Q mapping components into a matrix of √Q×√Q (√Q is an integer), and arrange the output signals of P mapping components into a matrix of √P×√P (√P is an integer, √P ≤√Q); let the matrix of √P×√P and the sub-matrix of size √P×√P in the matrix of √Q×√Q perform logical AND operation according to the one-to-one corresponding dot multiplication of all elements in the matrix, all Operations are performed simultaneously and in parallel in the time domain; the output signals of all operations are output as the output signals of the interconnection components, and the output signals of the interconnection components are input into the calculation components as the calculation The component's input signal.
The computing method as claimed in claim 4, for carrying out the multiplication and addition operation of Q input signals under Q n-bit weight unit conditions, characterized in that:

1) There are Q total of the electrical signal input components and/or optical signal capture components, which are used to input the desired Q excitation input signals,

2) There are Q weight units in total, which are used to input the desired Q weight input signals, and the value of each weight is represented by n bits,

3) performing an AND operation on the Q binary excitation signals and Q n-bit weight signals respectively, and then obtaining the values of the Q binary excitation signals modulated by the Q pulse width modulation waveforms, and adding the These values are input into the calculation component (linear neuron counting circuit), and the result of multiplying and accumulating Q input signals by Q n-bit weight signals and accumulated is obtained.
The method according to claim 1, characterized in that, the mapping method of the mapping component is a binary form, a pulse width modulation form, a form in which the output of the original input component and/or the shooting component remains unchanged, and an analog-to-digital conversion form , random number form, etc.
The computing method according to any one of claims 1 to 5, characterized in that the electrical signal A or the optical signal B and the weight part are mapped in binary form, and the operation is a bitwise OR operation of binary numbers.
The calculation method according to any one of claims 1 to 5, wherein the result of the target quantity obtained by the calculation component is: a voltage pulse signal with a certain frequency and a duty cycle; or, within a certain time T , the number of pulses counted by the counter for the voltage pulse signal with a certain frequency and duty cycle.
The computing method according to any one of claims 1 to 5, wherein the electrical signal A or the optical signal B is mapped in binary form, and the mapped binary signal is processed before and after the logic AND operation with the weight part Perform a shift operation.
The computing method according to any one of claims 1 to 6, wherein when the electrical signal A or the optical signal B is mapped in the form of pulse width modulation, the duration of the high level of the pulse signal is determined by the size of the signal itself, wherein the high level The flat signal is a continuous signal, and there is no low-level signal interruption in the middle.
The computing method according to any one of claims 1 to 6, wherein the electrical signal A or the optical signal B is mapped in the form of random numbers, and the signal is mapped to a random pulse sequence that releases a high-level signal subject to a certain probability, wherein The probability is determined by the magnitude of the signal itself.
The computing method according to claim 1, wherein the electrical signal A or the optical signal B is an analog current signal, and the signal directly skips the mapping component and the interconnection component, and is connected to the integral of the calculation component The input terminal of the unit, at this time, the output result of the calculation component is used to represent the electrical signal A or optical signal B output by the input component.
A kind of operation method, it is characterized in that, take multiplication and accumulation operation as basic operation module, and the linear operation involved in the machine learning algorithm to be carried out is decomposed into the operation pattern with multiplication and accumulation operation as unit, and the involved nonlinear The operation is represented by the superposition of linear operations, and the involved linear operation units are further decomposed into an operation paradigm with multiplication and accumulation operations as the unit.

And according to the operation paradigm, the connection structure of the operation components is set, and each multiplication and accumulation operation module in the connection structure is

1) using an excitation signal input component, and

2) Using a weight signal input component,

It also includes using a mapping component, a calculation component, and an output component, and the mapping component is respectively connected to the excitation signal input component and the weight signal input component to map the excitation signal and the weight signal , perform a logical AND operation on the mapped excitation signal and the mapped weight signal, and input the operation result to the calculation component, and the calculation component performs a combined operation in units of multiplication and addition operations according to the operation paradigm, to obtain the A desired amount of results is required.
An arithmetic unit, including at least one of an electrical signal input component and an optical signal generation component, characterized in that:

An electrical signal input component, including at least one electrical signal A input terminal, for inputting an electrical signal analog quantity expressing the electrical signal A;

The optical signal generating component is used to shoot for the target field of view, or receive the optical signal of another optical signal component, and generate the optical signal B analog quantity, the optical signal B includes image data with a certain positioning width, or a period of time A voltage pulse sequence on a domain, or an analog current of a certain magnitude; also includes,

A mapping component is connected to the electrical signal input component and/or the optical signal generating component respectively, and maps the electrical signal A and/or optical signal B to generate digital values representing the values of the electrical signal A and optical signal B respectively Signal;

The interconnection component adopts a plurality of electrical and/or optical paths and a cross-linked structure, and the structure corresponds to the operation performed on the digital signal of the electrical signal A and/or optical signal B after passing through the mapping component;

A computing component, connected to the interconnection component, accepts a digital signal corresponding to the electrical signal A and/or optical signal B from the interconnection component, and the computing component includes an electrical and/or optical connection structure, so The electrical and/or optical connection structure includes a suitable switch unit, and an integration unit, a reset unit, a discharge unit and a pulse release unit, which are used to calculate the signal output by the interconnection components and output the result.
The arithmetic unit according to claim 14, which is used to perform multiply-accumulate (MAC) operations, wherein the interconnection component adopts the fully connected structure in the algorithm on the machine learning neural network to perform data processing from the mapping component Combined treatment, also includes:

1) The number of output signals of the mapping component is set to K, and K=N×M, where M is the number of output signals of the mapping component that is set as the excitation input signal of the fully connected operation in the machine learning algorithm , N is the number of interconnection components connected to the computing components in the computing device;

2) The mapping component includes N×M output terminals, and the output M elements in each row of the N×M matrix are set as all M elements in the machine learning algorithm in a “point-to-point” manner. Stimulus input for concatenated operations The output signals of the signal mapping component are combined and logically ANDed, and

3) The output terminal of the interconnection component is connected to the input terminal of a corresponding row of the calculation component, so that the output signal of the interconnection component is used as the input signal of the calculation component of the corresponding row.
The computing unit as claimed in claim 14, used for multiplying and adding (MAC) operations, characterized in that:

The interconnection component combines data from the mapping component using convolutional connections in a machine learning algorithm and:

1) A plurality of mapping components have corresponding output terminals and output corresponding output signals, and the number of output signals of the mapping components is Z, and the mapping components associated with the Z signals are divided into two groups, one of which is Q, The other group is set to P, and Z=Q+P; and, the output signals of the Q mapping components are the excitation input signals of the convolution operation, and the output signals of the P mapping components are the weights of the convolution operation input signal;

2) The output signals of the Q mapping components are arranged into a matrix of √Q×√Q (√Q is an integer), and the output signals of the P mapping components are arranged into a matrix of √P×√P (√P is an integer, √P≤√Q); let the matrix of √P×√P and the sub-matrix of size √P×√P in the matrix of √Q×√Q be multiplied according to the one-to-one correspondence of all elements in the matrix Logical AND operations, all operations are performed in parallel in the time domain at the same time; and among all operations performed, the horizontal or vertical interval between adjacent two is step S, and the S is a non-negative integer, and

3) The interconnection component includes an output terminal, and the output signal is used as an input signal of the computing component.
The computing unit according to any one of claims 14-16, used for multiplication and addition operations of M input signals under the condition of M n-bit weight units, characterized in that:

1) The total number of the electrical signal input components and optical signal capture components is M, and the M input terminals are respectively used to input the desired M excitation input signals,

2) The M weight input terminals, the input quantities are respectively used to represent n-bit weight units; and, the M binary excitation signals are respectively and M n-bit weight units are logically ANDed, and then the obtained The M binary excitation signals are modulated by the N pulse width modulation waveforms, and these values are input into the calculation component composed of a linear neuron counting circuit to obtain the M excitation input signals The result of the multiply-accumulate operation that is multiplied and accumulated by M n-bit weight units.
The computing unit according to claim 14, wherein the type of the mapping component is one of binary type, pulse width modulation type, analog-to-digital conversion type, and random number type.
The computing unit according to claim 14, wherein the electrical signal input assembly can be one of the general-purpose input and output interfaces of SOC, ASIC, and FPGA chips, or be a register file (Register File), SRAM, DRAM, A type of RRAM storage unit, or other components for inputting analog current signals or analog voltage signals.
The computing unit according to claim 14, wherein the optical signal generation component is an image sensor chip, including a CIS chip, a CCD chip, a single-photon avalanche detection diode array, etc., for sensing optical signals and converting the optical signals into A bit-wide binary image signal or a time-domain pulse sequence, or an analog current signal.
The computing unit according to claim 19 or 20, wherein the mapping component is used to map the output signal of the input component as claimed in claim 18, and when the output signal of the input component is a binary signal, the mapping component is equivalent to Input component, that is, no changes are made to the input component at this time, and the mapping component is replaced by the input component;

When the output signal of the input component is an analog voltage signal, the mapping component can be an analog-to-digital conversion device, the mapping component is connected to the subsequent interconnection component, or the mapping component is replaced by the input component, and the output signal of the mapping component is an output signal of the input component, and the output signal is used directly as an input signal of said computing component for the computing device;

When the output signal of the input component is an analog current signal, the analog current signal skips the mapping component and the interconnection component, and is connected to the integral unit of the computing component in the computing unit as claimed in claim 14, and the output of the computing component is It can characterize the magnitude of the input analog current signal.
The arithmetic unit according to claim 18, wherein the mapping component can be a pulse width modulation waveform generating device, and the device outputs a number of high-level signals corresponding to the size of the output binary signal of the input component. The high-level signal is a continuous signal, and there is no interruption of the low-level signal in the middle; the duration of the output signal of the device is determined by the bit width of the output binary signal of the input component.
The computing unit according to claim 18, characterized in that,

The mapping component is a random pulse sequence generating device, the random pulse sequence generating device includes a linear feedback shift register and a comparator, and is set to, when the output signal of the input component is a binary signal, the binary signal is used as a linear feedback the initial value of the shift register; and

In each clock cycle, the linear feedback shift register is set, and its output value is input to the comparator for comparison with its initial value. If the initial value is greater than the current output value of the linear feedback shift register, the comparator output is 1. , Otherwise, the output of the comparator is 0;

Further, the comparator is set to output a random pulse sequence whose occurrence probability of a high-level signal is determined by the output binary signal of the input component, wherein the length of the pulse sequence is determined by the bit width of the output binary signal of the input component.
The computing unit according to claim 14, wherein the switch unit of the calculation component is configured to accept the output signal of the interconnection component, and output a signal determined by the switch state according to the output signal of the interconnection component;

The reset unit is set to be responsible for the reset function of the calculation component, that is, when the reset signal is working, the output signal of the calculation component is zero, and the integration unit is used to integrate the output signal of the switch unit. When the integration signal reaches the threshold of the integration unit, The pulse release unit starts to release the pulse, and at the same time, the discharge unit releases the integrated signal.
The computing unit according to claim 14, wherein the computing component exhibits a linear relationship between the output signal and the input signal; when the input signal is a binary signal, the binary signal is "1", and the switch is turned on or off , the binary signal is "0", then the switch state is closed or open, and the above two switch states are opposite states, and the number of open switches is in a linear relationship with the output signal, where the output signal has a certain frequency, duty cycle pulse signal.
The computing unit according to claim 14, wherein the characteristic of the computing component is that the output signal is in a linear relationship with the input signal, and when the input signal is an analog current signal, the input signal does not need to pass through the switch unit, but It directly enters the integration unit, and the pulse frequency of the output signal has a linear relationship with the magnitude of the input analog current signal.
The computing unit according to claim 14, wherein the output component is a digital counter whose input signal is a clock signal and/or a reset signal, or a pulse release unit of the computing component.
The computing unit according to claim 14, wherein the computing unit satisfies the number of high levels (low levels) of the input signal of the calculation component and the number of pulses counted by the counting unit of the output component within a unit time. The number has a linear relationship, and the calculation component does not need to include all the units contained in the calculation component of the calculation unit as claimed in claim 14, and is also applicable to the calculation unit as claimed in claim 14.