WO2023160735A2 - Operation method and operation unit - Google Patents

Operation method and operation unit Download PDF

Info

Publication number
WO2023160735A2
WO2023160735A2 PCT/CN2023/099468 CN2023099468W WO2023160735A2 WO 2023160735 A2 WO2023160735 A2 WO 2023160735A2 CN 2023099468 W CN2023099468 W CN 2023099468W WO 2023160735 A2 WO2023160735 A2 WO 2023160735A2
Authority
WO
WIPO (PCT)
Prior art keywords
signal
component
input
output
mapping
Prior art date
Application number
PCT/CN2023/099468
Other languages
French (fr)
Chinese (zh)
Other versions
WO2023160735A3 (en
Inventor
闫锋
吴天泽
岳涛
王凯
Original Assignee
南京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京大学 filed Critical 南京大学
Priority to PCT/CN2023/099468 priority Critical patent/WO2023160735A2/en
Publication of WO2023160735A2 publication Critical patent/WO2023160735A2/en
Publication of WO2023160735A3 publication Critical patent/WO2023160735A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the invention relates to a calculation method and a calculation unit, belonging to the field of integrated circuits.
  • the convolution acceleration chips of the traditional convolutional neural network need to use digital modules such as multipliers and accumulators to realize linear operations such as convolution and full connection.
  • the calculation method and device based on the multiplication and addition operation can realize the MAC operation in the traditional neural network based on the linear neuron circuit.
  • the neuron morphology circuit of the present invention only needs about 10 transistors to be realized, but an 8-bit multiplier requires tens of thousands of transistors. Therefore, in terms of power consumption, area, etc., the present invention Compared with the multipliers and accumulators used in traditional convolutional neural network acceleration chips to realize MAC operations, the published calculation method and chip device for multiply-accumulate operations have great advantages.
  • the relatively mature and mainstream implementation methods of "integrated storage and computing" in the industry include Flash-based integrated storage and computing, RRAM and MRAM-based integrated storage and computing, etc.
  • the off-chip weight A single MAC operation can be completed with the input excitation through a single floating gate device, which is also the key to the concept of "integration of storage and calculation", but this method has certain limitations: the floating gate device completes a single MAC operation, and the operation result is in the form of Output in the form of analog current, which means that the current is input into the I-V conversion module, connected to the ADC, and the signal is read out.
  • the multiplication and addition operation in the analog domain of the floating gate device array often brings problems such as low signal-to-noise ratio and IR-Drop, and the design of high-precision ADC is relatively complicated, and the ADC will also bring constraints such as large area and large power consumption. Therefore, in comparison, the ASIC chip device for multiplication and addition operations announced by the present invention also uses off-chip digital excitation input and off-chip weight input to realize the digital readout of the analog domain pulse voltage output by neurons, thereby completely avoiding the above limitations.
  • the object of the present invention is to provide a method and device for computing.
  • CPU operations In the field of hardware acceleration of artificial intelligence algorithms, common hardware-based multiplication and accumulation solutions are as follows: CPU operations, GPU operations, FPGA accelerated operations, and new types of "integrated storage and calculation" architecture operations such as RRAM and Flash.
  • the CPU has no obvious advantage in speed over other types of operations in accelerating neural network operations, which is related to the working principle of the CPU itself.
  • GPU and FPGA operations are used to accelerate linear operations in neural networks, multiplication accumulators designed with digital logic circuits are used. Therefore, when pursuing lower power consumption and higher computing power, the chip The design process has certain requirements, and the smaller the node process, the greater the improvement in computing performance. However, this has brought about a drawback that the research and development cost is also getting higher and higher.
  • the neural network accelerator using the digital-analog hybrid design is superior to the neural network accelerator designed only using digital logic circuits in terms of the key index of computing energy efficiency (computing power/power consumption).
  • Network accelerators therefore, the new RRAM, Flash and other "storage-computing integration" architecture operations have already occupied a place in the field of AI algorithm acceleration, but this "storage-computing integration" computing architecture is in terms of floating-gate device storage-computing integration operations.
  • the input is a digital control signal
  • the digital control signal is converted into a corresponding analog voltage control signal through the level shift module, which is applied to the port of the device, so that the device operates according to its own electrical and physical characteristics.
  • the result of the operation is output in the form of analog current, which means that the current is input into the I-V conversion module and connected to the ADC to realize the readout of the signal.
  • the multiplication and addition operation in the analog domain of the floating gate array often brings problems such as low signal-to-noise ratio and IR-Drop, and the design of high-precision ADC is relatively complicated, and the ADC will also bring constraints such as large area and large power consumption.
  • the present invention provides a technical solution for multiplication and accumulation operations and the corresponding digital-analog hybrid circuit computing device, which avoids the current digital-analog hybrid computing architecture of "integration of storage and calculation" at the level of program design.
  • the input of the device can generally use the grayscale binary data of the image as the excitation input and the output waveform of the pulse width modulator with the binary signal as the input as the weight input.
  • This is also the core of the computing method provided by the present invention, and the working principle on which the corresponding digital-analog hybrid computing device is based. The following is a detailed introduction of an embodiment of the computing device corresponding to the computing method provided by the present invention.
  • an operation method which is characterized in that it includes at least one of an electrical signal input component and an optical signal generation component, as well as a mapping component, an interconnection component, a calculation component, and an output component, and: the The electrical signal input component includes at least one electrical signal A input terminal for inputting the electrical signal A;
  • the optical signal generating component is used to shoot for the target field of view, or accepts the optical signal of other optical signal shooting or generating components to generate the optical signal B;
  • the mapping component is connected with the electrical signal A input component and/or optical signal respectively
  • the signal generating component is connected to receive the electrical signal A of the electrical signal input component and/or the optical signal B of the optical signal generating component, and map the electrical signal A and/or optical signal B to generate signals representing the electrical signals respectively.
  • the interconnection component selects part of the signals from the digital signals output by the electrical signal A and/or optical signal B through the mapping component as the excitation input part, and the rest
  • the excitation input part and at least part of the weight input part are cross-linked to form a multiply-accumulate (MAC) architecture, and the required subset is taken for the fully connected structure in the neural network algorithm of machine learning , to generate an operation corresponding to the digital signal after the electrical signal A and the optical signal B pass through the mapping component, and the output of the subset includes the required operation results of the excitation input part and the weight input part.
  • MAC multiply-accumulate
  • the calculation component includes a switch unit, an integration unit, a reset unit, an integral discharge unit, and a pulse release unit, which are connected to the interconnection component and receive the calculation results of the input part and the weight part from the interconnection component. , to obtain the required target amount of results.
  • the weight value is input, and a fully-connected sub-architecture corresponding to the required operation under the multiply-accumulate architecture is generated from the set of weight values.
  • the output signal of the mapping component of the excitation input signal of the full connection operation is correspondingly subjected to a logical AND operation,
  • the matrix of P and the sub-matrix of size ⁇ P ⁇ P in the matrix of ⁇ Q ⁇ Q are multiplied according to the one-to-one correspondence of all elements in the matrix
  • the method of the embodiment of the present invention is used to perform the multiplication and addition operation of Q input signals under the condition of Q n-bit weight units, wherein: 1) the electrical signal input component and/or the optical signal capturing component are Q 1, used to input the expected Q excitation input signals, 2) the weight units are Q in total, used to input the expected Q weight input signals, the value of each weight is represented by n bits, 3 ) performing an AND operation on the Q binary excitation signals and Q n-bit weight signals respectively, and then obtaining the values of the Q binary excitation signals modulated by the Q pulse width modulation waveforms, and combining these The value is input into the calculation component (linear neuron counting circuit), and the multiplication and addition operation result of multiplying and accumulating Q input signals by Q n-bit weight signals is obtained.
  • the electrical signal input component and/or the optical signal capturing component are Q 1, used to input the expected Q excitation input signals
  • the weight units are Q in total, used to input the expected Q weight input signals
  • the value of each weight is represented
  • mapping method of the mapping component is a binary format, a pulse width modulation format, a format that maintains the output of the original input component and/or the camera component, an analog-to-digital conversion format, a random number format, and the like.
  • the electrical signal A or the optical signal B and the weight part are mapped in binary form, and the operation is a bitwise OR operation of binary numbers.
  • the result of the target quantity obtained by the calculation component is: a voltage pulse signal with a certain frequency and a duty ratio; The counted number of pulses.
  • the electrical signal A or the optical signal B is mapped in a binary form, and before and after the logical AND operation with the weight part is performed, a shift operation is performed on the mapped binary signal.
  • the high-level duration of the pulse signal is determined by the size of the signal itself, and the high-level signal is a continuous signal without low-level signal interruption in the middle.
  • the electrical signal A or the optical signal B is mapped in the form of random numbers, and the signal is mapped to a random pulse sequence that releases a high-level signal subject to a certain probability, wherein the probability is determined by the size of the signal itself.
  • the electrical signal A or the optical signal B is an analog current signal, and the signal directly skips the mapping component and the interconnection component, and is connected to the input end of the integration unit of the computing component.
  • the output result is used to characterize the electrical signal A or optical signal B output by the input component.
  • a kind of operation method is provided, it is characterized in that, take multiplication and accumulation operation as the basic operation module, and decompose the linear operation involved in the machine learning algorithm to be carried out into multiplication and accumulation operation as The operation paradigm of the unit, the nonlinear operation involved is expressed by the superposition of linear operation, and the linear operation unit involved is further decomposed into the operation paradigm with multiplication and accumulation operation as the unit, and the operation paradigm is set according to the operation paradigm
  • the connection structure of computing components, each multiplication and accumulation operation module in the connection structure 1) adopts an excitation signal input component, and 2) adopts a weight signal input component, and also includes, adopts a mapping component, a calculation component, and an output component , and the mapping component is connected to the excitation signal input component and the weight signal input component respectively, and maps the excitation signal and the weight signal, and maps the mapped excitation signal and the mapped weight signal Carry out logical AND operation, input the operation result to the calculation component, and the calculation component performs combined operation with multi
  • a computing unit including at least one of an electrical signal input component and an optical signal generating component, characterized in that: the electrical signal input component includes at least one electrical signal A input terminal for input An electrical signal analog quantity expressing the electrical signal A; an optical signal generation component, used for shooting a target field of view, or receiving an optical signal from another optical signal component, and generating an optical signal B analog quantity, the optical signal B It includes image data with a certain bit width, or a voltage pulse sequence in a time domain, or an analog current of a certain magnitude; it also includes a mapping component, which is respectively connected to the electrical signal input component and/or the optical signal generation component, and mapping the electrical signal A and/or optical signal B to generate digital signals respectively representing the values of the electrical signal A and optical signal B; the interconnection components adopt a plurality of electrical and/or optical paths and cross-linked structures, The structure corresponds to the calculation performed on the digital signal of the electrical signal A and/or optical signal B after the mapping component; the calculation component is connected to
  • the computing unit of the embodiment of the present invention is used to perform a multiply-accumulate (MAC) operation, and it is characterized in that: the interconnection component adopts the fully connected structure in the algorithm on the machine learning neural network to combine the data from the mapping component
  • the number of output signals, N is the number of interconnection components connected to the calculation components in the computing device; 2) the mapping component includes N ⁇ M output terminals, and the output, the N ⁇ M matrix M elements in each row in are combined in a "point-to-point" manner with the output signals of M mapping components that are set as excitation input signals of fully connected operations in the machine learning algorithm, and logical AND operations are performed, and 3)
  • the output terminal of the interconnection component is connected to the input terminal of a corresponding row of the calculation components, so that the output signal of the interconnection component is
  • the computing unit of the embodiment of the present invention is used to perform multiplication and addition operations of M input signals under the condition of M n-bit weight units, and is characterized in that: 1) the total number of the electrical signal input components and the optical signal capture components M, the M input terminals are respectively used to input the desired M excitation input signals, 2) the M weight input terminals, the input quantities are respectively used to represent n-bit weight units; and, the M binary excitation signals are logically ANDed with M n-bit weight units respectively, and then the values of the M binary excitation signals modulated by the N pulse width modulation waveforms are obtained, and these values are input into
  • the multiplication and addition operation results obtained by multiplying and accumulating M excitation input signals by M n-bit weight units are obtained.
  • the type of the mapping component is one of binary type, pulse width modulation type, analog-to-digital conversion type, and random number type.
  • the electrical signal input component can be a kind of general-purpose input and output interface of SOC, ASIC, FPGA class chip, or be a kind of of register file (Register File), SRAM, DRAM, RRAM class storage unit, or be used for Other components that take an input of an analog current signal or an analog voltage signal.
  • register file Register File
  • the optical signal generating component is an image sensor chip, including a CIS chip, a CCD chip, a single photon avalanche detection diode array, etc., which are used to perceive the optical signal and convert the optical signal into a binary image signal with a certain bit width or a period of time domain pulse. sequence, or an analog current signal.
  • the mapping component is used to map the output signal of the input component as described in the embodiment of the present invention.
  • the mapping component is equivalent to the input component, that is, no change is made to the input component at this time.
  • the mapping component is replaced by the input component; when the output signal of the input component is an analog voltage signal, the mapping component can be an analog-to-digital conversion device at this time, and the mapping component is connected to the subsequent interconnection component, or the mapping component is input component, at this time the output signal of the mapping component is the output signal of the input component, and the output signal is directly used as the input signal of the computing component for the computing device; when the output signal of the input component is an analog current signal At this time, the analog current signal skips the mapping component and the interconnection component, and is connected to the integration unit of the computing component in the computing unit according to the embodiment of the present invention, and the output of the computing component can represent the magnitude of the input analog current signal.
  • the mapping component can be a pulse width modulation waveform generating device, which outputs a number of high-level signals corresponding to the size of the output binary signal of the input component, and the high-level signal is a continuous signal without low voltage in the middle.
  • Flat signal isolation; the duration of the output signal of the device is determined by the bit width of the output binary signal of the input component.
  • the mapping component is a random pulse sequence generating device
  • the random pulse sequence generating device includes a linear feedback shift register and a comparator, and is set to, when the output signal of the input component is a binary signal, the binary signal is used as The initial value of the linear feedback shift register; and in each clock cycle, the linear feedback shift register is set to, its output value is compared with its initial value input to the comparator, if the initial value is greater than the current linear feedback shift register
  • the output value of the comparator is 1, otherwise the output of the comparator is 0; further, the comparator is set to output a random pulse sequence whose occurrence probability of a high-level signal is determined by the output binary signal of the input component, wherein the pulse sequence The length of is determined by the bit width of the output binary signal of the input component.
  • the switch unit of the calculation component is set to accept the output signal of the interconnection component, and output a signal determined by the switch state according to the output signal of the interconnection component;
  • the reset unit is set to be responsible for the reset function of the calculation component, that is, when the reset signal works, the calculation
  • the output signal of the component is zero, and the integration unit is used to integrate the output signal of the switch unit.
  • the pulse release unit starts to release the pulse, and at the same time, the discharge unit discharges the integration signal .
  • the computing component exhibits a linear relationship between the output signal and the input signal; when the input signal is a binary signal, the binary signal is "1", the switch is on or off, and the binary signal is "0”, the switch state is off Or open, and the above two switch states are opposite states, and the number of the switches opened is linearly related to the output signal, where the output signal is a pulse signal with a certain frequency and duty cycle.
  • the characteristic of the calculation component is that the output signal has a linear relationship with the input signal, and when the input signal is an analog current signal, the input signal does not need to pass through the switch unit, but directly enters the integration unit, and the pulse frequency of the output signal is the same as The magnitude of the input analog current signal has a linear relationship.
  • the output component is a digital counter whose input signal is a clock signal and/or a reset signal, or a pulse releasing unit of the calculation component.
  • the operation unit satisfies the number of high levels (low levels) of the input signal of the calculation component and the The number of pulses counted by the counting unit of the output component has a linear relationship, and the calculation component does not need to include all the units contained in the calculation component in the calculation unit as described in the embodiment of the present invention, and is also applicable to the implementation of the present invention Example of the arithmetic unit.
  • step 1 mapping signal X and signal Y to obtain the mapping form of signal X and signal Y; step 2, targeting signal X and signal Y
  • step 2 targeting signal X and signal Y
  • the mapping form of is calculated to obtain the signal Q, wherein the signal Q is set as the algebraic operation result of the signal X and the signal Y.
  • signal X is mapped in binary form and signal Y is mapped in pulse width modulated form.
  • a logical AND operation is performed on the mapped signals of the signal X and the signal Y.
  • the signal Q is in the form of a pulse signal with a certain frequency and a duty cycle.
  • the operation when the signal X is mapped in a binary form, the operation further includes a shift operation of the binary signal itself before and after the execution of the logical AND operation.
  • the high-level duration of the pulse signal is determined by the signal itself, wherein the high-level signal is a continuous signal, and there is no interruption of the low-level signal in the middle.
  • the signal after the operation of the signal X and the signal Y is subjected to a mapping to obtain the signal Q.
  • the computing device includes: an input component, which is used for the input of signal X or signal Y, and which is connected to the mapping component in the device for calculation; a shooting component, which is used for shooting and quantifying the target field of view, to Obtain image data with a certain positioning width, a period of time domain pulse sequence or an analog current of a certain magnitude as signal X or signal Y, when the photographing component outputs image data with a certain positioning width or a period of time domain pulse sequence, the component Connected to the mapping component in the device for calculation, when the shooting component outputs an analog current signal of a certain size, make the analog current signal skip the mapping component and the interconnection component, and connect to the integration unit of the calculation component, At this time, the output of the calculation component can directly represent the magnitude of the analog current signal output by the shooting component; the mapping component is used for mapping the signal X or the signal Y, so as to map the signal X or the signal Y as described in the embodiment of the present invention The form is mapped; the interconnection component is used
  • mapping components When there are multiple mapping components, these mapping components need to be operated according to a certain combination scheme as described in the embodiment of the present invention, and the components are connected to the computing components used in the computing device; the computing components, the components are used It is used to perform calculations on signals output by interconnected components.
  • the assembly generally includes a switch unit, an integration unit, a reset unit, a discharge unit and a pulse release unit, the assembly is connected to the output assembly for the computing device;
  • the output component is used to collect the output signal of the calculation component, and generally consists of a counting unit.
  • the input component of signal X is a register file
  • the input component of signal Y is an ASIC core slice I/O.
  • the signal X is used as an excitation input unit with a bit width of 4 bits, and the mapping component of the signal X is directly replaced by the input component of the signal X, that is, the output signal of the mapping component of the signal X is the input component of the signal X
  • the output signal of the signal; the signal Y is used as a weight unit, and its bit width is 4 bits.
  • the mapping component of the signal Y is a pulse width modulation waveform generator, and the 4-bit weight unit is generated by the pulse width modulation waveform generator. Wave form.
  • the signal X is 1024 4-bit register files, which are input as the excitation signal of the interconnection components; the signal Y is 9 pulse width modulation generators, which are input as the weight signals of the interconnection components, and the interconnection scheme uses a machine learning algorithm
  • the output signals of 1024 mapping components of signal X are arranged into a 32 ⁇ 32 matrix, and the output signals of 9 mapping components of signal Y are arranged into a 3 ⁇ 3 matrix.
  • the arrayed matrix adopts a "point-to-point" combination scheme with full coverage of all elements, a scale of 3 ⁇ 3, and a step of 1 for logical AND operations, and the 1024 ⁇ 9 output signals obtained from the operations are used as the computing components of the device for computing input signal.
  • the signal X is nine 4-bit register files, which are input as the excitation signal of the interconnection components;
  • the signal Y is 36 pulse width modulation generators, which are input as the weight signals of the interconnection components, and the interconnection scheme uses a machine learning algorithm
  • the output signals of the nine signal X mapping components and the output signals of the nine signal Y mapping components are logically ANDed. Since the number of output signals of the signal Y mapping components is 36, the same Within a time period, the logical AND operation needs to be executed 4 times in parallel, and 36 output signals obtained from the 4 logical AND operations are used as input signals of the computing component of the computing device.
  • a single computing component is composed of 9 switching units, an integrating unit, a reset unit, a bleeding unit and a pulse releasing unit.
  • there are 1024 computing components in total that is, there are 1024 ⁇ 9 input signals, and the input signals are 1024 ⁇ 9 output signals of the interconnection components.
  • the output signals of 1024 calculation components are 1024 kinds of pulse signals with their own frequencies and duty ratios.
  • 1024 kinds of pulse signals with respective frequencies and duty ratios output by 1024 computing components are input to 1024 output components, and the output components are digital counters, and the counters are within the time period during which the pulse width modulation waveform generator generates waveforms, Count the input pulse signal with certain frequency and duty cycle.
  • the data counted by 1024 digital counters is the result of convolution operation between signal X and signal Y.
  • the invention provides a calculation method and device, which belong to the field of integrated circuits, and can realize operations such as full connection and convolution in neural network operations.
  • the ASIC chip used for computing provided by the present invention compares in area and power consumption
  • the multipliers and accumulators used by traditional convolutional neural network acceleration chips to implement MAC operations have great advantages.
  • Fig. 1 is a block diagram of a device for computing in an embodiment of the present invention
  • Fig. 2 is an effect diagram of performing pulse width modulation form mapping on the output signal of the input component in the present invention
  • Fig. 3 is a kind of circuit block diagram that is used for computing component in the embodiment 1 of the present invention.
  • Fig. 4 is a kind of circuit block diagram that is used for computing component in embodiment 2 of the present invention.
  • Fig. 5 is a circuit block diagram of a photographing component for realizing output of a time-domain pulse sequence in Embodiment 3 of the present invention
  • Fig. 6 is a schematic diagram of a full-connection connection method used to realize the weight signal input and the excitation signal input of interconnection components in Embodiment 4 of the present invention.
  • FIG. 3 a specific implementation circuit block diagram of the calculation component in the above-mentioned apparatus for calculation is provided, as shown in FIG. 3 .
  • the circuit consists of 9 PMOS transistors as switching units, wherein the gate terminals of the PMOS transistors are connected to the output signals of the interconnection components in the computing device, the source terminals of the 9 PMOS transistors are fixedly biased, generally set as IDC, and the drains of the 9 PMOS transistors are The terminal is connected to the inverting input terminal of an operational amplifier, the non-inverting input terminal of the operational amplifier is connected to the reference voltage V ref , the output terminal of the operational amplifier is connected to the gate terminal of a PMOS transistor, and the source terminal of the PMOS transistor is connected to the negative terminal of the operational amplifier.
  • the phase input terminal here the function of the operational amplifier is to clamp the voltage of the source terminal of the PMOS tube unchanged, that is, to play a clamping role.
  • the drain terminal of the PMOS tube is connected to the drain terminal of the NMOS tube on a branch of a Cascode current mirror. Through the copy effect of the current mirror, the current on the branch is copied to the upper plate of the capacitor used as the integration unit, and the lower plate of the capacitor Connect to VSS.
  • the upper plate of the capacitor is connected to the drain end of an NMOS reset transistor used to reset the unit, the source end of the transistor is connected to VSS, and the gate end of the transistor is connected to the external reset signal. When the reset signal is high, the computing component Perform a reset operation.
  • the gate of the NMOS discharge tube used for the integral discharge unit is connected to the upper plate of a stable capacitor, the source terminal is connected to VSS, the source terminal of the PMOS extraction tube used to extract the charge of the upper plate of the integral unit capacitor is connected to VDD, and the drain The end is connected with the drain end of the NMOS transistor discharge pipe.
  • the drain terminal of the NMOS discharge tube is connected to two-stage inverters, wherein the output of the first-stage inverter is connected to the gate terminal of the PMOS extraction tube, and the output of the second-stage inverter is connected to the upper plate of the stabilizing capacitor.
  • the upper plate of the integrating capacitor is additionally connected with a second-level inverter, which is used as a pulse release unit, and the output of the second-level inverter is the output of the embodiment of the calculation component, thereby realizing the pulse release unit The pulse release function.
  • FIG. 4 a specific implementation circuit block diagram of the calculation component in the above-mentioned apparatus for calculation is provided, as shown in FIG. 4 .
  • This embodiment only replaces the Cascode current mirror, operational amplifier, operational amplifier output tube, 9 PMOS switch tubes and other circuit parts used to charge the upper plate of the integral capacitor of the integration unit in the above-mentioned embodiment 1, and retains other parts in the embodiment 1. components of the circuit.
  • the current mirror in the figure has 9 copy branches, and each branch has a transmission gate as a switch unit to control whether the current is turned on or not.
  • the drain terminal of the PMOS transistor is connected to the upper plate of the integrating capacitor to charge the capacitor.
  • the structures and functions of the remaining circuit parts are the same as those in Embodiment 1.
  • a specific circuit implementation scheme of the above-mentioned photographing component that outputs a time-domain pulse sequence is provided.
  • the Gate signal When the Gate signal is high, the SPAD device is in an overvoltage state, and the gate signal controls the drain voltage of the N1 tube at the gate terminal to 0V. When the SPAD device receives photons, the device has an avalanche effect.
  • the Gate signal becomes low , the N1 tube is turned off, and the N1 tube is regarded as a resistor at this time, and its drain terminal gets a divided voltage V.
  • the drain terminal voltage of the N1 tube changes from the divided voltage V to 0, so that a pulse voltage signal is formed at the drain terminal of the N1 tube, and the signal passes through N2, P2
  • the pulse voltage signal at the drain end of the N1 tube is converted into a square wave signal S with a certain pulse width, and the square wave signal performs an OR logic operation with itself through an inverter, Get a square wave signal OUT with adjustable pulse width, which is the output signal of the circuit.
  • the pulse width of the OUT signal is controlled by the gate bias voltage Vm of the P1 transistor. Therefore, when the circuit works at a certain clock frequency, it can output a pulse sequence in the time domain according to the photon signal sensed by the SPAD device.
  • the signal Ai of the above-mentioned 9 4-bit register files input as the excitation signal of the interconnection component and the signal Wj, k of the 36 pulse width modulation generators input as the weight signal of the interconnection component are provided.
  • Way. 9 signals Ai and 9 signals Wj, k perform logical AND operations to obtain 9 output signals. Since there are 36 signals Wj, k in total, it means that there are 4 ⁇ 9 output signals of logical AND operations.
  • the 36 The output signal is the input signal of the subsequent computing components. Four computing components are required, and each computing component is provided with 9 signal input ports to connect to the input signals of the interconnection components.
  • connection mode of the circuit and the process of signal input and output are exemplified mainly by taking the number of switch units as an example.
  • any number of switch units can be set according to the needs of the actual application scene, and no limitation is set here.
  • the circuit of the embodiment of the present invention may include 4 or 5 switch units; in another example, the circuit of the embodiment of the present invention Can include 8 switch units.
  • the specific structure of the circuit and the connection manner between the switch units may be similar to the examples shown in the above embodiments 1-4, and will not be repeated here.
  • steps and devices in the various embodiments herein are not limited to implementation in a certain embodiment. In fact, some steps and some devices related to the various embodiments herein can be combined according to the concept of the present invention. New embodiments are contemplated and are within the scope of the invention.
  • the means may comprise various hardware and/or software components and/or modules, including but not limited to circuits, application specific integrated circuits (ASICs) or processors.
  • ASICs application specific integrated circuits
  • a general purpose processor digital signal processor (DSP), ASIC, field programmable gate array signal (FPGA) or other programmable logic device (PLD), discrete gate or Transistor logic, discrete hardware components, or any combination thereof to implement or perform each of the illustrated logic blocks, modules and circuits.
  • DSP digital signal processor
  • FPGA field programmable gate array signal
  • PLD programmable logic device
  • a general-purpose processor may be a microprocessor, but in the alternative, the processor may be any commercially available processor, controller, microcontroller, or state machine.
  • a processor may also be implemented as a combination of computing devices, eg, a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors cooperating with a DSP core, or any other such configuration.
  • a software module may reside in any form of tangible storage medium.
  • storage media include random access memory (RAM), read only memory (ROM), flash memory, EPROM memory, EEPROM memory, registers, hard disks, removable disks, CD-ROMs, and the like.
  • RAM random access memory
  • ROM read only memory
  • flash memory EPROM memory
  • EEPROM memory EEPROM memory
  • registers hard disks, removable disks, CD-ROMs, and the like.
  • a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
  • a software module may be a single instruction, or many instructions, and may be distributed over several different code segments, between different programs and across multiple storage media.
  • Methods of the invention herein comprise one or more acts for carrying out the described methods.
  • the methods and/or actions may be interchanged with one another without departing from the scope of the claims.
  • the order and/or use of specific actions may be modified without departing from the scope of the claims.
  • a storage media may be any available tangible media that can be accessed by a computer.
  • such computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices or may be used to carry or store instructions or data structures in the form of desired program code and any other tangible medium that can be accessed by a computer.
  • disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc.
  • a computer program product can perform the operations presented herein.
  • a computer program product may be a computer-readable tangible medium having instructions tangibly stored (and/or encoded) thereon that are executable by one or more processors to perform the operations described herein.
  • a computer program product may include packaging materials.
  • Software or instructions may also be transmitted via transmission media.
  • software may be transmitted from a website, server, or other remote source using a transmission medium such as coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, or microwave.
  • a transmission medium such as coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, or microwave.
  • modules and/or other appropriate means for performing the methods and techniques described herein may be downloaded and/or otherwise obtained by user terminals and/or base stations as appropriate.
  • a device can be coupled to a server to facilitate the transfer of the means for performing the methods described herein.
  • the various methods described herein can be stored via the (e.g. RAM, ROM, physical storage media such as CD or floppy disk etc.) are provided so that user terminals and/or base stations have access to various methods when coupled to or providing storage means to the device.
  • any other suitable technique for providing the methods and techniques described herein to a device may be utilized.

Abstract

Disclosed in the present invention are an operation method and an operation unit. According to the method, input signals are M binary excitation signals, and M N-bit weight units are outputted, wherein each weight unit is composed of a pulse width modulation (PWM) waveform generator; M PWM waveforms are obtained after an AND operation is performed on the input signals, and the M PWM waveforms are inputted into a linear neuron counting circuit; the linear neuron counting circuit has M input ends; when the linear neuron counting circuit receives the M PWM waveforms, a linear neuron releases pulses of corresponding frequencies according to the number of high levels (or low levels) of the input signals; a counter counts the released pulses within a certain time T; and a counting result is an operation result of a multiply-accumulate operation unit.

Description

一种运算方法和运算单元A computing method and computing unit 技术领域technical field
本发明涉及一种运算方法和运算单元,属于集成电路领域。The invention relates to a calculation method and a calculation unit, belonging to the field of integrated circuits.
背景技术Background technique
传统卷积神经网络的卷积加速芯片,诸如FPGA、GPU、NPU等芯片,须要采用乘法器和累加器等数字模块实现卷积、全连接等线性运算,相比之下,本发明公布的用于乘加运算的计算方法和装置基于线性神经元电路即可以实现传统神经网络中的MAC运算。在相同节点的硅基工艺下,本发明的神经元形态电路只需要10个左右的晶体管即可实现,然而8bit乘法器则需要上万个晶体管,因此,在功耗、面积等方面,本发明公布的用于乘加运算的计算方法和芯片装置相较传统卷积神经网络加速芯片采用的乘法器与累加器来实现MAC运算具有巨大的优势。The convolution acceleration chips of the traditional convolutional neural network, such as FPGA, GPU, NPU and other chips, need to use digital modules such as multipliers and accumulators to realize linear operations such as convolution and full connection. The calculation method and device based on the multiplication and addition operation can realize the MAC operation in the traditional neural network based on the linear neuron circuit. Under the silicon-based technology of the same node, the neuron morphology circuit of the present invention only needs about 10 transistors to be realized, but an 8-bit multiplier requires tens of thousands of transistors. Therefore, in terms of power consumption, area, etc., the present invention Compared with the multipliers and accumulators used in traditional convolutional neural network acceleration chips to realize MAC operations, the published calculation method and chip device for multiply-accumulate operations have great advantages.
另外,随着硅基工艺的微缩,计算与存储之间的速度差异越来越大,大量能量和时间损失在数据搬运的过程中,这就是由于冯诺依曼瓶颈而产生的存储墙问题。传统冯诺依曼计算架构在速度、功耗等方面难以满足未来海量数据处理的需求,因此“存算一体”的新型计算架构应运而生,存算一体计算,即在计算的同时完成存储,这种新的计算架构不需要花费大量的数据移动成本,将有望彻底突破存储墙的限制,大大提升高通量的数据处理能力。那么目前业内比较成熟的主流的“存算一体”实现方式有基于Flash的存算一体、基于RRAM、MRAM的存算一体等,以成熟的基于浮栅器件的存算一体为例,片外权重与输入激励经过浮栅单器件即可以完成一次MAC运算,这也是“存算一体”概念的关键所在,但是这种方式有它一定的局限性:浮栅器件完成单次MAC运算,运算结果以模拟电流的形式输出,这就意味着将电流输入I-V转换模块,连接ADC,实现信号的读出。然而浮栅器件阵列的模拟域乘加运算往往会带来低信噪比、IR-Drop等问题,并且高精度的ADC设计较为复杂,ADC也会带来大面积大功耗等制约因素。因此相较而言,本发明公布的用于乘加运算的ASIC芯片装置同样以片外数字激励输入、片外权重输入,实现神经元输出的模拟域脉冲电压的数字读出,从而完全避免了上述的局限性。In addition, with the shrinking of silicon-based technology, the speed difference between computing and storage is getting bigger and bigger, and a lot of energy and time are lost in the process of data transfer. This is the storage wall problem caused by the von Neumann bottleneck. The traditional von Neumann computing architecture is difficult to meet the needs of future massive data processing in terms of speed and power consumption. Therefore, a new computing architecture of "integrated storage and computing" has emerged as the times require. The integrated computing of storage and computing means that storage is completed while computing. This new computing architecture does not require a lot of data movement costs, and it is expected to completely break through the limitations of the storage wall and greatly improve high-throughput data processing capabilities. At present, the relatively mature and mainstream implementation methods of "integrated storage and computing" in the industry include Flash-based integrated storage and computing, RRAM and MRAM-based integrated storage and computing, etc. Taking the mature storage and computing based on floating gate devices as an example, the off-chip weight A single MAC operation can be completed with the input excitation through a single floating gate device, which is also the key to the concept of "integration of storage and calculation", but this method has certain limitations: the floating gate device completes a single MAC operation, and the operation result is in the form of Output in the form of analog current, which means that the current is input into the I-V conversion module, connected to the ADC, and the signal is read out. However, the multiplication and addition operation in the analog domain of the floating gate device array often brings problems such as low signal-to-noise ratio and IR-Drop, and the design of high-precision ADC is relatively complicated, and the ADC will also bring constraints such as large area and large power consumption. Therefore, in comparison, the ASIC chip device for multiplication and addition operations announced by the present invention also uses off-chip digital excitation input and off-chip weight input to realize the digital readout of the analog domain pulse voltage output by neurons, thereby completely avoiding the above limitations.
发明内容Contents of the invention
针对以上背景技术中存在的局限性,本发明的目的在于提供一种用于计算的方法和装置。 In view of the limitations existing in the above background technology, the object of the present invention is to provide a method and device for computing.
目前在人工智能算法硬件加速领域,常见的用硬件来实现乘法累加运算的方案有如下几种:CPU运算、GPU运算、FPGA加速运算以及新型的RRAM、Flash等“存算一体”架构运算,其中CPU在对神经网络进行加速运算中,速度较其他几种运算并没有明显的优势,这与CPU本身的工作原理有关。一般用GPU、FPGA运算来加速神经网络中的线性运算时,所采用的都是用数字逻辑电路设计成的乘法累加器,所以在追求更低的功耗以及更高的算力时,对芯片的设计工艺就产生了一定的要求,节点越小的工艺所带来的运算性能的提升也越大,然而这就带来了研发成本也越来越高的一个弊端。那么业内已经有数据表明,在相同的工艺节点下,采用数模混合设计的神经网络加速器在运算能效比(算力/功耗)这个关键指标上是要优于仅采用数字逻辑电路设计的神经网络加速器的,因此,新型的RRAM、Flash等“存算一体”架构运算在AI算法加速领域已经占据一席之地,不过这种“存算一体”运算架构就浮栅器件存算一体运算而言,在单器件完成单次MAC运算时,输入为数字控制信号,通过电平转移模块将数字控制信号转换为相应的模拟电压控制信号,施加在器件的端口上,使得器件根据其本身的电学物理特性进行运算,运算结果以模拟电流的形式输出,这就意味着将电流输入I-V转换模块,连接ADC,实现信号的读出。然而浮栅阵列的模拟域乘加运算往往会带来低信噪比、IR-Drop等问题,并且高精度的ADC设计较为复杂,ADC也会带来大面积大功耗等制约因素。因此,本发明所提供的一种用于乘法累加运算的技术方案以及与其对应的数模混合电路运算装置,在方案设计层面上避免了现有“存算一体”这种数模混合运算架构目前所面临的局限性,该装置输入一般可以采用图像的灰度二进制数据作为激励输入以及二进制信号作为输入的脉冲宽度调制器的输出波形作为权重输入,两者输入经过“逻辑与”运算后得到具有一定脉宽的二值信号,即数字域的“0”和“1”,该装置所包含的模拟线性神经元电路只需要接受数字域“0”或“1”的开关信号,而后在模拟电路内部进行相应的积分泄放等操作,以此输出一定频率、具有一定占空比的脉冲,那么在单位乘加运算时间内,数字计数器所计得的脉冲的个数即表征了数字信号“1”(或“0”)的输入个数的大小,以此形成一种映射关系。这也是本发明提供的计算方法的内核,以及对应数模混合运算装置所基于的工作原理。下面则是与本发明所提供的计算方法对应的运算装置的一个实施例的详细介绍。At present, in the field of hardware acceleration of artificial intelligence algorithms, common hardware-based multiplication and accumulation solutions are as follows: CPU operations, GPU operations, FPGA accelerated operations, and new types of "integrated storage and calculation" architecture operations such as RRAM and Flash. The CPU has no obvious advantage in speed over other types of operations in accelerating neural network operations, which is related to the working principle of the CPU itself. Generally, when GPU and FPGA operations are used to accelerate linear operations in neural networks, multiplication accumulators designed with digital logic circuits are used. Therefore, when pursuing lower power consumption and higher computing power, the chip The design process has certain requirements, and the smaller the node process, the greater the improvement in computing performance. However, this has brought about a drawback that the research and development cost is also getting higher and higher. Well, data in the industry have shown that under the same process node, the neural network accelerator using the digital-analog hybrid design is superior to the neural network accelerator designed only using digital logic circuits in terms of the key index of computing energy efficiency (computing power/power consumption). Network accelerators, therefore, the new RRAM, Flash and other "storage-computing integration" architecture operations have already occupied a place in the field of AI algorithm acceleration, but this "storage-computing integration" computing architecture is in terms of floating-gate device storage-computing integration operations. When a single device completes a single MAC operation, the input is a digital control signal, and the digital control signal is converted into a corresponding analog voltage control signal through the level shift module, which is applied to the port of the device, so that the device operates according to its own electrical and physical characteristics. Operation, the result of the operation is output in the form of analog current, which means that the current is input into the I-V conversion module and connected to the ADC to realize the readout of the signal. However, the multiplication and addition operation in the analog domain of the floating gate array often brings problems such as low signal-to-noise ratio and IR-Drop, and the design of high-precision ADC is relatively complicated, and the ADC will also bring constraints such as large area and large power consumption. Therefore, the present invention provides a technical solution for multiplication and accumulation operations and the corresponding digital-analog hybrid circuit computing device, which avoids the current digital-analog hybrid computing architecture of "integration of storage and calculation" at the level of program design. Due to the limitations faced, the input of the device can generally use the grayscale binary data of the image as the excitation input and the output waveform of the pulse width modulator with the binary signal as the input as the weight input. A binary signal with a certain pulse width, that is, "0" and "1" in the digital domain, the analog linear neuron circuit contained in the device only needs to accept the switching signal of "0" or "1" in the digital domain, and then in the analog circuit Corresponding operations such as integration and discharge are performed internally to output pulses with a certain frequency and a certain duty cycle. Then, within the unit multiplication and addition operation time, the number of pulses counted by the digital counter represents the digital signal "1 " (or "0") to form a mapping relationship. This is also the core of the computing method provided by the present invention, and the working principle on which the corresponding digital-analog hybrid computing device is based. The following is a detailed introduction of an embodiment of the computing device corresponding to the computing method provided by the present invention.
根据本发明一个实施例,提供了一种运算方法,其特征在于包括采用电信号输入组件和光信号生成组件的至少一种,以及映射组件,互联组件,计算组件,以及输出组件,并且:所述电信号输入组件,包括至少一个电信号A输入端,用于输入所述电信号A;所 述光信号生成组件,针对目标视场进行拍摄,或者接受其它光信号拍摄或生成组件的光信号,以生成光信号B;所述映射组件,分别与所述电信号A输入组件和/或光信号生成组件相连,接受所述电信号输入组件的电信号A和/或所述光信号生成组件的光信号B,并对电信号A和/或光信号B进行映射,生成分别代表所述电信号A和/或光信号B值的数字信号;所述互联组件,从多个所述电信号A和/或光信号B经过映射组件输出的数字信号中选择部分信号作为激励输入部分,其余部分作为权重输入部分,将所述激励输入部分和至少部分权重输入部分进行交联,以构成一个乘法累加(MAC)架构,并针对机器学习的神经网络算法中的全连接结构取所需要的子集,生成对应于对电信号A、光信号B经过映射组件后的数字信号所要进行的运算,所述子集的输出包括了所需要的激励输入部分和权重输入部分的运算结果。According to an embodiment of the present invention, there is provided an operation method, which is characterized in that it includes at least one of an electrical signal input component and an optical signal generation component, as well as a mapping component, an interconnection component, a calculation component, and an output component, and: the The electrical signal input component includes at least one electrical signal A input terminal for inputting the electrical signal A; The optical signal generating component is used to shoot for the target field of view, or accepts the optical signal of other optical signal shooting or generating components to generate the optical signal B; the mapping component is connected with the electrical signal A input component and/or optical signal respectively The signal generating component is connected to receive the electrical signal A of the electrical signal input component and/or the optical signal B of the optical signal generating component, and map the electrical signal A and/or optical signal B to generate signals representing the electrical signals respectively. A digital signal of the value of the signal A and/or optical signal B; the interconnection component selects part of the signals from the digital signals output by the electrical signal A and/or optical signal B through the mapping component as the excitation input part, and the rest As the weight input part, the excitation input part and at least part of the weight input part are cross-linked to form a multiply-accumulate (MAC) architecture, and the required subset is taken for the fully connected structure in the neural network algorithm of machine learning , to generate an operation corresponding to the digital signal after the electrical signal A and the optical signal B pass through the mapping component, and the output of the subset includes the required operation results of the excitation input part and the weight input part.
其中,所述计算组件包括开关单元、积分单元、复位单元、积分泄放单元以及脉冲释放单元,与所述互联组件相连,接受来自所述互联组件的、所述输入部分和权重部分的计算结果,得到所需要的目标量的结果。Wherein, the calculation component includes a switch unit, an integration unit, a reset unit, an integral discharge unit, and a pulse release unit, which are connected to the interconnection component and receive the calculation results of the input part and the weight part from the interconnection component. , to obtain the required target amount of results.
其中,基于所述全连接架构,输入所述权重值,并由所述权重值集合生成所述乘法累加架构下对应于所需要运算的全连接子架构,在该子架构下,对一组激励信号进行处理,得到所需要的目标值,其中:所述运算对权重信号K以及激励信号进行,并且:所述权重信号K的个数为K=N×M,其中M为被设置为所述算法中的全连接运算的激励输入信号的映射组件的输出信号的个数,M为0或正整数;N为互联组件连接到所述用于计算装置中的计算组件的个数,N为0或正整数;将执行所述全连接运算的权重输入信号的映射组件的输出信号排列为一个N×M的矩阵,该矩阵中的每一行M个元素同M个被设置为所述算法中的全连接运算的激励输入信号的映射组件的输出信号相应地进行逻辑与运算,并且将所述互联组件的输出信号作为对应行的所述计算组件的输入信号。Wherein, based on the fully-connected architecture, the weight value is input, and a fully-connected sub-architecture corresponding to the required operation under the multiply-accumulate architecture is generated from the set of weight values. Under this sub-architecture, a set of excitation The signal is processed to obtain the required target value, wherein: the operation is performed on the weight signal K and the excitation signal, and: the number of the weight signal K is K=N×M, where M is set as the The number of output signals of the mapping component of the excitation input signal of the fully connected operation in the algorithm, M is 0 or a positive integer; N is the number of interconnection components connected to the computing components in the computing device, and N is 0 Or a positive integer; the output signal of the mapping component that performs the weight input signal of the full connection operation is arranged as an N×M matrix, and the M elements in each row in the matrix are set as the M in the algorithm The output signal of the mapping component of the excitation input signal of the full connection operation is correspondingly subjected to a logical AND operation, and the output signal of the interconnection component is used as the input signal of the calculation component of the corresponding row.
其中,所述互联组件采用所述算法中的卷积连接,对来自所述映射组件的数据进行组合处理,并且:在存在多个映射组件的输出信号时,将映射组件的输出信号个数设为Z,将Z个信号分为两组,其中一组设为Q个,另一组设为P个,且Z=Q+P;以及,将Q个映射组件的输出信号设置为机器学习算法中的卷积运算的激励输入信号,将P个映射组件的输出信号设置为机器学习算法中的卷积运算的权重输入信号;并且所述方法还包括,将Q个映射组件的输出信号排列成一个√Q×√Q的矩阵(√Q为整数),将P个映射组件的输出信号排列为√P×√P的矩阵(√P为整数,√P≤√Q);令√P×√P的矩阵与√Q×√Q的矩阵中尺寸为√P×√P的子矩阵按照矩阵中所有元素一一对应点乘的 方式进行逻辑与运算,所有运算在时间域上都是同时并行进行的;将所有运算的输出信号被输出为所述互联组件的输出信号,以及将所述互联组件的输出信号输入到所述计算组件中,作为所述计算组件的输入信号。Wherein, the interconnection component uses the convolution connection in the algorithm to combine the data from the mapping component, and: when there are multiple output signals of the mapping component, the number of output signals of the mapping component is set to For Z, divide the Z signals into two groups, wherein one group is set as Q, and the other group is set as P, and Z=Q+P; and, the output signals of Q mapping components are set as machine learning algorithms The excitation input signal of the convolution operation in, the output signals of the P mapping components are set as the weight input signals of the convolution operation in the machine learning algorithm; and the method also includes, arranging the output signals of the Q mapping components as A matrix of √Q×√Q (√Q is an integer), arrange the output signals of P mapping components into a matrix of √P×√P (√P is an integer, √P≤√Q); make √P×√ The matrix of P and the sub-matrix of size √P×√P in the matrix of √Q×√Q are multiplied according to the one-to-one correspondence of all elements in the matrix All operations are performed in parallel in the time domain at the same time; the output signals of all operations are output as the output signals of the interconnection components, and the output signals of the interconnection components are input to the calculation In the component, it is used as the input signal of the computing component.
其中,本发明实施例的方法用于进行Q个输入信号在Q个n比特位权重单元条件下的乘加运算,其中:1)所述电信号输入组件和/或光信号拍摄组件共为Q个,用于输入所期望的Q个激励输入信号,2)所述权重单元共为Q个,用于输入所期望的Q个权重输入信号,每个权重的值均由n比特来表示,3)将所述Q个二进制激励信号分别与Q个n比特的权重信号进行与运算,之后得到所述Q个二进制激励信号受所述Q个脉冲宽度调制波形调制后的值,并将所述这些值输入到所述计算组件(线性神经元计数电路)中,得到对Q个输入信号与Q个n比特权重信号相乘并累加的乘加运算结果。Wherein, the method of the embodiment of the present invention is used to perform the multiplication and addition operation of Q input signals under the condition of Q n-bit weight units, wherein: 1) the electrical signal input component and/or the optical signal capturing component are Q 1, used to input the expected Q excitation input signals, 2) the weight units are Q in total, used to input the expected Q weight input signals, the value of each weight is represented by n bits, 3 ) performing an AND operation on the Q binary excitation signals and Q n-bit weight signals respectively, and then obtaining the values of the Q binary excitation signals modulated by the Q pulse width modulation waveforms, and combining these The value is input into the calculation component (linear neuron counting circuit), and the multiplication and addition operation result of multiplying and accumulating Q input signals by Q n-bit weight signals is obtained.
其中,所述映射组件的映射方式为二进制形式、脉冲宽度调制形式、维持原有输入组件和/或拍摄组件输出不改变的形式、模数转换形式、随机数形式等。Wherein, the mapping method of the mapping component is a binary format, a pulse width modulation format, a format that maintains the output of the original input component and/or the camera component, an analog-to-digital conversion format, a random number format, and the like.
其中,电信号A或光信号B以及所述权重部分均采用二进制形式进行映射,并且所述运算进行二进制数的按位或运算。Wherein, the electrical signal A or the optical signal B and the weight part are mapped in binary form, and the operation is a bitwise OR operation of binary numbers.
其中,所述计算组件所得到的目标量的结果为:具有一定频率、占空比的电压脉冲信号;或者,在一定时间T内,利用计数器对上述具有一定频率、占空比的电压脉冲信号所计得的脉冲个数。Wherein, the result of the target quantity obtained by the calculation component is: a voltage pulse signal with a certain frequency and a duty ratio; The counted number of pulses.
其中,电信号A或光信号B采用二进制形式进行映射,在执行与所述权重部分的逻辑与运算前后,对映射后的二进制信号进行移位操作。Wherein, the electrical signal A or the optical signal B is mapped in a binary form, and before and after the logical AND operation with the weight part is performed, a shift operation is performed on the mapped binary signal.
其中,在电信号A或光信号B采用脉冲宽度调制形式映射时,脉冲信号的高电平时长由信号本身大小决定,其中高电平信号为连续信号,中间没有低电平信号隔断。Among them, when the electrical signal A or optical signal B is mapped in the form of pulse width modulation, the high-level duration of the pulse signal is determined by the size of the signal itself, and the high-level signal is a continuous signal without low-level signal interruption in the middle.
其中,在电信号A或光信号B采用随机数形式映射,信号被映射为服从一定概率释放高电平信号的随机脉冲序列,其中所述概率由信号本身大小决定。Wherein, the electrical signal A or the optical signal B is mapped in the form of random numbers, and the signal is mapped to a random pulse sequence that releases a high-level signal subject to a certain probability, wherein the probability is determined by the size of the signal itself.
其中,电信号A或光信号B为模拟电流信号,该信号直接跳过所述映射组件、互联组件,并接入到到所述计算组件的积分单元的输入端,此时所述计算组件的输出结果用于表征所述输入组件所输出的电信号A或光信号B。Wherein, the electrical signal A or the optical signal B is an analog current signal, and the signal directly skips the mapping component and the interconnection component, and is connected to the input end of the integration unit of the computing component. The output result is used to characterize the electrical signal A or optical signal B output by the input component.
根据本发明另一实施例,提供了一种运算方法,其特征在于,将乘法累加运算作为基本的运算模块,并将待进行的机器学习算法中所涉及的线性运算分解为以乘法累加运算为单位的运算范式,所涉及的非线性运算用线性运算的叠加来表示,并进一步把所涉及的线性运算单元分解为以乘法累加运算为单元的运算范式,并根据所述运算范式设定运 算组件的连接架构,所述连接架构中每一个乘法累加运算模块均1)采用一个激励信号输入组件,以及2)采用一个权重信号输入组件,还包括,采用映射组件,计算组件,以及输出组件,并且,所述映射组件,分别与所述激励信号输入组件和所述权重信号输入组件相连,对所述激励信号以及所述权重信号进行映射,将映射后的激励信号和映射后的权重信号进行逻辑与运算,将运算结果输入到所述计算组件,所述计算组件根据所述运算范式进行以乘加运算为单位的组合运算,得到所需要的目标量的结果。According to another embodiment of the present invention, a kind of operation method is provided, it is characterized in that, take multiplication and accumulation operation as the basic operation module, and decompose the linear operation involved in the machine learning algorithm to be carried out into multiplication and accumulation operation as The operation paradigm of the unit, the nonlinear operation involved is expressed by the superposition of linear operation, and the linear operation unit involved is further decomposed into the operation paradigm with multiplication and accumulation operation as the unit, and the operation paradigm is set according to the operation paradigm The connection structure of computing components, each multiplication and accumulation operation module in the connection structure 1) adopts an excitation signal input component, and 2) adopts a weight signal input component, and also includes, adopts a mapping component, a calculation component, and an output component , and the mapping component is connected to the excitation signal input component and the weight signal input component respectively, and maps the excitation signal and the weight signal, and maps the mapped excitation signal and the mapped weight signal Carry out logical AND operation, input the operation result to the calculation component, and the calculation component performs combined operation with multiplication and addition operation as the unit according to the operation paradigm, to obtain the required target quantity result.
根据本发明一个实施例,提供了一种运算单元,包括电信号输入组件和光信号生成组件中的至少一种,其特征在于:电信号输入组件,包括至少一个电信号A输入端,用于输入表达所述电信号A的电信号模拟量;光信号生成组件,用于针对目标视场进行拍摄,或接收另一光信号组件的光信号,并生成光信号B模拟量,所述光信号B包括具有一定位宽的图像数据,或者一段时域上的电压脉冲序列,或者是一定大小的模拟电流;还包括,映射组件,分别与所述电信号输入组件和/或光信号生成组件相连,并对所述电信号A和/或光信号B进行映射,生成分别代表所述电信号A和光信号B值的数字信号;互联组件,采用多个电和/或光通路及交联的结构,所述结构对应于对电信号A和/或光信号B经过映射组件后的数字信号所进行的运算;计算组件,与所述互联组件相连,接受来自所述互联组件的、与所述电信号A和/或光信号B相应的数字信号,并且所述计算组件包括电和/或光连接结构,所述电和/或光连接结构包括适当的开关单元,以及积分单元、复位单元、泄放单元以及脉冲释放单元,用于对所述互联组件输出的信号进行计算,并将结果输出。According to one embodiment of the present invention, there is provided a computing unit, including at least one of an electrical signal input component and an optical signal generating component, characterized in that: the electrical signal input component includes at least one electrical signal A input terminal for input An electrical signal analog quantity expressing the electrical signal A; an optical signal generation component, used for shooting a target field of view, or receiving an optical signal from another optical signal component, and generating an optical signal B analog quantity, the optical signal B It includes image data with a certain bit width, or a voltage pulse sequence in a time domain, or an analog current of a certain magnitude; it also includes a mapping component, which is respectively connected to the electrical signal input component and/or the optical signal generation component, and mapping the electrical signal A and/or optical signal B to generate digital signals respectively representing the values of the electrical signal A and optical signal B; the interconnection components adopt a plurality of electrical and/or optical paths and cross-linked structures, The structure corresponds to the calculation performed on the digital signal of the electrical signal A and/or optical signal B after the mapping component; the calculation component is connected to the interconnection component, and accepts the electrical signal from the interconnection component A and/or optical signal B are corresponding digital signals, and the computing components include electrical and/or optical connection structures, the electrical and/or optical connection structures include appropriate switching units, and integration units, reset units, discharge The unit and the pulse release unit are used to calculate the signal output by the interconnection component and output the result.
其中,本发明实施例的运算单元用于进行乘法累加(MAC)运算,其特征在于:所述互联组件采用机器学习神经网络上算法中的全连接结构,对来自所述映射组件的数据进行组合处理,还包括:1)所述映射组件的输出信号的个数设置为K,且K=N×M,其中M为被设置为机器学习算法中的全连接运算的激励输入信号的映射组件的输出信号的个数,N为互联组件连接到所述用于计算装置中的计算组件的个数;2)所述映射组件包括N×M个输出端,并且所输出的、该N×M矩阵中的每一行M个元素以“点对点”的方式,同M个被设置为机器学习算法中的全连接运算的激励输入信号的映射组件的输出信号进行组合,并进行逻辑与运算,以及3)所述互联组件的输出端与所述计算组件的一个对应行的输入端相连,使所述互联组件的输出信号作为对应行的所述计算组件的输入信号。Wherein, the computing unit of the embodiment of the present invention is used to perform a multiply-accumulate (MAC) operation, and it is characterized in that: the interconnection component adopts the fully connected structure in the algorithm on the machine learning neural network to combine the data from the mapping component The processing also includes: 1) the number of output signals of the mapping component is set to K, and K=N×M, where M is set as the mapping component of the excitation input signal of the fully connected operation in the machine learning algorithm The number of output signals, N is the number of interconnection components connected to the calculation components in the computing device; 2) the mapping component includes N×M output terminals, and the output, the N×M matrix M elements in each row in are combined in a "point-to-point" manner with the output signals of M mapping components that are set as excitation input signals of fully connected operations in the machine learning algorithm, and logical AND operations are performed, and 3) The output terminal of the interconnection component is connected to the input terminal of a corresponding row of the calculation components, so that the output signal of the interconnection component is used as the input signal of the calculation component of the corresponding row.
其中,本发明实施例的运算单元用于进行乘加(MAC)运算,其特征在于:所述互联组件采用机器学习算法中的卷积连接,对来自所述映射组件的数据进行组合处理,并且: 1)多个映射组件具有相应的输出端,输出相应的输出信号,并且映射组件的输出信号个数为Z,Z个信号所关联的映射组件被分为两组,其中一组为Q个,另一组设为P个,且Z=Q+P;以及,所述Q个映射组件的输出信号为所述卷积运算的激励输入信号,P个映射组件的输出信号为卷积运算的权重输入信号;2)所述Q个映射组件的输出信号排列成一个√Q×√Q的矩阵(√Q为整数),将所述P个映射组件的输出信号排列为√P×√P的矩阵(√P为整数,√P≤√Q);令√P×√P的矩阵与√Q×√Q的矩阵中尺寸为√P×√P的子矩阵按照矩阵中所有元素一一对应点乘的方式进行逻辑与运算,所有运算在时间域上都是同时并行进行的;并且进行的所有运算当中相邻两者之间的横向或纵向间隔为步进S,所述S为非负整数,并且4)所述互联组件包括输出端,所输出的信号被作为所述计算组件的输入信号。Wherein, the computing unit of the embodiment of the present invention is used to perform a multiply-add (MAC) operation, and it is characterized in that: the interconnection component adopts the convolution connection in the machine learning algorithm to combine the data from the mapping component, and : 1) A plurality of mapping components have corresponding output terminals and output corresponding output signals, and the number of output signals of the mapping components is Z, and the mapping components associated with the Z signals are divided into two groups, one of which is Q, The other group is set to P, and Z=Q+P; and, the output signals of the Q mapping components are the excitation input signals of the convolution operation, and the output signals of the P mapping components are the weights of the convolution operation Input signal; 2) the output signals of the Q mapping components are arranged into a matrix of √Q×√Q (√Q is an integer), and the output signals of the P mapping components are arranged into a matrix of √P×√P (√P is an integer, √P≤√Q); Let the matrix of √P×√P and the sub-matrix of √P×√P in the matrix of √Q×√Q be multiplied according to the one-to-one correspondence of all elements in the matrix All operations are performed in parallel in the time domain at the same time; and the horizontal or vertical interval between adjacent two in all operations is step S, and the S is a non-negative integer, And 4) the interconnection component includes an output terminal, and the output signal is used as an input signal of the calculation component.
其中,本发明实施例的运算单元用于用于进行M个输入信号在M个n比特权重单元条件下的乘加运算,其特征在于:1)所述电信号输入组件和光信号拍摄组件的总数为M,所述M个输入端分别用于输入所期望的M个激励输入信号,2)所述M个权重输入端,所输入量分别用于表示n比特的权重单元;并且,将所述M个二进制激励信号分别与M个n比特的权重单元进行逻辑与运算,之后得到所述M个二进制激励信号受所述N个脉冲宽度调制波形调制后的值,并将所述这些值输入到所述、由线性神经元计数电路构成的计算组件中,得到对M个激励输入信号与M个n比特权重单元相乘并累加的乘加运算结果。Wherein, the computing unit of the embodiment of the present invention is used to perform multiplication and addition operations of M input signals under the condition of M n-bit weight units, and is characterized in that: 1) the total number of the electrical signal input components and the optical signal capture components M, the M input terminals are respectively used to input the desired M excitation input signals, 2) the M weight input terminals, the input quantities are respectively used to represent n-bit weight units; and, the M binary excitation signals are logically ANDed with M n-bit weight units respectively, and then the values of the M binary excitation signals modulated by the N pulse width modulation waveforms are obtained, and these values are input into In the calculation component composed of a linear neuron counting circuit, the multiplication and addition operation results obtained by multiplying and accumulating M excitation input signals by M n-bit weight units are obtained.
其中,所述映射组件的类型为二进制型、脉冲宽度调制型、模数转换型、随机数型中的一种。Wherein, the type of the mapping component is one of binary type, pulse width modulation type, analog-to-digital conversion type, and random number type.
其中,电信号输入组件可以为SOC、ASIC、FPGA类芯片的通用型输入输出接口的一种,或者为寄存器文件(Register File)、SRAM、DRAM、RRAM类存储单元的一种,或者为用于对模拟电流信号或模拟电压信号进行输入的其它组件。Wherein, the electrical signal input component can be a kind of general-purpose input and output interface of SOC, ASIC, FPGA class chip, or be a kind of of register file (Register File), SRAM, DRAM, RRAM class storage unit, or be used for Other components that take an input of an analog current signal or an analog voltage signal.
其中,光信号生成组件为图像传感器芯片,包括CIS芯片、CCD芯片、单光子雪崩探测二极管阵列等,用于感知光学信号,将光学信号转换为具有一定位宽的二进制图像信号或者一段时域脉冲序列,或者模拟电流信号。Among them, the optical signal generating component is an image sensor chip, including a CIS chip, a CCD chip, a single photon avalanche detection diode array, etc., which are used to perceive the optical signal and convert the optical signal into a binary image signal with a certain bit width or a period of time domain pulse. sequence, or an analog current signal.
其中,映射组件用于对输入组件的输出信号进行如本发明实施例所述映射,当输入组件的输出信号为二进制信号时,映射组件等同于输入组件,即此时不对输入组件作任何变化,映射组件被输入组件所替代;当输入组件的输出信号为模拟电压信号时,此时映射组件可以为模数转换装置,映射组件连接到后面的互联组件,或者映射组件被输入 组件所替代,这时映射组件的输出信号即为输入组件的输出信号,并且该输出信号被直接用作所述用于计算装置的计算组件的输入信号;当输入组件的输出信号为模拟电流信号时,此时模拟电流信号跳过映射组件、互联组件,接入到如本发明实施例所述运算单元中的计算组件的积分单元,计算组件的输出即能够表征输入的模拟电流信号大小。Wherein, the mapping component is used to map the output signal of the input component as described in the embodiment of the present invention. When the output signal of the input component is a binary signal, the mapping component is equivalent to the input component, that is, no change is made to the input component at this time. The mapping component is replaced by the input component; when the output signal of the input component is an analog voltage signal, the mapping component can be an analog-to-digital conversion device at this time, and the mapping component is connected to the subsequent interconnection component, or the mapping component is input component, at this time the output signal of the mapping component is the output signal of the input component, and the output signal is directly used as the input signal of the computing component for the computing device; when the output signal of the input component is an analog current signal At this time, the analog current signal skips the mapping component and the interconnection component, and is connected to the integration unit of the computing component in the computing unit according to the embodiment of the present invention, and the output of the computing component can represent the magnitude of the input analog current signal.
其中,映射组件可以为脉冲宽度调制波形发生装置,该装置针对输入组件的输出二进制信号的大小,输出与大小对应个数的高电平信号,该高电平信号为连续信号,中间没有低电平信号隔断;该装置输出信号的时长由输入组件的输出二进制信号的位宽所决定。Wherein, the mapping component can be a pulse width modulation waveform generating device, which outputs a number of high-level signals corresponding to the size of the output binary signal of the input component, and the high-level signal is a continuous signal without low voltage in the middle. Flat signal isolation; the duration of the output signal of the device is determined by the bit width of the output binary signal of the input component.
其中,所述映射组件为随机脉冲序列发生装置,所述随机脉冲序列发生装置包括线性反馈移位寄存器以及比较器,并且被设置为,当输入组件的输出信号为二进制信号时,该二进制信号作为线性反馈移位寄存器的初始值;以及在每个时钟周期内,线性反馈移位寄存器被设置为,其输出值与其初始值被输入到比较器进行比较,若初始值大于当前线性反馈移位寄存器的输出值,则比较器输出为1,反之比较器输出为0;进一步地,比较器被设置为输出一段高电平信号出现概率由输入组件的输出二进制信号决定的随机脉冲序列,其中脉冲序列的长度由输入组件的输出二进制信号的位宽决定。Wherein, the mapping component is a random pulse sequence generating device, the random pulse sequence generating device includes a linear feedback shift register and a comparator, and is set to, when the output signal of the input component is a binary signal, the binary signal is used as The initial value of the linear feedback shift register; and in each clock cycle, the linear feedback shift register is set to, its output value is compared with its initial value input to the comparator, if the initial value is greater than the current linear feedback shift register The output value of the comparator is 1, otherwise the output of the comparator is 0; further, the comparator is set to output a random pulse sequence whose occurrence probability of a high-level signal is determined by the output binary signal of the input component, wherein the pulse sequence The length of is determined by the bit width of the output binary signal of the input component.
其中,计算组件的开关单元被设置为接受互联组件的输出信号,根据互联组件的输出信号输出由开关状态确定的信号;复位单元被设置为负责计算组件的复位功能,即复位信号工作时,计算组件的输出信号为零,积分单元用于对开关单元的输出信号进行积分操作,当积分信号达到积分单元的阈值时,脉冲释放单元开始释放脉冲,与此同时泄放单元对积分信号进行泄放。Among them, the switch unit of the calculation component is set to accept the output signal of the interconnection component, and output a signal determined by the switch state according to the output signal of the interconnection component; the reset unit is set to be responsible for the reset function of the calculation component, that is, when the reset signal works, the calculation The output signal of the component is zero, and the integration unit is used to integrate the output signal of the switch unit. When the integration signal reaches the threshold of the integration unit, the pulse release unit starts to release the pulse, and at the same time, the discharge unit discharges the integration signal .
其中,计算组件表现出的特性为输出信号与输入信号呈线性关系;当输入信号为二进制信号时,二进制信号为“1”,则开关打开或关闭,二进制信号为“0”则开关状态为关闭或打开,并且上述两个开关状态为相反状态,以及所述开关打开的个数与输出信号呈线性关系,这里所述输出信号为具有一定频率、占空比的脉冲信号。Among them, the computing component exhibits a linear relationship between the output signal and the input signal; when the input signal is a binary signal, the binary signal is "1", the switch is on or off, and the binary signal is "0", the switch state is off Or open, and the above two switch states are opposite states, and the number of the switches opened is linearly related to the output signal, where the output signal is a pulse signal with a certain frequency and duty cycle.
其中,计算组件表现出的特性为输出信号与输入信号呈线性关系,并且当输入信号为模拟电流信号时,输入信号不需要经过开关单元,而是直接进入积分单元,并且输出信号的脉冲频率与输入模拟电流信号的大小呈线性关系。Among them, the characteristic of the calculation component is that the output signal has a linear relationship with the input signal, and when the input signal is an analog current signal, the input signal does not need to pass through the switch unit, but directly enters the integration unit, and the pulse frequency of the output signal is the same as The magnitude of the input analog current signal has a linear relationship.
其中,输出组件为输入信号为时钟信号和/或复位信号的数字计数器,或者为所述计算组件的脉冲释放单元。Wherein, the output component is a digital counter whose input signal is a clock signal and/or a reset signal, or a pulse releasing unit of the calculation component.
其中,所述运算单元满足计算组件输入信号的高电平(低电平)个数与在单位时间内 输出组件的计数单元所计得的脉冲个数呈线性关系,所述计算组件也可以不需要囊括如本发明实施例所述运算单元中计算组件包含的全部单元,并且也适用于如本发明实施例所述运算单元。Wherein, the operation unit satisfies the number of high levels (low levels) of the input signal of the calculation component and the The number of pulses counted by the counting unit of the output component has a linear relationship, and the calculation component does not need to include all the units contained in the calculation component in the calculation unit as described in the embodiment of the present invention, and is also applicable to the implementation of the present invention Example of the arithmetic unit.
本发明的技术方案如下:Technical scheme of the present invention is as follows:
一种运算方法和运算装置,所述运算方法分为两个步骤:步骤一,针对信号X和信号Y作映射,以得到信号X和信号Y的映射形式;步骤二,针对信号X和信号Y的映射形式进行计算,得到信号Q,其中信号Q被设置为信号X和信号Y的代数运算结果。An operation method and an operation device, the operation method is divided into two steps: step 1, mapping signal X and signal Y to obtain the mapping form of signal X and signal Y; step 2, targeting signal X and signal Y The mapping form of is calculated to obtain the signal Q, wherein the signal Q is set as the algebraic operation result of the signal X and the signal Y.
根据一个实施例,信号X按照二进制形式映射、信号Y按照脉冲宽度调制形式映射。According to one embodiment, signal X is mapped in binary form and signal Y is mapped in pulse width modulated form.
根据一个实施例,将信号X和信号Y映射后的信号进行逻辑与运算。According to one embodiment, a logical AND operation is performed on the mapped signals of the signal X and the signal Y.
根据一个实施例,信号Q的形式为一种为具有一定频率、占空比的脉冲信号。According to an embodiment, the signal Q is in the form of a pulse signal with a certain frequency and a duty cycle.
根据一个实施例,在信号X采用二进制形式进行映射时,所述运算还包括逻辑与运算执行前后的二进制信号本身的移位操作。According to an embodiment, when the signal X is mapped in a binary form, the operation further includes a shift operation of the binary signal itself before and after the execution of the logical AND operation.
根据一个实施例,在信号Y采用脉冲宽度调制形式映射时,脉冲信号的高电平时长由信号本身所决定,其中高电平信号为连续信号,中间没有低电平信号隔断。According to an embodiment, when the signal Y is mapped in the form of pulse width modulation, the high-level duration of the pulse signal is determined by the signal itself, wherein the high-level signal is a continuous signal, and there is no interruption of the low-level signal in the middle.
根据一个实施例,信号X和信号Y进行运算后的信号经过一种映射,得到信号Q。According to an embodiment, the signal after the operation of the signal X and the signal Y is subjected to a mapping to obtain the signal Q.
所述运算装置包括:输入组件,该组件用于信号X或信号Y的输入,该组件连接到所述用于计算的装置中的映射组件;拍摄组件,针对目标视场进行拍摄和量化,以得到具有一定位宽的图像数据、一段时域脉冲序列或者一定大小的模拟电流作为信号X或信号Y,当拍摄组件输出的是具有一定位宽的图像数据或者一段时域脉冲序列时,该组件连接到所述用于计算的装置中的映射组件,当拍摄组件输出的是一定大小的模拟电流信号时,令该模拟电流信号跳过映射组件以及互联组件,接入到计算组件的积分单元,此时计算组件的输出直接可以表征拍摄组件输出的模拟电流信号的大小;映射组件,用于对信号X或信号Y的映射,以对信号X或信号Y按照如本发明实施例所述的映射形式进行映射;互联组件,用于对信号X、信号Y经过映射组件后的信号进行如本发明实施例所述的运算。当存在多个映射组件时,须要将这些映射组件按照某种组合方案进行如本发明实施例所述的运算,该组件连接到所述用于计算装置中的计算组件;计算组件,该组件用于对互联组件输出的信号进行计算。该组件一般包括开关单元、积分单元、复位单元、泄放单元以及脉冲释放单元,该组件连接到所述用于计算装置的输出组件;The computing device includes: an input component, which is used for the input of signal X or signal Y, and which is connected to the mapping component in the device for calculation; a shooting component, which is used for shooting and quantifying the target field of view, to Obtain image data with a certain positioning width, a period of time domain pulse sequence or an analog current of a certain magnitude as signal X or signal Y, when the photographing component outputs image data with a certain positioning width or a period of time domain pulse sequence, the component Connected to the mapping component in the device for calculation, when the shooting component outputs an analog current signal of a certain size, make the analog current signal skip the mapping component and the interconnection component, and connect to the integration unit of the calculation component, At this time, the output of the calculation component can directly represent the magnitude of the analog current signal output by the shooting component; the mapping component is used for mapping the signal X or the signal Y, so as to map the signal X or the signal Y as described in the embodiment of the present invention The form is mapped; the interconnection component is used to perform the operation as described in the embodiment of the present invention on the signals of the signal X and the signal Y after passing through the mapping component. When there are multiple mapping components, these mapping components need to be operated according to a certain combination scheme as described in the embodiment of the present invention, and the components are connected to the computing components used in the computing device; the computing components, the components are used It is used to perform calculations on signals output by interconnected components. The assembly generally includes a switch unit, an integration unit, a reset unit, a discharge unit and a pulse release unit, the assembly is connected to the output assembly for the computing device;
输出组件,该组件用于对计算组件的输出信号进行采集,一般由计数单元构成。The output component is used to collect the output signal of the calculation component, and generally consists of a counting unit.
根据一个实施例,信号X的输入组件为寄存器文件,信号Y的输入组件为ASIC芯 片的I/O。According to one embodiment, the input component of signal X is a register file, and the input component of signal Y is an ASIC core slice I/O.
根据一个实施例,信号X作为一个激励输入单元,其位宽为4比特,信号X的映射组件直接由信号X的输入组件所取代,即信号X的映射组件的输出信号为信号X的输入组件的输出信号;信号Y作为一个权重单元,其位宽为4比特,信号Y的映射组件为脉冲宽度调制波形发生器,该4比特的权重单元由脉冲宽度调制波形发生器所生成的脉冲宽度调制波形构成。According to one embodiment, the signal X is used as an excitation input unit with a bit width of 4 bits, and the mapping component of the signal X is directly replaced by the input component of the signal X, that is, the output signal of the mapping component of the signal X is the input component of the signal X The output signal of the signal; the signal Y is used as a weight unit, and its bit width is 4 bits. The mapping component of the signal Y is a pulse width modulation waveform generator, and the 4-bit weight unit is generated by the pulse width modulation waveform generator. Wave form.
根据一个实施例,信号X为1024个4bit的寄存器文件,作为互联组件的激励信号输入;信号Y为9个脉冲宽度调制发生器,作为互联组件的权重信号输入,互联方案采用的是机器学习算法中的卷积连接方式,1024个信号X的映射组件的输出信号被排列成一个32×32的矩阵,9个信号Y的映射组件的输出信号被排列为3×3的矩阵。通过在1024个信号X的映射组件的输出信号所排列的矩阵周围进行一圈“补零”的操作,将9个信号Y的映射组件的输出信号所排列的3×3的矩阵,对信号X排列的矩阵采取所有元素全覆盖、规模为3×3、步进为1的“点对点”组合方案进行逻辑与运算,运算得到的1024×9个输出信号作为所述用于计算的装置的计算组件的输入信号。According to one embodiment, the signal X is 1024 4-bit register files, which are input as the excitation signal of the interconnection components; the signal Y is 9 pulse width modulation generators, which are input as the weight signals of the interconnection components, and the interconnection scheme uses a machine learning algorithm In the convolution connection method, the output signals of 1024 mapping components of signal X are arranged into a 32×32 matrix, and the output signals of 9 mapping components of signal Y are arranged into a 3×3 matrix. By performing a "zero padding" operation around the matrix in which the output signals of the 1024 signal X mapping components are arranged, the 3×3 matrix in which the output signals of the 9 signal Y mapping components are arranged, the signal X The arrayed matrix adopts a "point-to-point" combination scheme with full coverage of all elements, a scale of 3×3, and a step of 1 for logical AND operations, and the 1024×9 output signals obtained from the operations are used as the computing components of the device for computing input signal.
根据一个实施例,信号X为9个4bit的寄存器文件,作为互联组件的激励信号输入;信号Y为36个脉冲宽度调制发生器,作为互联组件的权重信号输入,互联方案采用的是机器学习算法中的全连接方式,9个信号X的映射组件的输出信号与9个信号Y的映射组件的输出信号进行逻辑与运算,由于信号Y的映射组件的输出信号的个数为36,因此在同一时间内,所述逻辑与运算需要并行执行4次,4次逻辑与运算得到的36个输出信号作为所述用于计算装置的计算组件的输入信号。According to one embodiment, the signal X is nine 4-bit register files, which are input as the excitation signal of the interconnection components; the signal Y is 36 pulse width modulation generators, which are input as the weight signals of the interconnection components, and the interconnection scheme uses a machine learning algorithm In the fully connected mode, the output signals of the nine signal X mapping components and the output signals of the nine signal Y mapping components are logically ANDed. Since the number of output signals of the signal Y mapping components is 36, the same Within a time period, the logical AND operation needs to be executed 4 times in parallel, and 36 output signals obtained from the 4 logical AND operations are used as input signals of the computing component of the computing device.
根据一个实施例,单个计算组件由9个开关单元,一个积分单元,一个复位单元,一个泄放单元以及一个脉冲释放单元构成。在此实施例中,一共由1024个计算组件,即输入信号有1024×9个,而输入信号即为互联组件的1024×9个输出信号。1024个计算组件输出信号为1024种具有各自频率、占空比的脉冲信号。According to an embodiment, a single computing component is composed of 9 switching units, an integrating unit, a reset unit, a bleeding unit and a pulse releasing unit. In this embodiment, there are 1024 computing components in total, that is, there are 1024×9 input signals, and the input signals are 1024×9 output signals of the interconnection components. The output signals of 1024 calculation components are 1024 kinds of pulse signals with their own frequencies and duty ratios.
根据一个实施例,1024个计算组件输出的1024种具有各自频率、占空比的脉冲信号输入到1024个输出组件,输出组件为数字计数器,计数器在脉冲宽度调制波形发生器产生波形的时间内,对输入的具有一定频率、占空比的脉冲信号进行计数。1024个数字计数器计得的数据即为信号X和信号Y进行卷积运算的结果。According to one embodiment, 1024 kinds of pulse signals with respective frequencies and duty ratios output by 1024 computing components are input to 1024 output components, and the output components are digital counters, and the counters are within the time period during which the pulse width modulation waveform generator generates waveforms, Count the input pulse signal with certain frequency and duty cycle. The data counted by 1024 digital counters is the result of convolution operation between signal X and signal Y.
本发明提供了一种用于计算的方法和装置,属于集成电路领域,可实现神经网络运算中的全连接、卷积等运算。本发明所提供的用于计算的ASIC芯片,在面积、功耗上相较 传统卷积神经网络加速芯片采用的乘法器与累加器来实现MAC运算具有很大的优势。The invention provides a calculation method and device, which belong to the field of integrated circuits, and can realize operations such as full connection and convolution in neural network operations. The ASIC chip used for computing provided by the present invention compares in area and power consumption The multipliers and accumulators used by traditional convolutional neural network acceleration chips to implement MAC operations have great advantages.
附图说明Description of drawings
图1是本发明实施例中用于计算的装置框图;Fig. 1 is a block diagram of a device for computing in an embodiment of the present invention;
图2是本发明中对输入组件的输出信号进行脉冲宽度调制形式映射的效果图;Fig. 2 is an effect diagram of performing pulse width modulation form mapping on the output signal of the input component in the present invention;
图3是本发明实施例1中用于计算组件的一种电路框图;Fig. 3 is a kind of circuit block diagram that is used for computing component in the embodiment 1 of the present invention;
图4是本发明实施例2中用于计算组件的一种电路框图;Fig. 4 is a kind of circuit block diagram that is used for computing component in embodiment 2 of the present invention;
图5是本发明实施例3中用于实现一段时域脉冲序列输出的拍摄组件的电路框图;Fig. 5 is a circuit block diagram of a photographing component for realizing output of a time-domain pulse sequence in Embodiment 3 of the present invention;
图6是本发明实施例4中用于实现互联组件的权重信号输入与激励信号输入之间全连接连接方式的示意图。Fig. 6 is a schematic diagram of a full-connection connection method used to realize the weight signal input and the excitation signal input of interconnection components in Embodiment 4 of the present invention.
具体实施方式Detailed ways
下面将参照附图来描述根据本发明实施例的计算方法和装置。在附图中,相同的参考标号自始至终表示相同的元件。应当理解:这里描述的实施例仅仅是说明性的,而不应被解释为限制本发明的范围。The calculation method and device according to the embodiments of the present invention will be described below with reference to the accompanying drawings. In the drawings, like reference numerals denote like elements throughout. It should be understood that the embodiments described herein are illustrative only and should not be construed as limiting the scope of the present invention.
实施例1Example 1
本实施例中,提供上述用于计算的装置中的计算组件的具体实现电路框图,如图3所示。In this embodiment, a specific implementation circuit block diagram of the calculation component in the above-mentioned apparatus for calculation is provided, as shown in FIG. 3 .
该电路由9个PMOS管作为开关单元,其中PMOS管的栅端连接计算装置中互联组件的输出信号,9个PMOS管的源端加固定偏压,一般设为IDC,9个PMOS管的漏端连接到一个运算放大器的反相输入端,运放的正相输入端接参考电压Vref,运放的输出端连接到一个PMOS管的栅端,该PMOS管的源端接运放的反相输入端,这里运算放大器的作用为将该PMOS管的源端的电压钳住不变,即起到一个钳位作用。该PMOS管的漏端接一个Cascode电流镜一条支路上NMOS管的漏端,通过电流镜的拷贝效应,将该支路上的电流拷贝到用作积分单元的电容上极板上,电容下极板接VSS。电容的上极板接一个用于复位单元的NMOS复位管的漏端,该管的源端接VSS,该管的栅端接片外给的复位信号,当复位信号为高时,该计算组件进行复位操作。用于积分泄放单元的NMOS泄放管的栅极接一个稳定电容的上极板,源端接VSS,用于抽取积分单元电容的上极板电荷的PMOS抽取管的源端接VDD,漏端与NMOS管泄放管的漏端相连。NMOS泄放管的漏端接两级反相器,其中第一级反相器输出的接PMOS抽取管的栅端,第二级反相器输出接稳定电容的上极板。此外,积分电容的上极板另接一个二级反相器,该二级反相器用作为脉冲释放单元,二级反相器的输出为该计算组件的实施例的输出,从而实现脉冲释放单元的脉冲释放功能。 The circuit consists of 9 PMOS transistors as switching units, wherein the gate terminals of the PMOS transistors are connected to the output signals of the interconnection components in the computing device, the source terminals of the 9 PMOS transistors are fixedly biased, generally set as IDC, and the drains of the 9 PMOS transistors are The terminal is connected to the inverting input terminal of an operational amplifier, the non-inverting input terminal of the operational amplifier is connected to the reference voltage V ref , the output terminal of the operational amplifier is connected to the gate terminal of a PMOS transistor, and the source terminal of the PMOS transistor is connected to the negative terminal of the operational amplifier. The phase input terminal, here the function of the operational amplifier is to clamp the voltage of the source terminal of the PMOS tube unchanged, that is, to play a clamping role. The drain terminal of the PMOS tube is connected to the drain terminal of the NMOS tube on a branch of a Cascode current mirror. Through the copy effect of the current mirror, the current on the branch is copied to the upper plate of the capacitor used as the integration unit, and the lower plate of the capacitor Connect to VSS. The upper plate of the capacitor is connected to the drain end of an NMOS reset transistor used to reset the unit, the source end of the transistor is connected to VSS, and the gate end of the transistor is connected to the external reset signal. When the reset signal is high, the computing component Perform a reset operation. The gate of the NMOS discharge tube used for the integral discharge unit is connected to the upper plate of a stable capacitor, the source terminal is connected to VSS, the source terminal of the PMOS extraction tube used to extract the charge of the upper plate of the integral unit capacitor is connected to VDD, and the drain The end is connected with the drain end of the NMOS transistor discharge pipe. The drain terminal of the NMOS discharge tube is connected to two-stage inverters, wherein the output of the first-stage inverter is connected to the gate terminal of the PMOS extraction tube, and the output of the second-stage inverter is connected to the upper plate of the stabilizing capacitor. In addition, the upper plate of the integrating capacitor is additionally connected with a second-level inverter, which is used as a pulse release unit, and the output of the second-level inverter is the output of the embodiment of the calculation component, thereby realizing the pulse release unit The pulse release function.
实施例2Example 2
本实施例中,提供上述用于计算的装置中的计算组件的具体实现电路框图,如图4所示。本实施例只替换上述实施例1中Cascode电流镜、运算放大器、运放输出管、9个PMOS开关管等用于给积分单元的积分电容上极板充电的电路部分,保留实施例1中其他的电路组成部分。图中电流镜有9个拷贝支路,每一条支路上由一个传输门作为开关单元,控制电流导通与否,PMOS管的漏端接到积分电容上极板,对电容进行充电操作。其余电路部分的结构与功能与实施例1中的相同。In this embodiment, a specific implementation circuit block diagram of the calculation component in the above-mentioned apparatus for calculation is provided, as shown in FIG. 4 . This embodiment only replaces the Cascode current mirror, operational amplifier, operational amplifier output tube, 9 PMOS switch tubes and other circuit parts used to charge the upper plate of the integral capacitor of the integration unit in the above-mentioned embodiment 1, and retains other parts in the embodiment 1. components of the circuit. The current mirror in the figure has 9 copy branches, and each branch has a transmission gate as a switch unit to control whether the current is turned on or not. The drain terminal of the PMOS transistor is connected to the upper plate of the integrating capacitor to charge the capacitor. The structures and functions of the remaining circuit parts are the same as those in Embodiment 1.
实施例3Example 3
本实施例中,提供上述输出一段时域脉冲序列的拍摄组件的具体电路实现方案。如图5所示。当Gate信号为高电平时,SPAD器件处于过压状态,Gate信号控制栅端的N1管的漏端电压为0V,当SPAD器件接收到光子后,器件发生雪崩效应,当Gate信号变为低电平时,N1管关闭,此时N1管被视为一个电阻,其漏端得到了一个分压V。当这个分压使得SPAD不满足过压工作条件时,N1管的漏端电压又从分压V变为0,以此在N1管的漏端形成了一个脉冲电压信号,该信号经过N2、P2、N3、P3构成的一个缓冲器之后,该N1管漏端的脉冲电压信号被转换成具有一定脉宽的方波信号S,该方波信号经过一个反相器与本身进行一个或非逻辑运算,得到一个脉冲宽度可调的方波信号OUT,即该电路的输出信号。OUT信号的脉冲宽度由P1管的栅端偏置电压Vm来控制。因此,当该电路工作在一定时钟频率下,可以根据SPAD器件感知的光子信号,输出一段时域上的脉冲序列。In this embodiment, a specific circuit implementation scheme of the above-mentioned photographing component that outputs a time-domain pulse sequence is provided. As shown in Figure 5. When the Gate signal is high, the SPAD device is in an overvoltage state, and the gate signal controls the drain voltage of the N1 tube at the gate terminal to 0V. When the SPAD device receives photons, the device has an avalanche effect. When the Gate signal becomes low , the N1 tube is turned off, and the N1 tube is regarded as a resistor at this time, and its drain terminal gets a divided voltage V. When the divided voltage makes the SPAD not meet the overvoltage working condition, the drain terminal voltage of the N1 tube changes from the divided voltage V to 0, so that a pulse voltage signal is formed at the drain terminal of the N1 tube, and the signal passes through N2, P2 After a buffer composed of , N3, and P3, the pulse voltage signal at the drain end of the N1 tube is converted into a square wave signal S with a certain pulse width, and the square wave signal performs an OR logic operation with itself through an inverter, Get a square wave signal OUT with adjustable pulse width, which is the output signal of the circuit. The pulse width of the OUT signal is controlled by the gate bias voltage Vm of the P1 transistor. Therefore, when the circuit works at a certain clock frequency, it can output a pulse sequence in the time domain according to the photon signal sensed by the SPAD device.
实施例4Example 4
本实施例中,提供上述作为互联组件的激励信号输入的9个4bit的寄存器文件的信号Ai与作为互联组件的权重信号输入的36个脉冲宽度调制发生器的信号Wj,k的全连接运算连接方式。9个信号Ai与9个信号Wj,k进行逻辑与运算,得到9个输出信号,由于一共有36个信号Wj,k,即意味着有4×9的逻辑与运算的输出信号,该36个输出信号即为后续计算组件的输入信号,需要4个计算组件,每个计算组件设有9个信号输入端口,以此来接入互联组件的输入信号。In this embodiment, the signal Ai of the above-mentioned 9 4-bit register files input as the excitation signal of the interconnection component and the signal Wj, k of the 36 pulse width modulation generators input as the weight signal of the interconnection component are provided. Way. 9 signals Ai and 9 signals Wj, k perform logical AND operations to obtain 9 output signals. Since there are 36 signals Wj, k in total, it means that there are 4×9 output signals of logical AND operations. The 36 The output signal is the input signal of the subsequent computing components. Four computing components are required, and each computing component is provided with 9 signal input ports to connect to the input signals of the interconnection components.
上述实施例中主要以开关单元为9个为示例针对电路的连接方式和信号输入输出过程进行了示例性说明。在实际应用中,在本发明实施例的具体电路结构中,开关单元可以根据实际应用场景的需要设置任意数量,在此不做限制。例如,在一个示例中,本发明实施例的电路中可以包括4个或5个开关单元;在另一个示例中,本发明实施例的电路 中可以包括8个开关单元。在上述示例中,电路的具体结构和开关单元之间的连接方式可以与上述实施例1-4中示出的示例类似,在此不再赘述。In the above-mentioned embodiment, the connection mode of the circuit and the process of signal input and output are exemplified mainly by taking the number of switch units as an example. In practical application, in the specific circuit structure of the embodiment of the present invention, any number of switch units can be set according to the needs of the actual application scene, and no limitation is set here. For example, in one example, the circuit of the embodiment of the present invention may include 4 or 5 switch units; in another example, the circuit of the embodiment of the present invention Can include 8 switch units. In the above example, the specific structure of the circuit and the connection manner between the switch units may be similar to the examples shown in the above embodiments 1-4, and will not be repeated here.
当然,上述的具体实施例仅是例子而非限制,且本领域技术人员可以根据本发明的构思从上述分开描述的各个实施例中合并和组合一些步骤和装置来实现本发明的效果,这种合并和组合而成的实施例也被包括在本发明中,在此不一一描述这种合并和组合。Of course, the above specific embodiments are only examples and not limiting, and those skilled in the art can combine and combine some steps and devices from the above separately described embodiments according to the concept of the present invention to realize the effect of the present invention. Embodiments formed by combination and combination are also included in the present invention, and such combination and combination will not be described here one by one.
注意,在本发明中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本发明的各个实施例必须具备的。另外,上述发明的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本发明为必须采用上述具体的细节来实现。Note that the advantages, advantages, effects, etc. mentioned in the present invention are only examples rather than limitations, and it cannot be considered that these advantages, advantages, effects, etc. must be possessed by each embodiment of the present invention. In addition, the specific details of the above-mentioned invention are only for the purpose of illustration and understanding, rather than limitation, and the above-mentioned details do not limit the present invention to be realized by using the above-mentioned specific details.
本发明中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的,可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇,指“包括但不限于”,且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”,且可与其互换使用,除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”,且可与其互换使用。The block diagrams of devices, devices, equipment, and systems involved in the present invention are only illustrative examples and are not intended to require or imply that they must be connected, arranged, and configured in the manner shown in the block diagrams. As will be appreciated by those skilled in the art, these devices, devices, devices, systems may be connected, arranged, configured in any manner. Words such as "including", "comprising", "having" and the like are open-ended words meaning "including but not limited to" and may be used interchangeably therewith. As used herein, the words "or" and "and" refer to the word "and/or" and are used interchangeably therewith, unless the context clearly dictates otherwise. As used herein, the word "such as" refers to the phrase "such as but not limited to" and can be used interchangeably therewith.
本发明中的步骤流程图以及以上方法描述仅作为例示性的例子并且不意图要求或暗示必须按照给出的顺序进行各个实施例的步骤。如本领域技术人员将认识到的,可以按任意顺序进行以上实施例中的步骤的顺序。诸如“其后”、“然后”、“接下来”等等的词语不意图限制步骤的顺序;这些词语仅用于引导读者通读这些方法的描述。此外,例如使用冠词“一个”、“一”或者“该”对于单数的要素的任何引用不被解释为将该要素限制为单数。The flow charts of the steps in the present invention and the above method descriptions are only illustrative examples and are not intended to require or imply that the steps of the various embodiments must be performed in the order presented. As will be appreciated by those skilled in the art, the order of the steps in the above embodiments may be performed in any order. Words such as "thereafter," "then," "next," etc. are not intended to limit the order of the steps; these words are simply used to guide the reader through the description of the methods. In addition, any reference to an element in the singular, eg, using the articles "a," "an," or "the," is not to be construed as limiting that element to the singular.
另外,本文中的各个实施例中的步骤和装置并非仅限定于某个实施例中实行,事实上,可以根据本发明的概念来结合本文中的各个实施例中相关的部分步骤和部分装置以构思新的实施例,而这些新的实施例也包括在本发明的范围内。In addition, the steps and devices in the various embodiments herein are not limited to implementation in a certain embodiment. In fact, some steps and some devices related to the various embodiments herein can be combined according to the concept of the present invention. New embodiments are contemplated and are within the scope of the invention.
以上所述的方法的各个操作可以通过能够进行相应的功能的任何适当的手段而进行。该手段可以包括各种硬件和/或软件组件和/或模块,包括但不限于电路、专用集成电路(ASIC)或处理器。Each operation of the method described above may be performed by any suitable means capable of performing the corresponding function. The means may comprise various hardware and/or software components and/or modules, including but not limited to circuits, application specific integrated circuits (ASICs) or processors.
可以利用被设计用于进行在此所述的功能的通用处理器、数字信号处理器(DSP)、ASIC、现场可编程门阵列信号(FPGA)或其他可编程逻辑器件(PLD)、离散门或晶体管逻辑、离散的硬件组件或者其任意组合而实现或进行所述的各个例示的逻辑块、模块 和电路。通用处理器可以是微处理器,但是作为替换,该处理器可以是任何商业上可获得的处理器、控制器、微控制器或状态机。处理器还可以实现为计算设备的组合,例如DSP和微处理器的组合,多个微处理器、与DSP核协作的一个或多个微处理器或任何其他这样的配置。A general purpose processor, digital signal processor (DSP), ASIC, field programmable gate array signal (FPGA) or other programmable logic device (PLD), discrete gate or Transistor logic, discrete hardware components, or any combination thereof to implement or perform each of the illustrated logic blocks, modules and circuits. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any commercially available processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, eg, a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors cooperating with a DSP core, or any other such configuration.
结合本发明描述的方法或算法的步骤可以直接嵌入在硬件中、处理器执行的软件模块中或者这两种的组合中。软件模块可以存在于任何形式的有形存储介质中。可以使用的存储介质的一些例子包括随机存取存储器(RAM)、只读存储器(ROM)、快闪存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、可移动盘、CD-ROM等。存储介质可以耦接到处理器以便该处理器可以从该存储介质读取信息以及向该存储介质写信息。在替换方式中,存储介质可以与处理器是整体的。软件模块可以是单个指令或者许多指令,并且可以分布在几个不同的代码段上、不同的程序之间以及跨过多个存储介质。The steps of a method or algorithm described in connection with the present invention may be embedded directly in hardware, in a software module executed by a processor, or in a combination of both. A software module may reside in any form of tangible storage medium. Some examples of storage media that may be used include random access memory (RAM), read only memory (ROM), flash memory, EPROM memory, EEPROM memory, registers, hard disks, removable disks, CD-ROMs, and the like. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. A software module may be a single instruction, or many instructions, and may be distributed over several different code segments, between different programs and across multiple storage media.
在此发明的方法包括用于实现所述的方法的一个或多个动作。方法和/或动作可以彼此互换而不脱离权利要求的范围。换句话说,除非指定了动作的具体顺序,否则可以修改具体动作的顺序和/或使用而不脱离权利要求的范围。Methods of the invention herein comprise one or more acts for carrying out the described methods. The methods and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of actions is specified, the order and/or use of specific actions may be modified without departing from the scope of the claims.
所述的功能可以按硬件、软件、固件或其任意组合而实现。如果以软件实现,功能可以作为一个或多个指令存储在切实的计算机可读介质上。存储介质可以是可以由计算机访问的任何可用的切实介质。通过例子而不是限制,这样的计算机可读介质可以包括RAM、ROM、EEPROM、CD-ROM或其他光盘存储、磁盘存储或其他磁存储器件或者可以用于携带或存储指令或数据结构形式的期望的程序代码并且可以由计算机访问的任何其他切实介质。如在此使用的,盘(disc)包括紧凑盘(CD)、激光盘、光盘、数字通用盘(DVD)、软盘和蓝光盘。The described functions may be realized by hardware, software, firmware or any combination thereof. If implemented in software, the functions may be stored as one or more instructions on a tangible computer-readable medium. A storage media may be any available tangible media that can be accessed by a computer. By way of example and not limitation, such computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices or may be used to carry or store instructions or data structures in the form of desired program code and any other tangible medium that can be accessed by a computer. As used herein, disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc.
因此,计算机程序产品可以进行在此给出的操作。例如,这样的计算机程序产品可以是具有有形存储(和/或编码)在其上的指令的计算机可读的有形介质,该指令可由一个或多个处理器执行以进行在此所述的操作。计算机程序产品可以包括包装的材料。Accordingly, a computer program product can perform the operations presented herein. For example, such a computer program product may be a computer-readable tangible medium having instructions tangibly stored (and/or encoded) thereon that are executable by one or more processors to perform the operations described herein. A computer program product may include packaging materials.
软件或指令也可以通过传输介质而传输。例如,可以使用诸如同轴电缆、光纤光缆、双绞线、数字订户线(DSL)或诸如红外、无线电或微波的无线技术的传输介质从网站、服务器或者其他远程源传输软件。Software or instructions may also be transmitted via transmission media. For example, software may be transmitted from a website, server, or other remote source using a transmission medium such as coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, or microwave.
此外,用于进行在此所述的方法和技术的模块和/或其他适当的手段可以在适当时由用户终端和/或基站下载和/或其他方式获得。例如,这样的设备可以耦接到服务器以促进用于进行在此所述的方法的手段的传送。或者,在此所述的各种方法可以经由存储部件 (例如RAM、ROM、诸如CD或软盘等的物理存储介质)提供,以便用户终端和/或基站可以在耦接到该设备或者向该设备提供存储部件时获得各种方法。此外,可以利用用于将在此所述的方法和技术提供给设备的任何其他适当的技术。In addition, modules and/or other appropriate means for performing the methods and techniques described herein may be downloaded and/or otherwise obtained by user terminals and/or base stations as appropriate. For example, such a device can be coupled to a server to facilitate the transfer of the means for performing the methods described herein. Alternatively, the various methods described herein can be stored via the (e.g. RAM, ROM, physical storage media such as CD or floppy disk etc.) are provided so that user terminals and/or base stations have access to various methods when coupled to or providing storage means to the device. In addition, any other suitable technique for providing the methods and techniques described herein to a device may be utilized.
其他例子和实现方式在本发明和所附权利要求的范围和精神内。例如,由于软件的本质,以上所述的功能可以使用由处理器、硬件、固件、硬连线或这些的任意的组合执行的软件实现。实现功能的特征也可以物理地位于各个位置,包括被分发以便功能的部分在不同的物理位置处实现。而且,如在此使用的,包括在权利要求中使用的,在以“至少一个”开始的项的列举中使用的“或”指示分离的列举,以便例如“A、B或C的至少一个”的列举意味着A或B或C,或AB或AC或BC,或ABC(即A和B和C)。此外,措辞“示例的”不意味着描述的例子是优选的或者比其他例子更好。Other examples and implementations are within the scope and spirit of the invention and the appended claims. For example, due to the nature of software, functions described above can be implemented using software executed by a processor, hardware, firmware, hardwiring or any combination of these. Features implementing functions may also be physically located at various locations, including being distributed so that portions of functions are implemented at different physical locations. Also, as used herein, including in the claims, the use of "or" in a listing of an item beginning with "at least one" indicates a separate listing such that for example "at least one of A, B, or C" The enumeration of means A or B or C, or AB or AC or BC, or ABC (ie, A and B and C). Furthermore, the word "exemplary" does not mean that the described examples are preferred or better than other examples.
可以不脱离由所附权利要求定义的教导的技术而进行对在此所述的技术的各种改变、替换和更改。此外,本发明的权利要求的范围不限于以上所述的处理、机器、制造、事件的组成、手段、方法和动作的具体方面。可以利用与在此所述的相应方面进行基本相同的功能或者实现基本相同的结果的当前存在的或者稍后要开发的处理、机器、制造、事件的组成、手段、方法或动作。因而,所附权利要求包括在其范围内的这样的处理、机器、制造、事件的组成、手段、方法或动作。Various changes, substitutions and alterations to the technology described herein can be made without departing from the teaching of the technology as defined by the appended claims. Furthermore, the scope of the claims of the present invention is not limited to the specific aspects of the process, machine, manufacture, composition of matter, means, method and acts described above. Any process, machine, manufacture, composition of matter, means, method or act, currently existing or later developed, which performs substantially the same function or achieves substantially the same result as the corresponding aspect described herein may be utilized. Accordingly, the appended claims include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or acts.
提供所发明的方面的以上描述以使本领域的任何技术人员能够做出或者使用本发明。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的,并且在此定义的一般原理可以应用于其他方面而不脱离本发明的范围。因此,本发明不希望被限制到在此示出的方面,而是按照与在此发明的原理和新颖的特征一致的最宽范围。The above description of aspects of the invention is provided to enable any person skilled in the art to make or use the invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features of the invention herein.
为了例示和描述的目的已经给出了以上描述。此外,此描述不希望将本发明的实施例限制到在此发明的形式。尽管以上已经讨论了多个示例方面和实施例,但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。 The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the invention to the form of the invention herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions and sub-combinations thereof.

Claims (28)

  1. 一种运算方法,其特征在于包括采用电信号输入组件和光信号生成组件的至少一种,以及映射组件,互联组件,计算组件,以及输出组件,并且:An operation method, characterized in that it includes at least one of an electrical signal input component and an optical signal generation component, as well as a mapping component, an interconnection component, a calculation component, and an output component, and:
    所述电信号输入组件,包括至少一个电信号A输入端,用于输入所述电信号A;The electrical signal input component includes at least one electrical signal A input terminal for inputting the electrical signal A;
    所述光信号生成组件,针对目标视场进行拍摄,或者接受其它光信号拍摄或生成组件的光信号,以生成光信号B;The optical signal generating component shoots the target field of view, or accepts the optical signal of other optical signal shooting or generating components to generate the optical signal B;
    所述映射组件,分别与所述电信号输入组件和/或光信号生成组件相连,接受所述电信号输入组件的电信号A和/或所述光信号生成组件的光信号B,并对电信号A和/或光信号B进行映射,生成分别代表所述电信号A和/或光信号B值的数字信号;The mapping component is respectively connected to the electrical signal input component and/or the optical signal generating component, accepts the electrical signal A of the electrical signal input component and/or the optical signal B of the optical signal generating component, and Signal A and/or optical signal B are mapped to generate digital signals respectively representing the values of electrical signal A and/or optical signal B;
    所述互联组件,从多个所述电信号A和/或光信号B经过映射组件输出的数字信号中选择部分信号作为激励输入部分,其余部分作为权重输入部分,将所述激励输入部分和至少部分权重输入部分进行交联,以构成一个乘法累加(MAC)架构,并针对机器学习的神经网络算法中的全连接结构取所需要的子集,生成对应于对电信号A、光信号B经过映射组件后的数字信号所要进行的运算,所述子集的输出包括了所需要的激励输入部分和权重输入部分的运算结果。The interconnection component selects part of the signals from the multiple digital signals output by the electrical signal A and/or optical signal B through the mapping component as the excitation input part, and the rest as the weight input part, and combines the excitation input part and at least Part of the weight input part is cross-linked to form a multiply-accumulate (MAC) architecture, and the required subset is taken for the fully-connected structure in the neural network algorithm of machine learning, and the corresponding electrical signal A and optical signal B are generated. The operation to be performed on the digital signal after the mapping component, the output of the subset includes the required operation results of the excitation input part and the weight input part.
  2. 如权利要求1所述的方法,其特征在于,所述计算组件包括开关单元、积分单元、复位单元、积分泄放单元以及脉冲释放单元,与所述互联组件相连,接受来自所述互联组件的、所述输入部分和权重部分的计算结果,得到所需要的目标量的结果。The method according to claim 1, wherein the calculation component includes a switch unit, an integral unit, a reset unit, an integral discharge unit, and a pulse release unit, which are connected to the interconnection component and accept the input from the interconnection component. , the calculation results of the input part and the weight part to obtain the required target quantity result.
  3. 如权利要求1所述的方法,其特征在于,基于所述全连接架构,输入所述权重值,并由所述权重值集合生成所述乘法累加架构下对应于所需要运算的全连接子架构,在该子架构下,对一组激励信号进行处理,得到所需要的目标值,其中:所述运算对权重信号K以及激励信号进行,并且:The method according to claim 1, characterized in that, based on the fully connected architecture, the weight value is input, and a fully connected subarchitecture corresponding to the required operation under the multiply-accumulate architecture is generated from the set of weight values , under this sub-architecture, a set of excitation signals is processed to obtain the required target value, wherein: the operation is performed on the weight signal K and the excitation signal, and:
    所述权重信号K的个数为K=N×M,其中M为被设置为所述算法中的全连接运算的激励输入信号的映射组件的输出信号的个数,M为0或正整数;N为互联组件连接到所述用于计算装置中的计算组件的个数,N为0或正整数;The number of the weight signal K is K=N×M, wherein M is set as the number of output signals of the mapping component of the excitation input signal of the fully connected operation in the algorithm, and M is 0 or a positive integer; N is the number of interconnection components connected to the computing components in the computing device, and N is 0 or a positive integer;
    将执行所述全连接运算的权重输入信号的映射组件的输出信号排列为一个N×M的矩阵,该矩阵中的每一行M个元素同M个被设置为所述算法中的全连接运算的激励输 入信号的映射组件的输出信号相应地进行逻辑与运算,并且将所述互联组件的输出信号作为对应行的所述计算组件的输入信号。Arranging the output signal of the mapping component of the weighted input signal that performs the full connection operation into an N×M matrix, each row of the matrix has M elements and M is set as the full connection operation in the algorithm incentive to lose The output signal of the mapping component of the input signal is correspondingly subjected to a logical AND operation, and the output signal of the interconnection component is used as the input signal of the calculation component of the corresponding row.
  4. 如权利要求1所述的方法,其特征在于,所述互联组件采用所述算法中的卷积连接,对来自所述映射组件的数据进行组合处理,并且:The method of claim 1, wherein the interconnection component combines data from the mapping component using convolutional connections in the algorithm, and:
    在存在多个映射组件的输出信号时,将映射组件的输出信号个数设为Z,将Z个信号分为两组,其中一组设为Q个,另一组设为P个,且Z=Q+P;以及,将Q个映射组件的输出信号设置为机器学习算法中的卷积运算的激励输入信号,将P个映射组件的输出信号设置为机器学习算法中的卷积运算的权重输入信号;并且所述方法还包括,When there are multiple output signals of mapping components, the number of output signals of mapping components is set to Z, and the Z signals are divided into two groups, one of which is set to Q, and the other set to P, and Z =Q+P; And, the output signals of the Q mapping components are set as the excitation input signals of the convolution operation in the machine learning algorithm, and the output signals of the P mapping components are set as the weights of the convolution operation in the machine learning algorithm input signal; and the method further includes,
    将Q个映射组件的输出信号排列成一个√Q×√Q的矩阵(√Q为整数),将P个映射组件的输出信号排列为√P×√P的矩阵(√P为整数,√P≤√Q);令√P×√P的矩阵与√Q×√Q的矩阵中尺寸为√P×√P的子矩阵按照矩阵中所有元素一一对应点乘的方式进行逻辑与运算,所有运算在时间域上都是同时并行进行的;将所有运算的输出信号被输出为所述互联组件的输出信号,以及将所述互联组件的输出信号输入到所述计算组件中,作为所述计算组件的输入信号。Arrange the output signals of Q mapping components into a matrix of √Q×√Q (√Q is an integer), and arrange the output signals of P mapping components into a matrix of √P×√P (√P is an integer, √P ≤√Q); let the matrix of √P×√P and the sub-matrix of size √P×√P in the matrix of √Q×√Q perform logical AND operation according to the one-to-one corresponding dot multiplication of all elements in the matrix, all Operations are performed simultaneously and in parallel in the time domain; the output signals of all operations are output as the output signals of the interconnection components, and the output signals of the interconnection components are input into the calculation components as the calculation The component's input signal.
  5. 如权利要求4所述的运算方法,用于进行Q个输入信号在Q个n比特位权重单元条件下的乘加运算,其特征在于:The computing method as claimed in claim 4, for carrying out the multiplication and addition operation of Q input signals under Q n-bit weight unit conditions, characterized in that:
    1)所述电信号输入组件和/或光信号拍摄组件共为Q个,用于输入所期望的Q个激励输入信号,1) There are Q total of the electrical signal input components and/or optical signal capture components, which are used to input the desired Q excitation input signals,
    2)所述权重单元共为Q个,用于输入所期望的Q个权重输入信号,每个权重的值均由n比特来表示,2) There are Q weight units in total, which are used to input the desired Q weight input signals, and the value of each weight is represented by n bits,
    3)将所述Q个二进制激励信号分别与Q个n比特的权重信号进行与运算,之后得到所述Q个二进制激励信号受所述Q个脉冲宽度调制波形调制后的值,并将所述这些值输入到所述计算组件(线性神经元计数电路)中,得到对Q个输入信号与Q个n比特权重信号相乘并累加的乘加运算结果。3) performing an AND operation on the Q binary excitation signals and Q n-bit weight signals respectively, and then obtaining the values of the Q binary excitation signals modulated by the Q pulse width modulation waveforms, and adding the These values are input into the calculation component (linear neuron counting circuit), and the result of multiplying and accumulating Q input signals by Q n-bit weight signals and accumulated is obtained.
  6. 如权利要求1中所述的方法,其特征在于,所述映射组件的映射方式为二进制形式、脉冲宽度调制形式、维持原有输入组件和/或拍摄组件输出不改变的形式、模数转换形式、随机数形式等。 The method according to claim 1, characterized in that, the mapping method of the mapping component is a binary form, a pulse width modulation form, a form in which the output of the original input component and/or the shooting component remains unchanged, and an analog-to-digital conversion form , random number form, etc.
  7. 如权利要求1至5之一所述的运算方法,其特征在于,电信号A或光信号B以及所述权重部分均采用二进制形式进行映射,并且所述运算进行二进制数的按位或运算。The computing method according to any one of claims 1 to 5, characterized in that the electrical signal A or the optical signal B and the weight part are mapped in binary form, and the operation is a bitwise OR operation of binary numbers.
  8. 如权利要求1至5之一所述的运算方法,其特征在于,所述计算组件所得到的目标量的结果为:具有一定频率、占空比的电压脉冲信号;或者,在一定时间T内,利用计数器对上述具有一定频率、占空比的电压脉冲信号所计得的脉冲个数。The calculation method according to any one of claims 1 to 5, wherein the result of the target quantity obtained by the calculation component is: a voltage pulse signal with a certain frequency and a duty cycle; or, within a certain time T , the number of pulses counted by the counter for the voltage pulse signal with a certain frequency and duty cycle.
  9. 如权利要求1至5之一所述的运算方法,其特征在于,电信号A或光信号B采用二进制形式进行映射,在执行与所述权重部分的逻辑与运算前后,对映射后的二进制信号进行移位操作。The computing method according to any one of claims 1 to 5, wherein the electrical signal A or the optical signal B is mapped in binary form, and the mapped binary signal is processed before and after the logic AND operation with the weight part Perform a shift operation.
  10. 如权利要求1至6之一所述的运算方法,其特征在于,在电信号A或光信号B采用脉冲宽度调制形式映射时,脉冲信号的高电平时长由信号本身大小决定,其中高电平信号为连续信号,中间没有低电平信号隔断。The computing method according to any one of claims 1 to 6, wherein when the electrical signal A or the optical signal B is mapped in the form of pulse width modulation, the duration of the high level of the pulse signal is determined by the size of the signal itself, wherein the high level The flat signal is a continuous signal, and there is no low-level signal interruption in the middle.
  11. 如权利要求1至6之一所述的运算方法,其特征在于,在电信号A或光信号B采用随机数形式映射,信号被映射为服从一定概率释放高电平信号的随机脉冲序列,其中所述概率由信号本身大小决定。The computing method according to any one of claims 1 to 6, wherein the electrical signal A or the optical signal B is mapped in the form of random numbers, and the signal is mapped to a random pulse sequence that releases a high-level signal subject to a certain probability, wherein The probability is determined by the magnitude of the signal itself.
  12. 如权利要求1所述的运算方法,其特征在于,电信号A或光信号B为模拟电流信号,该信号直接跳过所述映射组件、互联组件,并接入到到所述计算组件的积分单元的输入端,此时所述计算组件的输出结果用于表征所述输入组件所输出的电信号A或光信号B。The computing method according to claim 1, wherein the electrical signal A or the optical signal B is an analog current signal, and the signal directly skips the mapping component and the interconnection component, and is connected to the integral of the calculation component The input terminal of the unit, at this time, the output result of the calculation component is used to represent the electrical signal A or optical signal B output by the input component.
  13. 一种运算方法,其特征在于,将乘法累加运算作为基本的运算模块,并将待进行的机器学习算法中所涉及的线性运算分解为以乘法累加运算为单位的运算范式,所涉及的非线性运算用线性运算的叠加来表示,并进一步把所涉及的线性运算单元分解为以乘法累加运算为单元的运算范式,A kind of operation method, it is characterized in that, take multiplication and accumulation operation as basic operation module, and the linear operation involved in the machine learning algorithm to be carried out is decomposed into the operation pattern with multiplication and accumulation operation as unit, and the involved nonlinear The operation is represented by the superposition of linear operations, and the involved linear operation units are further decomposed into an operation paradigm with multiplication and accumulation operations as the unit.
    并根据所述运算范式设定运算组件的连接架构,所述连接架构中每一个乘法累加运算模块均And according to the operation paradigm, the connection structure of the operation components is set, and each multiplication and accumulation operation module in the connection structure is
    1)采用一个激励信号输入组件,以及 1) using an excitation signal input component, and
    2)采用一个权重信号输入组件,2) Using a weight signal input component,
    还包括,采用映射组件,计算组件,以及输出组件,并且,所述映射组件,分别与所述激励信号输入组件和所述权重信号输入组件相连,对所述激励信号以及所述权重信号进行映射,将映射后的激励信号和映射后的权重信号进行逻辑与运算,将运算结果输入到所述计算组件,所述计算组件根据所述运算范式进行以乘加运算为单位的组合运算,得到所需要的目标量的结果。It also includes using a mapping component, a calculation component, and an output component, and the mapping component is respectively connected to the excitation signal input component and the weight signal input component to map the excitation signal and the weight signal , perform a logical AND operation on the mapped excitation signal and the mapped weight signal, and input the operation result to the calculation component, and the calculation component performs a combined operation in units of multiplication and addition operations according to the operation paradigm, to obtain the A desired amount of results is required.
  14. 一种运算单元,包括电信号输入组件和光信号生成组件中的至少一种,其特征在于:An arithmetic unit, including at least one of an electrical signal input component and an optical signal generation component, characterized in that:
    电信号输入组件,包括至少一个电信号A输入端,用于输入表达所述电信号A的电信号模拟量;An electrical signal input component, including at least one electrical signal A input terminal, for inputting an electrical signal analog quantity expressing the electrical signal A;
    光信号生成组件,用于针对目标视场进行拍摄,或接收另一光信号组件的光信号,并生成光信号B模拟量,所述光信号B包括具有一定位宽的图像数据,或者一段时域上的电压脉冲序列,或者是一定大小的模拟电流;还包括,The optical signal generating component is used to shoot for the target field of view, or receive the optical signal of another optical signal component, and generate the optical signal B analog quantity, the optical signal B includes image data with a certain positioning width, or a period of time A voltage pulse sequence on a domain, or an analog current of a certain magnitude; also includes,
    映射组件,分别与所述电信号输入组件和/或光信号生成组件相连,并对所述电信号A和/或光信号B进行映射,生成分别代表所述电信号A和光信号B值的数字信号;A mapping component is connected to the electrical signal input component and/or the optical signal generating component respectively, and maps the electrical signal A and/or optical signal B to generate digital values representing the values of the electrical signal A and optical signal B respectively Signal;
    互联组件,采用多个电和/或光通路及交联的结构,所述结构对应于对电信号A和/或光信号B经过映射组件后的数字信号所进行的运算;The interconnection component adopts a plurality of electrical and/or optical paths and a cross-linked structure, and the structure corresponds to the operation performed on the digital signal of the electrical signal A and/or optical signal B after passing through the mapping component;
    计算组件,与所述互联组件相连,接受来自所述互联组件的、与所述电信号A和/或光信号B相应的数字信号,并且所述计算组件包括电和/或光连接结构,所述电和/或光连接结构包括适当的开关单元,以及积分单元、复位单元、泄放单元以及脉冲释放单元,用于对所述互联组件输出的信号进行计算,并将结果输出。A computing component, connected to the interconnection component, accepts a digital signal corresponding to the electrical signal A and/or optical signal B from the interconnection component, and the computing component includes an electrical and/or optical connection structure, so The electrical and/or optical connection structure includes a suitable switch unit, and an integration unit, a reset unit, a discharge unit and a pulse release unit, which are used to calculate the signal output by the interconnection components and output the result.
  15. 如权利要求14所述的运算单元,用于进行乘法累加(MAC)运算,其特征在于:所述互联组件采用机器学习神经网络上算法中的全连接结构,对来自所述映射组件的数据进行组合处理,还包括:The arithmetic unit according to claim 14, which is used to perform multiply-accumulate (MAC) operations, wherein the interconnection component adopts the fully connected structure in the algorithm on the machine learning neural network to perform data processing from the mapping component Combined treatment, also includes:
    1)所述映射组件的输出信号的个数设置为K,且K=N×M,其中M为被设置为机器学习算法中的全连接运算的激励输入信号的映射组件的输出信号的个数,N为互联组件连接到所述用于计算装置中的计算组件的个数;1) The number of output signals of the mapping component is set to K, and K=N×M, where M is the number of output signals of the mapping component that is set as the excitation input signal of the fully connected operation in the machine learning algorithm , N is the number of interconnection components connected to the computing components in the computing device;
    2)所述映射组件包括N×M个输出端,并且所输出的、该N×M矩阵中的每一行M个元素以“点对点”的方式,同M个被设置为机器学习算法中的全连接运算的激励输入 信号的映射组件的输出信号进行组合,并进行逻辑与运算,以及2) The mapping component includes N×M output terminals, and the output M elements in each row of the N×M matrix are set as all M elements in the machine learning algorithm in a “point-to-point” manner. Stimulus input for concatenated operations The output signals of the signal mapping component are combined and logically ANDed, and
    3)所述互联组件的输出端与所述计算组件的一个对应行的输入端相连,使所述互联组件的输出信号作为对应行的所述计算组件的输入信号。3) The output terminal of the interconnection component is connected to the input terminal of a corresponding row of the calculation component, so that the output signal of the interconnection component is used as the input signal of the calculation component of the corresponding row.
  16. 如权利要求14所述的运算单元,用于进行乘加(MAC)运算,其特征在于:The computing unit as claimed in claim 14, used for multiplying and adding (MAC) operations, characterized in that:
    所述互联组件采用机器学习算法中的卷积连接,对来自所述映射组件的数据进行组合处理,并且:The interconnection component combines data from the mapping component using convolutional connections in a machine learning algorithm and:
    1)多个映射组件具有相应的输出端,输出相应的输出信号,并且映射组件的输出信号个数为Z,Z个信号所关联的映射组件被分为两组,其中一组为Q个,另一组设为P个,且Z=Q+P;以及,所述Q个映射组件的输出信号为所述卷积运算的激励输入信号,P个映射组件的输出信号为卷积运算的权重输入信号;1) A plurality of mapping components have corresponding output terminals and output corresponding output signals, and the number of output signals of the mapping components is Z, and the mapping components associated with the Z signals are divided into two groups, one of which is Q, The other group is set to P, and Z=Q+P; and, the output signals of the Q mapping components are the excitation input signals of the convolution operation, and the output signals of the P mapping components are the weights of the convolution operation input signal;
    2)所述Q个映射组件的输出信号排列成一个√Q×√Q的矩阵(√Q为整数),将所述P个映射组件的输出信号排列为√P×√P的矩阵(√P为整数,√P≤√Q);令√P×√P的矩阵与√Q×√Q的矩阵中尺寸为√P×√P的子矩阵按照矩阵中所有元素一一对应点乘的方式进行逻辑与运算,所有运算在时间域上都是同时并行进行的;并且进行的所有运算当中相邻两者之间的横向或纵向间隔为步进S,所述S为非负整数,并且2) The output signals of the Q mapping components are arranged into a matrix of √Q×√Q (√Q is an integer), and the output signals of the P mapping components are arranged into a matrix of √P×√P (√P is an integer, √P≤√Q); let the matrix of √P×√P and the sub-matrix of size √P×√P in the matrix of √Q×√Q be multiplied according to the one-to-one correspondence of all elements in the matrix Logical AND operations, all operations are performed in parallel in the time domain at the same time; and among all operations performed, the horizontal or vertical interval between adjacent two is step S, and the S is a non-negative integer, and
    3)所述互联组件包括输出端,所输出的信号被作为所述计算组件的输入信号。3) The interconnection component includes an output terminal, and the output signal is used as an input signal of the computing component.
  17. 如权利要求14-16中任一权利要求所述的运算单元,用于进行M个输入信号在M个n比特权重单元条件下的乘加运算,其特征在于:The computing unit according to any one of claims 14-16, used for multiplication and addition operations of M input signals under the condition of M n-bit weight units, characterized in that:
    1)所述电信号输入组件和光信号拍摄组件的总数为M,所述M个输入端分别用于输入所期望的M个激励输入信号,1) The total number of the electrical signal input components and optical signal capture components is M, and the M input terminals are respectively used to input the desired M excitation input signals,
    2)所述M个权重输入端,所输入量分别用于表示n比特的权重单元;并且,将所述M个二进制激励信号分别与M个n比特的权重单元进行逻辑与运算,之后得到所述M个二进制激励信号受所述N个脉冲宽度调制波形调制后的值,并将所述这些值输入到所述、由线性神经元计数电路构成的计算组件中,得到对M个激励输入信号与M个n比特权重单元相乘并累加的乘加运算结果。2) The M weight input terminals, the input quantities are respectively used to represent n-bit weight units; and, the M binary excitation signals are respectively and M n-bit weight units are logically ANDed, and then the obtained The M binary excitation signals are modulated by the N pulse width modulation waveforms, and these values are input into the calculation component composed of a linear neuron counting circuit to obtain the M excitation input signals The result of the multiply-accumulate operation that is multiplied and accumulated by M n-bit weight units.
  18. 如权利要求14所述的运算单元,其特征在于,所述映射组件的类型为二进制型、脉冲宽度调制型、模数转换型、随机数型中的一种。 The computing unit according to claim 14, wherein the type of the mapping component is one of binary type, pulse width modulation type, analog-to-digital conversion type, and random number type.
  19. 如权利要求14所述的运算单元,其特征在于,电信号输入组件可以为SOC、ASIC、FPGA类芯片的通用型输入输出接口的一种,或者为寄存器文件(Register File)、SRAM、DRAM、RRAM类存储单元的一种,或者为用于对模拟电流信号或模拟电压信号进行输入的其它组件。The computing unit according to claim 14, wherein the electrical signal input assembly can be one of the general-purpose input and output interfaces of SOC, ASIC, and FPGA chips, or be a register file (Register File), SRAM, DRAM, A type of RRAM storage unit, or other components for inputting analog current signals or analog voltage signals.
  20. 如权利要求14所述的运算单元,其特征在于,光信号生成组件为图像传感器芯片,包括CIS芯片、CCD芯片、单光子雪崩探测二极管阵列等,用于感知光学信号,将光学信号转换为具有一定位宽的二进制图像信号或者一段时域脉冲序列,或者模拟电流信号。The computing unit according to claim 14, wherein the optical signal generation component is an image sensor chip, including a CIS chip, a CCD chip, a single-photon avalanche detection diode array, etc., for sensing optical signals and converting the optical signals into A bit-wide binary image signal or a time-domain pulse sequence, or an analog current signal.
  21. 如权利要求19或20所述的运算单元,其特征在于,映射组件用于对输入组件的输出信号进行如权利要求18所述映射,当输入组件的输出信号为二进制信号时,映射组件等同于输入组件,即此时不对输入组件作任何变化,映射组件被输入组件所替代;The computing unit according to claim 19 or 20, wherein the mapping component is used to map the output signal of the input component as claimed in claim 18, and when the output signal of the input component is a binary signal, the mapping component is equivalent to Input component, that is, no changes are made to the input component at this time, and the mapping component is replaced by the input component;
    当输入组件的输出信号为模拟电压信号时,此时映射组件可以为模数转换装置,映射组件连接到后面的互联组件,或者映射组件被输入组件所替代,这时映射组件的输出信号即为输入组件的输出信号,并且该输出信号被直接用作所述用于计算装置的计算组件的输入信号;When the output signal of the input component is an analog voltage signal, the mapping component can be an analog-to-digital conversion device, the mapping component is connected to the subsequent interconnection component, or the mapping component is replaced by the input component, and the output signal of the mapping component is an output signal of the input component, and the output signal is used directly as an input signal of said computing component for the computing device;
    当输入组件的输出信号为模拟电流信号时,此时模拟电流信号跳过映射组件、互联组件,接入到如权利要求14所述运算单元中的计算组件的的积分单元,计算组件的输出即能够表征输入的模拟电流信号大小。When the output signal of the input component is an analog current signal, the analog current signal skips the mapping component and the interconnection component, and is connected to the integral unit of the computing component in the computing unit as claimed in claim 14, and the output of the computing component is It can characterize the magnitude of the input analog current signal.
  22. 如权利要求18所述的运算单元,其特征在于,映射组件可以为脉冲宽度调制波形发生装置,该装置针对输入组件的输出二进制信号的大小,输出与大小对应个数的高电平信号,该高电平信号为连续信号,中间没有低电平信号隔断;该装置输出信号的时长由输入组件的输出二进制信号的位宽所决定。The arithmetic unit according to claim 18, wherein the mapping component can be a pulse width modulation waveform generating device, and the device outputs a number of high-level signals corresponding to the size of the output binary signal of the input component. The high-level signal is a continuous signal, and there is no interruption of the low-level signal in the middle; the duration of the output signal of the device is determined by the bit width of the output binary signal of the input component.
  23. 如权利要求18所述的运算单元,其特征在于,The computing unit according to claim 18, characterized in that,
    所述映射组件为随机脉冲序列发生装置,所述随机脉冲序列发生装置包括线性反馈移位寄存器以及比较器,并且被设置为,当输入组件的输出信号为二进制信号时,该二进制信号作为线性反馈移位寄存器的初始值;以及The mapping component is a random pulse sequence generating device, the random pulse sequence generating device includes a linear feedback shift register and a comparator, and is set to, when the output signal of the input component is a binary signal, the binary signal is used as a linear feedback the initial value of the shift register; and
    在每个时钟周期内,线性反馈移位寄存器被设置为,其输出值与其初始值被输入到比较器进行比较,若初始值大于当前线性反馈移位寄存器的输出值,则比较器输出为1, 反之比较器输出为0;In each clock cycle, the linear feedback shift register is set, and its output value is input to the comparator for comparison with its initial value. If the initial value is greater than the current output value of the linear feedback shift register, the comparator output is 1. , Otherwise, the output of the comparator is 0;
    进一步地,比较器被设置为输出一段高电平信号出现概率由输入组件的输出二进制信号决定的随机脉冲序列,其中脉冲序列的长度由输入组件的输出二进制信号的位宽决定。Further, the comparator is set to output a random pulse sequence whose occurrence probability of a high-level signal is determined by the output binary signal of the input component, wherein the length of the pulse sequence is determined by the bit width of the output binary signal of the input component.
  24. 如权利要求14所述的运算单元,其特征在于,计算组件的开关单元被设置为接受互联组件的输出信号,根据互联组件的输出信号输出由开关状态确定的信号;The computing unit according to claim 14, wherein the switch unit of the calculation component is configured to accept the output signal of the interconnection component, and output a signal determined by the switch state according to the output signal of the interconnection component;
    复位单元被设置为负责计算组件的复位功能,即复位信号工作时,计算组件的输出信号为零,积分单元用于对开关单元的输出信号进行积分操作,当积分信号达到积分单元的阈值时,脉冲释放单元开始释放脉冲,与此同时泄放单元对积分信号进行泄放。The reset unit is set to be responsible for the reset function of the calculation component, that is, when the reset signal is working, the output signal of the calculation component is zero, and the integration unit is used to integrate the output signal of the switch unit. When the integration signal reaches the threshold of the integration unit, The pulse release unit starts to release the pulse, and at the same time, the discharge unit releases the integrated signal.
  25. 如权利要求14所述的运算单元,其特征在于,计算组件表现出的特性为输出信号与输入信号呈线性关系;当输入信号为二进制信号时,二进制信号为“1”,则开关打开或关闭,二进制信号为“0”则开关状态为关闭或打开,并且上述两个开关状态为相反状态,以及所述开关打开的个数与输出信号呈线性关系,这里所述输出信号为具有一定频率、占空比的脉冲信号。The computing unit according to claim 14, wherein the computing component exhibits a linear relationship between the output signal and the input signal; when the input signal is a binary signal, the binary signal is "1", and the switch is turned on or off , the binary signal is "0", then the switch state is closed or open, and the above two switch states are opposite states, and the number of open switches is in a linear relationship with the output signal, where the output signal has a certain frequency, duty cycle pulse signal.
  26. 如权利要求14所述的运算单元,其特征在于,计算组件表现出的特性为输出信号与输入信号呈线性关系,并且当输入信号为模拟电流信号时,输入信号不需要经过开关单元,而是直接进入积分单元,并且输出信号的脉冲频率与输入模拟电流信号的大小呈线性关系。The computing unit according to claim 14, wherein the characteristic of the computing component is that the output signal is in a linear relationship with the input signal, and when the input signal is an analog current signal, the input signal does not need to pass through the switch unit, but It directly enters the integration unit, and the pulse frequency of the output signal has a linear relationship with the magnitude of the input analog current signal.
  27. 如权利要求14所述的运算单元,其特征在于,输出组件为输入信号为时钟信号和/或复位信号的数字计数器,或者为所述计算组件的脉冲释放单元。The computing unit according to claim 14, wherein the output component is a digital counter whose input signal is a clock signal and/or a reset signal, or a pulse release unit of the computing component.
  28. 如权利要求14所述的运算单元,其特征在于,所述运算单元满足计算组件输入信号的高电平(低电平)个数与在单位时间内输出组件的计数单元所计得的脉冲个数呈线性关系,所述计算组件也可以不需要囊括如权利要求14所述运算单元中计算组件包含的全部单元,并且也适用于如权利要求14所述运算单元。 The computing unit according to claim 14, wherein the computing unit satisfies the number of high levels (low levels) of the input signal of the calculation component and the number of pulses counted by the counting unit of the output component within a unit time. The number has a linear relationship, and the calculation component does not need to include all the units contained in the calculation component of the calculation unit as claimed in claim 14, and is also applicable to the calculation unit as claimed in claim 14.
PCT/CN2023/099468 2023-06-09 2023-06-09 Operation method and operation unit WO2023160735A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/099468 WO2023160735A2 (en) 2023-06-09 2023-06-09 Operation method and operation unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/099468 WO2023160735A2 (en) 2023-06-09 2023-06-09 Operation method and operation unit

Publications (2)

Publication Number Publication Date
WO2023160735A2 true WO2023160735A2 (en) 2023-08-31
WO2023160735A3 WO2023160735A3 (en) 2024-03-21

Family

ID=87764894

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/099468 WO2023160735A2 (en) 2023-06-09 2023-06-09 Operation method and operation unit

Country Status (1)

Country Link
WO (1) WO2023160735A2 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033759A (en) * 2019-12-09 2021-06-25 南京惟心光电系统有限公司 Pulse convolution neural network algorithm, integrated circuit, arithmetic device, and storage medium
JP7365999B2 (en) * 2019-12-24 2023-10-20 財團法人工業技術研究院 Neural network computing device and method
CN111445012B (en) * 2020-04-28 2023-04-18 南京大学 FPGA-based packet convolution hardware accelerator and method thereof
CN113902097A (en) * 2021-09-30 2022-01-07 南京大学 Run-length coding accelerator and method for sparse CNN neural network model

Also Published As

Publication number Publication date
WO2023160735A3 (en) 2024-03-21

Similar Documents

Publication Publication Date Title
Wang et al. A deep neural network accelerator based on tiled RRAM architecture
CN110543933B (en) Pulse type convolution neural network based on FLASH memory array
US11842167B2 (en) Switched capacitor vector-matrix multiplier
Pourmeidani et al. Probabilistic interpolation recoder for energy-error-product efficient DBNs with p-bit devices
WO2020173040A1 (en) Reversible logic circuit and operation method thereof
US11817173B2 (en) Timing-based computer architecture systems and methods
WO2023116923A1 (en) Storage and calculation integrated device and calculation method
CN112698811A (en) Neural network random number generator sharing circuit, sharing method and processor chip
Alam et al. Exact stochastic computing multiplication in memristive memory
CN113364462B (en) Analog storage and calculation integrated multi-bit precision implementation structure
Bankman et al. RRAM-based in-memory computing for embedded deep neural networks
WO2023160735A2 (en) Operation method and operation unit
US20190294957A1 (en) Arithmetic device and arithmetic method
Cherupally et al. Leveraging noise and aggressive quantization of in-memory computing for robust dnn hardware against adversarial input and weight attacks
Wang et al. A novel RRAM-based adaptive-threshold LIF neuron circuit for high recognition accuracy
CN112002365B (en) Parallel logic operation method based on multi-bit nonvolatile memory and full adder
US11475288B2 (en) Sorting networks using unary processing
CN213934855U (en) Neural network random number generator sharing circuit based on random computation
JP2022045335A (en) Memory device and operation method thereof
CN113988279A (en) Output current reading method and system of storage array supporting negative value excitation
CN115756388B (en) Multi-mode storage and calculation integrated circuit, chip and calculation device
Natsui et al. MTJ-based nonvolatile ternary logic gate for quantized convolutional neural networks
CN117636945B (en) 5-bit signed bit AND OR accumulation operation circuit and CIM circuit
CN113724764B (en) Multiplication device based on nonvolatile memory
US20220334800A1 (en) Exact stochastic computing multiplication in memory

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23759356

Country of ref document: EP

Kind code of ref document: A2