CN111435459A - Double-sided neural network processor - Google Patents

Double-sided neural network processor Download PDF

Info

Publication number
CN111435459A
CN111435459A CN201910029526.5A CN201910029526A CN111435459A CN 111435459 A CN111435459 A CN 111435459A CN 201910029526 A CN201910029526 A CN 201910029526A CN 111435459 A CN111435459 A CN 111435459A
Authority
CN
China
Prior art keywords
neural
neural network
network processor
computation circuit
circuit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910029526.5A
Other languages
Chinese (zh)
Inventor
张国飙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Haicun Information Technology Co Ltd
Original Assignee
Hangzhou Haicun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Haicun Information Technology Co Ltd filed Critical Hangzhou Haicun Information Technology Co Ltd
Priority to CN201910029526.5A priority Critical patent/CN111435459A/en
Priority to US16/249,112 priority patent/US11055606B2/en
Publication of CN111435459A publication Critical patent/CN111435459A/en
Priority to US17/227,323 priority patent/US20210232892A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Semiconductor Memories (AREA)
  • Memory System (AREA)

Abstract

The double-sided neural network processor (100) comprises a plurality of memory units (100aa-100mn), each memory unit (100ij) comprising at least one memory array (170) and a neural computation circuit (180). The neural network processor package (100) is formed on a semiconductor substrate (0) having a first surface (0a) and a second surface (0 b). The first surface (100a) contains a memory array (170) and the second surface (100b) contains a neural computation circuit (180). The memory array (170) and the neural computation circuit (180) are electrically coupled by a plurality of inter-surface connections (160).

Description

Double-sided neural network processor
Technical Field
The present invention relates to the field of integrated circuits, and more particularly, to neural-network processors (neural-processors) used for Artificial Intelligence (AI).
Background
A fifth application of the processor is a neural network. Neural networks provide a powerful artificial intelligence tool. FIG. 1A is an example of a neural network. It contains an input layer 32, a hidden layer 34 and an output layer 36. The input layer 32 contains i neurons 33, which input data x1、…xiConstituting an input vector 30 x. The output layer 36 contains k neurons 37, the output data y of which1、y2、…ykConstituting an output vector 30 y. The hidden layer 34 is interposed between the input layer 32 and the output layer 36. It contains j neurons 35, each neuron 35 electrically coupled to a first neuron in the input layer 32 and a second neuron in the output layer 36. The strength of coupling between neurons is determined by synaptic weight wijAnd wjkAnd (4) showing.
The prior art proposes a neural network accelerator chip 60 (see the ancient cloud et al, "Dadiannao: AMachine-L earning Supercomputer", IEEE/ACM International Symposium on Micro-architecture, 5(1), p. 609-622, 2014.) the neural network accelerator 60 comprises 16 cores 50 coupled to each other by a tree-like connection (FIG. 1B), each core 50 comprises a neural computing unit (NPU) 30 and four eDRAM blocks 40 (FIG. 1C). the NPU 30 performs neural computation and comprises 256+32 16-bit multipliers and 256+ 32-bit adders. the eDRAM 40 stores synaptic weights with a storage capacity of 2 MB.
There is still room for improvement in the neural network accelerator 60. First, the eDRAM 40 is a volatile memory, and pre-synaptic weights need to be loaded into the eDRAM 40 from external memory, which takes time. Second, only 32MB of eDRAM in each neural network accelerator chip 60 may be used to store synaptic weights. This capacity is still much lower than actually needed. Again, the design emphasis of neural network accelerator 60 is skewed towards memory-eDRAM 40 occupies 80% of the area in each core, while NPU 30 occupies less than 10% of the area, so the computational density is very limited.
Disclosure of Invention
The main purpose of the invention is to promote the progress of artificial intelligence.
It is another object of the invention to increase the computational power of neural network processors.
It is another object of the present invention to provide a neural network processor that can be used with mobile devices.
To achieve these and other objects, the present invention provides a two-sided neural network processor whose basic function is neural computation; more importantly, the synaptic weights required for neural computation are stored in the same chip. The neural network processor comprises thousands of storage computing units (storage computing units for short), and each storage computing unit comprises at least one neural storage circuit and one neural computing circuit. The neural storage circuit contains a storage array that stores synaptic weights; a neural computation circuit performs a neural computation using the synaptic weights. The neural network processor is formed on a semiconductor substrate, the substrate having a first surface and a second surface: the first surface includes a plurality of memory arrays and the second surface includes a plurality of neural computation circuits electrically coupled thereto via a plurality of inter-surface connections.
This type of integration of the memory array and the neural computation circuit on both sides of the substrate is referred to as double-sided integration. Double-sided integration can improve computational density. With conventional two-dimensional integration, the area of the neural network processor is the sum of the memory array and the neural computation circuit. After double-sided integration is adopted, the storage array is moved to the other side of the substrate from the edge, the neural network processor becomes small, and the calculation density is enhanced.
The first surface may employ any form of memory as a carrier of synaptic weights, such as RAM (SRAM, DRAM, MRAM, FRAM, etc.), or ROM (mask-ROM, OTP, NOR flash, NAND flash, etc.); the second surface may contain any form of neurocomputational circuitry. Since the memory array in the first surface is formed over a single crystal semiconductor substrate, its speed is fast. Furthermore, the memory array and the neural computation circuit are close together (relative to a traditional von Neumann architecture), and the time required to read synaptic weights is short. In addition, the number of inter-surface connections is large, which allows for ultra-wide bandwidth between the memory array and the neural computation circuitry. In the neural calculation, input data are sent to all the storage calculation units, and the neural calculation is carried out simultaneously, so that large-scale parallel calculation is guaranteed. Because the neural network processor contains thousands of storage and calculation units, high-speed and high-efficiency neural calculation can be realized.
Accordingly, the invention proposes a neural network processor (100), characterized in that it comprises: a plurality of storage units (100aa-100mn), each storage unit (100ij) comprising at least one memory array (170) and a neural computation circuit (180), the memory array (170) storing at least one synaptic weight, the neural computation circuit (180) performing a neural computation using the synaptic weights; a semiconductor substrate (0) having a first surface (0a) and a second surface (0b), said first surface (0a) containing said memory array (170) and said second surface (0b) containing said neuro-computation circuitry (180), said first surface (0a) and said second surface (0b) being electrically coupled by a plurality of inter-surface connections (160).
Drawings
FIG. 1A is a schematic diagram of a neural network; FIG. 1B is a chip layout diagram of a neural network accelerator (prior art); fig. 1C is a core architecture of the neural network accelerator.
Fig. 2A-2B are general descriptions of the two-sided neural network processor 100: FIG. 2A is a block circuit diagram thereof; fig. 2B is a circuit block diagram of the storage unit thereof.
FIG. 3A is a perspective view of a first surface of the neural network processor; FIG. 3B is a perspective view of a second surface thereof; fig. 3C is a sectional view thereof.
Fig. 4A-4B are circuit layout diagrams of first and second surfaces of a neural network processor 100.
Fig. 5A-5C are block circuit diagrams of three types of storage units.
Fig. 6A-6C are circuit layouts of three types of storage cells in the first and second surfaces.
FIG. 7 is a block circuit diagram of a neural computation circuit.
Fig. 8A-8B are circuit block diagrams of two types of computation circuits.
It is noted that the figures are diagrammatic and not drawn to scale. Dimensions and structures of parts in the figures may be exaggerated or reduced for clarity and convenience. In different embodiments, alphabetic suffixes following numbers represent different instances of the same class of structure; the same numerical prefixes refer to the same or similar structures. The symbol "/" represents a relationship of "and" or ".
In this specification, "memory" broadly refers to any semiconductor-based information storage device that can store information permanently or temporarily. A "memory array" is a collection of all memory cells that share at least one address line. "electrically coupled" means any form of coupling in which an electrical signal may be transmitted from one element to another. In other publications, "Neural Processing Unit (NPU)" is also referred to as "Neural Function Unit (NFU)" and the like, which are all synonymous; "neural network processor" is also referred to as "neural processor", "neural network accelerator", "machine learning accelerator", etc., and they all have the same meaning.
Detailed Description
Fig. 2A-2B are general illustrations of a two-sided neural network processor 100. Fig. 2A is a circuit block diagram thereof. Not only can the neural network processor 100 perform neural calculations, but synaptic weights required for the neural calculations are stored locally and in close proximity. Neural network processor 100 contains a banked array of m n banked units 100aa-100 mn. Taking the storage unit 100ij as an example, it has an input 110 and an output 120. Generally speaking, a neural network processor 100 may contain thousands of computational units 100aa-100mn, which support massively parallel computations.
Fig. 2B is a circuit block diagram of the storage unit 100ij thereof. The storage unit 100ij comprises at least a neural storage circuit 170 and a neural computation circuit 180 electrically coupled to each other via a plurality of inter-surface connections 160 (see fig. 3C). Each neural storage circuit 170 contains at least one storage array that stores synaptic weights that are used by the neural computation circuit 180 to perform neural computations. Since the memory array 170 is located on a different surface than the neural computation circuit 180, the memory array 170 is represented by a dashed line.
FIG. 3A is a perspective view of a first surface 0a of a neural network processor chip 100; fig. 3B is a perspective view of the second surface 0B thereof; fig. 3C is a sectional view thereof. The neural network processor chip 100 contains a semiconductor substrate 0. The substrate 0 has a first surface 0a (+ z direction) and a second surface 0b (-z direction). In this embodiment, the neural computation circuits 180aa-180bb are formed on the first surface 0a of the substrate 0; neural memory circuits (memory arrays) 170aa-170bb are formed on the second surface 0b of the substrate 0 and are electrically coupled therebetween by a plurality of inter-surface connections (160, including 160a-160 c). Examples of the inter-surface connections (160) include through-substrate vias (TSV's). In other embodiments, the memory arrays 170aa-170bb are formed on the first surface 0a of the substrate 0; the neural computation circuits 180aa-180bb are formed on the second surface 0b of the substrate 0.
This integration of memory arrays 170aa-170bb and neural computation circuits 180aa-180bb to form both front and back sides (0a, 0b) of substrate 0 is referred to as double-sided integration. Double-sided integration can improve computational density. With conventional two-dimensional integration, the area of the neural network processor is the sum of the memory array and the neural computation circuit. After double-sided integration is adopted, the storage array is moved to the other side of the substrate from the edge, the neural network processor becomes small, and the calculation density is enhanced.
The first surface 0a may employ any form of memory as a carrier of synaptic weights, such as RAM (SRAM, DRAM, MRAM, FRAM, etc.), or ROM (mask-ROM, OTP, NOR flash, NAND flash, etc.); the second surface 0b may contain any form of neural computation circuitry. Since the memory array 170 in the first surface 0a is formed on a single crystal semiconductor substrate, it is fast. Furthermore, the memory array 170 and the neural computation circuit 180 are close together (relative to the traditional von Neumann architecture), and the time required to read the new synaptic weights is short. In addition, the number of inter-surface connections 160 is large, which allows for ultra-wide bandwidth between the memory array 170 and the neural computation circuit 180. In the neural calculation, input data are sent to all the storage calculation units, and the neural calculation is carried out simultaneously, so that large-scale parallel calculation is guaranteed. Because the neural network processor contains thousands of storage units (fig. 2A), high-speed and high-efficiency neural computation can be realized.
Fig. 4A-4B are circuit layouts of first and second surfaces 0a, 0B in a two-sided neural network processor 100. This embodiment corresponds to the embodiment of fig. 5A and 6A. Those skilled in the art can easily generalize it to the embodiments of fig. 5B and 6B, and fig. 5C and 6C. FIG. 4A illustrates a first surface 0a that contains a plurality of memory arrays 170aa-170 mn. FIG. 5B shows a second surface 0B that contains a plurality of neural computation circuits 180aa-180 mn. The neural network processor 100 of fig. 5A and 5B employs a "full alignment" technique, i.e., by designing the circuit layout for both surfaces 0a, 0B for the following purposes: each memory array (e.g., 170ij) has a neural computation circuit (e.g., 180ij) aligned with it (see fig. 6A-6C). Since a single neural computation circuit (e.g., 180ij) may have multiple memory arrays (e.g., 170ijA-170ijD, 170ijW-170 ijZ) aligned with it (see fig. 6B-6C), the period of the neural computation circuit (e.g., 180ij) on the second surface 0B is an integer multiple of the period of the memory arrays (e.g., 170ij) on the first surface 0 a.
Fig. 5A to 6C show three kinds of storage units 100 ij. FIGS. 5A-5C are block circuit diagrams thereof; fig. 6A to 6C are circuit layout diagrams thereof. In these embodiments, one neural computation circuit 180ij serves a different number of storage arrays 170 ij.
The neural computation circuit 180ij in fig. 5A serves a memory array 170 ij: it performs neural computations using synaptic weights stored in the memory array 170 ij. The neural computation circuit 180ij in FIG. 5B serves four storage arrays 170ijA-170 ijD: it performs neural computations using synaptic weights stored in the storage arrays 170ijA-170 jiD. The neural computation circuit 180ij in FIG. 5C serves eight storage arrays 170ijA-170ijD and 170ijW-170 ijZ: it performs neural calculations using synaptic weights stored in the storage arrays 170ijA-170ijD and 170ijW-170 ijZ. As can be seen from FIGS. 6A-6C below, the neural computation circuit 180ij that serves more memory arrays 170ij generally occupies a larger chip area and has greater functionality. In fig. 5A to 6C, since the memory array 170ij and the neural computation circuit 180ij are located on different surfaces (see fig. 3A to 3C, and fig. 4A to 4B), the memory array 170ij is indicated by a dotted line.
Fig. 6A-6C show the circuit layout of the second surface 0b and the projection (shown in dashed lines) of the memory arrays 170ij-170ijZ (located in the first surface 0a) onto the second surface 0 b. The embodiment of fig. 6A corresponds to the embodiment of fig. 5A. In this embodiment, the neural computation circuit 180ij in the memory cell 100ij is located in the second semiconductor substrate 0b of the second surface 0 b. The neural computation circuit 180ij is at least partially covered by the memory array 170 ij.
In the embodiment, the period of the neural computation circuit 180ij is equal to the period of the memory array 170ij, and the area of the neural computation circuit cannot exceed the projection area of the memory array 170ij on the second chip 100b, so that the function is limited. This embodiment is better suited to achieve simpler neural calculations. Fig. 6B-6C disclose two complex neural computation circuits 180 ij.
The embodiment of fig. 6B corresponds to the embodiment of fig. 5B. In this embodiment, the neural computation circuits 180ij of the storage unit 100ij are located in the second surface 0b, which are at least partially covered by four storage arrays 170ijA-170 ijD. Below the four memory arrays 170ijA-170ijD, the neural computation circuit 180ji can be laid out freely. The period of the neural computation circuit 180ij in fig. 6B is twice the period of the memory array 170ij in fig. 6A, and the area is four times the period, so that more complicated neural computation can be realized.
The embodiment of fig. 6C corresponds to the embodiment of fig. 5C. In this embodiment, the neural computation circuit 180ij in the storage unit 100ij is located in the second surface 0 b. The eight storage arrays 170ijA-170ijD, 170ijW-170ijZ are divided into two groups 170ijSA, 170 jiSB. Each bank (e.g., 170 ijSA) includes four storage arrays (e.g., 170ijA-170 ijD). Under the first set 170SA of four storage arrays 170ijA-170ijD, the first neural computation circuit assembly 180ijA may be freely laid out. Similarly, under the second set 170ijSB of four memory arrays 170ijW-170ijZ, the second neural computation circuit assembly 180ijB may be freely laid out. The first nerve computation circuit component 180ijA and the second nerve computation circuit component 180ijB constitute a nerve computation circuit 180 ij. The wiring channels 182, 184, 186 provide for electrical coupling between different nerve computing circuit assemblies 180ijA, 180ijB, or between different nerve computing circuits. The neural computation circuit 180ij in fig. 6C has a period four times (x direction) and an area eight times as long as that of the memory array 170ij in fig. 6A, and can implement more complicated neural computation.
Fig. 7-8B disclose details of a neural computation circuit 180 and its computation circuit 730. In the embodiment of FIG. 7, the neural computation circuit 180 contains a synaptic weight (W)s) RAM 740A, an input neuron (N)in) RAM740B and a computing circuit 730. WsRAM 740A is a cache that temporarily stores synaptic weights 742 from 3D-M array 170; n is a radical ofinRAM740B is also a buffer that temporarily stores input data 746 from input 110. The calculation circuit 730 performs neural calculations and produces output data 748.
In the embodiment of fig. 8A, the calculation circuit 730 contains a multiplier 732, an adder 734, a register 736, and an activation function circuit 738. The multiplier 732 weights the synapses wijAnd input data xiMultiply, adder 734 and register 736 pair the product (w)ij×xi) The accumulated value is supplied to an activation function circuit 738, and the result is output data yj
In the embodiment of fig. 8B, multiplier 732 in fig. 8A is replaced with a multiplier-adder (MAC) 732'. Of course, the multiplier-adder 732' also includes a multiplier. WsRAM 740A outputs not only synaptic weights wij(via port 742 w), also outputs offset bj(via port 742 b). Multiplier-adder 732' for input data xiSynaptic weight wijAnd bias bjImplementing an offset multiply operation (w)ij×xi+bj)。
An activation function refers to a function whose output is controlled within a certain range (e.g., 0 to 1, or-1 to + 1), including a sigmod function, a signum function, a threshold function, a piecewise linear function, a step function, a tanh function, etc., the circuit implementation of the activation function is difficult. the computing circuit 730 may also contain a non-volatile memory, L UT. for long-term storage of the activation function is typically a read-only memory (ROM). in one embodiment of the invention, the ROM is a three-dimensional read-only memory (3D-ROM) array, which is stacked above and coincident with the neural computing circuit (180). at this time, the computing circuit 730 becomes extremely simple-it only needs to implement addition and multiplication, but does not need to implement the activation function.a computing circuit 730 implementing the activation function with a 3D-ROM array has a small area, which can guarantee the density of computation.
It will be understood that changes in form and detail may be made therein without departing from the spirit and scope of the invention, and are not intended to impede the practice of the invention. The invention, therefore, is not to be restricted except in the spirit of the appended claims.

Claims (10)

1. A two-sided neural network processor (100), comprising:
a plurality of storage units (100aa-100mn), each storage unit (100ij) comprising at least one memory array (170) and a neural computation circuit (180), the memory array (170) storing at least one synaptic weight, the neural computation circuit (180) performing a neural computation using the synaptic weights;
a semiconductor substrate (0) having a first surface (0a) and a second surface (0b), said first surface (0a) containing said memory array (170) and said second surface (0b) containing said neuro-computation circuitry (180), said first surface (0a) and said second surface (0b) being electrically coupled by a plurality of inter-surface connections (160).
2. The neural network processor (100) of claim 1, further characterized by: the projection of the memory array (170) on the second surface (0b) at least partially coincides with the neuro-computation circuit (180).
3. The neural network processor (100) of claim 1, further characterized by: each memory array (170ij) in the first surface (0a) has a neural computation circuit (180ij) aligned with it on the second surface (0 b).
4. The neural network processor (100) of claim 1, further characterized by: each neural computation circuit (180ij) in the second surface (0b) has at least one memory array (170ij) aligned with it on the first surface (0 a).
5. The neural network processor (100) of claim 1, further characterized by: the period of the neural computation circuit (180ij) in the second surface (0b) is an integer multiple of the period of the storage array (170ij) in the first surface (0 b).
6. The neural network processor (100) of claim 1, further characterized by: the neural computation circuit (180) includes at least one multiplier (732).
7. The neural network processor (100) of claim 1, further characterized by: the neural computation circuit (180) includes at least one multiplier-adder (732').
8. The neural network processor (100) of claim 1, further characterized in that the neural computation circuit (180) includes a Read Only Memory (ROM) that stores a look-up table (L UT) of activation functions.
9. The neural network processor (100) of claim 8, further characterized by: the ROM is a three-dimensional read-only memory (3D-ROM) array, the 3D-ROM array being stacked above the neural computation circuit (180).
10. The neural network processor (100) of claim 1, further characterized by: the inter-surface connections (160) are through-silicon vias (TSV's).
CN201910029526.5A 2016-03-21 2019-01-13 Double-sided neural network processor Pending CN111435459A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201910029526.5A CN111435459A (en) 2019-01-13 2019-01-13 Double-sided neural network processor
US16/249,112 US11055606B2 (en) 2016-03-21 2019-01-16 Vertically integrated neuro-processor
US17/227,323 US20210232892A1 (en) 2016-03-21 2021-04-11 Neuro-Processing Circuit Using Three-Dimensional Memory to Store Look-Up Table of Activation Function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910029526.5A CN111435459A (en) 2019-01-13 2019-01-13 Double-sided neural network processor

Publications (1)

Publication Number Publication Date
CN111435459A true CN111435459A (en) 2020-07-21

Family

ID=71579830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910029526.5A Pending CN111435459A (en) 2016-03-21 2019-01-13 Double-sided neural network processor

Country Status (1)

Country Link
CN (1) CN111435459A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220704A (en) * 2016-03-21 2017-09-29 杭州海存信息技术有限公司 Integrated neural network processor containing three-dimensional memory array
CN107305594A (en) * 2016-04-22 2017-10-31 杭州海存信息技术有限公司 Processor containing three-dimensional memory array
CN108053848A (en) * 2018-01-02 2018-05-18 清华大学 Circuit structure and neural network chip

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220704A (en) * 2016-03-21 2017-09-29 杭州海存信息技术有限公司 Integrated neural network processor containing three-dimensional memory array
CN107305594A (en) * 2016-04-22 2017-10-31 杭州海存信息技术有限公司 Processor containing three-dimensional memory array
CN108053848A (en) * 2018-01-02 2018-05-18 清华大学 Circuit structure and neural network chip

Similar Documents

Publication Publication Date Title
CN107220704B (en) Integrated neural network processor containing three-dimensional storage array
US11410026B2 (en) Neuromorphic circuit having 3D stacked structure and semiconductor device having the same
TWI795435B (en) System and method for calculating
CN108446764B (en) Novel neuromorphic chip architecture
CN113571111B (en) Vertical mapping and computation of deep neural networks in non-volatile memory
US11776944B2 (en) Discrete three-dimensional processor
Dutta et al. Hdnn-pim: Efficient in memory design of hyperdimensional computing with feature extraction
US20220269436A1 (en) Compute accelerated stacked memory
CN111435459A (en) Double-sided neural network processor
Campbell et al. 3D wafer stack neurocomputing
US11960987B2 (en) Discrete three-dimensional processor
WO2023173530A1 (en) Convolution operation accelerator and convolution operation method
CN111435460A (en) Neural network processor package
AU2020369825B2 (en) 3D neural inference processing unit architectures
CN109583577B (en) Arithmetic device and method
US20230361081A1 (en) In-memory computing circuit and fabrication method thereof
Ibrahim et al. DBNS addition using cellular neural networks
CN111290994B (en) Discrete three-dimensional processor
US20230022347A1 (en) Memory array with programmable number of filters
KR20170058294A (en) Storage processor array for scientific computations
CN110414676A (en) Neural network processor containing three-dimensional longitudinal storage array
Biederman et al. Design of a neural network-based digital multiplier
Wu et al. A Kernel Unfolding Approach to Trade Data Movement with Computation Power for CNN Acceleration
Bear A Novel Processing-In-Memory Architecture for Dense and Sparse Matrix Multiplications
CN118093179A (en) Computing task processing method and processor of neuromorphic chip under many-core architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination