CN109919321A - Unit has the artificial intelligence module and System on Chip/SoC of local accumulation function - Google Patents
Unit has the artificial intelligence module and System on Chip/SoC of local accumulation function Download PDFInfo
- Publication number
- CN109919321A CN109919321A CN201910103617.9A CN201910103617A CN109919321A CN 109919321 A CN109919321 A CN 109919321A CN 201910103617 A CN201910103617 A CN 201910103617A CN 109919321 A CN109919321 A CN 109919321A
- Authority
- CN
- China
- Prior art keywords
- data
- processing unit
- module
- dimension
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 8
- 238000009825 accumulation Methods 0.000 title abstract description 6
- 238000004804 winding Methods 0.000 claims description 6
- 238000010977 unit operation Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 8
- 238000003491 array Methods 0.000 description 7
- 239000011159 matrix material Substances 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 229940050561 matrix product Drugs 0.000 description 2
- 230000010349 pulsation Effects 0.000 description 2
- 108700012361 REG2 Proteins 0.000 description 1
- 101150108637 REG2 gene Proteins 0.000 description 1
- 101100120298 Rattus norvegicus Flot1 gene Proteins 0.000 description 1
- 101100412403 Rattus norvegicus Reg3b gene Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008713 feedback mechanism Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 238000000034 method Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 210000000352 storage cell Anatomy 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Abstract
A kind of artificial intelligence AI module of the processing unit with local accumulation function and the System on Chip/SoC including the AI module.In embodiment, chip circuit includes AI module, and the AI module includes: the multiple processing units for being arranged in two-dimensional array, completes multiply-add operation;Processing unit includes enabled input terminal, receives enable signal, and suspends or start the operation of processing unit according to enable signal;Processing unit under the influence of control signals, is configured and adds up to product;Each processing unit shares the same clock signal and carries out operation.The embodiment of the present invention allows each unit to add up all previous operation result, can effectively reduce the scale of AI module.
Description
Technical field
The present invention relates to technical field of integrated circuits more particularly to a kind of processing unit to have the artificial of local accumulation function
Intelligent AI module and the System on Chip/SoC including the AI module.
Background technique
Systolic arrays (Systolic Array), it is intended that it is that data is allowed to be flowed in the array of arithmetic element,
The number of memory access is reduced, and makes structure more regular, wiring is more unified, improves frequency.This concept of systolic arrays exists
Nineteen eighty-two just has been proposed, recently the nuclear structure due to artificial intelligence chip using the structure as calculating, and again
Concern is arrived.
With going deep into for artificial intelligence study and being widely popularized for application, it is necessary to release the AI module for more meeting demand.
In addition, artificial intelligence module is accessed control by processing unit by bus, and bus is that have certain band
Tolerance system, such framework are difficult to adapt to the big bandwidth demand of artificial intelligence AI module.
Summary of the invention
According in a first aspect, the embodiment of the present invention provides a kind of chip circuit, which includes AI module, the AI
Module includes: multiple processing units that two-dimensional array is arranged in by the first dimension and the second dimension, and each processing unit can be completed
Multiply-add operation;Wherein, processing unit includes enabled input terminal, for receiving enable signal, and according to enable signal pause or
Start the operation of processing unit;Processing unit under the influence of control signals, can add up to product;In two-dimensional array
Each processing unit shares the same clock signal and carries out operation;First dimension and the second dimension are perpendicular to one another.
In a kind of embodiment of first aspect, processing unit includes coefficient memory, for providing processing unit fortune
Calculation coefficient data;Processing unit includes multiplier, adder, the first register (REG1), the second register and multiplexer;?
The first input data end and the first data output end in first dimension;The second data input pin and second in the second dimension
Data output end;First data are inputted from the first data-in port, and the first data are multiplied by multiplier with coefficient data;Multiplexing
Device selects a data output, addition from the output data of the second data and the first register from the second data input pin
By the output data and product addition of the multiplexer, after being added and value is deposited in the first register device;With value in clock
It can be exported through the second data output end under control;First data are also deposited in the second register, and under clock control
It can be exported through the first output end.
In second aspect, the embodiment of the present invention provides a kind of System on Chip/SoC, comprising: chip electricity as described in relation to the first aspect
Road;FPGA module is coupled with the AI module, to send data from AI module or to receive data.
In the embodiment of second aspect, AI module includes first processing units, the second processing unit and third processing
Unit;Wherein first processing units and the second processing unit are along the first dimension arranged adjacent, the second output of first processing units
It is coupled to the first input end of the second processing unit in end;First processing units and third processing unit are along the second dimension adjacent row
Column, the first output end of first processing units are coupled to the second input terminal of third processing unit.
In yet another embodiment, the winding structure of FPGA module is multiplexed in AI Module-embedding FPGA module, with
Just data are sent from AI module or receives data, all via the winding structure of the FPGA of the multiplexing.
In embodiment, due to adding up in each processing unit to all previous operation result, thus, it is possible to effectively reduce AI
The scale of module.
Detailed description of the invention
Fig. 1 is the schematic diagram of 2 dimension AI module according to an embodiment of the present invention;
Fig. 2 is the schematic diagram of processing unit;
Fig. 3 is the schematic diagram of the memory MEM in the processing unit of Fig. 2;
Fig. 4 is the schematic diagram of 2 dimension systolic arrays processing data;
Fig. 5 is a kind of structural schematic diagram of System on Chip/SoC for being integrated with FPGA and AI module;
Fig. 6 is the structural schematic diagram of FPGA circuitry.
Specific embodiment
To make the technical solution of the embodiment of the present invention and becoming apparent from for advantage expression, below by drawings and examples,
Technical scheme of the present invention will be described in further detail.
In the description of the present application, term " center ", "upper", "lower", "front", "rear", "left", "right", "vertical", " water
It is flat ", "top", "bottom", "inner", the instructions such as "outside" orientation or positional relationship be to be based on the orientation or positional relationship shown in the drawings,
Be merely for convenience of description the application and simplify description, rather than the device or element of indication or suggestion meaning must have it is specific
Orientation, be constructed and operated in a specific orientation, therefore should not be understood as the limitation to the application.
Fig. 1 is the schematic diagram of 2 dimension AI module according to an embodiment of the present invention.In one example, AI module is pulsation battle array
Column, systolic arrays are the processing unit structures that synchronization of data streams flows through adjacent two-dimensional array unit.As shown in Figure 1, pulsation battle array
Column include, for example, 4X4 processing unit PE.Systolic arrays can be divided into two dimensions, the first dimension perpendicular to one another and the second dimension
Degree.By taking first processing units, the second processing unit and third processing unit as an example, first processing units and the second processing unit edge
First dimension is arranged along first direction, and the first output end of first processing units is coupled to first input of the second processing unit
End;First processing units and third processing unit arrange in a second direction along the second dimension, the second output of first processing units
It is coupled to the second input terminal of third processing unit in end.
One-dimensional data a can sequentially input identical second dimension values along first direction along the first dimension under same clock
Each processing unit;Data are throughout managed in unit to be multiplied with another dimension data (coefficient) W of storage in the cells;Product is along second
Dimension in a second direction everywhere in reason unit transmission, and be added each other.For the sake of understanding conveniently, hereafter will be with horizontal dimensions
First dimension, from left to right are first direction, are the second dimension with vertical dimensions, upper downwards for second direction.
It is noted that every data line in Fig. 1 can both represent the signal of single-bit, 8 (or 16,32) bits can also be represented
Signal.
Processing unit is configured with enable signal EN input terminal, for receiving enable signal EN, and according to the enable signal
The treatment progress of EN, starting or pause processing unit.The same clock signal of units shared is managed everywhere in two-dimensional array to carry out
Operation.
Processing unit under the influence of control signals, can add up to product.Control signal may include enabled letter
Number EN, selection control signal of multiplexer etc..Due to that can add up in each unit to all previous operation result, so can
To effectively reduce the scale of AI module.
In one example, matrix multiplication may be implemented in two-dimensional array.In another example, two-dimensional array may be implemented
Convolution algorithm.
Fig. 2 is the schematic diagram of processing unit.As shown in Fig. 2, processing unit includes multiplier MUL, adder ADD.Data
It inputs from the first data-in port DI, is multiplied in MUL with the coefficient W being stored in coefficient memory MEM, then, the product
It is added in adder ADD with the data P from the second data-in port PI, after being added and value is deposited in register REG1
In.In next clock, and value S is exported through the first output end PO.It can be through inputting after the first output end PO output with value S
Port PI inputs another underlying PE.First input data end DI and the first data output end DO is distributed along first direction
In the first dimension;Second data input pin PI and the second data output end PO are distributed in a second direction in the second dimension.
In one example, processing unit further includes multiplexer MUX, which inputs according to control signal from the second data
It holds and selects one in the output signal of the data P and REG1 of PI, to be sent into adder ADD.Processing unit is in control signal
Under effect, it can add up to product.Based on such internal feedback mechanism, can be multiplied in the same processing unit
Accumulation adds, and thus implements various types of AI operations.
Certainly, data a can also be deposited in register REG2, and be exported under clock control through second output terminal DO
To the processing unit PE on right side.
Clock CK is used to control the treatment progress of processing unit.
Enable signal EN is used to start or suspend the treatment progress of processing unit.
Fig. 3 is the schematic diagram of the memory MEM in the processing unit of Fig. 2.As shown in figure 3, memory includes the D of 8 bits
Trigger, coefficient data are then Q0-Q7 through output end Q output from D input terminal input trigger.Clock CK control trigger
Rhythm.Enable signal EN is for determining whether d type flip flop starts or suspend.
Fig. 4 is the schematic diagram of 2 dimension systolic arrays processing data.As shown in figure 4, the left column of 4X4 systolic arrays includes at 4
Unit is managed, the coefficient stored in each processing unit is respectively W11, W12, W13, W14.Can temporarily with the label reference of coefficient at
Manage unit.First, it is assumed that the MUX of processing unit only gates the input data of PI.
Data are inputted from left side.In first clock, a11 input unit 11, processing obtains product p11=a11*w11.Such as
If the p10 that fruit comes from above processing unit is not 0, then also need to be included in the numerical value of p10.
In second clock, a11*w11 is displaced downwardly to unit 12 from unit 11;A21 input unit 11, a12 input unit 12;
Then unit 11 obtains product a21*w11 (perhaps there are also product of this moment from p10), and unit 12 obtains product a12*w12,
And export a12*w12+a11*w11.
In third clock, a21*w11 is displaced downwardly to unit 12, a12*w12+a11*w11 from unit 11 and moves down from unit 12
To unit 13;A31 input unit 11, a22 input unit 12, a13 input unit 13;Then unit 11 obtain product a31*w11 (or
Perhaps there are also product of this moment from p10), unit 12 obtains product a22*w12, and exports a22*w12+a21*w11;It is single
Member 13 obtains product a13*w13, and exports a13*w13+a12*w12+a11*w11.
In the 4th clock, a31*w11 is displaced downwardly to unit 12 from unit 11, and a12*w12+a11*w11 is moved down from unit 12
To unit 13, a13*w13+a12*w12+a11*w11 is displaced downwardly to unit 14 from unit 13;A41 input unit 11, a32 input are single
12, a23 of member input unit 13, a14 input unit 14;Then unit 11 obtains product a41*w11 and (perhaps comes from there are also this moment
The product of p10), unit 12 obtains product a32*w12, and exports a32*w12+a31*w11;Unit 13 obtains product a23*
W13, and export a23*w13+a22*w12+a31*w11;Unit 14 obtains product a14*w14, and exports a14*w14+
a13*w13+a12*w12+a11*w11。
Similarly, unit 24 is a14*w24+a13*w23+a12*w22+a11*w21 in the output of the 5th clock;Unit 34
It is a14*w34+a13*w33+a12*w32+a11*w31 in the output of the 6th clock;Output of the unit 44 in the 7th clock
For a14*w44+a13*w43+a12*w42+a11*w41.
As can be seen that unit 14,24,34 and 44 respectively the 4th, 5,6, the output of 7 clocks can regard as respectively with aij
A matrix and wij for element are the matrix product of the W matrix of element.
If adjusting the coefficient data in input data or memory, for example aij is replaced into a [N-i] [M-j],
Matrix product is carried out on the basis of data after displacement, gained result of product is convolution.
MUX function as shown in connection with fig. 2 it is found that in one example, can configure the MUX of each processing unit, i.e., as follows
In top n cycle period, MUX only gates the output numerical value of REG1, and in the cycle period of N+1, MUX only gates the defeated of PI
Enter.So, processing unit can add up the operation result of top n cycle period, then in subsequent cycle period
Accumulation result is exported into AI module.In this way, the scale of AI module can be effectively reduced.
Fig. 5 is a kind of structural schematic diagram of System on Chip/SoC for being integrated with FPGA and AI module.As shown in figure 5, System on Chip/SoC
On be integrated at least one FPGA circuitry and at least one AI module.In at least one AI module, each AI module can be Fig. 1 institute
The AI module shown.
In at least one FPGA circuitry, each FPGA circuitry can realize the various functions such as logic, calculating, control.FPGA module
The various functions such as logic, calculating, control can be achieved.FPGA realizes that combination is patrolled using small-sized look-up table (for example, 16 × 1RAM)
Volume, each look-up table is connected to the input terminal of a d type flip flop, and trigger drives other logic circuits or driving I/O again, by
This constitutes the basic logic unit module that can not only realize combination logic function but also can realize sequential logic function, these intermodules
I/O module is interconnected or is connected to using metal connecting line.The logic of FPGA is to load to compile by internally static storage cell
Number of passes according to come what is realized, store value in a memory cell determine between the logic function and each module of logic unit or
Connecting mode between module and I/O, and finally determine function achieved by FPGA.
Interface corresponding with two-dimensional convolution array is additionally provided on System on Chip/SoC, FPGA module and AI module pass through interface
Module connection.Interface module can be XBAR module, and XBAR module is for example by multiple selectors (Multiplexer) and selection position
Member composition.Interface module is also possible to FIFO (first in first out).Interface module can also be synchronizer (Synchronizer), together
Step device is for example connected in series by 2 triggers (Flip-Flop or FF).FPGA module can be AI module transfer data, provide
Control.
FPGA module and AI module can be placed side by side, and FPGA module can be AI module transfer data at this time, provide control
System;AI module can also be embedded among FPGA module, and AI module needs to be multiplexed the winding structure of FPGA module at this time, will pass through
The winding structure of the FPGA module of multiplexing sends and receivees data.
Fig. 6 is the structural schematic diagram of FPGA circuitry.As shown in fig. 6, FPGA circuitry may include having multiple programmable logic moulds
The modules such as block (LOGIC), embedded memory block (EMB), multiply-accumulator (MAC) and corresponding coiling (XBAR).Certainly, FPGA electricity
Road is additionally provided with the related resources such as clock/configuration module (trunk spine/ branch seam).If desired EMB or when MAC module, because of it
The big many of area ratio PLB, therefore several PLB modules are replaced with this EMB/MAC module.
Coiling resource XBAR is the contact of each intermodule interconnection, is evenly distributed in FPGA module.Institute in FPGA module
Some resources, PLB, EMB, MAC, IO mutual coiling are all to be had an identical interface-coiling XBAR unit to come in fact
It is existing.From the point of view of winding mode, entire array is identical consistent, the XBAR unit formation grid of proper alignment, will be all in FPGA
Module is connected.
LOGIC module may include, the table for example, 86 inputs are noted, 18 registers.EMB module can be, for example,
The storage unit of 36k bit or 2 18k bits.MAC module can be, for example, 25x18 multiplier or 2 18x18 multiplication
Device.There is no restriction for the accounting of each module number of LOGIC, MAC, EMB in FPGA array, and the size of array is also as needed, is setting
Timing is determined by practical application.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (5)
1. a kind of chip circuit, including artificial intelligence AI module, the AI module includes: to arrange by the first dimension and the second dimension
At multiple processing units (PE) of two-dimensional array, each processing unit can complete multiply-add operation;Wherein, processing unit includes enabled
Input terminal, for receiving enable signal, and according to enable signal pause or the operation of starting processing unit;Processing unit is being controlled
Under the action of signal processed, it can add up to product;Everywhere in two-dimensional array manage the same clock signal of units shared into
Row operation;First dimension and the second dimension are perpendicular to one another.
2. chip circuit according to claim 1, which is characterized in that processing unit includes coefficient memory, for providing
Processing unit operation coefficient data;Processing unit include multiplier (MUL), adder (ADD), the first register (REG1),
Second register (REG2) and multiplexer (MUX);The first input data end (DI) and the output of the first data in the first dimension
It holds (DO);The second data input pin (PI) and the second data output end (PO) in the second dimension;First data are counted from first
It is inputted according to input port, the first data are multiplied by multiplier with coefficient data (W);Multiplexer is from from the second data input pin
Select the output of data in second data and the output data of the first register, adder by the output data of the multiplexer and
Product addition, after being added and value are deposited in the first register (REG1);It can be through the second number under clock control with value
It is exported according to output end;First data are also deposited in the second register, and can be defeated through the first output end under clock control
Out.
3. a kind of System on Chip/SoC, comprising: the chip circuit as described in one of claim 1-2;
FPGA module is coupled with the AI module, to send data from AI module or to receive data.
4. System on Chip/SoC according to claim 3, which is characterized in that AI module includes first processing units, second processing
Unit and third processing unit;Wherein first processing units and the second processing unit are along the first dimension arranged adjacent, the first processing
First output end of unit is coupled to the first input end of the second processing unit;First processing units and third processing unit are along
Two-dimensions arranged adjacent, the second output terminal of first processing units are coupled to the second input terminal of third processing unit.
5. System on Chip/SoC as claimed in claim 3, which is characterized in that be multiplexed FPGA mould in AI Module-embedding FPGA module
The winding structure of block, to send data from AI module or to receive data, all via the bobbin winder bracket of the FPGA of the multiplexing
Structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910103617.9A CN109919321A (en) | 2019-02-01 | 2019-02-01 | Unit has the artificial intelligence module and System on Chip/SoC of local accumulation function |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910103617.9A CN109919321A (en) | 2019-02-01 | 2019-02-01 | Unit has the artificial intelligence module and System on Chip/SoC of local accumulation function |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109919321A true CN109919321A (en) | 2019-06-21 |
Family
ID=66961362
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910103617.9A Pending CN109919321A (en) | 2019-02-01 | 2019-02-01 | Unit has the artificial intelligence module and System on Chip/SoC of local accumulation function |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109919321A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111752529A (en) * | 2020-06-30 | 2020-10-09 | 无锡中微亿芯有限公司 | Programmable logic unit structure supporting efficient multiply-accumulate operation |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105827217A (en) * | 2016-03-03 | 2016-08-03 | 深圳市紫光同创电子有限公司 | Finite long impulse response filter circuit and field-programmable gate array |
CN107454966A (en) * | 2015-05-21 | 2017-12-08 | 谷歌公司 | Weight is prefetched for neural network processor |
CN107454965A (en) * | 2015-05-21 | 2017-12-08 | 谷歌公司 | Batch processing in neural network processor |
CN107578095A (en) * | 2017-09-01 | 2018-01-12 | 中国科学院计算技术研究所 | Neural computing device and the processor comprising the computing device |
CN107578098A (en) * | 2017-09-01 | 2018-01-12 | 中国科学院计算技术研究所 | Neural network processor based on systolic arrays |
CN107918794A (en) * | 2017-11-15 | 2018-04-17 | 中国科学院计算技术研究所 | Neural network processor based on computing array |
CN108073983A (en) * | 2016-11-10 | 2018-05-25 | 谷歌有限责任公司 | Core is performed within hardware to cross over |
CN108573304A (en) * | 2017-03-09 | 2018-09-25 | 谷歌有限责任公司 | Transposition neural network matrix in hardware |
CN108734636A (en) * | 2017-04-24 | 2018-11-02 | 英特尔公司 | Special fixed function hardware for efficient convolution |
CN108805262A (en) * | 2017-04-27 | 2018-11-13 | 美国飞通计算解决方案有限公司 | System and method for carrying out systolic arrays design according to advanced procedures |
-
2019
- 2019-02-01 CN CN201910103617.9A patent/CN109919321A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107454966A (en) * | 2015-05-21 | 2017-12-08 | 谷歌公司 | Weight is prefetched for neural network processor |
CN107454965A (en) * | 2015-05-21 | 2017-12-08 | 谷歌公司 | Batch processing in neural network processor |
CN105827217A (en) * | 2016-03-03 | 2016-08-03 | 深圳市紫光同创电子有限公司 | Finite long impulse response filter circuit and field-programmable gate array |
CN108073983A (en) * | 2016-11-10 | 2018-05-25 | 谷歌有限责任公司 | Core is performed within hardware to cross over |
CN108573304A (en) * | 2017-03-09 | 2018-09-25 | 谷歌有限责任公司 | Transposition neural network matrix in hardware |
CN108734636A (en) * | 2017-04-24 | 2018-11-02 | 英特尔公司 | Special fixed function hardware for efficient convolution |
CN108805262A (en) * | 2017-04-27 | 2018-11-13 | 美国飞通计算解决方案有限公司 | System and method for carrying out systolic arrays design according to advanced procedures |
CN107578095A (en) * | 2017-09-01 | 2018-01-12 | 中国科学院计算技术研究所 | Neural computing device and the processor comprising the computing device |
CN107578098A (en) * | 2017-09-01 | 2018-01-12 | 中国科学院计算技术研究所 | Neural network processor based on systolic arrays |
CN107918794A (en) * | 2017-11-15 | 2018-04-17 | 中国科学院计算技术研究所 | Neural network processor based on computing array |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111752529A (en) * | 2020-06-30 | 2020-10-09 | 无锡中微亿芯有限公司 | Programmable logic unit structure supporting efficient multiply-accumulate operation |
CN111752529B (en) * | 2020-06-30 | 2021-12-07 | 无锡中微亿芯有限公司 | Programmable logic unit structure supporting efficient multiply-accumulate operation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5509106A (en) | Triangular scalable neural array processor | |
CN106022468A (en) | Artificial neural network processor integrated circuit and design method therefor | |
US20070113054A1 (en) | Component with a dynamically reconfigurable architecture | |
US9025595B2 (en) | Unified network architecture for scalable super-calculus systems | |
CN109711533A (en) | Convolutional neural networks module based on FPGA | |
US7734896B2 (en) | Enhanced processor element structure in a reconfigurable integrated circuit device | |
CN109902063A (en) | A kind of System on Chip/SoC being integrated with two-dimensional convolution array | |
US4524428A (en) | Modular input-programmable logic circuits for use in a modular array processor | |
CN102508803A (en) | Matrix transposition memory controller | |
CN109993272A (en) | Convolution and down-sampled arithmetic element, neural network computing unit and field programmable gate array IC | |
CN109919321A (en) | Unit has the artificial intelligence module and System on Chip/SoC of local accumulation function | |
CN102163248A (en) | Advanced synthesizing method for integrated circuit | |
CN109902835A (en) | Processing unit is provided with the artificial intelligence module and System on Chip/SoC of general-purpose algorithm unit | |
CN107368459B (en) | Scheduling method of reconfigurable computing structure based on arbitrary dimension matrix multiplication | |
CN109857024A (en) | The unit performance test method and System on Chip/SoC of artificial intelligence module | |
CN109886416A (en) | The System on Chip/SoC and machine learning method of integrated AI's module | |
CN109902040A (en) | A kind of System on Chip/SoC of integrated FPGA and artificial intelligence module | |
CN109919323A (en) | Edge cells have the artificial intelligence module and System on Chip/SoC of local accumulation function | |
US11016822B1 (en) | Cascade streaming between data processing engines in an array | |
CN109902064A (en) | A kind of chip circuit of two dimension systolic arrays | |
CN112639839A (en) | Arithmetic device of neural network and control method thereof | |
CN109871950A (en) | Unit has the chip circuit and System on Chip/SoC of the artificial intelligence module of bypass functionality | |
CN109902836A (en) | The failure tolerant method and System on Chip/SoC of artificial intelligence module | |
CN109919322A (en) | A kind of method and system chip of artificial intelligence module on test macro chip | |
CN109766293A (en) | Connect the circuit and System on Chip/SoC of FPGA and artificial intelligence module on chip |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190621 |