CN109933371A - Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage - Google Patents
Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage Download PDFInfo
- Publication number
- CN109933371A CN109933371A CN201910104134.0A CN201910104134A CN109933371A CN 109933371 A CN109933371 A CN 109933371A CN 201910104134 A CN201910104134 A CN 201910104134A CN 109933371 A CN109933371 A CN 109933371A
- Authority
- CN
- China
- Prior art keywords
- module
- data
- chip
- processing unit
- fpga
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 10
- 238000012545 processing Methods 0.000 claims abstract description 42
- 238000004804 winding Methods 0.000 claims description 6
- 230000008878 coupling Effects 0.000 claims 1
- 238000010168 coupling process Methods 0.000 claims 1
- 238000005859 coupling reaction Methods 0.000 claims 1
- 230000003252 repetitive effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 5
- 238000000034 method Methods 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000033764 rhythmic process Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 102100023882 Endoribonuclease ZC3H12A Human genes 0.000 description 1
- 101710112715 Endoribonuclease ZC3H12A Proteins 0.000 description 1
- 108700012361 REG2 Proteins 0.000 description 1
- 101150108637 REG2 gene Proteins 0.000 description 1
- 101100120298 Rattus norvegicus Flot1 gene Proteins 0.000 description 1
- 101100412403 Rattus norvegicus Reg3b gene Proteins 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- QGVYYLZOAMMKAH-UHFFFAOYSA-N pegnivacogin Chemical compound COCCOC(=O)NCCCCC(NC(=O)OCCOC)C(=O)NCCCCCCOP(=O)(O)O QGVYYLZOAMMKAH-UHFFFAOYSA-N 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 210000000352 storage cell Anatomy 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Landscapes
- Logic Circuits (AREA)
Abstract
A kind of chip circuit and System on Chip/SoC of the artificial intelligence AI module of the addressable local storage of its unit.In embodiment, AI module includes: the multiple processing units (PE) for being arranged in two-dimensional array, and each processing unit can complete multiply-add operation;Wherein, processing unit includes enabled input terminal, for receiving enable signal, and according to enable signal pause or the operation of starting processing unit;Processor further includes memory, for storing the data of multiply-add operation;Wherein, in two-dimensional array everywhere in the same clock signal of reason units shared carry out operation.AI module can efficiently complete the repetitive operation of high-volume data due to using memory.
Description
Technical field
The present invention relates to the artificial intelligence AI that technical field of integrated circuits more particularly to its unit may have access to local storage
The chip circuit and System on Chip/SoC of module.
Background technique
In recent years, artificial intelligence welcomes a wave development wave.Artificial intelligence is that research makes computer to simulate the certain of people
The subject of thought process and intelligent behavior (such as study, reasoning, thinking, planning), the main original that intelligence is realized including computer
The computer for managing, being manufactured similarly to human brain intelligence enables a computer to realize higher level application.
With going deep into for artificial intelligence study and being widely popularized for application, it is necessary to release the AI module for more meeting demand.
In addition, artificial intelligence module is accessed control by processor by bus, and bus is that have certain bandwidth
Limitation, such framework are difficult to adapt to the big bandwidth demand of artificial intelligence AI module.
Summary of the invention
According in a first aspect, providing a kind of chip circuit including AI module, AI module includes: to be arranged in two-dimensional array
Multiple processing units, each processing unit can complete multiply-add operation;Wherein, processing unit includes enabled input terminal, for receiving
Enable signal, and according to enable signal pause or the operation of starting processing unit;Processor further includes memory, for storing
The data of multiply-add operation;Wherein, in two-dimensional array everywhere in the same clock signal of reason units shared carry out operation.
Preferably, memory includes multiple bit d type flip flops being one another in series.
Preferably, memory includes multiple bit d type flip flops being connected in parallel to each other.
Preferably, processing unit further includes multiplier, adder, the first register and the second register;In the first dimension
On the first input data end and the first data output end;The second data input pin and the output of the second data in the second dimension
End;First data are inputted from the first data-in port, and the first data are multiplied by multiplier with coefficient data;Second data are from
The input of two data input pins, by the second data and product addition, after being added and value is deposited in the first register adder;
It can be exported through the second data output end under clock control with value;First data are also deposited in the second register, and
It can be exported through second output terminal under clock control.
According to second aspect, a kind of System on Chip/SoC is provided, comprising: chip circuit as described in relation to the first aspect;FPGA module,
It is coupled with the AI module, to send data from AI module or to receive data.
Preferably, the winding structure of FPGA module is multiplexed in AI Module-embedding FPGA module, to send out from AI module
It send data or receives data, all via the winding structure of the FPGA of the multiplexing.
Preferably, including first device, the processing unit for weight coefficient write-in AI module.Preferably, first device
It is realized by FPGA module.
Preferably, including second device, for reading weight coefficient from the processing unit of AI module.Preferably, second
Device is realized by FPGA module.
AI module can efficiently complete the repetitive operation of high-volume data due to using memory.
Detailed description of the invention
Fig. 1 is the schematic diagram of AI module according to an embodiment of the present invention;
Fig. 2 is the schematic diagram of processing unit;
Fig. 3 is the schematic diagram of the memory MEM in the processing unit of Fig. 2;
Fig. 4 is the schematic diagram of the memory MEM in the processing unit of Fig. 2;
Fig. 5 is a kind of structural schematic diagram of System on Chip/SoC for being integrated with FPGA and AI module;
Fig. 6 is the structural schematic diagram of FPGA circuitry;
Fig. 7 is to write weight to reason unit everywhere in AI module and read the schematic diagram of weight.
Specific embodiment
To make the technical solution of the embodiment of the present invention and becoming apparent from for advantage expression, below by drawings and examples,
Technical scheme of the present invention will be described in further detail.
In the description of the present application, term " center ", "upper", "lower", "front", "rear", "left", "right", "vertical", " water
It is flat ", "top", "bottom", "inner", the instructions such as "outside" orientation or positional relationship be to be based on the orientation or positional relationship shown in the drawings,
Be merely for convenience of description the application and simplify description, rather than the device or element of indication or suggestion meaning must have it is specific
Orientation, be constructed and operated in a specific orientation, therefore should not be understood as the limitation to the application.
Fig. 1 is the schematic diagram of AI module according to an embodiment of the present invention.In one example, AI module is systolic arrays,
That is the synchronization of data streams processor structure that flows through adjacent two-dimensional array unit.As shown in Figure 1, AI module includes, for example, 4X4
A processing unit PE.AI module can be divided into two dimensions, vertically.For convenience, can with horizontal dimensions be the first dimension,
It can be with vertical dimensions for the second dimension;It is first direction by from left to right, above be second direction downwards.With first processing units,
For the second processing unit and third processing unit, first processing units and the second processing unit along the first dimension arranged adjacent,
First output end of first processing units is coupled to the first input end of the second processing unit;First processing units and third processing
Unit is arranged along the second dimension, and the second output terminal of first processing units is coupled to the second input terminal of third processing unit.
One-dimensional data a can be sequentially input under same clock along first direction with identical second dimension along the first dimension
Unit is managed everywhere in value;Data are throughout managed in unit to be multiplied with another dimension data (coefficient) W of storage in the cells, product edge
Second dimension has in a second direction manages unit transmission everywhere in identical first dimension values, and is added each other.
The same clock signal of units shared is managed everywhere in two-dimensional array carries out operation.
It is noted that every data line in Fig. 1 can both represent the signal of single-bit, 8 (or 16,32) bits can also be represented
Signal.
AI module can pass through the local storage of dedicated path access process unit.
In one example, matrix multiplication may be implemented in two-dimensional array.In another example, two-dimensional array may be implemented
Convolution algorithm.
Fig. 2 is the schematic diagram of processing unit.As shown in Fig. 2, processing unit includes multiplier MUL, adder ADD.Data
It inputs from the first data-in port DI, is multiplied in MUL with the coefficient W being stored in coefficient memory MEM;Then, the product
It is added in adder ADD with the data P from the second data-in port PI, after being added and value is deposited in register REG1
In.Under clock control, and value S is exported through second output terminal PO.It can be through input terminal after second output terminal PO output with value S
Mouth PI inputs another underlying PE.The number of the first input data end DI and first is distributed along first direction in the first dimension
According to output end DO;It is distributed the second data input pin PI and the second data output end PO in a second direction in the second dimension.
Certainly, data a can also be deposited in register REG2, and be exported under clock control through the first output end DO
To the processing unit PE on right side.
Clock CK is used to control the treatment progress of processing unit.Array processing is synchronized by clock.
Enable signal EN is used to start or suspend the treatment progress of processing unit.Under the control of enable signal EN, everywhere
Reason unit suspends simultaneously, or colleague's starting treatment progress.
Fig. 3 is the schematic diagram of the memory MEM in the processing unit of Fig. 2.The memory can be accessed using bit-wise.
As shown in figure 3, memory includes the d type flip flop of 8 bits, coefficient data in a manner of bit stream from D input terminal input trigger, so
It is Q0-Q7 by output end Q output.Q0-Q7 can provide the coefficient data of operation.The rhythm of clock CK control trigger.
Enable signal EN is for determining whether d type flip flop starts or suspend.
Fig. 4 is the schematic diagram of the memory MEM in the processing unit of Fig. 2.The memory can be accessed using word mode.Such as
Shown in Fig. 4, memory includes the d type flip flop of 8 bit D1-D8, and coefficient data is from D input terminal input trigger, then through exporting
Holding Q output is Q0-Q7.The rhythm of clock CK control trigger.Enable signal EN is for determining whether d type flip flop starts or temporarily
Stop.It note that the CK clock of memory MEM is different from the CK clock of processing unit.
Fig. 5 is a kind of structural schematic diagram of System on Chip/SoC for being integrated with FPGA and AI module.As shown in figure 5, System on Chip/SoC
On be integrated at least one FPGA circuitry and at least one AI module.AI module can be AI module as shown in Figure 1.
Each FPGA circuitry at least one FPGA circuitry can realize the various functions such as logic, calculating, control.FPGA is utilized
Small-sized look-up table (for example, 16 × 1RAM) Lai Shixian combinational logic, each look-up table are connected to the input terminal of a d type flip flop,
Trigger drives other logic circuits or driving I/O again, thus constitutes when can not only realize combination logic function but also can realize
The basic logic unit module of sequence logic function, these intermodules interconnect or are connected to I/O module using metal connecting line.
The logic of FPGA is to load programming data by internally static storage cell to realize, stores value in a memory cell
It determines the connecting mode between the logic function and each module of logic unit or between module and I/O, and finally determines
Function achieved by FPGA.
Interface corresponding with two-dimensional convolution array is additionally provided on System on Chip/SoC, FPGA module and AI module pass through interface
Module connection.Interface module can be XBAR module, and XBAR module is for example by multiple selectors (Multiplexer) and selection position
Member composition.Interface module is also possible to FIFO (first in first out).Interface module can also be synchronizer (Synchronizer), together
Step device is for example connected in series by 2 triggers (Flip-Flop or FF).FPGA module can be AI module transfer data, provide
Control.
FPGA module and AI module can be placed side by side, and FPGA module can be AI module transfer data at this time, provide control
System;AI module can also be embedded among FPGA module, and AI module needs to be multiplexed the winding structure of FPGA module at this time, will pass through
The winding structure of the FPGA module of multiplexing sends and receivees data.
Fig. 6 is the structural schematic diagram of FPGA circuitry.As shown in fig. 6, FPGA circuitry may include having multiple programmable logic moulds
The modules such as block (LOGIC), embedded memory block (EMB), multiply-accumulator (MAC) and corresponding coiling (XBAR).Certainly, FPGA electricity
Road is additionally provided with the related resources such as clock/configuration module (trunk spine/ branch seam).If desired EMB or when MAC module, because of it
The big many of area ratio PLB, therefore several PLB modules are replaced with this EMB/MAC module.
Coiling resource XBAR is the contact of each intermodule interconnection, is evenly distributed in FPGA module.Institute in FPGA module
Some resources, PLB, EMB, MAC, IO mutual coiling are all to be had an identical interface, i.e., coiling XBAR unit comes
It realizes.From the point of view of winding mode, entire array is identical consistent, the XBAR unit formation grid of proper alignment, by institute in FPGA
There is module to be connected.
LOGIC module may include, the table for example, 86 inputs are noted, 18 registers.EMB module can be, for example,
The storage unit of 36k bit or 2 18k bits.MAC module can be, for example, 25x18 multiplier or 2 18x18 multiplication
Device.In FPGA array, there is no restriction for the accounting of each module number of LOGIC, MAC, EMB, and the size of array is also as needed, is setting
Timing is determined by practical application.
Fig. 7 is to write weight to reason unit everywhere in AI module and read the schematic diagram of weight.When needing to write weight, Ke Yili
Weight coefficient is written to AI module with the first device in left side.It, can be by AI module when needing to read weight from AI module
In weighted data sequential read out, be written second device.First device and second device can use the submodule of FPGA module
It realizes.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (10)
1. a kind of chip circuit of artificial intelligence AI module, AI module includes: the multiple processing units for being arranged in two-dimensional array
(PE), each processing unit can complete multiply-add operation;Wherein, processing unit includes enabled input terminal, for receiving enable signal,
And suspend or start the operation of processing unit according to enable signal;Processor further includes memory, for storing multiply-add operation
Data;Wherein, in two-dimensional array everywhere in the same clock signal of reason units shared carry out operation.
2. chip circuit according to claim 1, which is characterized in that memory includes multiple bit D being one another in series
Trigger.
3. chip circuit according to claim 1, which is characterized in that memory includes multiple bit D being connected in parallel to each other
Trigger.
4. chip circuit according to claim 1, which is characterized in that processing unit further includes multiplier (MUL), adder
(ADD), the first register (REG1) and the second register (REG2);The first input data end (DI) in the first dimension and
One data output end (DO);The second data input pin (PI) and the second data output end (PO) in the second dimension;First number
It is inputted according to from the first data-in port, the first data are multiplied by multiplier with coefficient data (W);Second data are from the second data
Input terminal input, by the second data and product addition, after being added and value is deposited in the first register (REG1) adder;
It can be exported through the second data output end under clock control with value;First data are also deposited in the second register, and
It can be exported through the first output end under clock control.
5. a kind of System on Chip/SoC, comprising: the chip circuit as described in one of claim 1-4;FPGA module, with the AI module
Coupling, to send data from AI module or to receive data.
6. System on Chip/SoC as claimed in claim 5, which is characterized in that be multiplexed FPGA mould in AI Module-embedding FPGA module
The winding structure of block, to send data from AI module or to receive data, all via the bobbin winder bracket of the FPGA of the multiplexing
Structure.
7. FPGA system chip according to claim 5, which is characterized in that including first device, write for weight coefficient
Enter the processing unit of AI module.
8. System on Chip/SoC according to claim 5, which is characterized in that including second device, for by weight coefficient from AI
The processing unit of module is read.
9. System on Chip/SoC according to claim 5, which is characterized in that first device is realized by FPGA module.
10. System on Chip/SoC according to claim 5, which is characterized in that second device is realized by FPGA module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910104134.0A CN109933371A (en) | 2019-02-01 | 2019-02-01 | Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910104134.0A CN109933371A (en) | 2019-02-01 | 2019-02-01 | Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109933371A true CN109933371A (en) | 2019-06-25 |
Family
ID=66985454
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910104134.0A Pending CN109933371A (en) | 2019-02-01 | 2019-02-01 | Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109933371A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5091864A (en) * | 1988-12-23 | 1992-02-25 | Hitachi, Ltd. | Systolic processor elements for a neural network |
EP3091685A1 (en) * | 2015-05-06 | 2016-11-09 | Dr. Johannes Heidenhain GmbH | Device and method for processing of serial data frames |
US20160342892A1 (en) * | 2015-05-21 | 2016-11-24 | Google Inc. | Prefetching weights for use in a neural network processor |
CN107578098A (en) * | 2017-09-01 | 2018-01-12 | 中国科学院计算技术研究所 | Neural network processor based on systolic arrays |
US20180046907A1 (en) * | 2015-05-21 | 2018-02-15 | Google Inc. | Neural Network Processor |
US20190026078A1 (en) * | 2017-07-24 | 2019-01-24 | Tesla, Inc. | Accelerated mathematical engine |
-
2019
- 2019-02-01 CN CN201910104134.0A patent/CN109933371A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5091864A (en) * | 1988-12-23 | 1992-02-25 | Hitachi, Ltd. | Systolic processor elements for a neural network |
EP3091685A1 (en) * | 2015-05-06 | 2016-11-09 | Dr. Johannes Heidenhain GmbH | Device and method for processing of serial data frames |
US20160342892A1 (en) * | 2015-05-21 | 2016-11-24 | Google Inc. | Prefetching weights for use in a neural network processor |
US20180046907A1 (en) * | 2015-05-21 | 2018-02-15 | Google Inc. | Neural Network Processor |
US20190026078A1 (en) * | 2017-07-24 | 2019-01-24 | Tesla, Inc. | Accelerated mathematical engine |
CN107578098A (en) * | 2017-09-01 | 2018-01-12 | 中国科学院计算技术研究所 | Neural network processor based on systolic arrays |
Non-Patent Citations (2)
Title |
---|
刘健等: "基于FPGA的高速浮点FFT的实现研究", 《微型机与应用》, no. 14, 25 July 2012 (2012-07-25) * |
陈欣波 等: "基于FPGA的现代数字电路设计", 西安电子科技大学出版社, pages: 135 - 139 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7418579B2 (en) | Component with a dynamically reconfigurable architecture | |
EP0132926B1 (en) | Parallel processor | |
CN107852379A (en) | For the two-dimentional router of orientation of field programmable gate array and interference networks and the router and other circuits of network and application | |
JPH05505268A (en) | Neural network with daisy chain control | |
CN112686379B (en) | Integrated circuit device, electronic apparatus, board and computing method | |
US9292640B1 (en) | Method and system for dynamic selection of a memory read port | |
CN109902063A (en) | A kind of System on Chip/SoC being integrated with two-dimensional convolution array | |
US20050267729A1 (en) | Extensible memory architecture and communication protocol for supporting multiple devices in low-bandwidth, asynchronous applications | |
CN110059797A (en) | A kind of computing device and Related product | |
CN109902835A (en) | Processing unit is provided with the artificial intelligence module and System on Chip/SoC of general-purpose algorithm unit | |
CN109857024A (en) | The unit performance test method and System on Chip/SoC of artificial intelligence module | |
KR100840030B1 (en) | Programmable logic circuit | |
CN104598404B (en) | Computing device extended method and device and expansible computing system | |
CN109902040A (en) | A kind of System on Chip/SoC of integrated FPGA and artificial intelligence module | |
CN109886416A (en) | The System on Chip/SoC and machine learning method of integrated AI's module | |
CN109933371A (en) | Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage | |
CN109902836A (en) | The failure tolerant method and System on Chip/SoC of artificial intelligence module | |
CN109933369B (en) | System chip of artificial intelligence module integrated with single instruction multiple data flow architecture | |
CN109919322A (en) | A kind of method and system chip of artificial intelligence module on test macro chip | |
CN109766293A (en) | Connect the circuit and System on Chip/SoC of FPGA and artificial intelligence module on chip | |
CN109902795A (en) | Processing unit is provided with the artificial intelligence module and System on Chip/SoC of inputoutput multiplexer | |
CN109871950A (en) | Unit has the chip circuit and System on Chip/SoC of the artificial intelligence module of bypass functionality | |
US9158731B2 (en) | Multiprocessor arrangement having shared memory, and a method of communication between processors in a multiprocessor arrangement | |
CN109919323A (en) | Edge cells have the artificial intelligence module and System on Chip/SoC of local accumulation function | |
CN101236576B (en) | Interconnecting model suitable for heterogeneous reconfigurable processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190625 |
|
RJ01 | Rejection of invention patent application after publication |