CN109933371A - Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage - Google Patents

Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage Download PDF

Info

Publication number
CN109933371A
CN109933371A CN201910104134.0A CN201910104134A CN109933371A CN 109933371 A CN109933371 A CN 109933371A CN 201910104134 A CN201910104134 A CN 201910104134A CN 109933371 A CN109933371 A CN 109933371A
Authority
CN
China
Prior art keywords
module
data
chip
processing unit
fpga
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910104134.0A
Other languages
Chinese (zh)
Inventor
连荣椿
王海力
马明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jing Wei Qi Li (beijing) Technology Co Ltd
Original Assignee
Jing Wei Qi Li (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jing Wei Qi Li (beijing) Technology Co Ltd filed Critical Jing Wei Qi Li (beijing) Technology Co Ltd
Priority to CN201910104134.0A priority Critical patent/CN109933371A/en
Publication of CN109933371A publication Critical patent/CN109933371A/en
Pending legal-status Critical Current

Links

Landscapes

  • Logic Circuits (AREA)

Abstract

A kind of chip circuit and System on Chip/SoC of the artificial intelligence AI module of the addressable local storage of its unit.In embodiment, AI module includes: the multiple processing units (PE) for being arranged in two-dimensional array, and each processing unit can complete multiply-add operation;Wherein, processing unit includes enabled input terminal, for receiving enable signal, and according to enable signal pause or the operation of starting processing unit;Processor further includes memory, for storing the data of multiply-add operation;Wherein, in two-dimensional array everywhere in the same clock signal of reason units shared carry out operation.AI module can efficiently complete the repetitive operation of high-volume data due to using memory.

Description

Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage
Technical field
The present invention relates to the artificial intelligence AI that technical field of integrated circuits more particularly to its unit may have access to local storage The chip circuit and System on Chip/SoC of module.
Background technique
In recent years, artificial intelligence welcomes a wave development wave.Artificial intelligence is that research makes computer to simulate the certain of people The subject of thought process and intelligent behavior (such as study, reasoning, thinking, planning), the main original that intelligence is realized including computer The computer for managing, being manufactured similarly to human brain intelligence enables a computer to realize higher level application.
With going deep into for artificial intelligence study and being widely popularized for application, it is necessary to release the AI module for more meeting demand.
In addition, artificial intelligence module is accessed control by processor by bus, and bus is that have certain bandwidth Limitation, such framework are difficult to adapt to the big bandwidth demand of artificial intelligence AI module.
Summary of the invention
According in a first aspect, providing a kind of chip circuit including AI module, AI module includes: to be arranged in two-dimensional array Multiple processing units, each processing unit can complete multiply-add operation;Wherein, processing unit includes enabled input terminal, for receiving Enable signal, and according to enable signal pause or the operation of starting processing unit;Processor further includes memory, for storing The data of multiply-add operation;Wherein, in two-dimensional array everywhere in the same clock signal of reason units shared carry out operation.
Preferably, memory includes multiple bit d type flip flops being one another in series.
Preferably, memory includes multiple bit d type flip flops being connected in parallel to each other.
Preferably, processing unit further includes multiplier, adder, the first register and the second register;In the first dimension On the first input data end and the first data output end;The second data input pin and the output of the second data in the second dimension End;First data are inputted from the first data-in port, and the first data are multiplied by multiplier with coefficient data;Second data are from The input of two data input pins, by the second data and product addition, after being added and value is deposited in the first register adder; It can be exported through the second data output end under clock control with value;First data are also deposited in the second register, and It can be exported through second output terminal under clock control.
According to second aspect, a kind of System on Chip/SoC is provided, comprising: chip circuit as described in relation to the first aspect;FPGA module, It is coupled with the AI module, to send data from AI module or to receive data.
Preferably, the winding structure of FPGA module is multiplexed in AI Module-embedding FPGA module, to send out from AI module It send data or receives data, all via the winding structure of the FPGA of the multiplexing.
Preferably, including first device, the processing unit for weight coefficient write-in AI module.Preferably, first device It is realized by FPGA module.
Preferably, including second device, for reading weight coefficient from the processing unit of AI module.Preferably, second Device is realized by FPGA module.
AI module can efficiently complete the repetitive operation of high-volume data due to using memory.
Detailed description of the invention
Fig. 1 is the schematic diagram of AI module according to an embodiment of the present invention;
Fig. 2 is the schematic diagram of processing unit;
Fig. 3 is the schematic diagram of the memory MEM in the processing unit of Fig. 2;
Fig. 4 is the schematic diagram of the memory MEM in the processing unit of Fig. 2;
Fig. 5 is a kind of structural schematic diagram of System on Chip/SoC for being integrated with FPGA and AI module;
Fig. 6 is the structural schematic diagram of FPGA circuitry;
Fig. 7 is to write weight to reason unit everywhere in AI module and read the schematic diagram of weight.
Specific embodiment
To make the technical solution of the embodiment of the present invention and becoming apparent from for advantage expression, below by drawings and examples, Technical scheme of the present invention will be described in further detail.
In the description of the present application, term " center ", "upper", "lower", "front", "rear", "left", "right", "vertical", " water It is flat ", "top", "bottom", "inner", the instructions such as "outside" orientation or positional relationship be to be based on the orientation or positional relationship shown in the drawings, Be merely for convenience of description the application and simplify description, rather than the device or element of indication or suggestion meaning must have it is specific Orientation, be constructed and operated in a specific orientation, therefore should not be understood as the limitation to the application.
Fig. 1 is the schematic diagram of AI module according to an embodiment of the present invention.In one example, AI module is systolic arrays, That is the synchronization of data streams processor structure that flows through adjacent two-dimensional array unit.As shown in Figure 1, AI module includes, for example, 4X4 A processing unit PE.AI module can be divided into two dimensions, vertically.For convenience, can with horizontal dimensions be the first dimension, It can be with vertical dimensions for the second dimension;It is first direction by from left to right, above be second direction downwards.With first processing units, For the second processing unit and third processing unit, first processing units and the second processing unit along the first dimension arranged adjacent, First output end of first processing units is coupled to the first input end of the second processing unit;First processing units and third processing Unit is arranged along the second dimension, and the second output terminal of first processing units is coupled to the second input terminal of third processing unit.
One-dimensional data a can be sequentially input under same clock along first direction with identical second dimension along the first dimension Unit is managed everywhere in value;Data are throughout managed in unit to be multiplied with another dimension data (coefficient) W of storage in the cells, product edge Second dimension has in a second direction manages unit transmission everywhere in identical first dimension values, and is added each other.
The same clock signal of units shared is managed everywhere in two-dimensional array carries out operation.
It is noted that every data line in Fig. 1 can both represent the signal of single-bit, 8 (or 16,32) bits can also be represented Signal.
AI module can pass through the local storage of dedicated path access process unit.
In one example, matrix multiplication may be implemented in two-dimensional array.In another example, two-dimensional array may be implemented Convolution algorithm.
Fig. 2 is the schematic diagram of processing unit.As shown in Fig. 2, processing unit includes multiplier MUL, adder ADD.Data It inputs from the first data-in port DI, is multiplied in MUL with the coefficient W being stored in coefficient memory MEM;Then, the product It is added in adder ADD with the data P from the second data-in port PI, after being added and value is deposited in register REG1 In.Under clock control, and value S is exported through second output terminal PO.It can be through input terminal after second output terminal PO output with value S Mouth PI inputs another underlying PE.The number of the first input data end DI and first is distributed along first direction in the first dimension According to output end DO;It is distributed the second data input pin PI and the second data output end PO in a second direction in the second dimension.
Certainly, data a can also be deposited in register REG2, and be exported under clock control through the first output end DO To the processing unit PE on right side.
Clock CK is used to control the treatment progress of processing unit.Array processing is synchronized by clock.
Enable signal EN is used to start or suspend the treatment progress of processing unit.Under the control of enable signal EN, everywhere Reason unit suspends simultaneously, or colleague's starting treatment progress.
Fig. 3 is the schematic diagram of the memory MEM in the processing unit of Fig. 2.The memory can be accessed using bit-wise. As shown in figure 3, memory includes the d type flip flop of 8 bits, coefficient data in a manner of bit stream from D input terminal input trigger, so It is Q0-Q7 by output end Q output.Q0-Q7 can provide the coefficient data of operation.The rhythm of clock CK control trigger. Enable signal EN is for determining whether d type flip flop starts or suspend.
Fig. 4 is the schematic diagram of the memory MEM in the processing unit of Fig. 2.The memory can be accessed using word mode.Such as Shown in Fig. 4, memory includes the d type flip flop of 8 bit D1-D8, and coefficient data is from D input terminal input trigger, then through exporting Holding Q output is Q0-Q7.The rhythm of clock CK control trigger.Enable signal EN is for determining whether d type flip flop starts or temporarily Stop.It note that the CK clock of memory MEM is different from the CK clock of processing unit.
Fig. 5 is a kind of structural schematic diagram of System on Chip/SoC for being integrated with FPGA and AI module.As shown in figure 5, System on Chip/SoC On be integrated at least one FPGA circuitry and at least one AI module.AI module can be AI module as shown in Figure 1.
Each FPGA circuitry at least one FPGA circuitry can realize the various functions such as logic, calculating, control.FPGA is utilized Small-sized look-up table (for example, 16 × 1RAM) Lai Shixian combinational logic, each look-up table are connected to the input terminal of a d type flip flop, Trigger drives other logic circuits or driving I/O again, thus constitutes when can not only realize combination logic function but also can realize The basic logic unit module of sequence logic function, these intermodules interconnect or are connected to I/O module using metal connecting line. The logic of FPGA is to load programming data by internally static storage cell to realize, stores value in a memory cell It determines the connecting mode between the logic function and each module of logic unit or between module and I/O, and finally determines Function achieved by FPGA.
Interface corresponding with two-dimensional convolution array is additionally provided on System on Chip/SoC, FPGA module and AI module pass through interface Module connection.Interface module can be XBAR module, and XBAR module is for example by multiple selectors (Multiplexer) and selection position Member composition.Interface module is also possible to FIFO (first in first out).Interface module can also be synchronizer (Synchronizer), together Step device is for example connected in series by 2 triggers (Flip-Flop or FF).FPGA module can be AI module transfer data, provide Control.
FPGA module and AI module can be placed side by side, and FPGA module can be AI module transfer data at this time, provide control System;AI module can also be embedded among FPGA module, and AI module needs to be multiplexed the winding structure of FPGA module at this time, will pass through The winding structure of the FPGA module of multiplexing sends and receivees data.
Fig. 6 is the structural schematic diagram of FPGA circuitry.As shown in fig. 6, FPGA circuitry may include having multiple programmable logic moulds The modules such as block (LOGIC), embedded memory block (EMB), multiply-accumulator (MAC) and corresponding coiling (XBAR).Certainly, FPGA electricity Road is additionally provided with the related resources such as clock/configuration module (trunk spine/ branch seam).If desired EMB or when MAC module, because of it The big many of area ratio PLB, therefore several PLB modules are replaced with this EMB/MAC module.
Coiling resource XBAR is the contact of each intermodule interconnection, is evenly distributed in FPGA module.Institute in FPGA module Some resources, PLB, EMB, MAC, IO mutual coiling are all to be had an identical interface, i.e., coiling XBAR unit comes It realizes.From the point of view of winding mode, entire array is identical consistent, the XBAR unit formation grid of proper alignment, by institute in FPGA There is module to be connected.
LOGIC module may include, the table for example, 86 inputs are noted, 18 registers.EMB module can be, for example, The storage unit of 36k bit or 2 18k bits.MAC module can be, for example, 25x18 multiplier or 2 18x18 multiplication Device.In FPGA array, there is no restriction for the accounting of each module number of LOGIC, MAC, EMB, and the size of array is also as needed, is setting Timing is determined by practical application.
Fig. 7 is to write weight to reason unit everywhere in AI module and read the schematic diagram of weight.When needing to write weight, Ke Yili Weight coefficient is written to AI module with the first device in left side.It, can be by AI module when needing to read weight from AI module In weighted data sequential read out, be written second device.First device and second device can use the submodule of FPGA module It realizes.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims (10)

1. a kind of chip circuit of artificial intelligence AI module, AI module includes: the multiple processing units for being arranged in two-dimensional array (PE), each processing unit can complete multiply-add operation;Wherein, processing unit includes enabled input terminal, for receiving enable signal, And suspend or start the operation of processing unit according to enable signal;Processor further includes memory, for storing multiply-add operation Data;Wherein, in two-dimensional array everywhere in the same clock signal of reason units shared carry out operation.
2. chip circuit according to claim 1, which is characterized in that memory includes multiple bit D being one another in series Trigger.
3. chip circuit according to claim 1, which is characterized in that memory includes multiple bit D being connected in parallel to each other Trigger.
4. chip circuit according to claim 1, which is characterized in that processing unit further includes multiplier (MUL), adder (ADD), the first register (REG1) and the second register (REG2);The first input data end (DI) in the first dimension and One data output end (DO);The second data input pin (PI) and the second data output end (PO) in the second dimension;First number It is inputted according to from the first data-in port, the first data are multiplied by multiplier with coefficient data (W);Second data are from the second data Input terminal input, by the second data and product addition, after being added and value is deposited in the first register (REG1) adder; It can be exported through the second data output end under clock control with value;First data are also deposited in the second register, and It can be exported through the first output end under clock control.
5. a kind of System on Chip/SoC, comprising: the chip circuit as described in one of claim 1-4;FPGA module, with the AI module Coupling, to send data from AI module or to receive data.
6. System on Chip/SoC as claimed in claim 5, which is characterized in that be multiplexed FPGA mould in AI Module-embedding FPGA module The winding structure of block, to send data from AI module or to receive data, all via the bobbin winder bracket of the FPGA of the multiplexing Structure.
7. FPGA system chip according to claim 5, which is characterized in that including first device, write for weight coefficient Enter the processing unit of AI module.
8. System on Chip/SoC according to claim 5, which is characterized in that including second device, for by weight coefficient from AI The processing unit of module is read.
9. System on Chip/SoC according to claim 5, which is characterized in that first device is realized by FPGA module.
10. System on Chip/SoC according to claim 5, which is characterized in that second device is realized by FPGA module.
CN201910104134.0A 2019-02-01 2019-02-01 Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage Pending CN109933371A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910104134.0A CN109933371A (en) 2019-02-01 2019-02-01 Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910104134.0A CN109933371A (en) 2019-02-01 2019-02-01 Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage

Publications (1)

Publication Number Publication Date
CN109933371A true CN109933371A (en) 2019-06-25

Family

ID=66985454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910104134.0A Pending CN109933371A (en) 2019-02-01 2019-02-01 Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage

Country Status (1)

Country Link
CN (1) CN109933371A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5091864A (en) * 1988-12-23 1992-02-25 Hitachi, Ltd. Systolic processor elements for a neural network
EP3091685A1 (en) * 2015-05-06 2016-11-09 Dr. Johannes Heidenhain GmbH Device and method for processing of serial data frames
US20160342892A1 (en) * 2015-05-21 2016-11-24 Google Inc. Prefetching weights for use in a neural network processor
CN107578098A (en) * 2017-09-01 2018-01-12 中国科学院计算技术研究所 Neural network processor based on systolic arrays
US20180046907A1 (en) * 2015-05-21 2018-02-15 Google Inc. Neural Network Processor
US20190026078A1 (en) * 2017-07-24 2019-01-24 Tesla, Inc. Accelerated mathematical engine

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5091864A (en) * 1988-12-23 1992-02-25 Hitachi, Ltd. Systolic processor elements for a neural network
EP3091685A1 (en) * 2015-05-06 2016-11-09 Dr. Johannes Heidenhain GmbH Device and method for processing of serial data frames
US20160342892A1 (en) * 2015-05-21 2016-11-24 Google Inc. Prefetching weights for use in a neural network processor
US20180046907A1 (en) * 2015-05-21 2018-02-15 Google Inc. Neural Network Processor
US20190026078A1 (en) * 2017-07-24 2019-01-24 Tesla, Inc. Accelerated mathematical engine
CN107578098A (en) * 2017-09-01 2018-01-12 中国科学院计算技术研究所 Neural network processor based on systolic arrays

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘健等: "基于FPGA的高速浮点FFT的实现研究", 《微型机与应用》, no. 14, 25 July 2012 (2012-07-25) *
陈欣波 等: "基于FPGA的现代数字电路设计", 西安电子科技大学出版社, pages: 135 - 139 *

Similar Documents

Publication Publication Date Title
US7418579B2 (en) Component with a dynamically reconfigurable architecture
EP0132926B1 (en) Parallel processor
CN107852379A (en) For the two-dimentional router of orientation of field programmable gate array and interference networks and the router and other circuits of network and application
JPH05505268A (en) Neural network with daisy chain control
CN112686379B (en) Integrated circuit device, electronic apparatus, board and computing method
US9292640B1 (en) Method and system for dynamic selection of a memory read port
CN109902063A (en) A kind of System on Chip/SoC being integrated with two-dimensional convolution array
US20050267729A1 (en) Extensible memory architecture and communication protocol for supporting multiple devices in low-bandwidth, asynchronous applications
CN110059797A (en) A kind of computing device and Related product
CN109902835A (en) Processing unit is provided with the artificial intelligence module and System on Chip/SoC of general-purpose algorithm unit
CN109857024A (en) The unit performance test method and System on Chip/SoC of artificial intelligence module
KR100840030B1 (en) Programmable logic circuit
CN104598404B (en) Computing device extended method and device and expansible computing system
CN109902040A (en) A kind of System on Chip/SoC of integrated FPGA and artificial intelligence module
CN109886416A (en) The System on Chip/SoC and machine learning method of integrated AI's module
CN109933371A (en) Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage
CN109902836A (en) The failure tolerant method and System on Chip/SoC of artificial intelligence module
CN109933369B (en) System chip of artificial intelligence module integrated with single instruction multiple data flow architecture
CN109919322A (en) A kind of method and system chip of artificial intelligence module on test macro chip
CN109766293A (en) Connect the circuit and System on Chip/SoC of FPGA and artificial intelligence module on chip
CN109902795A (en) Processing unit is provided with the artificial intelligence module and System on Chip/SoC of inputoutput multiplexer
CN109871950A (en) Unit has the chip circuit and System on Chip/SoC of the artificial intelligence module of bypass functionality
US9158731B2 (en) Multiprocessor arrangement having shared memory, and a method of communication between processors in a multiprocessor arrangement
CN109919323A (en) Edge cells have the artificial intelligence module and System on Chip/SoC of local accumulation function
CN101236576B (en) Interconnecting model suitable for heterogeneous reconfigurable processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190625

RJ01 Rejection of invention patent application after publication