CN110197269A - Integrated circuit chip device and Related product - Google Patents
Integrated circuit chip device and Related product Download PDFInfo
- Publication number
- CN110197269A CN110197269A CN201810164331.7A CN201810164331A CN110197269A CN 110197269 A CN110197269 A CN 110197269A CN 201810164331 A CN201810164331 A CN 201810164331A CN 110197269 A CN110197269 A CN 110197269A
- Authority
- CN
- China
- Prior art keywords
- data block
- circuit
- data
- based process
- vertical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Mathematical Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Optimization (AREA)
- Neurology (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Image Processing (AREA)
- Logic Circuits (AREA)
Abstract
It includes: main process task circuit and multiple based process circuits that present disclosure, which provides a kind of integrated circuit chip device and Related product, the integrated circuit chip device,;It includes the first mapping circuit that at least one circuit, which includes: the main process task circuit, in the main process task circuit or multiple based process circuits, at least one circuit includes the second mapping circuit in the multiple based process circuit, and first mapping circuit and second mapping circuit are used to execute the compression processing of each data in neural network computing;The multiple based process circuit is in array distribution;Each based process circuit and other adjacent based process circuit connections, m based process circuit of n based process circuit of the 1st row of main process task circuit connection, n based process circuit of m row and the 1st column.The advantage that the technical solution that present disclosure provides has calculation amount small, low in energy consumption.
Description
Technical field
Present disclosure is related to field of neural networks more particularly to a kind of integrated circuit chip device and Related product.
Background technique
Artificial neural network (Artificial Neural Network, i.e. ANN), it is artificial since being the 1980s
The research hotspot that smart field rises.It is abstracted human brain neuroid from information processing angle, and it is simple to establish certain
Model is formed different networks by different connection types.Neural network or class are also often directly referred to as in engineering and academia
Neural network.Neural network is a kind of operational model, is constituted by being coupled to each other between a large amount of node (or neuron).It is existing
Neural network operation be based on CPU (Central Processing Unit, central processing unit) or GPU (English:
Graphics Processing Unit, graphics processor) Lai Shixian neural network operation, such operation it is computationally intensive,
Power consumption is high.
Summary of the invention
Present disclosure embodiment provides a kind of integrated circuit chip device and Related product, can promote the processing of computing device
Speed improves efficiency.
In a first aspect, providing a kind of integrated circuit chip device, the integrated circuit chip device includes: main process task circuit
And multiple based process circuits;The main process task circuit includes the first mapping circuit, in the multiple based process circuit extremely
A few circuit (i.e. part or all of based process circuit) includes the second mapping circuit, first mapping circuit and described
Second mapping circuit is used to execute the compression processing of each data in neural network computing;
The multiple based process circuit is in array distribution;Each based process circuit and other adjacent based process electricity
Road connection, the n based process circuit and the 1st of n based process circuit of the 1st row of main process task circuit connection, m row
M based process circuit of column;
The main process task circuit, for obtaining input block, convolution kernel data block and convolution instruction, according to the volume
The input block is divided into vertical data block by product instruction, and the convolution kernel data block is divided into lateral data block;According to
It determines that the first mapping circuit of starting handles the first data block according to the operation control of convolution instruction, obtains that treated
First data block;First data block includes the lateral data block and/or the vertical data block;Refer to according to the convolution
By treated, the first data block is sent at least one base in the based process circuit being connected with the main process task circuit for order
Plinth processing circuit;
The multiple based process circuit, the operation control for instructing according to the convolution determine whether that starting second is reflected
Transmit-receive radio road handles the second data block, and according to treated, the second data block executes the fortune in neural network in a parallel fashion
Calculation obtains operation result, and the operation result is described by giving with the based process circuit transmission of the main process task circuit connection
Main process task circuit;Second data block is the data that the reception main process task circuit that the based process circuit determines is sent
Block, second data block and treated first data block associated;
The main process task circuit obtains the instruction results of the convolution instruction for handling the operation result.
Second aspect, provides a kind of neural network computing device, and the neural network computing device includes one or more
The integrated circuit chip device that first aspect provides.
The third aspect, provides a kind of combined treatment device, and the combined treatment device includes: the nerve that second aspect provides
Network operations device, general interconnecting interface and general processing unit;
The neural network computing device is connect by the general interconnecting interface with the general processing unit.
Fourth aspect, provides a kind of chip, the device or third of the device of the integrated chip first aspect, second aspect
The device of aspect.
5th aspect, provides a kind of electronic equipment, the electronic equipment includes the chip of fourth aspect.
6th aspect, provides a kind of operation method of neural network, and the method is applied in integrated circuit chip device,
The integrated circuit chip device includes: integrated circuit chip device described in first aspect, the integrated circuit chip device
For executing the operation of neural network.
As can be seen that operation will be carried out again by providing mapping circuit by present disclosure embodiment after data block compression processing, section
Transfer resource and computing resource are saved, so it is with low in energy consumption, the small advantage of calculation amount.
Detailed description of the invention
Fig. 1 a is a kind of integrated circuit chip device structural schematic diagram.
Fig. 1 b is another integrated circuit chip device structural schematic diagram.
Fig. 1 c is a kind of structural schematic diagram of based process circuit.
Fig. 1 d is a kind of structural schematic diagram of main process task circuit.
Fig. 2 a is a kind of application method schematic diagram of based process circuit.
Fig. 2 b is a kind of main process task circuit transmission schematic diagram data.
Fig. 2 c is Matrix Multiplication with the schematic diagram of vector.
Fig. 2 d is a kind of integrated circuit chip device structural schematic diagram.
Fig. 2 e is another integrated circuit chip device structural schematic diagram.
Fig. 2 f is Matrix Multiplication with the schematic diagram of matrix.
Fig. 3 a is convolution input data schematic diagram.
Fig. 3 b is convolution kernel schematic diagram.
Fig. 3 c is the operation window schematic diagram of a three-dimensional data block of input data.
Fig. 3 d is another operation window schematic diagram of a three-dimensional data block of input data.
Fig. 3 e is the another operation window schematic diagram of a three-dimensional data block of input data
Fig. 4 is a kind of structural schematic diagram for neural network chip that present disclosure embodiment stream provides;
Fig. 5 a- Fig. 5 b is the structural schematic diagram of two kinds of mapping circuits provided by the embodiments of the present application.
Specific embodiment
In order to make those skilled in the art more fully understand present disclosure scheme, below in conjunction in present disclosure embodiment
The technical solution in present disclosure embodiment is clearly and completely described in attached drawing, it is clear that described embodiment is only
Present disclosure a part of the embodiment, instead of all the embodiments.Based on the embodiment in present disclosure, those of ordinary skill in the art
Every other embodiment obtained without creative efforts belongs to the range of present disclosure protection.
In the device that first aspect provides, the main process task circuit, for obtaining input block, convolution kernel data block
And convolution instruction, it is instructed according to the convolution and the input block is divided into vertical data block, by the convolution nucleus number
Lateral data block is divided into according to block;Operation control according to convolution instruction determines the first mapping circuit of starting to the first data
Block is handled, first data block that obtains that treated;First data block includes the lateral data block and/or described perpendicular
To data block;According to convolution instruction, by treated, the first data block is sent to the basis being connected with the main process task circuit
At least one based process circuit in processing circuit;
The multiple based process circuit, the operation control for instructing according to the convolution determine whether that starting second is reflected
Transmit-receive radio road handles the second data block, and according to treated, the second data block executes the fortune in neural network in a parallel fashion
Calculation obtains operation result, and the operation result is described by giving with the based process circuit transmission of the main process task circuit connection
Main process task circuit;Second data block is the data that the reception main process task circuit that the based process circuit determines is sent
Block, second data block and treated first data block associated;
The main process task circuit obtains the instruction results of the convolution instruction for handling the operation result.
In the device that first aspect provides, when first data block includes lateral data block and vertical data block,
The main process task circuit, be specifically used for starting first mapping circuit to the lateral data block and the vertical data block into
The identification data block of lateral data block that row processing obtains that treated and the transverse direction data block associated, vertical data that treated
The identification data block of block and the vertical data block associated;Lateral data block that treated by described in and the transverse direction data block are closed
The identification data block of connection carries out deconsolidation process and obtains multiple basic data blocks and the basic data block respectively associated mark
Data block, by the multiple basic data block and the multiple basic data block respectively associated identification data block be distributed to
Its based process circuit connected, by treated the vertical data block and the identification data block of the vertical data block associated
It broadcasts to based process circuit connected to it;Wherein, direct index or step-length index specifically can be used in the identification data block
Mode indicate, optionally go back list (List of Lists, LIL), the list of coordinates (Coordinate of freelist
List, COO), compression loose line (Compressed Sparse Row, CSR), sparse column (the Compressed Sparse of compression
Column, CSC), (ELL Pack, ELL) and mixing (Hybird, HYB) etc. modes indicate that the application is without limitation.
By taking the identification data block indicates in the way of direct index as an example, the identification data block is concretely by 0
The data block constituted with 1, wherein the absolute value for the data (such as weight or input neuron) for including in 0 expression data block is less than
Or it is equal to first threshold, the absolute value for the data (such as weight or input neuron) for including in 1 expression data block is greater than the first threshold
Value, first threshold is user side or device side is customized is randomly provided, such as 0.05,0 etc..
To save volume of transmitted data, improve data transfer efficiency, in the main process task circuit to the based process circuit
Send data during, specifically can by the multiple basic data block target data and the multiple basic data block
Respective associated identification data block is distributed to based process circuit connected to it;It is optional, it can also will described that treated be vertical
The identification data block of target data and the vertical data block associated in data block is broadcasted to based process electricity connected to it
Road.Wherein, the target data refers to that absolute value is greater than the data of first threshold in data block, or refers to data block (here
Lateral data block that concretely treated or treated vertical data block) in non-zero data.
Correspondingly, the based process circuit is specifically used for starting second mapping circuit according to the vertical data
The associated identification data block of block and the associated mark data of the basic data block obtain connection identifier data block, and according to described
Connection identifier data block is handled to obtain treated vertical data block to the vertical data block and the basic data block
And basic data block;Are executed by convolution algorithm and obtains operation result for treated the vertical data block and basic data block, it will
The operation result is sent to the main process task circuit;
The main process task circuit obtains described instruction result for handling the operation result.
For example, lateral data block is M1Row N1The matrix of column, basic data block M2Row N2The matrix of column, wherein M1>M2, N1
>N2.Correspondingly, the identification data block of the transverse direction data block associated is equally also M1Row N1The matrix of column, basic data block association
Identification data block be similarly M2Row N2The matrix of column.By taking basic data block is the matrix of 2*2 as an example, it is set asThe
One threshold value is 0.05, then the associated identification data block of the basic data block isIt is reflected about the first mapping circuit and second
The processing of data block will be specifically addressed in transmit-receive radio road later.
In the device that first aspect provides, when first data block includes lateral data block, the main process task electricity
Road is handled to obtain treated lateral data block to the lateral data block specifically for starting first mapping circuit
And the identification data block of the transverse direction data block associated, or starting first mapping circuit is according to the lateral number prestored
Handled to obtain treated lateral data block to the lateral data block according to the associated identification data block of block;By the processing
The identification data block of lateral data block and the transverse direction data block associated afterwards carries out deconsolidation process and obtains multiple basic data blocks
And the respective associated identification data block of the basic data block, by the multiple basic data block and the multiple basic number
According to block, respectively associated identification data block is distributed to based process circuit connected to it, by the vertical data block broadcast to
Its based process circuit connected;
It is associated according to the basic data block to be specifically used for starting second mapping circuit for the based process circuit
Identification data block handles the vertical data block, the vertical data block that obtains that treated;That treated is vertical to described
Data block and treated the basic data block execute convolution algorithm and obtain operation result, and the operation result is sent to institute
State main process task circuit.
In an alternative embodiment, the main process task circuit, also particularly useful for by the vertical data block or that treated is perpendicular
To the identification data block of data block and the vertical data block associated carry out deconsolidation process obtain the vertical data block of multiple portions with
And the respective associated identification data block of the multiple vertical data block in part;By the vertical data block in the multiple part and described
Respectively associated identification data block by one or many is broadcast to the based process circuit to the vertical data block of multiple portions;Its
In, the multiple vertical data block combinations in part form the vertical data block or treated vertical data block.
Correspondingly, it is vertical according to the part to be specifically used for starting second mapping circuit for the based process circuit
The identification data block of data block associated and the associated identification data block of the basic data block obtain connection identifier data block;Root
Handled to obtain that treated to the vertical data block in the part and the basic data block according to the connection identifier data
The vertical data block in part and treated basic data block;To treated the vertical data block in part and the processing
Basic data block afterwards executes convolution algorithm.
Wherein, which is by the associated identification data block of the basic data block and the part
The identification data block of vertical data block associated carries out the data block obtained by element and operation.Optionally, the connection identifier number
It is used to indicate that data in two data blocks (specially basic data block and vertical data block) to be all larger than the number of absolute value according to block
According to.Specifically it is described in detail later.
For example, the matrix that the identification data block of lateral data block associated is 2*3Partially vertical data block associated
Identification data block be 2*2 matrixThen the corresponding connection identifier data block obtained is
In the device that first aspect provides, when first data block includes vertical data block, the main process task electricity
Road is handled the vertical data block specifically for starting first mapping circuit, the vertical data that obtain that treated
The identification data block of block and the vertical data block associated, or starting first mapping circuit are described vertical according to what is prestored
The identification data block of data block associated is handled to obtain treated vertical data block to the vertical data block;To the cross
Deconsolidation process, which is carried out, to data block obtains multiple basic data blocks;The multiple basic data block is distributed to base connected to it
Plinth processing circuit, by the identification data block of treated the vertical data block and the vertical data block associated broadcast to its
The based process circuit of connection;
The based process circuit, specifically for starting second mapping circuit according to the vertical data block associated
Identification data block is handled to obtain treated basic data block to the basic data block;To treated the vertical number
Inner product operation is executed according to block and treated the basic data block and obtains operation result, the operation result is sent to described
Main process task circuit.
In an alternative embodiment, the main process task circuit, also particularly useful for will treated the vertical data block and should
The identification data block of vertical data block associated carries out deconsolidation process and obtains the vertical data block of multiple portions and the multiple part
The identification data block of vertical data block associated;By the vertical data block of the vertical data block in the multiple part and the multiple part
Respective associated identification data block is broadcast to the based process circuit by one or many;Wherein, the multiple part is perpendicular
The vertical data block or treated vertical data block are formed to data block combinations.
Correspondingly, the based process circuit is specifically used for the identification data block according to the vertical data block associated in the part
Handled to obtain treated basic data block to the basic data block;To treated basic data block and the institute
It states the vertical data block in part and executes inner product operation.
In the device that first aspect provides, the main process task circuit is specifically used for the vertical data block (concretely
The vertical data block or treated vertical data block) pass through a broadcast transmission to the based process connected to it
Circuit.
In the device that first aspect provides, the based process circuit is specifically used for (similarly may be used the basic data block
For the basic data block or treated basic data block) it with the vertical data block executes inner product and handles to obtain inner product processing and tie
The inner product processing result is added up and obtains operation result, the operation result is sent to the main process task circuit by fruit.
In the device that first aspect provides, the based process circuit, is specifically used for the basic data block and this is vertical
Data block executes product calculation and obtains result of product, and the result of product is added up and obtains operation result, by the operation result
It is sent to the main process task circuit;
The main process task circuit arranges the accumulation result for obtaining accumulation result after adding up to the operation result
Obtain described instruction result.
In the device that first aspect provides, the main process task circuit is more specifically for the vertical data block to be divided into
A vertical data block in part, by the vertical data block in the multiple part by repeatedly broadcasting to the based process circuit;It is described
The vertical data block combinations of multiple portions form the vertical data block.
In the device that first aspect provides, the based process circuit is specifically used for the vertical data block in the part (tool
Body can be the vertical data block in part or treated the vertical data block in part) inner product is executed with the basic data block handles
After obtain inner product processing result, partial arithmetic result is obtained by the inner product processing result is cumulative, by the partial arithmetic result
It is sent to the main process task circuit.Here basic data block is by taking core 3*3 as an example, the vertical data block in the part by taking 3*3 matrix as an example,
It distinguishes 3*3 matrix and core 3*3 executes corresponding position multiplication, then its corresponding inner product result has 3 inner product processing results,
3 inner product processing results are added up and obtain partial arithmetic result.3 inner product processing result Out0 (the 0th row of 3*3 matrix and cores
The inner product of the 0th row of 3*3), Out1 (inner product of 3*3 matrix the 1st row and the 1st row of core 3*3), Out2 (the 2nd row of 3*3 matrix and core 3*3
The inner product of 2nd row) it is specifically as follows:
Out0=r00*k0 [0]+r01*k0 [1]+r02*k0 [2]
Out1=r10*k1 [0]+r11*k1 [1]+r12*k1 [2]
Out2=r20*k2 [0]+r21*k2 [1]+r22*k2 [2]
Wherein, the r of r00 indicates the vertical data block in part, and 00 indicates the 0th column element of the 0th row.
K0 [0], k indicate basic data block, 0 [0] indicate the 0th row the 0th column element;
Partial arithmetic result=Out0+Out1+Out2.
In the device that first aspect provides, the based process circuit is specifically used for the multiplexing vertical data in the n times part
Block executes the vertical data block in the part and the n basic data block inner product operation obtains n part processing result, by n part
Processing result obtains n partial arithmetic result after adding up respectively, and the n partial arithmetic result is sent to main process task circuit,
The n is the integer more than or equal to 2.
Here for basic data block by taking p core 3*3 as an example, the vertical data block in the part is multiplexed p 3*3 by taking 3*3 matrix as an example
Matrix executes p corresponding position multiplication with core 3*3 respectively, and each operation, that is, corresponding inner product result has p inner product as a result, 3
A inner product result forms one group of inner product operation as a result, every group in p group of 3 inner product results, which are added up, obtains p partial arithmetic knot
Fruit.
In the device that first aspect provides, the main process task circuit includes: buffer circuit on master register or main leaf;
The based process circuit includes: base register or basic on piece buffer circuit.
In the device that first aspect provides, the main process task circuit includes: vector operation device circuit, arithmetic logic unit
One of circuit, accumulator circuit, matrix transposition circuit, direct memory access circuit or data rearrangement circuit or any group
It closes.
In the device that first aspect provides, the input block and the available tensor table of the convolution kernel data block
Show, concretely: a kind of or any combination in vector, matrix, three-dimensional data block, 4 D data block and n dimensional data block.
A refering to fig. 1, Fig. 1 a are a kind of integrated circuit chip device that present disclosure provides, the integrated circuit chip device packet
Include: main process task circuit and multiple based process circuits, the multiple based process circuit are arranged in array (m*n array), wherein
M, the value range of n is that at least one value is more than or equal to 2 in integer and m, n more than or equal to 1.For m*n array distribution
Multiple based process circuits, each based process circuit and adjacent based process circuit connection, the main process task circuit connection
K based process circuit of multiple based process circuits, the k based process circuit can be with are as follows: at n basis of the 1st row
Manage m based process circuit of circuit, n based process circuit of m row and the 1st column.Integrated circuit as shown in Figure 1a
Chip apparatus, main process task circuit include the first mapping circuit, and first mapping circuit is used to carry out compression processing to data, with
Obtain treated data and mark data.Whether the absolute value that the mark data is used to indicate the data is greater than the first threshold
Value.Further, the main process task circuit can only by treated data, (concretely absolute value be greater than the number of first threshold
According to) and the mark data of the data correlation be sent to based process circuit.Advantage is: reduction is sent in based process circuit
The data volume of data processing is carried out, data processing rate is promoted.The first threshold is user side or the customized setting of device side,
Such as 0.05,0.5 etc., without limitation.
For example, the input data of main process task circuit is matrix data blockBy the first mapping
It can get that treated that matrix data block is after processing of circuitThe associated mark data of matrix data block
Block isSpecific processing about the first mapping circuit will be described in detail later.
Correspondingly, when main process task circuit distributes data to based process circuit, 1 and 0.5 the two data can be only sent,
And non-sent treated matrix data block, 8 data;It also needs the associated identification data block of matrix data block together simultaneously
It is sent to based process circuit, so as to based process circuit identification data block and received two data (1 He based on the received
0.5), correspondence knows that the two data are located at the position of original matrix data block.It is that the based process circuit can be according to reception
Identification data block and received data, the corresponding matrix data block that restores in main process task circuit that treated.
At least one based process circuit in multiple tandem circuits is (some or all of in i.e. multiple based process circuits
Based process circuit) it may include the second mapping circuit.Specifically, can have part basis processing in multiple based process circuits
Circuit includes the second mapping circuit, such as in optinal plan, can be by k the second mapping circuit of based process circuit configuration, this
N based process circuit of sample can be each responsible for carrying out compression processing step to the data of m based process circuit of this column.This
Setting can be improved operation efficiency, reduce power consumption, because for n based process circuit of the 1st row, since it connects at first
The data of main process task circuit transmission are received, then the data received progress compression processing can be reduced subsequent based process
The calculation amount of circuit and the amount transmitted with the data of subsequent based process circuit, similarly, for m based process of first row
The second mapping circuit of circuit configuration also has the advantages that calculation amount is small and low in energy consumption.In addition, main process task circuit can according to the structure
To use dynamic data sending strategy, for example, m based process circuit broadcast data of the main process task circuit to the 1st column, main place
It manages circuit and sends distribution data to n based process circuit of the 1st row.Specific processing about the second mapping circuit will later
It is described in detail.
The main process task circuit, for execute each continuous operation in neural network computing and with it is coupled
The based process circuit transmission data;It above-mentioned continuous operation but is not limited to: accumulating operation, ALU operation, activation operation etc.
Operation.
The multiple based process circuit, for executing the fortune in neural network in a parallel fashion according to the data of transmission
It calculates, and by operation result by giving the main process task circuit with the based process circuit transmission of the main process task circuit connection.On
Stating the operation that parallel mode executes in neural network includes but is not limited to: inner product operation, matrix or vector multiplication operation etc..
Main process task circuit may include: data transmitting line, data receiver circuit or interface, which can be with
Integrated transverse direction data distribution circuit and vertical data distribution circuit, certainly in practical applications, lateral data distribution circuit with
And vertical data distribution circuit can also be respectively set.For lateral data, that is, need to send according to line direction (or lateral)
To the data of each based process circuit, such as the basis being sent to lateral data in Fig. 1 a in any row or multirow in m row
Processing circuit.For vertical data, that is, need selectively to be sent to part basis processing circuit according to column direction (or vertical)
Data, specifically, the convolution input data of convolution algorithm needs to be sent to all based process circuits such as convolution algorithm,
All its is vertical data, and convolution kernel needs are selectively sent to part basis data block, so convolution kernel is lateral data.
Lateral data specifically select to be sent to that based process circuit mode can by main process task circuit according to load and its
He is specifically determined the method for salary distribution.It, can be by data with the forms of broadcasting for the sending method of vertical data or lateral data
It is sent to each based process circuit.(in practical applications, lateral/vertical data are sent to by way of once broadcasting
Lateral/vertical data can also be sent to each based process electricity by way of repeatedly broadcasting by each based process circuit
Road, present disclosure specific embodiment are not intended to limit the number of above-mentioned broadcast).Optionally, main for above-mentioned transverse direction/vertical data
Processing circuit can also selectively be sent to part basis processing circuit.
Main process task circuit (as shown in Figure 1 d) may include register and/or on piece buffer circuit, and the main process task circuit is also
It may include: control circuit, vector operation device circuit, ALU (arithmetic and logic unit, arithmetic logic unit)
The circuits such as circuit, accumulator circuit, DMA (Direct Memory Access, direct memory access) circuit, are actually being answered certainly
In, above-mentioned main process task circuit can also be added, conversion circuit (such as matrix transposition circuit), data rearrangement circuit or activation electricity
Road etc. others circuit.
Each based process circuit may include base register and/or basic on piece buffer circuit;Each based process
Circuit can also include: one or any combination in inner product operation device circuit, vector operation device circuit, accumulator circuit etc..On
Stating inner product operation device circuit, vector operation device circuit, accumulator circuit can be integrated circuit, above-mentioned inner product operation device electricity
Road, vector operation device circuit, accumulator circuit may be the circuit being separately provided.
Optionally, the cumulative fortune of inner product operation can be executed for the accumulator circuit of n based process circuit of m row
It calculates, because this product knot for arranging all based process circuits can be received for m row based process circuit
Fruit, and the accumulating operation of inner product operation is executed to the accumulating operation of inner product operation by n based process circuit of m row, this
Sample can effectively distribute computing resource, have the advantages that save power consumption.This technical solution is larger particularly with m quantity
Shi Gengwei is applicable in.
The circuit of execution can be distributed by main process task circuit for the compression processing of data, specifically, can be by aobvious
Show or implicit mode distribute the circuit of execution, for display mode, main process task circuit can configure special instruction or
Instruction determines when based process circuit receives the special instruction or instruction and executes data compression process, such as based process electricity
When road does not receive special instruction or instruction, the compression processing for not executing data is determined.For another example, it can be held in a manner of hint
Row, for example, it (i.e. containing 0, or includes the data less than preset threshold greater than present count that based process circuit, which receives sparse data,
Amount) and determine when needing to be implemented inner product operation, compression processing will be carried out to sparse data.It is special for the mode of display configuration
Instruction or instruction can configure a descending series, and the every of the descending series passes through a based process circuit, and numerical value subtracts 1, base
Plinth processing circuit reads the value of the descending series, if the value is greater than zero, then data compression process is executed, as the value is equal to or less than
Zero, then do not execute data compression process.This setting is configured according to the based process circuit of array distribution, such as the
For m based process circuit of i column, main process task circuit needs the based process circuit of front 5 to execute data compression process, then
Main process task circuit issues a special instruction, which includes descending series, and the initial value of the descending series can be
5, then every to pass through a based process circuit, the value of descending series subtracts 1, when to the 5th based process circuit, the descending series
Value be 1, when to the 6th based process circuit, which is 0, and the 6th based process circuit will not be execute should at this time
Data compression process, such mode can allow the executing subject of the dynamic configuration data compression processing of main process task circuit with
And execute number.
Present disclosure one embodiment provides a kind of integrated circuit chip device, including a main process task circuit (can also claim
For master unit) and multiple based process circuits (being referred to as base unit);The structure of embodiment is as shown in Figure 1 b;Wherein,
It is the internal structure of the neural network computing device in dotted line frame;The arrow of grey filling indicates at main process task circuit and basis
The data transmission path between gate array is managed, hollow arrow indicates each based process circuit in based process gate array
Data transmission path between (adjacent foundation processing circuit).Wherein, the length and width length of based process gate array can be different,
I.e. the value of m, n can be different, naturally it is also possible to which identical, present disclosure is not intended to limit the occurrence of above-mentioned value.
The circuit structure of based process circuit is as illustrated in figure 1 c;Dotted line frame indicates the boundary of based process circuit in figure, with
The block arrow that dotted line frame is intersected indicates that (be directed toward in dotted line frame is input channel to data I/O channel, it is indicated that dotted line frame is defeated
Channel out);Rectangle frame in dotted line frame indicates storage unit circuit (register and/or on piece caching), including input data 1,
Input data 2, multiplication or inner product are as a result, cumulative data;Diamond indicates calculator circuit, including multiplication or inner product operation device,
Adder.
In the present embodiment, the neural network computing device includes a main process task circuit and 16 based process circuits
(16 based process circuits are just to for example, in practical applications, can use other numerical value);
In the present embodiment, there are two Data Input Interface, two data output interfaces for based process circuit;After this example
In continuous description, lateral input interface (being directed toward the lateral arrows of this unit in Fig. 1 b) is referred to as and inputs 0, vertical input interface
(the vertical arrow of this unit is directed toward in Fig. 1 b), which is referred to as, inputs 1;By each lateral data output interface (from this list in Fig. 1 b
The lateral arrows that member is pointed out) it is referred to as output 0, vertical data output interface (the vertical arrow pointed out in Fig. 1 b from this unit) claims
Make output 1.
The Data Input Interface and data output interface of each based process circuit can be separately connected different units,
Including main process task circuit and other based process circuits;
In this example, the input 0 of based process circuit 0,4,8,12 (number is shown in Fig. 1 b) this four based process circuits and master
The data output interface of processing circuit connects;
In this example, the input 1 of 0,1,2,3 this four based process circuits of based process circuit and the data of main process task circuit
Output interface connection;
In this example, output 1 and the main process task circuit of 12,13,14,15 this four based process circuits of based process circuit
Data Input Interface is connected;
In this example, the case where based process circuit output interface is connected with other based process circuit input interfaces, sees figure
Shown in 1b, it will not enumerate;
Specifically, the output interface S1 of S cell is connected with the input interface P1 of P unit, indicates that P unit can be from it
P1 interface is sent to the data of its S1 interface to S cell.
The present embodiment includes a main process task circuit, and main process task circuit is connected (i.e. by input interface with external device (ED)
Have output interface), the Data Input Interface of a part of data output interface of main process task circuit and a part of based process circuit
It is connected;A part of Data Input Interface of main process task circuit is connected with the data output interface of a part of based process circuit.
The application method of integrated circuit chip device
Involved data can be the data after compression processing in the application method that present disclosure provides.It needs
Bright, the data in the application can be input neuron or weight in neural network, concretely matrix data or
Vector data etc., the application is without limitation.It that is to say that data that the application is set forth below or data block can be in neural networks
Neuron or weight are inputted, they can be embodied in the form of matrix or vector etc..
This application involves data compression process specifically in previously described first mapping circuit and the second mapping circuit
It executes.It is to be understood that weight is more since neural network is the algorithm of a high calculation amount and high memory access, calculation amount and memory access
Amount can all increase.In particular, being calculated in the case where for weight smaller (for example 0, or less than the weight of setting numerical value) to improve
Rate, reduction expense need to carry out compression processing to the lesser data of these weights.In practical applications, data compression process is dilute
It dredges and is applied in neural network, effect is the most obvious, such as reduces workload, reduction data overhead that data calculate, improves number
According to computation rate etc..
By taking input data as an example, the specific embodiment that data compression process is related to is illustrated.The input data includes but not
It is limited at least one input neuron and/or at least one weight.
In first embodiment:
First mapping circuit receives the first input data (concretely data to be calculated that main process task circuit is sent
Block, such as lateral data block or vertical data block) after, first mapping circuit can be to first input data at
Reason, to obtain treated the first input data with the associated mark mask data of first input data, mask data use
Whether it is greater than first threshold, such as 0.5,0 in the absolute value for indicating first input data.
Specifically, the absolute value when first input data is greater than first threshold, then retain the input data;Otherwise it deletes
First input data is set to 0 by first input data.For example, the matrix data block of input is
First threshold is 0.05, then can get treated matrix data block after the processing of the first mapping circuit
It is with the associated identification data block of matrix data block (alternatively referred to as mask matrix)
Further, to reduce volume of transmitted data, the main process task circuit is again into based process circuit connected to it
When distributing data, can be transmitted in treated the matrix data block target data (be in this example 1,0.06 and 0.5) with
And the associated identification data block of matrix data block.When it is implemented, the main process task circuit can will be described according to setting rule
Target data in treated matrix data block is distributed in based process circuit, for example, successively send according to row sequence or
Successively according to column sequence etc., the application is without limitation.Correspondingly, based process circuit receive the target data and
After the target data corresponds to associated identification data block, according to setting rule (such as the row sequence) square that is reduced to that treated
Battle array data block.0.5) and identification data block such as in this example, based process circuit can data (1,0.06 and based on the receivedIt would know that the corresponding matrix data block of the data (the first mapping circuit treated square i.e. in main process task circuit
Battle array data block) be
In embodiments of the present invention, which can be lateral data block and/or vertical data block.
Correspondingly, the second mapping circuit carries out the second input data using the associated mark data of the first input data
Processing, to obtain treated the second input data;Wherein the first input data is different from second input data.Such as
When first input data is at least one weight, then second input data can be at least one input neuron;
Alternatively, then second input data can be at least one when first input data is at least one input neuron
Weight.
In embodiments of the present invention, second input data is different from first input data, the second input number
According to can be any of following: lateral data block, basic data block, the vertical data block of vertical data block and part.
For example, then the second input data is the vertical data block in part when first input data is lateral data block.
Assuming that the second input data is matrix data blockAccordingly with mask matrix in upper exampleAfter processing, obtaining that treated, the vertical data block in part isDue in practical application
In, the matrix data block dimension that input data is related to is larger, and it is only for signals by the application, this does not constitute restriction.
In second embodiment:
First mapping circuit can be used for handling the first input data and the second input data, to be handled
The first input data and the associated first identifier mask data of first input data afterwards, treated the second input number
Accordingly and the associated second identifier mask data of second input data.Wherein, the first mask data or second
Whether the absolute value that mask data are used to indicate first or second input data is greater than second threshold, which is user side
Or the customized setting of device side, such as 0.05,0 etc..
Treated first input data or the second input data can be treated input data, can also be not locate
Input data before reason.For example, the first input data is lateral data block, such as the matrix data block in above-mentioned exampleThe lateral data block that can get that treated after the processing of the first mapping circuit, after handling here
Lateral data block can be original matrix data blockIt can also be the matrix data block after compression processingIt is to be understood that the application is data processing effect in the transmission and based process circuit for reduce data volume
Rate, preferably treated the input data (such as treated basic data block or part vertically data block) should be compression
Data that treated.Preferably, the data that main process task circuit is sent into based process circuit, concretely described treated
Target data in input data, concretely absolute value can also be non-zero data greater than the data of preset threshold to the target data
Etc..
Correspondingly in based process circuit, the second mapping circuit can be according to associated first mark of first input data
Know data and the associated second identifier data of second input data obtain connection identifier data;The connection identifier data are used
Absolute value is all larger than the data of third threshold value in instruction first input data and second input data, wherein third
Threshold value is user side or the customized setting of device side, such as 0.05,0.Further, second mapping circuit can be according to institute
It states connection identifier data respectively to handle received first input data and the second input data, to obtain, treated
First input data and treated the second input data.
For example, the first input data is matrix data blockSecond input block is equally
For matrix data blockIt can get first input data after the processing of the first mapping circuit to close
The first identifier data block of connectionAnd treated the first input blockCorrespondingly
Obtain the associated second identifier data block of second input dataTreated, and the second input block isIt correspondingly, is improve data transfer rate, it only can will treated the first input in main process task circuit
Target data 1,0.06 and 0.5 and the associated first identifier data block of first input block in data block are sent to
Based process circuit;Meanwhile by the target data 1,1.1,0.6,0.3 and 0.5 in treated the second input block, and
The associated second identifier data block of second input block is sent to based process circuit.
Correspondingly, based process circuit, can be by the second mapping circuit to above-mentioned first mark after receiving above-mentioned data
Know data block and second identifier data block carries out obtaining connection identifier data block by element and operationAccordingly
Ground, the second mapping circuit is using the connection identifier data block respectively to treated first input block and treated
Second input block is respectively processed, to obtain, treated that the first input block isPlace
The second input block after reason isIt wherein, can be according to first identifier data block in based process circuit
And the target data in received first data block, determine that the first data block where the target data is corresponding (is passed through
First mapping circuit treated the first data block);Correspondingly, according to second identifier data block and received second data block
In target data, (i.e. by the first mapping circuit, treated for the second data block for determining where the target data is corresponding
Second data block);Then, after the second mapping circuit knows connection identifier data block, distinguished using the connection identifier data block
It carries out with the first determining data block and the second data block determined by element and operation, to obtain at via the second mapping circuit
The first data block after reason and treated the second data block.
In 3rd embodiment:
First mapping circuit can't be set in the main process task circuit, but third can be inputted number by the main process task circuit
Accordingly and the associated third mark data of the third input data that prestores is sent in based process circuit connected to it.
The second mapping circuit is provided in the based process circuit.The tool for the data compression process that the second mapping circuit is related to is described below
Body embodiment.
It is to be understood that the third input data includes but is not limited to basic data block, the vertical data block in part, vertical number
According to block etc..Similarly, in neural network processor, which can also be at least one weight, and/or at least one
A input nerve, the application is without limitation.
In the second mapping circuit, second mapping circuit can the associated third mark of third input data based on the received
Know data to handle the third input data, so that treated third input data is obtained, so as to subsequent to processing
Third input data afterwards executes correlation operation, such as inner product operation.
For example, the received third input data of the second mapping circuit is matrix data blockPhase
The associated third identification data block of the third input data (also at mask matrix data block) prestored with answering isFurther, the second mapping circuit handle to third input block according to third identification data block
To treated, third input block is specially
In addition, the input neuron and output neuron mentioned in the embodiment of the present invention do not mean that entire neural network
The neuron in neuron and output layer in input layer, but for two layers of neuron of arbitrary neighborhood in neural network, place
Neuron in network feed forward operation lower layer is to input neuron, and the neuron in network feed forward operation upper layer is
Output neuron.By taking convolutional neural networks as an example, it is assumed that a convolutional neural networks have L layers, K=1,2,3 ... L-1, for K
For layer and K+1 layer, K layer referred to as input layer, the neuron in this layer is above-mentioned input neuron, and K+1 layers are claimed
For input layer, the neuron in this layer is above-mentioned output neuron, i.e., other than top layer, each layer all can serve as to input
Layer, next layer are corresponding output layer.
In 4th implementation:
In the main process task circuit and it is not provided with mapping circuit, the first mapping electricity is provided in the based process circuit
Road and the second mapping circuit.About the data processing of first mapping circuit and the second mapping circuit, for details, reference can be made to aforementioned
Described in one embodiment to 3rd embodiment, which is not described herein again.
Optionally, there is also the 5th embodiments.In 5th embodiment, in the based process circuit and it is not provided with mapping electricity
First mapping circuit and the second mapping circuit are arranged in main process task circuit by road, about first mapping circuit
Data processing with the second mapping circuit is no longer gone to live in the household of one's in-laws on getting married here for details, reference can be made to described in aforementioned first embodiment to 3rd embodiment
It states.It is that the compression processing of data is completed in main process task circuit, by treated, input data is sent to based process circuit,
So that based process circuit is executed using treated input data (weight after concretely treated neuron and processing)
Arithmetic operation correspondingly.
The concrete structure schematic diagram this application involves mapping circuit is described below.It possible is reflected as Fig. 5 a and 5b show two kinds
Transmit-receive radio road.Wherein, mapping circuit as shown in Figure 5 a includes comparator and selector.Number about the comparator and selector
Measure the application without limitation.As Fig. 5 a shows a comparator and two selectors, wherein the comparator is for determining input
Whether data meet preset condition.The preset condition can be above-mentioned for the customized setting of user side or equipment side, such as the application
The input data absolute value be greater than or equal to preset threshold.If meeting preset condition, comparator can determine permission
The input data is exported, it is 1 which, which corresponds to associated mark data,;Otherwise it can determine and do not export the input data, or
It is 0 that person, which defaults the input data,.Correspondingly, it is 0 that the input data, which corresponds to associated mark data, at this time.It that is to say, by this
After comparator, the associated mark data of input data would know that.
It further, can be by the mark data of acquisition after the comparator is to the judgement of input data progress preset condition
It is input in selector, so that selector decides whether to export input data correspondingly using the mark data, that is, obtains
Input data that treated.
As Fig. 5 a can be in the matrix data block by comparator by taking the input data is matrix data block as an example
Each data carry out the judgement of preset condition, to can get the associated identification data block of matrix data block (mask matrix).
Further, the matrix data block is screened using the identification data block in first selector, by the matrix
The data that absolute value is greater than or equal to preset threshold (meeting preset condition) in data block are retained, and remainder data is deleted
It removes, with output treated matrix data block.Optionally, also defeated to other using the identification data block in second selector
Enter data (such as second matrix data block) to be handled, such as carries out by element and operation, by the second matrix data block
The data that middle absolute value is greater than or equal to preset threshold are retained, with output treated the second matrix data block.
It is to be understood that corresponding in above-mentioned the first and second embodiments, the specific structure of first mapping circuit can be wrapped
Include the comparator and first selector at least one comparator and at least one selector, such as upper example in Fig. 5 a;Described
The concrete outcome of two mapping circuits may include one or more selectors, such as go up the second selector of Fig. 5 a in example.
Such as Fig. 5 b, the structural schematic diagram of another mapping circuit is shown.Such as Fig. 5 b, the mapping circuit includes selector,
The quantity of the selector without limitation, can be one, can also be multiple.Specifically, the selector is used for according to input
Mark data associated by input data selects the input data of input, will be in the input data absolutely
The data that value is greater than or equal to preset threshold are exported, and remainder data delete/do not export, to obtain, that treated is defeated
Enter data.
By taking the input data is matrix data block as an example, Xiang Suoshu mapping circuit inputs the matrix data block and the square
The identification data block of battle array data block associated, selector can select the matrix data block according to the identification data block, will
Its absolute value is exported more than or equal to 0 data, and remainder data not exports, thus output treated matrix data
Block.
It is to be understood that structure as shown in Figure 5 b can be applied to the second mapping circuit in above-mentioned 3rd embodiment, it is
The concrete outcome of the second mapping circuit in above-mentioned 3rd embodiment may include at least one selector.Similarly, for main process task
The first mapping circuit and the second mapping circuit that design in circuit and based process circuit can be according to as shown in figure 5 a and 5b
Functional component carries out combined crosswise or component is split, and the application is without limitation.
Based on previous embodiment, it is specifically described in main process task circuit and based process circuit below at the operation for needing to complete
Following method can be used to carry out for reason:
Main process task circuit first enables the first mapping circuit and handles the first input data, and to obtain, treated first
Input data and the associated first identifier data of first input data;Then again will treated the first input data and
The associated first identifier data of first input data are transferred to based process circuit computing.For example, main process task circuit can incite somebody to action
Data (such as lateral data block/vertical data block) to be calculated are transmitted further to based process circuit after being handled, its advantage is that
The bit wide that transmission data can be reduced, reduces the total bit number amount of transmission, and based process circuit executes the lesser data fortune of bit wide
The efficiency of calculation is also higher, and power consumption is lower.
Based process circuit enable the second mapping circuit using the first identifier data to received second input data into
Then row processing, second input data that obtains that treated again execute treated the first input data and the second input data
Correlation operation.For example, based process circuit receives the second input data (such as sparse number that main process task circuit transmission comes
According to vertical data block), compression processing is first carried out to it and carries out operation again, improves operation efficiency, reduces power consumption.
Optionally, main process task circuit can be first associated by the first input data (such as basic data block), the first input data
First identifier data, the second input data (the vertical data block in such as part) and the associated second identifier number of the second input data
According to being first transferred to based process circuit computing.
Correspondingly, after based process circuit receives data, can first enable the second mapping circuit according to first identifier data and
Second identifier data obtain connection identifier data block, then defeated to the first input data and second using the connection identifier data
Enter data to be handled, can also further be completed in based process circuit for treated first input data and
The arithmetic operation of second input data, benefit reduce data operation quantity, improve operation efficiency, reduce power consumption.
Optionally, the associated first identifier data of the first input data and the second input data that main process task circuit is sent
Associated second identifier data are to be stored in advance in the main process task circuit, or enable first for the main process task circuit and reflect
Transmit-receive radio road is obtained by the first/second input data, and the application is without limitation.
The application method (such as Fig. 2 a) of based process circuit;
Main process task circuit receives input data to be calculated outside device;
Optionally, main process task circuit utilizes the various computing circuits of this unit, vector operation circuit, inner product operation device electricity
Road, accumulator circuit etc. carry out calculation process to data;
Main process task circuit is by data output interface to based process gate array (the set of all based process circuits
Referred to as based process gate array) send data (as shown in Figure 2 b);
The mode of transmission data herein can be to a part of based process circuit and directly transmit data, i.e. repeatedly broadcast
Mode;
The mode for sending data herein can send different data, i.e. distributor to different based process circuits respectively
Formula;
Based process gate array calculates data;
Based process circuit carries out operation after receiving input data;
Optionally, based process circuit transmits out the data from the data output interface of this unit after receiving data
It goes;(it is transferred to other based process circuits for not receiving data from main process task circuit directly.)
Optionally, based process circuit transfers out operation result from data output interface;(results of intermediate calculations or
Final calculation result)
Main process task circuit receives the output data returned from based process gate array;
Optionally, it is (such as tired to continue processing to the data received from based process gate array for main process task circuit
Add or activate operation);
Main process task processing of circuit finishes, and processing result is transferred to outside device from data output interface.
Tensor, which to be completed, using the circuit device multiplies tensor operation, the previously described data block of tensor sum is identical,
It can be matrix, vector, three-dimensional data block, four figures according to any one of block and high dimensional data block or multinomial combination;Below
As the concrete methods of realizing of Matrix Multiplication vector sum Matrix Multiplication matrix operation is shown respectively in Fig. 2 c and 2f.
Matrix Multiplication vector operation is completed using the circuit device;(Matrix Multiplication vector can be every a line in matrix point
Inner product operation is not carried out with vector, and these results are put into a vector by the sequence of corresponding row.)
Be described below calculate size be M row L column matrix S and length be L vector P multiplication operation, following Fig. 2 c
It is shown.
The method uses all or part based process circuit of the neural computing device, it is assumed that uses
K based process circuit;
The data in some or all of matrix S row are sent each of k based process circuit by main process task circuit
Based process circuit;
In a kind of optional scheme, the data of certain row in matrix S are sent one by the control circuit of main process task circuit every time
Number or a part of number give some based process circuit;(for example, for sending a number every time, it can be for for some
Based process circuit, the 1st transmission the 1st number of the 3rd row, the 2nd the 2nd number sent in the 3rd row data, the 3rd transmission the 3rd
The 3rd capable number ..., or for sending a part of number every time, the 1st the 3rd row the first two number of transmission (the i.e. the 1st, 2 number),
Second of the 3rd row the 3rd of transmission and the 4th number, third time send the 3rd row the 5th and the 6th number ...;)
In a kind of optional scheme, the control circuit of main process task circuit respectively sends out the data of rows a few in matrix S every time
Send an a part of number of number person to some based process circuit;(for example, for some based process circuit, send for the 1st time
1st number of the every row of 3,4,5 rows, the 2nd number of the 2nd the 3rd, 4, the 5 every row of row of transmission, the 3rd the 3rd, 4, the 5 every row of row of transmission
3rd number ... or the 1st transmission every row the first two number of the 3rd, 4,5 row, second sends the 3rd, 4, the 5 every row of row the 3rd and the
4 numbers, third time send the every row the 5th of the 3rd, 4,5 row and the 6th number ....)
The data in vector P are gradually sent the 0th based process circuit by the control circuit of main process task circuit;
After 0th based process circuit receives the data of vector P, send the data to coupled next
A based process circuit, i.e. based process circuit 1;
Specifically, some based process circuits cannot directly obtain all numbers needed for calculating from main process task circuit
According to for example, the based process circuit 1 in Fig. 2 d, only one Data Input Interface are connected with main process task circuit, so can only be straight
The data that matrix S is obtained from main process task circuit are connect, and the data of vector P are just needed by the output of based process circuit 0 to basis
Processing circuit 1, similarly, based process circuit 1 after also receiving data will also continue the data of vector P to export to based process
Circuit 2.
Each based process circuit carries out operation to the data received, which includes but is not limited to: inner product operation,
Multiplying, add operation etc.;
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will
As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will
As a result it is added on register and/or on piece caching;
After based process circuit counting goes out result, result is transferred out from data output interface and (is transferred to and connects with it
Other based process circuits connect);
In a kind of optinal plan, which can be the final result or intermediate result of inner product operation;
After based process circuit receives the calculated result from other based process circuits, transmit the data to
Its other based process circuit or main process task circuit for being connected;
Main process task circuit receive each based process circuit inner product operation as a result, the result treatment is most terminated
Fruit (processing can be accumulating operation or activation operation etc.).
The embodiment of Matrix Multiplication vector approach is realized using above-mentioned computing device:
In a kind of optinal plan, multiple based process circuits are according to shown in following Fig. 2 d or Fig. 2 e used in method
Mode arrange;
As shown in Figure 2 c, main process task circuit can obtain the corresponding mask matrix of matrix S and matrix P (i.e. above respectively
Mark data/the identification data block).Specifically, the corresponding mask matrix of matrix S and matrix P can be in advance
It is stored in the high-speed memory in main process task circuit;It can also be main process task circuit and enable the first mapping circuit respectively according to matrix
Corresponding mask matrix that S and matrix P is obtained.The M row data of matrix S are divided into K group by the control circuit of Main Processor Unit,
It is responsible for the operation of i-th group (set of row is denoted as Ai in this group of data) by i-th of based process circuit respectively;Correspondingly, main place
The M row data of the corresponding first mask matrix of matrix S equally can be also divided into K group by the control circuit of reason unit, and with matrix S quilt
It is divided into the matrix newly formed after K group and sends jointly to based process circuit correspondingly, with complete in the based process circuit
At the arithmetic operation of related data.
The method that M row data are grouped herein be arbitrarily will not duplicate allocation packet mode;
In a kind of optinal plan, use the following method of salary distribution: it is a that jth row is given jth %K (% for take the remainder operation)
Based process circuit;
In a kind of optinal plan, for being unable to average packet the case where can also be right first to a part of row mean allocation
It is distributed in any way in remaining row.
Data in row part or all of in matrix S are successively sent to corresponding by the control circuit of main process task circuit every time
Based process circuit;Correspondingly, control circuit can also be by the mark corresponding to a few row data of this in matrix S in the first mask matrix
Know data and sends jointly to corresponding based process circuit.
For example, matrix S is the matrix data block of 50*50, matrix S points can be 10 minor matrixs by main process task circuit, each
The size of minor matrix is 5*50, then main process task circuit can be by the 1st minor matrix S0(5 rows 50 column) and minor matrix S0It closes
The identification data block (5 rows 50 column) of connection sends jointly to the 1st based process circuit, with complete in the 1st based process circuit
At the calculation process of related data.
In a kind of optinal plan, it is negative that the control circuit of main process task circuit sends it to i-th of based process circuit every time
One or more data in data line in i-th group of data Mi of duty, i-th group of data Mi can be the number in matrix S
According to the data being also possible in the corresponding first mask matrix of matrix S;
In a kind of optinal plan, it is negative that the control circuit of main process task circuit sends it to i-th of based process circuit every time
One or more data of every row some or all of in i-th group of data Mi of duty in row;
The control circuit of main process task circuit successively sends the data in vector P to the 1st based process circuit;Accordingly
Data in the associated 2nd mask matrix of vector P also can be successively sent to the 1st by the control circuit on ground, main process task circuit together
A based process circuit
In a kind of optinal plan, the control circuit of main process task circuit can send vector P or vector P association every time
The 2nd mask matrix in one or more data;
I-th of based process circuit, which receives, also transmittable after the data of vector P or the 2nd mask matrix gives it
Connected i+1 based process circuit;
Each based process circuit receive from certain a line in matrix S or one or more data in a few rows with
And after one or more data from vector P, carry out operation (including but not limited to multiplication or addition);
In the specific implementation, each based process circuit receives data in matrix S and the data in the first mask square
Associated first identifier data, the data in vector P and the data associated second identifier in the 2nd mask data in battle array
After data;Can connection identifier data first be obtained according to first identifier data and second identifier data;Then the connection identifier is utilized
Data decide whether to execute correlation operation to the data in the data and vector P in matrix P.The connection identifier data are logical
It crosses and first identifier data and second identifier data is carried out and operate obtained, can be some in 0 or 1,1 representing matrix S
The data of same position are the data that absolute value is greater than preset threshold in the data and vector P of position;Conversely, 0 representing matrix S
The data of same position are the data that absolute value is less than or equal to preset threshold in the data and/or vector P of middle same position.
It is that each the second mapping circuit of based process circuit start is according to the first mask matrix of matrix S and vector P
It is that the 1 corresponding data in matrix S and vector P execute related operation behaviour that 2nd mask matrix, which chooses same position identification data,
Make, such as multiplication, add operation etc..It that is to say, corresponded to using the first mask matrix and the 2nd mask matrix to choose matrix S
With in matrix P in same position absolute value be greater than preset threshold data execute correlation operation, as multiplication operate.
For example, the data that based process circuit receives certain two row in matrix S are matrix
Corresponding matrix S0Associated first mask matrixReceiving a few a data in vector P is vector P0
[1 0.01 1.1 0.6]T, vector P0Associated 2nd mask vector [1 01 1]T;Further based process circuit can
It is first right to enable the second mapping circuit[1 01 1]TIt carries out obtaining connection mask matrix by element and operationFurther using connection mask matrix to received matrix S0With vector P0It is handled, thus at acquisition
Matrix after reasonWith treated vector P0[1 0 0 0.6]T, so that based process circuit is for place
Matrix S after reason0With treated vector P0Execute relevant arithmetic operation.
In a kind of optinal plan, if received data (concretely data to be calculated in each based process circuit
Block, such as the data of a few row/columns in matrix S or vector P and the corresponding mark data in mask matrix) data volume be more than
When preset threshold, which will no longer receive new input data, if main process task circuit is by the matrix S of subsequent transmission
Or the data and the corresponding mark data etc. in mask matrix of the data of a few row/columns of vector P, until based process electricity
Possess enough buffer/store spaces in road, then receives the data that main process task circuit is newly sent.
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will
As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will
As a result it is added on register and/or on piece caching;
In a kind of optinal plan, the data that based process circuit receives are also possible to intermediate result, are stored in deposit
On device and/or on piece caching;
Based process circuit by local calculated result be transferred to coupled next based process circuit or
Main process task circuit;
In a kind of optinal plan, corresponding to the structure of Fig. 2 d, only the last one based process circuit of each column is defeated
Outgoing interface is connected with main process task circuit, and in this case, only the last one based process circuit can directly will be local
Calculated result be transferred to main process task circuit, the calculated result of other based process circuits will pass to the next base of oneself
Plinth processing circuit, next based process circuit pass to down next based process circuit until being all transferred to the last one base
Plinth processing circuit, the last one based process circuit is by other based process of local calculated result and this column received
The result of circuit executes accumulation calculating and obtains intermediate result, and intermediate result is sent to main process task circuit;It certainly can also be for most
The processing result of the result of other tandem circuits of this column and local can be transmitted directly to by the latter based process circuit
Main process task circuit.
In a kind of optinal plan, corresponding to the structure of Fig. 2 e, each based process circuit has and main process task circuit
Local calculated result is directly transferred to master by the output interface being connected, in this case, each based process circuit
Processing circuit;
After based process circuit receives the calculated result that other based process circuits pass over, it is transferred to and its phase
The next based process circuit or main process task circuit of connection.
Main process task circuit receive M inner product operation as a result, operation result as Matrix Multiplication vector.
Matrix Multiplication matrix operation is completed using the circuit device;
Be described below calculate size be M row L column matrix S and size be L row N column matrix P multiplication operation, (square
Every a line in battle array S is identical as each column length of matrix P, as shown in figure 2f)
This method is illustrated using described device embodiment as shown in Figure 1 b;
First mapping circuit of main process task circuit obtains the corresponding mark mask matrix of matrix S and matrix P, such as opens
Dynamic first mapping circuit is respectively handled matrix S and matrix P to obtain the corresponding first mask matrix of matrix S and be somebody's turn to do
The corresponding 2nd mask matrix of matrix P;
The control circuit of main process task circuit sends the data in some or all of matrix S row to defeated by lateral data
Incoming interface those of be directly connected with main process task circuit based process circuit (for example, in Fig. 1 b the grey filling of the top it is perpendicular
To data path);Meanwhile control circuit can also be by the corresponding mark data some or all of in the first mask matrix in row
It is sent in based process circuit connected to it.For example, control circuit is by the front two row data and the front two row in matrix S
The corresponding front two row mark data in the first mask matrix of data is sent collectively to the tandem circuit being connected with main process task circuit
In.
In a kind of optinal plan, the data of certain row in matrix S are sent one by the control circuit of main process task circuit every time
Several or a part of number gives some based process circuit;(for example, for some based process circuit, the 1st the 3rd row of transmission
1st number, the 2nd the 2nd number sent in the 3rd row data, the 3rd number ... or the 1st hair of the 3rd the 3rd row of transmission
The 3rd row the first two number is sent, second of the 3rd row the 3rd of transmission and the 4th number, third time send the 3rd row the 5th and the 6th number ...;)
Correspondingly, control circuit is simultaneously also by the mark for corresponding to row in the first mask matrix corresponding with the row in matrix S
Data send one every time or a part of mark data gives some based process circuit.
In a kind of optinal plan, the control circuit of main process task circuit by the data of rows a few in matrix S and is corresponded to
The mark data that several rows are corresponded in first mask matrix respectively sends a number or a part of number every time and gives some based process electricity
Road;(for example, for some based process circuit, the 1st number of the 1st the 3rd, 4, the 5 every row of row of transmission, the 2nd transmission the 3rd,
2nd number of the every row of 4,5 rows, the 3rd number ... or the 1st the 3rd, 4,5 row of transmission of the 3rd the 3rd, 4, the 5 every row of row of transmission
Every row the first two number, second of transmission every row the 3rd of the 3rd, 4,5 row and the 4th number, third time send every the 5th He of row of the 3rd, 4,5 row
6th number ...;)
The control circuit of main process task circuit sends the data some or all of in matrix P in column to by vertical data
Input interface those of is directly connected based process circuit (for example, based process gate array is left in Fig. 1 b with main process task circuit
The lateral data path of the grey filling of side);Meanwhile control circuit can will also correspond to the part in the 2nd mask matrix or complete
Mark data in portion's row is sent in based process circuit connected to it.For example, control circuit is by the front two row in matrix P
Data and the corresponding front two row mark data in the 2nd mask matrix of the front two row data are sent collectively to and main process task electricity
In the connected tandem circuit in road.
In a kind of optinal plan, the data that certain in matrix P arranges are sent one by the control circuit of main process task circuit every time
Several or a part of number gives some based process circuit;(for example, for some based process circuit, the 3rd column of the 1st transmission
1st number, the 2nd the 2nd number sent in the 3rd column data, the 3rd number ... or the 1st hair of the 3rd column of the 3rd transmission
The 3rd column the first two number is sent, second of transmission the 3rd arranges the 3rd and the 4th number, and third time sends the 3rd and arranges the 5th and the 6th number ...;)
Correspondingly, control circuit is also each by the mark data for corresponding to row in the 2nd mask matrix corresponding with the row in matrix P simultaneously
It sends one or a part of mark data gives some based process circuit.
In a kind of optinal plan, the control circuit of main process task circuit by the data of column a few in matrix P and is corresponded to
The mark data that several rows are corresponded in 2nd mask matrix respectively sends an a part of number of number person every time and gives some based process circuit;
(for example, for some based process circuit, the 1st number of the 1st the 3rd, 4,5 column each column of transmission, the 2nd transmission the 3rd, 4,5
2nd number of column each column, the 3rd the 3rd number ... for sending the 3rd, 4,5 column each column or the 3rd, 4,5 column of the 1st transmission are often
Column the first two number, second sends the 3rd, 4,5 column each column the 3rd and the 4th number, and third time sends the 3rd, 4,5 column each column the 5th and the
6 numbers ...;)
Based process circuit receive matrix S data and the associated first mask matrix of matrix S mark data it
Afterwards, which is passed through
Its lateral data output interface is transferred to its next based process circuit that is connected (for example, based process circuit in Fig. 1 b
The lateral data path of white filling among array);After based process circuit receives the data of matrix P, by the data
Coupled next based process circuit is transferred to (for example, basic in Fig. 1 b by its vertical data output interface
The vertical data path of white filling among processing circuit array);
Each based process circuit carries out operation to the data received;Specifically, each based process circuit receives
Into matrix S the corresponding first identifier data associated in the first mask matrix of the data and the data of certain a line or a few rows,
In matrix P after the corresponding second identifier data associated in the 2nd mask data of the data and the data of a certain column or several column;
Can connection identifier data first be obtained according to first identifier data and second identifier data;Then it is determined using the connection identifier data
Whether correlation operation is executed to the data in the data and matrix P in matrix S.The connection identifier data are by first
Mark data and the progress of second identifier data and operation are obtained, can be the number of some position in 0 or 1,1 representing matrix S
It is the data that absolute value is greater than preset threshold according to the data with same position in matrix P;Conversely, a certain position in 0 representing matrix S
The data of same position are the data that absolute value is less than or equal to preset threshold in the data and/or matrix P set.For details, reference can be made to
Described in previous embodiment, which is not described herein again.
It is that each the second mapping circuit of based process circuit start is according to the first mask matrix of matrix S and matrix P
2nd mask matrix chooses the data that same position identification data is 1 and executes correlation operation, such as multiplication, add operation
Etc..
In a kind of optinal plan, if received data (concretely data to be calculated in each based process circuit
Block, such as the data of a few row/columns in matrix S or matrix P and the corresponding mark data in mask matrix) data volume be more than
When preset threshold, which will no longer receive new input data, if main process task circuit is by the matrix S of subsequent transmission
Or the data and the corresponding mark data etc. in mask matrix of the data of a few row/columns of matrix P, until based process electricity
Possess enough buffer/store spaces in road, then receives the data that main process task circuit is newly sent.
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will
As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will
As a result it is added on register and/or on piece caching;
After based process circuit counting goes out result, result can be transferred out from data output interface;
In a kind of optinal plan, which can be the final result or intermediate result of inner product operation;
Specifically, from the interface if the based process circuit has the output interface being directly connected with main process task circuit
Transmission is as a result, if it is not, towards that directly can export result to the direction of the based process circuit of main process task circuit output
(for example, bottom line based process circuit outputs it result and is directly output to main process task circuit in Fig. 1 b, other bases
Processing circuit transmits downwards operation result from vertical output interface).
After based process circuit receives the calculated result from other based process circuits, transmit the data to
Its other based process circuit or main process task circuit for being connected;
Towards that directly can export result to the direction of main process task circuit output, (for example, in Fig. 1 b, bottom line is basic
Processing circuit outputs it result and is directly output to main process task circuit, other based process circuits are downward from vertical output interface
Transmit operation result);
Main process task circuit receive each based process circuit inner product operation as a result, output result can be obtained.
The embodiment of " Matrix Multiplication matrix " method:
Method uses the based process gate array arranged according to mode as shown in Figure 1 b;
First mapping circuit of main process task circuit obtains the corresponding mark mask matrix of matrix S and matrix P, such as opens
Dynamic first mapping circuit is respectively handled matrix S and matrix P to obtain the corresponding first mask matrix of matrix S and be somebody's turn to do
The corresponding 2nd mask matrix of matrix P, optionally, also available treated matrix S and matrix P, it is assumed that treated matrix S
There is h row, handles
The h row data of matrix S are divided into h group by the control circuit of main process task circuit, are born respectively by i-th of based process circuit
Blame the operation of i-th group (set of row is denoted as Hi in this group of data);Meanwhile control circuit can also be corresponding in the first mask by data
Mark data some or all of in matrix in row is sent in based process circuit connected to it.For example, control circuit
Together by the front two row mark data of front two row data and front two row data correspondence in the first mask matrix in matrix S
It is sent in the tandem circuit being connected with main process task circuit.
The method that h row data are grouped herein be arbitrarily will not duplicate allocation packet mode;
In a kind of optinal plan, using the following method of salary distribution: jth row is given by the control circuit of main process task circuit
J%h based process circuit;
In a kind of optinal plan, for being unable to average packet the case where can also be right first to a part of row mean allocation
It is distributed in any way in remaining row.
The W column data of matrix P is divided into w group by the control circuit of main process task circuit, is born respectively by i-th of based process circuit
Blame the operation of i-th group (set of row is denoted as Wi in this group of data);Correspondingly, control circuit simultaneously will also be with the column in matrix P
The corresponding mark data of respective column in the 2nd mask matrix send every time one or a part of mark data to some basis at
Manage circuit.
The method that W column data is grouped herein be arbitrarily will not duplicate allocation packet mode;
In a kind of optinal plan, using the following method of salary distribution: jth row is given by the control circuit of main process task circuit
J%w based process circuit;
In a kind of optinal plan, first a part of column average can also be distributed the case where for being unable to average packet, it is right
It is distributed in any way in remaining column.
The data in some or all of matrix S row are sent based process circuit battle array by the control circuit of main process task circuit
First based process circuit of every row in column;
In a kind of optinal plan, the control circuit of main process task circuit the i-th row into based process gate array every time
First based process circuit sends one or more data in the data line in its i-th group of responsible data Hi;Simultaneously
The corresponding mark data in mask matrix of i-th group of data Hi can be also sent to using same procedure by first foundation processing electricity
Road;
In a kind of optinal plan, the control circuit of main process task circuit the i-th row into based process gate array every time
First based process circuit some or all of sends in its i-th group of responsible data Hi the one or more of every row in row
Data;The corresponding mark data in mask matrix of i-th group of data Hi can be also sent to by the first base using same procedure simultaneously
Plinth processing circuit;
The data in some or all of matrix P column are sent based process circuit battle array by the control circuit of main process task circuit
First based process circuit of each column in column;Meanwhile control circuit can also will corresponding part in the 2nd mask matrix or
Mark data in whole rows is sent in based process circuit connected to it.For example, control circuit is by before in matrix P two
Line number is accordingly and the corresponding front two row mark data in the 2nd mask matrix of the front two row data is sent collectively to and main process task
In the connected tandem circuit of circuit.
In a kind of optinal plan, the control circuit of main process task circuit every time into based process gate array i-th column
First based process circuit sends one or more data in the column data in its i-th group of responsible data Wi;
In a kind of optinal plan, the control circuit of main process task circuit every time into based process gate array i-th column
First based process circuit some or all of sends in its i-th group of responsible data Ni the one or more of each column in column
Data;
After based process circuit receives the data of matrix S, which is passed by its lateral data output interface
Its next based process circuit that is connected is defeated by (for example, the cross of the white filling in Fig. 1 b among based process gate array
To data path);After based process circuit receives the data of matrix P, which is connect by its vertical data output
Port transmission is to coupled next based process circuit (for example, the white in Fig. 1 b among based process gate array
The vertical data path of filling);
Each based process circuit carries out operation to the data received;Specifically, each based process circuit receives
Into matrix S the corresponding first identifier data associated in the first mask matrix of the data and the data of certain a line or a few rows,
In matrix P after the corresponding second identifier data associated in the 2nd mask data of the data and the data of a certain column or several column;
Can connection identifier data first be obtained according to first identifier data and second identifier data;Then it is determined using the connection identifier data
Whether correlation operation is executed to the data in the data and matrix P in matrix S.The connection identifier data are by first
Mark data and the progress of second identifier data and operation are obtained, can be the number of some position in 0 or 1,1 representing matrix S
It is the data that absolute value is greater than preset threshold according to the data with same position in matrix P;Conversely, a certain position in 0 representing matrix S
The data of same position are the data that absolute value is less than or equal to preset threshold in the data and/or matrix P set.For details, reference can be made to
Described in previous embodiment, which is not described herein again.
It is that each the second mapping circuit of based process circuit start is according to the first mask matrix of matrix S and matrix P
2nd mask matrix chooses the data that same position identification data is 1 and executes correlation operation, such as multiplication, add operation
Etc..
In a kind of optinal plan, if received data (concretely data to be calculated in each based process circuit
Block, such as the data of a few row/columns in matrix S or matrix P and the corresponding mark data in mask matrix) data volume be more than
When preset threshold, which will no longer receive new input data, if main process task circuit is by the matrix S of subsequent transmission
Or the data and the corresponding mark data etc. in mask matrix of the data of a few row/columns of matrix P, until based process electricity
Possess enough buffer/store spaces in road, then receives the data that main process task circuit is newly sent.
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will
As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will
As a result it is added on register and/or on piece caching;
After based process circuit counting goes out result, result can be transferred out from data output interface;
In a kind of optinal plan, which can be the final result or intermediate result of inner product operation;
Specifically, from the interface if the based process circuit has the output interface being directly connected with main process task circuit
Transmission is as a result, if it is not, towards that directly can export result to the direction of the based process circuit of main process task circuit output
(for example, bottom line based process circuit, which outputs it result, is directly output to main process task circuit, other based process circuits
Operation result is transmitted downwards from vertical output interface).
After based process circuit receives the calculated result from other based process circuits, transmit the data to
Its other based process circuit or main process task circuit for being connected;
Towards can be directly to the direction of main process task circuit output output result (for example, bottom line based process electricity
Road outputs it result and is directly output to main process task circuit, other based process circuits transmit downwards fortune from vertical output interface
Calculate result);
Main process task circuit receive each based process circuit inner product operation as a result, output result can be obtained.
" transverse direction " used in above description, the words such as " vertical " are intended merely to example shown in statement Fig. 1 b, actually make
Two different interfaces are represented with " transverse direction " " vertical " interface for only needing to distinguish each unit.
Full connection operation is completed using the circuit device:
If the input data of full articulamentum is a vector (i.e. the case where input of neural network is single sample),
Using the weight matrix of full articulamentum as matrix S, input vector as vector P, according to described device using Matrix Multiplication with to
Amount method executes operation;
If the input data of full articulamentum is a matrix (i.e. the case where input of neural network is multiple samples),
Using the weight matrix of full articulamentum as matrix S, input vector as matrix P, or using the weight matrix of full articulamentum as
Matrix P, input vector execute operation as matrix S, according to the Matrix Multiplication of described device with matrix;
Convolution algorithm is completed using the circuit device:
Convolution algorithm is described below, a square indicates that a data, input data indicate (N with Fig. 3 a in figure below
A sample, each sample have C channel, a height of H, width W of the characteristic pattern in each channel), weight namely convolution kernel Fig. 3 b
It indicates (having M convolution kernel, each convolution kernel has C channel, and height and width are respectively KH and KW).For N number of sample of input data
This, the rule of convolution algorithm is the same, and explained later carries out the process of convolution algorithm on a sample, in a sample
On, each of M convolution kernel will carry out same operation, and each convolution kernel operation obtains a sheet of planar characteristic pattern, and M is a
M plane characteristic figure is finally calculated in convolution kernel, (to a sample, the output of convolution is M characteristic pattern), and one is rolled up
Product core will carry out inner product operation in each plan-position of a sample, be slided then along the direction H and W, for example,
Fig. 3 c indicates that the position in convolution kernel lower right corner in a sample of input data carries out the corresponding diagram of inner product operation;Fig. 3 d
Indicate one lattice of position upward sliding that a lattice are slided in the position of convolution to the left and Fig. 3 e indicates convolution.
This method is illustrated using described device embodiment as shown in Figure 1 b;
First mapping circuit of main process task circuit can be handled the data in some or all of weight convolution kernel, be obtained
It (is the number after processing in some or all of weight convolution kernel to corresponding mask data and treated weight data
According to).
By the data in some or all of weight convolution kernel, (data can be original to the control circuit of main process task circuit
Weight data or treated weight data) it is sent to and is directly connected with main process task circuit by lateral Data Input Interface
Those based process circuits (for example, vertical data path that the grey of the top is filled in Fig. 1 b);Meanwhile control circuit will be with
The data correspond to associated mask data and also send jointly in the based process circuit with main process task circuit connection;
In a kind of optinal plan, the control circuit of main process task circuit sends the data of some convolution kernel in weight every time
One number or a part of number give some based process circuit;(for example, for some based process circuit, send for the 1st time
The 1st number of 3 rows, the 2nd the 2nd number sent in the 3rd row data, the 3rd number ... or the 1st of the 3rd the 3rd row of transmission
The 3rd row the first two number of secondary transmission, second of the 3rd row the 3rd of transmission and the 4th number, third time send the 3rd row the 5th and the 6th
Number ...;) simultaneously, control circuit is by the corresponding mask data of some convolution kernel in the weight also using above-mentioned each generation one
Several or a part of data give that based process circuit;
Another situation is that, the control circuit of main process task circuit is by the several convolution kernels of certain in weight in a kind of optinal plan
Data every time respectively send an a part of number of number person give some based process circuit;(for example, for some based process electricity
Road, the 1st number of the 1st the 3rd, 4, the 5 every row of row of transmission, the 2nd number of the 2nd the 3rd, 4, the 5 every row of row of transmission, the 3rd transmission
3rd number ... of the 3rd, 4, the 5 every row of row or the 1st transmission every row the first two number of the 3rd, 4,5 row, second of transmission the 3rd,
The every row the 3rd of 4,5 rows and the 4th number, third time send the every row the 5th of the 3rd, 4,5 row and the 6th number ...;) correspondingly, control electricity
Road also will occur every time one using above-mentioned identical method with associated mask data corresponding to certain several convolution kernel in the weight
Number or a part of data give that based process circuit;
The control circuit of main process task circuit divides input data according to the position of convolution, the control of main process task circuit
Circuit by the data some or all of in input data in convolution position be sent to by vertical Data Input Interface directly with
Main process task circuit be connected those of based process circuit (for example, what the grey in Fig. 1 b on the left of based process gate array was filled
Lateral data path);Correspondingly, control circuit equally also can be according to the position mask associated for the input data of convolution
Data are divided, and correspondingly control circuit simultaneously also can be by the number some or all of in the input data in convolution position
It is also sent jointly in the based process circuit being electrically connected with main process task circuit according to corresponding mask data;
In a kind of optinal plan, the control circuit of main process task circuit by the data of some convolution position in input data with
And associated mask data corresponding with the data send a number or a part of number every time and give some based process circuit;(example
Such as, for some based process circuit, the 1st transmission the 3rd arranges the 1st number, the 2nd the 2nd sent in the 3rd column data
Number, the 3rd number ... or the 1st the 3rd column the first two number of transmission of the 3rd column of the 3rd transmission, second of transmission the 3rd arrange the 3rd
With the 4th number, third time sends the 3rd and arranges the 5th and the 6th number ...;)
Another situation is that, the control circuit of main process task circuit is by the several volumes of certain in input data in a kind of optinal plan
The data and associated mask data corresponding with the data of product position respectively send a number or a part of number to some every time
Based process circuit;(for example, for some based process circuit, the 1st number of the 1st the 3rd, 4,5 column each column of transmission, the 2nd
Secondary the 2nd number for sending the 3rd, 4,5 column each column, the 3rd number ... or the 1st hair of the 3rd the 3rd, 4,5 column each column of transmission
The 3rd, 4,5 column each column the first two number is sent, second of the 3rd, 4,5 column each column the 3rd of transmission and the 4th number, third time send the 3rd, 4,5
Column each column the 5th and the 6th number ...;)
Based process circuit receive weight data (concretely in weight convolution kernel data (abbreviation weight data)
Or associated mask data corresponding with the weight data) after, which is transmitted by its lateral data output interface
It is connected next based process circuit to it (for example, the transverse direction of the white filling in Fig. 1 b among based process gate array
Data path);Based process circuit receives data (the input number that the data can send for main processing circuit of input data
Accordingly and the associated mark mask data of the input data) after, which is transferred to by its vertical data output interface
Coupled next based process circuit is (for example, white filling in Fig. 1 b among based process gate array is perpendicular
To data path);
Specifically, the control circuit of main process task circuit can be by input data and the associated mask data one of the input data
It rises and is sent to base processing circuit, based process circuit receives the input data and the associated mask data of the input data;
Each based process circuit carries out operation to the data received;Specifically, based process circuit can enable
Two mapping circuits are according to the associated mask data of input data and associated mask data (the i.e. convolution kernel in weight of weight data
Associated mask data) obtain connection identifier data;Recycle connection identifier data selection input data and weight data
The data that middle absolute value is greater than preset threshold carry out multiplying;
In a kind of optinal plan, if received data (concretely data to be calculated in each based process circuit
Block, as the mask data, input data or the input data of data and the data correlation in weight in convolution kernel are associated with
Mask data) data volume be more than preset threshold when, which will no longer receive new input data, such as main place
Reason circuit by the several convolution kernels of certain in the weight of subsequent transmission data and the data correspond to associated mask data etc.,
Until possessing enough buffer/store spaces in based process circuit, then receive the data that main process task circuit is newly sent.
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will
As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will
As a result it is added on register and/or on piece caching;
After based process circuit counting goes out result, result can be transferred out from data output interface;
In a kind of optinal plan, which can be the final result or intermediate result of inner product operation;
Specifically, from the interface if the based process circuit has the output interface being directly connected with main process task circuit
Transmission is as a result, if it is not, towards that directly can export result to the direction of the based process circuit of main process task circuit output
(for example, bottom line based process circuit outputs it result and is directly output to main process task circuit in Fig. 1 b, other bases
Processing circuit transmits downwards operation result from vertical output interface).
After based process circuit receives the calculated result from other based process circuits, transmit the data to
Its other based process circuit or main process task circuit for being connected;
Towards can be directly to the direction of main process task circuit output output result (for example, bottom line based process electricity
Road outputs it result and is directly output to main process task circuit, other based process circuits transmit downwards fortune from vertical output interface
Calculate result);
Main process task circuit receive each based process circuit inner product operation as a result, output result can be obtained.
In one embodiment, the invention discloses a kind of neural network computing devices comprising for executing institute as above
Functional unit corresponding to all or part of embodiments provided in embodiment of the method is provided.
In one embodiment, the invention discloses a kind of chip (such as Fig. 4), for executing embodiment of the method as described above
All or part of embodiments of middle offer.
In one embodiment, the invention discloses a kind of electronic devices comprising real for executing method as described above
Apply the functional unit of all or part of embodiments in example.
Electronic device include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal,
Mobile phone, automobile data recorder, navigator, sensor, camera, server, camera, video camera, projector, wrist-watch, earphone, movement
Storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven,
Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument
And/or electrocardiograph.
Particular embodiments described above has carried out further in detail the purpose of present disclosure, technical scheme and beneficial effects
Describe in detail it is bright, it is all it should be understood that be not limited to present disclosure the foregoing is merely the specific embodiment of present disclosure
Within the spirit and principle of present disclosure, any modification, equivalent substitution, improvement and etc. done should be included in the guarantor of present disclosure
Within the scope of shield.
Claims (12)
1. a kind of integrated circuit chip device, which is characterized in that the integrated circuit chip device include: main process task circuit and
Multiple based process circuits;The main process task circuit includes the first mapping circuit, at least one in the multiple based process circuit
A circuit includes the second mapping circuit, and first mapping circuit and second mapping circuit are used to execute neural network
The compression processing of each data in operation;
The multiple based process circuit is in array distribution;Each based process circuit and other adjacent based process circuits connect
It connects, what n based process circuit of the 1st row of main process task circuit connection, n based process circuit of m row and the 1st arranged
M based process circuit;
The main process task circuit refers to for obtaining input block, convolution kernel data block and convolution instruction according to the convolution
It enables and the input block is divided into vertical data block, the convolution kernel data block is divided into lateral data block;According to institute
The operation control for stating convolution instruction determines that the first mapping circuit of starting handles the first data block, obtains that treated first
Data block;First data block includes the lateral data block and/or the vertical data block;Instructing according to the convolution will
The first data block that treated is sent at least one of based process circuit being connected with main process task circuit basis
Manage circuit;
The multiple based process circuit, the operation control for instructing according to the convolution determine whether starting the second mapping electricity
Road handles the second data block, and according to treated, operation that the second data block executes in a parallel fashion in neural network is obtained
To operation result, and by the operation result by giving the main place with the based process circuit transmission of the main process task circuit connection
Manage circuit;Second data block is the data block that the reception main process task circuit that the based process circuit determines is sent,
Second data block and treated first data block associated;
The main process task circuit obtains the instruction results of the convolution instruction for handling the operation result.
2. integrated circuit chip device according to claim 1, which is characterized in that when first data block includes laterally
When data block and vertical data block,
The main process task circuit is specifically used for starting first mapping circuit to the lateral data block and the vertical data
Block is handled to obtain treated lateral data block and the identification data block of the transverse direction data block associated, and treated vertically
The identification data block of data block and the vertical data block associated;Treated by described in lateral data block and the transverse direction data
The associated identification data block progress deconsolidation process of block obtains multiple basic data blocks and the basic data block is respectively associated
Identification data block, by the multiple basic data block and the respectively associated identification data block distribution of the multiple basic data block
To based process circuit connected to it, by the mark number of treated the vertical data block and the vertical data block associated
It broadcasts according to block to based process circuit connected to it;
The based process circuit, specifically for starting second mapping circuit according to the mark of the vertical data block associated
Data block and the associated mark data of the basic data block obtain connection identifier data block, and according to the connection identifier data
Block is handled to obtain treated vertical data block and basic data block to the vertical data block and the basic data block;
Are executed by convolution algorithm and obtains operation result for treated the vertical data block and basic data block, the operation result is sent out
It send to the main process task circuit.
3. integrated circuit chip device according to claim 1, which is characterized in that when first data block includes laterally
When data block,
The main process task circuit carries out handling everywhere specifically for starting first mapping circuit to the lateral data block
The identification data block of lateral data block and the transverse direction data block associated after reason, or starting first mapping circuit according to
The identification data block of the lateral data block associated prestored to the lateral data block handled to obtain treated laterally
Data block;The identification data block progress deconsolidation process of treated the lateral data block and the transverse direction data block associated is obtained
To the respective associated identification data block of multiple basic data blocks and the basic data block, by the multiple basic data block with
And respectively associated identification data block is distributed to based process circuit connected to it to the multiple basic data block, it will be described perpendicular
It broadcasts to data block to based process circuit connected to it;
The based process circuit is specifically used for starting second mapping circuit according to the associated mark of the basic data block
Data block handles the vertical data block, the vertical data block that obtains that treated;To treated the vertical data
Block and treated the basic data block execute convolution algorithm and obtain operation result, and the operation result is sent to the master
Processing circuit.
4. integrated circuit chip device according to claim 1, which is characterized in that when first data block includes vertical
When data block,
The main process task circuit handles the vertical data block specifically for starting first mapping circuit, obtains
The identification data block of vertical data block that treated and the vertical data block associated, or starting the first mapping circuit root
Handled to obtain that treated is perpendicular to the vertical data block according to the identification data block of the vertical data block associated prestored
To data block;Deconsolidation process is carried out to the lateral data block and obtains multiple basic data blocks;By the multiple basic data block
It is distributed to based process circuit connected to it, by the mark of treated the vertical data block and the vertical data block associated
Know data block to broadcast to based process circuit connected to it;
The based process circuit, specifically for starting second mapping circuit according to the mark of the vertical data block associated
Data block is handled to obtain treated basic data block to the basic data block;To treated the vertical data block
Inner product operation is executed with treated the basic data block and obtains operation result, and the operation result is sent to the main place
Manage circuit.
5. the integrated circuit chip device according to any one of claim 2-4, which is characterized in that
The based process circuit executes product calculation specifically for the basic data block and the vertical data block and obtains product knot
The result of product is added up and obtains operation result, the operation result is sent to the main process task circuit by fruit;
The accumulation result is arranged to obtain by the main process task circuit for obtaining accumulation result after adding up to the operation result
Described instruction result.
6. integrated circuit chip device according to claim 5, which is characterized in that
The main process task circuit, specifically for by the mark of treated the vertical data block and the vertical data block associated
Data block is divided into the identification data block of the vertical data block associated of the vertical data block of multiple portions and the part, will be the multiple
The vertical data block in part and the vertical data block in the multiple part respectively associated identification data block by repeatedly broadcasting to institute
State based process circuit;The multiple vertical data block combinations in part form the vertical data block;
The based process circuit is specifically used for starting second mapping circuit according to the associated mark of the basic data block
The identification data block of the vertical data block associated of data block and the part obtains connection identifier data block;It is marked according to the connection
Know data block and is handled to obtain treated basic data block to the vertical data block of the basic data block and the part
And treated part broadcast data;To treated the basic data block and treated that the vertical data block in part is held
Row convolution algorithm;
Alternatively, the based process circuit, is specifically used for starting second mapping circuit according to the vertical data block in the part
Associated identification data block is handled to obtain treated basic data block to the basic data block;Treated to described
The vertical data block of basic data block and the part executes convolution algorithm.
7. the integrated circuit chip device according to any one of claim 2-4, which is characterized in that
The main process task circuit, specifically for by the vertical data block or treated that vertical data block is divided into multiple portions
Vertical data block, by the vertical data block in the multiple part by repeatedly broadcasting to the based process circuit;Alternatively,
The main process task circuit, specifically for by the vertical data block or treated vertical data block by once broadcasting
To the based process circuit.
8. the integrated circuit chip device according to any one of claim 2-4, which is characterized in that
The based process circuit is specifically used for executing the vertical data block in the part and the basic data block into an inner product processing
After obtain inner product processing result, partial arithmetic result is obtained by the inner product processing result is cumulative, by the partial arithmetic result
It is sent to the main process task circuit;Alternatively,
The based process circuit executes the vertical data block in the part and n specifically for the multiplexing vertical data block in the n times part
The inner product operation of the basic data block obtains n part processing result, and n are obtained after n part processing result is added up respectively
The n partial arithmetic result is sent to main process task circuit by partial arithmetic result, and the n is the integer more than or equal to 2.
9. integrated circuit chip device according to claim 1, which is characterized in that
The main process task circuit includes: buffer circuit on master register or main leaf;
The based process circuit includes: base register or basic on piece buffer circuit.
10. integrated circuit chip device according to claim 1, which is characterized in that
A kind of input block are as follows: or any combination in matrix, three-dimensional data block, 4 D data block and n dimensional data block;
The convolution kernel data block are as follows: a kind of in matrix, three-dimensional data block, 4 D data block and n dimensional data block or any group
It closes.
11. a kind of chip, which is characterized in that device of the integrated chip as described in claim 1-10 any one.
12. a kind of operation method of neural network, which is characterized in that the method is applied in integrated circuit chip device, institute
Stating integrated circuit chip device includes: the integrated circuit chip device as described in claim 1-10 any one, described integrated
Circuit chip device is used to execute the convolution algorithm of neural network.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010617208.3A CN111767997B (en) | 2018-02-27 | 2018-02-27 | Integrated circuit chip device and related products |
CN201810164331.7A CN110197269B (en) | 2018-02-27 | 2018-02-27 | Integrated circuit chip device and related product |
PCT/CN2019/076088 WO2019165946A1 (en) | 2018-02-27 | 2019-02-25 | Integrated circuit chip device, board card and related product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810164331.7A CN110197269B (en) | 2018-02-27 | 2018-02-27 | Integrated circuit chip device and related product |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010617208.3A Division CN111767997B (en) | 2018-02-27 | 2018-02-27 | Integrated circuit chip device and related products |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110197269A true CN110197269A (en) | 2019-09-03 |
CN110197269B CN110197269B (en) | 2020-12-29 |
Family
ID=67750912
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010617208.3A Active CN111767997B (en) | 2018-02-27 | 2018-02-27 | Integrated circuit chip device and related products |
CN201810164331.7A Active CN110197269B (en) | 2018-02-27 | 2018-02-27 | Integrated circuit chip device and related product |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010617208.3A Active CN111767997B (en) | 2018-02-27 | 2018-02-27 | Integrated circuit chip device and related products |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN111767997B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106126481A (en) * | 2016-06-29 | 2016-11-16 | 华为技术有限公司 | A kind of computing engines and electronic equipment |
CN106447034A (en) * | 2016-10-27 | 2017-02-22 | 中国科学院计算技术研究所 | Neutral network processor based on data compression, design method and chip |
CN107316078A (en) * | 2016-04-27 | 2017-11-03 | 北京中科寒武纪科技有限公司 | Apparatus and method for performing artificial neural network self study computing |
CN107609641A (en) * | 2017-08-30 | 2018-01-19 | 清华大学 | Sparse neural network framework and its implementation |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104915322B (en) * | 2015-06-09 | 2018-05-01 | 中国人民解放军国防科学技术大学 | A kind of hardware-accelerated method of convolutional neural networks |
CN107329936A (en) * | 2016-04-29 | 2017-11-07 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing neural network computing and matrix/vector computing |
-
2018
- 2018-02-27 CN CN202010617208.3A patent/CN111767997B/en active Active
- 2018-02-27 CN CN201810164331.7A patent/CN110197269B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107316078A (en) * | 2016-04-27 | 2017-11-03 | 北京中科寒武纪科技有限公司 | Apparatus and method for performing artificial neural network self study computing |
CN106126481A (en) * | 2016-06-29 | 2016-11-16 | 华为技术有限公司 | A kind of computing engines and electronic equipment |
CN106447034A (en) * | 2016-10-27 | 2017-02-22 | 中国科学院计算技术研究所 | Neutral network processor based on data compression, design method and chip |
CN107609641A (en) * | 2017-08-30 | 2018-01-19 | 清华大学 | Sparse neural network framework and its implementation |
Non-Patent Citations (1)
Title |
---|
YUNJI CHEN ET AL.: "DaDianNao: A Machine-Learning Supercomputer", 《IEEE》 * |
Also Published As
Publication number | Publication date |
---|---|
CN110197269B (en) | 2020-12-29 |
CN111767997A (en) | 2020-10-13 |
CN111767997B (en) | 2023-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110197270A (en) | Integrated circuit chip device and Related product | |
CN109993301A (en) | Neural metwork training device and Related product | |
CN110163334A (en) | Integrated circuit chip device and Related product | |
CN108170640A (en) | The method of its progress operation of neural network computing device and application | |
CN110197274A (en) | Integrated circuit chip device and Related product | |
CN109993291A (en) | Integrated circuit chip device and Related product | |
CN111160542A (en) | Integrated circuit chip device and related product | |
CN110197263A (en) | Integrated circuit chip device and Related product | |
CN109993290A (en) | Integrated circuit chip device and Related product | |
CN110197268A (en) | Integrated circuit chip device and Related product | |
CN110197269A (en) | Integrated circuit chip device and Related product | |
CN110197265A (en) | Integrated circuit chip device and Related product | |
CN110197275A (en) | Integrated circuit chip device and Related product | |
CN110197271A (en) | Integrated circuit chip device and Related product | |
CN110197264A (en) | Neural network processor board and Related product | |
CN110197272A (en) | Integrated circuit chip device and Related product | |
WO2019129302A1 (en) | Integrated circuit chip device and related product | |
CN102710307B (en) | Pairing method and device among user terminals | |
CN110197267A (en) | Neural network processor board and Related product | |
CN110197273A (en) | Integrated circuit chip device and Related product | |
CN109993289A (en) | Integrated circuit chip device and Related product | |
CN110197266A (en) | Integrated circuit chip device and Related product | |
CN109978151A (en) | Neural network processor board and Related product | |
CN105553723B (en) | A kind of Virtual Cluster laying method of network flow perception | |
CN105228249B (en) | A kind of sub-carrier wave distribution method, relevant apparatus and base station |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20201127 Address after: Room 611-194, R & D center building, China (Hefei) international intelligent voice Industrial Park, 3333 Xiyou Road, hi tech Zone, Hefei City, Anhui Province Applicant after: Anhui Cambrian Information Technology Co., Ltd Address before: 200120, No. two, No. 888, West Road, Nanhui new town, Shanghai, Pudong New Area Applicant before: Shanghai Cambricon Information Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |