CN110197269A - Integrated circuit chip device and Related product - Google Patents

Integrated circuit chip device and Related product Download PDF

Info

Publication number
CN110197269A
CN110197269A CN201810164331.7A CN201810164331A CN110197269A CN 110197269 A CN110197269 A CN 110197269A CN 201810164331 A CN201810164331 A CN 201810164331A CN 110197269 A CN110197269 A CN 110197269A
Authority
CN
China
Prior art keywords
data block
circuit
data
based process
vertical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810164331.7A
Other languages
Chinese (zh)
Other versions
CN110197269B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN202010617208.3A priority Critical patent/CN111767997B/en
Priority to CN201810164331.7A priority patent/CN110197269B/en
Priority to PCT/CN2019/076088 priority patent/WO2019165946A1/en
Publication of CN110197269A publication Critical patent/CN110197269A/en
Application granted granted Critical
Publication of CN110197269B publication Critical patent/CN110197269B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Neurology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Image Processing (AREA)
  • Logic Circuits (AREA)

Abstract

It includes: main process task circuit and multiple based process circuits that present disclosure, which provides a kind of integrated circuit chip device and Related product, the integrated circuit chip device,;It includes the first mapping circuit that at least one circuit, which includes: the main process task circuit, in the main process task circuit or multiple based process circuits, at least one circuit includes the second mapping circuit in the multiple based process circuit, and first mapping circuit and second mapping circuit are used to execute the compression processing of each data in neural network computing;The multiple based process circuit is in array distribution;Each based process circuit and other adjacent based process circuit connections, m based process circuit of n based process circuit of the 1st row of main process task circuit connection, n based process circuit of m row and the 1st column.The advantage that the technical solution that present disclosure provides has calculation amount small, low in energy consumption.

Description

Integrated circuit chip device and Related product
Technical field
Present disclosure is related to field of neural networks more particularly to a kind of integrated circuit chip device and Related product.
Background technique
Artificial neural network (Artificial Neural Network, i.e. ANN), it is artificial since being the 1980s The research hotspot that smart field rises.It is abstracted human brain neuroid from information processing angle, and it is simple to establish certain Model is formed different networks by different connection types.Neural network or class are also often directly referred to as in engineering and academia Neural network.Neural network is a kind of operational model, is constituted by being coupled to each other between a large amount of node (or neuron).It is existing Neural network operation be based on CPU (Central Processing Unit, central processing unit) or GPU (English: Graphics Processing Unit, graphics processor) Lai Shixian neural network operation, such operation it is computationally intensive, Power consumption is high.
Summary of the invention
Present disclosure embodiment provides a kind of integrated circuit chip device and Related product, can promote the processing of computing device Speed improves efficiency.
In a first aspect, providing a kind of integrated circuit chip device, the integrated circuit chip device includes: main process task circuit And multiple based process circuits;The main process task circuit includes the first mapping circuit, in the multiple based process circuit extremely A few circuit (i.e. part or all of based process circuit) includes the second mapping circuit, first mapping circuit and described Second mapping circuit is used to execute the compression processing of each data in neural network computing;
The multiple based process circuit is in array distribution;Each based process circuit and other adjacent based process electricity Road connection, the n based process circuit and the 1st of n based process circuit of the 1st row of main process task circuit connection, m row M based process circuit of column;
The main process task circuit, for obtaining input block, convolution kernel data block and convolution instruction, according to the volume The input block is divided into vertical data block by product instruction, and the convolution kernel data block is divided into lateral data block;According to It determines that the first mapping circuit of starting handles the first data block according to the operation control of convolution instruction, obtains that treated First data block;First data block includes the lateral data block and/or the vertical data block;Refer to according to the convolution By treated, the first data block is sent at least one base in the based process circuit being connected with the main process task circuit for order Plinth processing circuit;
The multiple based process circuit, the operation control for instructing according to the convolution determine whether that starting second is reflected Transmit-receive radio road handles the second data block, and according to treated, the second data block executes the fortune in neural network in a parallel fashion Calculation obtains operation result, and the operation result is described by giving with the based process circuit transmission of the main process task circuit connection Main process task circuit;Second data block is the data that the reception main process task circuit that the based process circuit determines is sent Block, second data block and treated first data block associated;
The main process task circuit obtains the instruction results of the convolution instruction for handling the operation result.
Second aspect, provides a kind of neural network computing device, and the neural network computing device includes one or more The integrated circuit chip device that first aspect provides.
The third aspect, provides a kind of combined treatment device, and the combined treatment device includes: the nerve that second aspect provides Network operations device, general interconnecting interface and general processing unit;
The neural network computing device is connect by the general interconnecting interface with the general processing unit.
Fourth aspect, provides a kind of chip, the device or third of the device of the integrated chip first aspect, second aspect The device of aspect.
5th aspect, provides a kind of electronic equipment, the electronic equipment includes the chip of fourth aspect.
6th aspect, provides a kind of operation method of neural network, and the method is applied in integrated circuit chip device, The integrated circuit chip device includes: integrated circuit chip device described in first aspect, the integrated circuit chip device For executing the operation of neural network.
As can be seen that operation will be carried out again by providing mapping circuit by present disclosure embodiment after data block compression processing, section Transfer resource and computing resource are saved, so it is with low in energy consumption, the small advantage of calculation amount.
Detailed description of the invention
Fig. 1 a is a kind of integrated circuit chip device structural schematic diagram.
Fig. 1 b is another integrated circuit chip device structural schematic diagram.
Fig. 1 c is a kind of structural schematic diagram of based process circuit.
Fig. 1 d is a kind of structural schematic diagram of main process task circuit.
Fig. 2 a is a kind of application method schematic diagram of based process circuit.
Fig. 2 b is a kind of main process task circuit transmission schematic diagram data.
Fig. 2 c is Matrix Multiplication with the schematic diagram of vector.
Fig. 2 d is a kind of integrated circuit chip device structural schematic diagram.
Fig. 2 e is another integrated circuit chip device structural schematic diagram.
Fig. 2 f is Matrix Multiplication with the schematic diagram of matrix.
Fig. 3 a is convolution input data schematic diagram.
Fig. 3 b is convolution kernel schematic diagram.
Fig. 3 c is the operation window schematic diagram of a three-dimensional data block of input data.
Fig. 3 d is another operation window schematic diagram of a three-dimensional data block of input data.
Fig. 3 e is the another operation window schematic diagram of a three-dimensional data block of input data
Fig. 4 is a kind of structural schematic diagram for neural network chip that present disclosure embodiment stream provides;
Fig. 5 a- Fig. 5 b is the structural schematic diagram of two kinds of mapping circuits provided by the embodiments of the present application.
Specific embodiment
In order to make those skilled in the art more fully understand present disclosure scheme, below in conjunction in present disclosure embodiment The technical solution in present disclosure embodiment is clearly and completely described in attached drawing, it is clear that described embodiment is only Present disclosure a part of the embodiment, instead of all the embodiments.Based on the embodiment in present disclosure, those of ordinary skill in the art Every other embodiment obtained without creative efforts belongs to the range of present disclosure protection.
In the device that first aspect provides, the main process task circuit, for obtaining input block, convolution kernel data block And convolution instruction, it is instructed according to the convolution and the input block is divided into vertical data block, by the convolution nucleus number Lateral data block is divided into according to block;Operation control according to convolution instruction determines the first mapping circuit of starting to the first data Block is handled, first data block that obtains that treated;First data block includes the lateral data block and/or described perpendicular To data block;According to convolution instruction, by treated, the first data block is sent to the basis being connected with the main process task circuit At least one based process circuit in processing circuit;
The multiple based process circuit, the operation control for instructing according to the convolution determine whether that starting second is reflected Transmit-receive radio road handles the second data block, and according to treated, the second data block executes the fortune in neural network in a parallel fashion Calculation obtains operation result, and the operation result is described by giving with the based process circuit transmission of the main process task circuit connection Main process task circuit;Second data block is the data that the reception main process task circuit that the based process circuit determines is sent Block, second data block and treated first data block associated;
The main process task circuit obtains the instruction results of the convolution instruction for handling the operation result.
In the device that first aspect provides, when first data block includes lateral data block and vertical data block, The main process task circuit, be specifically used for starting first mapping circuit to the lateral data block and the vertical data block into The identification data block of lateral data block that row processing obtains that treated and the transverse direction data block associated, vertical data that treated The identification data block of block and the vertical data block associated;Lateral data block that treated by described in and the transverse direction data block are closed The identification data block of connection carries out deconsolidation process and obtains multiple basic data blocks and the basic data block respectively associated mark Data block, by the multiple basic data block and the multiple basic data block respectively associated identification data block be distributed to Its based process circuit connected, by treated the vertical data block and the identification data block of the vertical data block associated It broadcasts to based process circuit connected to it;Wherein, direct index or step-length index specifically can be used in the identification data block Mode indicate, optionally go back list (List of Lists, LIL), the list of coordinates (Coordinate of freelist List, COO), compression loose line (Compressed Sparse Row, CSR), sparse column (the Compressed Sparse of compression Column, CSC), (ELL Pack, ELL) and mixing (Hybird, HYB) etc. modes indicate that the application is without limitation.
By taking the identification data block indicates in the way of direct index as an example, the identification data block is concretely by 0 The data block constituted with 1, wherein the absolute value for the data (such as weight or input neuron) for including in 0 expression data block is less than Or it is equal to first threshold, the absolute value for the data (such as weight or input neuron) for including in 1 expression data block is greater than the first threshold Value, first threshold is user side or device side is customized is randomly provided, such as 0.05,0 etc..
To save volume of transmitted data, improve data transfer efficiency, in the main process task circuit to the based process circuit Send data during, specifically can by the multiple basic data block target data and the multiple basic data block Respective associated identification data block is distributed to based process circuit connected to it;It is optional, it can also will described that treated be vertical The identification data block of target data and the vertical data block associated in data block is broadcasted to based process electricity connected to it Road.Wherein, the target data refers to that absolute value is greater than the data of first threshold in data block, or refers to data block (here Lateral data block that concretely treated or treated vertical data block) in non-zero data.
Correspondingly, the based process circuit is specifically used for starting second mapping circuit according to the vertical data The associated identification data block of block and the associated mark data of the basic data block obtain connection identifier data block, and according to described Connection identifier data block is handled to obtain treated vertical data block to the vertical data block and the basic data block And basic data block;Are executed by convolution algorithm and obtains operation result for treated the vertical data block and basic data block, it will The operation result is sent to the main process task circuit;
The main process task circuit obtains described instruction result for handling the operation result.
For example, lateral data block is M1Row N1The matrix of column, basic data block M2Row N2The matrix of column, wherein M1>M2, N1 >N2.Correspondingly, the identification data block of the transverse direction data block associated is equally also M1Row N1The matrix of column, basic data block association Identification data block be similarly M2Row N2The matrix of column.By taking basic data block is the matrix of 2*2 as an example, it is set asThe One threshold value is 0.05, then the associated identification data block of the basic data block isIt is reflected about the first mapping circuit and second The processing of data block will be specifically addressed in transmit-receive radio road later.
In the device that first aspect provides, when first data block includes lateral data block, the main process task electricity Road is handled to obtain treated lateral data block to the lateral data block specifically for starting first mapping circuit And the identification data block of the transverse direction data block associated, or starting first mapping circuit is according to the lateral number prestored Handled to obtain treated lateral data block to the lateral data block according to the associated identification data block of block;By the processing The identification data block of lateral data block and the transverse direction data block associated afterwards carries out deconsolidation process and obtains multiple basic data blocks And the respective associated identification data block of the basic data block, by the multiple basic data block and the multiple basic number According to block, respectively associated identification data block is distributed to based process circuit connected to it, by the vertical data block broadcast to Its based process circuit connected;
It is associated according to the basic data block to be specifically used for starting second mapping circuit for the based process circuit Identification data block handles the vertical data block, the vertical data block that obtains that treated;That treated is vertical to described Data block and treated the basic data block execute convolution algorithm and obtain operation result, and the operation result is sent to institute State main process task circuit.
In an alternative embodiment, the main process task circuit, also particularly useful for by the vertical data block or that treated is perpendicular To the identification data block of data block and the vertical data block associated carry out deconsolidation process obtain the vertical data block of multiple portions with And the respective associated identification data block of the multiple vertical data block in part;By the vertical data block in the multiple part and described Respectively associated identification data block by one or many is broadcast to the based process circuit to the vertical data block of multiple portions;Its In, the multiple vertical data block combinations in part form the vertical data block or treated vertical data block.
Correspondingly, it is vertical according to the part to be specifically used for starting second mapping circuit for the based process circuit The identification data block of data block associated and the associated identification data block of the basic data block obtain connection identifier data block;Root Handled to obtain that treated to the vertical data block in the part and the basic data block according to the connection identifier data The vertical data block in part and treated basic data block;To treated the vertical data block in part and the processing Basic data block afterwards executes convolution algorithm.
Wherein, which is by the associated identification data block of the basic data block and the part The identification data block of vertical data block associated carries out the data block obtained by element and operation.Optionally, the connection identifier number It is used to indicate that data in two data blocks (specially basic data block and vertical data block) to be all larger than the number of absolute value according to block According to.Specifically it is described in detail later.
For example, the matrix that the identification data block of lateral data block associated is 2*3Partially vertical data block associated Identification data block be 2*2 matrixThen the corresponding connection identifier data block obtained is
In the device that first aspect provides, when first data block includes vertical data block, the main process task electricity Road is handled the vertical data block specifically for starting first mapping circuit, the vertical data that obtain that treated The identification data block of block and the vertical data block associated, or starting first mapping circuit are described vertical according to what is prestored The identification data block of data block associated is handled to obtain treated vertical data block to the vertical data block;To the cross Deconsolidation process, which is carried out, to data block obtains multiple basic data blocks;The multiple basic data block is distributed to base connected to it Plinth processing circuit, by the identification data block of treated the vertical data block and the vertical data block associated broadcast to its The based process circuit of connection;
The based process circuit, specifically for starting second mapping circuit according to the vertical data block associated Identification data block is handled to obtain treated basic data block to the basic data block;To treated the vertical number Inner product operation is executed according to block and treated the basic data block and obtains operation result, the operation result is sent to described Main process task circuit.
In an alternative embodiment, the main process task circuit, also particularly useful for will treated the vertical data block and should The identification data block of vertical data block associated carries out deconsolidation process and obtains the vertical data block of multiple portions and the multiple part The identification data block of vertical data block associated;By the vertical data block of the vertical data block in the multiple part and the multiple part Respective associated identification data block is broadcast to the based process circuit by one or many;Wherein, the multiple part is perpendicular The vertical data block or treated vertical data block are formed to data block combinations.
Correspondingly, the based process circuit is specifically used for the identification data block according to the vertical data block associated in the part Handled to obtain treated basic data block to the basic data block;To treated basic data block and the institute It states the vertical data block in part and executes inner product operation.
In the device that first aspect provides, the main process task circuit is specifically used for the vertical data block (concretely The vertical data block or treated vertical data block) pass through a broadcast transmission to the based process connected to it Circuit.
In the device that first aspect provides, the based process circuit is specifically used for (similarly may be used the basic data block For the basic data block or treated basic data block) it with the vertical data block executes inner product and handles to obtain inner product processing and tie The inner product processing result is added up and obtains operation result, the operation result is sent to the main process task circuit by fruit.
In the device that first aspect provides, the based process circuit, is specifically used for the basic data block and this is vertical Data block executes product calculation and obtains result of product, and the result of product is added up and obtains operation result, by the operation result It is sent to the main process task circuit;
The main process task circuit arranges the accumulation result for obtaining accumulation result after adding up to the operation result Obtain described instruction result.
In the device that first aspect provides, the main process task circuit is more specifically for the vertical data block to be divided into A vertical data block in part, by the vertical data block in the multiple part by repeatedly broadcasting to the based process circuit;It is described The vertical data block combinations of multiple portions form the vertical data block.
In the device that first aspect provides, the based process circuit is specifically used for the vertical data block in the part (tool Body can be the vertical data block in part or treated the vertical data block in part) inner product is executed with the basic data block handles After obtain inner product processing result, partial arithmetic result is obtained by the inner product processing result is cumulative, by the partial arithmetic result It is sent to the main process task circuit.Here basic data block is by taking core 3*3 as an example, the vertical data block in the part by taking 3*3 matrix as an example, It distinguishes 3*3 matrix and core 3*3 executes corresponding position multiplication, then its corresponding inner product result has 3 inner product processing results, 3 inner product processing results are added up and obtain partial arithmetic result.3 inner product processing result Out0 (the 0th row of 3*3 matrix and cores The inner product of the 0th row of 3*3), Out1 (inner product of 3*3 matrix the 1st row and the 1st row of core 3*3), Out2 (the 2nd row of 3*3 matrix and core 3*3 The inner product of 2nd row) it is specifically as follows:
Out0=r00*k0 [0]+r01*k0 [1]+r02*k0 [2]
Out1=r10*k1 [0]+r11*k1 [1]+r12*k1 [2]
Out2=r20*k2 [0]+r21*k2 [1]+r22*k2 [2]
Wherein, the r of r00 indicates the vertical data block in part, and 00 indicates the 0th column element of the 0th row.
K0 [0], k indicate basic data block, 0 [0] indicate the 0th row the 0th column element;
Partial arithmetic result=Out0+Out1+Out2.
In the device that first aspect provides, the based process circuit is specifically used for the multiplexing vertical data in the n times part Block executes the vertical data block in the part and the n basic data block inner product operation obtains n part processing result, by n part Processing result obtains n partial arithmetic result after adding up respectively, and the n partial arithmetic result is sent to main process task circuit, The n is the integer more than or equal to 2.
Here for basic data block by taking p core 3*3 as an example, the vertical data block in the part is multiplexed p 3*3 by taking 3*3 matrix as an example Matrix executes p corresponding position multiplication with core 3*3 respectively, and each operation, that is, corresponding inner product result has p inner product as a result, 3 A inner product result forms one group of inner product operation as a result, every group in p group of 3 inner product results, which are added up, obtains p partial arithmetic knot Fruit.
In the device that first aspect provides, the main process task circuit includes: buffer circuit on master register or main leaf;
The based process circuit includes: base register or basic on piece buffer circuit.
In the device that first aspect provides, the main process task circuit includes: vector operation device circuit, arithmetic logic unit One of circuit, accumulator circuit, matrix transposition circuit, direct memory access circuit or data rearrangement circuit or any group It closes.
In the device that first aspect provides, the input block and the available tensor table of the convolution kernel data block Show, concretely: a kind of or any combination in vector, matrix, three-dimensional data block, 4 D data block and n dimensional data block.
A refering to fig. 1, Fig. 1 a are a kind of integrated circuit chip device that present disclosure provides, the integrated circuit chip device packet Include: main process task circuit and multiple based process circuits, the multiple based process circuit are arranged in array (m*n array), wherein M, the value range of n is that at least one value is more than or equal to 2 in integer and m, n more than or equal to 1.For m*n array distribution Multiple based process circuits, each based process circuit and adjacent based process circuit connection, the main process task circuit connection K based process circuit of multiple based process circuits, the k based process circuit can be with are as follows: at n basis of the 1st row Manage m based process circuit of circuit, n based process circuit of m row and the 1st column.Integrated circuit as shown in Figure 1a Chip apparatus, main process task circuit include the first mapping circuit, and first mapping circuit is used to carry out compression processing to data, with Obtain treated data and mark data.Whether the absolute value that the mark data is used to indicate the data is greater than the first threshold Value.Further, the main process task circuit can only by treated data, (concretely absolute value be greater than the number of first threshold According to) and the mark data of the data correlation be sent to based process circuit.Advantage is: reduction is sent in based process circuit The data volume of data processing is carried out, data processing rate is promoted.The first threshold is user side or the customized setting of device side, Such as 0.05,0.5 etc., without limitation.
For example, the input data of main process task circuit is matrix data blockBy the first mapping It can get that treated that matrix data block is after processing of circuitThe associated mark data of matrix data block Block isSpecific processing about the first mapping circuit will be described in detail later.
Correspondingly, when main process task circuit distributes data to based process circuit, 1 and 0.5 the two data can be only sent, And non-sent treated matrix data block, 8 data;It also needs the associated identification data block of matrix data block together simultaneously It is sent to based process circuit, so as to based process circuit identification data block and received two data (1 He based on the received 0.5), correspondence knows that the two data are located at the position of original matrix data block.It is that the based process circuit can be according to reception Identification data block and received data, the corresponding matrix data block that restores in main process task circuit that treated.
At least one based process circuit in multiple tandem circuits is (some or all of in i.e. multiple based process circuits Based process circuit) it may include the second mapping circuit.Specifically, can have part basis processing in multiple based process circuits Circuit includes the second mapping circuit, such as in optinal plan, can be by k the second mapping circuit of based process circuit configuration, this N based process circuit of sample can be each responsible for carrying out compression processing step to the data of m based process circuit of this column.This Setting can be improved operation efficiency, reduce power consumption, because for n based process circuit of the 1st row, since it connects at first The data of main process task circuit transmission are received, then the data received progress compression processing can be reduced subsequent based process The calculation amount of circuit and the amount transmitted with the data of subsequent based process circuit, similarly, for m based process of first row The second mapping circuit of circuit configuration also has the advantages that calculation amount is small and low in energy consumption.In addition, main process task circuit can according to the structure To use dynamic data sending strategy, for example, m based process circuit broadcast data of the main process task circuit to the 1st column, main place It manages circuit and sends distribution data to n based process circuit of the 1st row.Specific processing about the second mapping circuit will later It is described in detail.
The main process task circuit, for execute each continuous operation in neural network computing and with it is coupled The based process circuit transmission data;It above-mentioned continuous operation but is not limited to: accumulating operation, ALU operation, activation operation etc. Operation.
The multiple based process circuit, for executing the fortune in neural network in a parallel fashion according to the data of transmission It calculates, and by operation result by giving the main process task circuit with the based process circuit transmission of the main process task circuit connection.On Stating the operation that parallel mode executes in neural network includes but is not limited to: inner product operation, matrix or vector multiplication operation etc..
Main process task circuit may include: data transmitting line, data receiver circuit or interface, which can be with Integrated transverse direction data distribution circuit and vertical data distribution circuit, certainly in practical applications, lateral data distribution circuit with And vertical data distribution circuit can also be respectively set.For lateral data, that is, need to send according to line direction (or lateral) To the data of each based process circuit, such as the basis being sent to lateral data in Fig. 1 a in any row or multirow in m row Processing circuit.For vertical data, that is, need selectively to be sent to part basis processing circuit according to column direction (or vertical) Data, specifically, the convolution input data of convolution algorithm needs to be sent to all based process circuits such as convolution algorithm, All its is vertical data, and convolution kernel needs are selectively sent to part basis data block, so convolution kernel is lateral data. Lateral data specifically select to be sent to that based process circuit mode can by main process task circuit according to load and its He is specifically determined the method for salary distribution.It, can be by data with the forms of broadcasting for the sending method of vertical data or lateral data It is sent to each based process circuit.(in practical applications, lateral/vertical data are sent to by way of once broadcasting Lateral/vertical data can also be sent to each based process electricity by way of repeatedly broadcasting by each based process circuit Road, present disclosure specific embodiment are not intended to limit the number of above-mentioned broadcast).Optionally, main for above-mentioned transverse direction/vertical data Processing circuit can also selectively be sent to part basis processing circuit.
Main process task circuit (as shown in Figure 1 d) may include register and/or on piece buffer circuit, and the main process task circuit is also It may include: control circuit, vector operation device circuit, ALU (arithmetic and logic unit, arithmetic logic unit) The circuits such as circuit, accumulator circuit, DMA (Direct Memory Access, direct memory access) circuit, are actually being answered certainly In, above-mentioned main process task circuit can also be added, conversion circuit (such as matrix transposition circuit), data rearrangement circuit or activation electricity Road etc. others circuit.
Each based process circuit may include base register and/or basic on piece buffer circuit;Each based process Circuit can also include: one or any combination in inner product operation device circuit, vector operation device circuit, accumulator circuit etc..On Stating inner product operation device circuit, vector operation device circuit, accumulator circuit can be integrated circuit, above-mentioned inner product operation device electricity Road, vector operation device circuit, accumulator circuit may be the circuit being separately provided.
Optionally, the cumulative fortune of inner product operation can be executed for the accumulator circuit of n based process circuit of m row It calculates, because this product knot for arranging all based process circuits can be received for m row based process circuit Fruit, and the accumulating operation of inner product operation is executed to the accumulating operation of inner product operation by n based process circuit of m row, this Sample can effectively distribute computing resource, have the advantages that save power consumption.This technical solution is larger particularly with m quantity Shi Gengwei is applicable in.
The circuit of execution can be distributed by main process task circuit for the compression processing of data, specifically, can be by aobvious Show or implicit mode distribute the circuit of execution, for display mode, main process task circuit can configure special instruction or Instruction determines when based process circuit receives the special instruction or instruction and executes data compression process, such as based process electricity When road does not receive special instruction or instruction, the compression processing for not executing data is determined.For another example, it can be held in a manner of hint Row, for example, it (i.e. containing 0, or includes the data less than preset threshold greater than present count that based process circuit, which receives sparse data, Amount) and determine when needing to be implemented inner product operation, compression processing will be carried out to sparse data.It is special for the mode of display configuration Instruction or instruction can configure a descending series, and the every of the descending series passes through a based process circuit, and numerical value subtracts 1, base Plinth processing circuit reads the value of the descending series, if the value is greater than zero, then data compression process is executed, as the value is equal to or less than Zero, then do not execute data compression process.This setting is configured according to the based process circuit of array distribution, such as the For m based process circuit of i column, main process task circuit needs the based process circuit of front 5 to execute data compression process, then Main process task circuit issues a special instruction, which includes descending series, and the initial value of the descending series can be 5, then every to pass through a based process circuit, the value of descending series subtracts 1, when to the 5th based process circuit, the descending series Value be 1, when to the 6th based process circuit, which is 0, and the 6th based process circuit will not be execute should at this time Data compression process, such mode can allow the executing subject of the dynamic configuration data compression processing of main process task circuit with And execute number.
Present disclosure one embodiment provides a kind of integrated circuit chip device, including a main process task circuit (can also claim For master unit) and multiple based process circuits (being referred to as base unit);The structure of embodiment is as shown in Figure 1 b;Wherein, It is the internal structure of the neural network computing device in dotted line frame;The arrow of grey filling indicates at main process task circuit and basis The data transmission path between gate array is managed, hollow arrow indicates each based process circuit in based process gate array Data transmission path between (adjacent foundation processing circuit).Wherein, the length and width length of based process gate array can be different, I.e. the value of m, n can be different, naturally it is also possible to which identical, present disclosure is not intended to limit the occurrence of above-mentioned value.
The circuit structure of based process circuit is as illustrated in figure 1 c;Dotted line frame indicates the boundary of based process circuit in figure, with The block arrow that dotted line frame is intersected indicates that (be directed toward in dotted line frame is input channel to data I/O channel, it is indicated that dotted line frame is defeated Channel out);Rectangle frame in dotted line frame indicates storage unit circuit (register and/or on piece caching), including input data 1, Input data 2, multiplication or inner product are as a result, cumulative data;Diamond indicates calculator circuit, including multiplication or inner product operation device, Adder.
In the present embodiment, the neural network computing device includes a main process task circuit and 16 based process circuits (16 based process circuits are just to for example, in practical applications, can use other numerical value);
In the present embodiment, there are two Data Input Interface, two data output interfaces for based process circuit;After this example In continuous description, lateral input interface (being directed toward the lateral arrows of this unit in Fig. 1 b) is referred to as and inputs 0, vertical input interface (the vertical arrow of this unit is directed toward in Fig. 1 b), which is referred to as, inputs 1;By each lateral data output interface (from this list in Fig. 1 b The lateral arrows that member is pointed out) it is referred to as output 0, vertical data output interface (the vertical arrow pointed out in Fig. 1 b from this unit) claims Make output 1.
The Data Input Interface and data output interface of each based process circuit can be separately connected different units, Including main process task circuit and other based process circuits;
In this example, the input 0 of based process circuit 0,4,8,12 (number is shown in Fig. 1 b) this four based process circuits and master The data output interface of processing circuit connects;
In this example, the input 1 of 0,1,2,3 this four based process circuits of based process circuit and the data of main process task circuit Output interface connection;
In this example, output 1 and the main process task circuit of 12,13,14,15 this four based process circuits of based process circuit Data Input Interface is connected;
In this example, the case where based process circuit output interface is connected with other based process circuit input interfaces, sees figure Shown in 1b, it will not enumerate;
Specifically, the output interface S1 of S cell is connected with the input interface P1 of P unit, indicates that P unit can be from it P1 interface is sent to the data of its S1 interface to S cell.
The present embodiment includes a main process task circuit, and main process task circuit is connected (i.e. by input interface with external device (ED) Have output interface), the Data Input Interface of a part of data output interface of main process task circuit and a part of based process circuit It is connected;A part of Data Input Interface of main process task circuit is connected with the data output interface of a part of based process circuit.
The application method of integrated circuit chip device
Involved data can be the data after compression processing in the application method that present disclosure provides.It needs Bright, the data in the application can be input neuron or weight in neural network, concretely matrix data or Vector data etc., the application is without limitation.It that is to say that data that the application is set forth below or data block can be in neural networks Neuron or weight are inputted, they can be embodied in the form of matrix or vector etc..
This application involves data compression process specifically in previously described first mapping circuit and the second mapping circuit It executes.It is to be understood that weight is more since neural network is the algorithm of a high calculation amount and high memory access, calculation amount and memory access Amount can all increase.In particular, being calculated in the case where for weight smaller (for example 0, or less than the weight of setting numerical value) to improve Rate, reduction expense need to carry out compression processing to the lesser data of these weights.In practical applications, data compression process is dilute It dredges and is applied in neural network, effect is the most obvious, such as reduces workload, reduction data overhead that data calculate, improves number According to computation rate etc..
By taking input data as an example, the specific embodiment that data compression process is related to is illustrated.The input data includes but not It is limited at least one input neuron and/or at least one weight.
In first embodiment:
First mapping circuit receives the first input data (concretely data to be calculated that main process task circuit is sent Block, such as lateral data block or vertical data block) after, first mapping circuit can be to first input data at Reason, to obtain treated the first input data with the associated mark mask data of first input data, mask data use Whether it is greater than first threshold, such as 0.5,0 in the absolute value for indicating first input data.
Specifically, the absolute value when first input data is greater than first threshold, then retain the input data;Otherwise it deletes First input data is set to 0 by first input data.For example, the matrix data block of input is First threshold is 0.05, then can get treated matrix data block after the processing of the first mapping circuit It is with the associated identification data block of matrix data block (alternatively referred to as mask matrix)
Further, to reduce volume of transmitted data, the main process task circuit is again into based process circuit connected to it When distributing data, can be transmitted in treated the matrix data block target data (be in this example 1,0.06 and 0.5) with And the associated identification data block of matrix data block.When it is implemented, the main process task circuit can will be described according to setting rule Target data in treated matrix data block is distributed in based process circuit, for example, successively send according to row sequence or Successively according to column sequence etc., the application is without limitation.Correspondingly, based process circuit receive the target data and After the target data corresponds to associated identification data block, according to setting rule (such as the row sequence) square that is reduced to that treated Battle array data block.0.5) and identification data block such as in this example, based process circuit can data (1,0.06 and based on the receivedIt would know that the corresponding matrix data block of the data (the first mapping circuit treated square i.e. in main process task circuit Battle array data block) be
In embodiments of the present invention, which can be lateral data block and/or vertical data block.
Correspondingly, the second mapping circuit carries out the second input data using the associated mark data of the first input data Processing, to obtain treated the second input data;Wherein the first input data is different from second input data.Such as When first input data is at least one weight, then second input data can be at least one input neuron; Alternatively, then second input data can be at least one when first input data is at least one input neuron Weight.
In embodiments of the present invention, second input data is different from first input data, the second input number According to can be any of following: lateral data block, basic data block, the vertical data block of vertical data block and part.
For example, then the second input data is the vertical data block in part when first input data is lateral data block. Assuming that the second input data is matrix data blockAccordingly with mask matrix in upper exampleAfter processing, obtaining that treated, the vertical data block in part isDue in practical application In, the matrix data block dimension that input data is related to is larger, and it is only for signals by the application, this does not constitute restriction.
In second embodiment:
First mapping circuit can be used for handling the first input data and the second input data, to be handled The first input data and the associated first identifier mask data of first input data afterwards, treated the second input number Accordingly and the associated second identifier mask data of second input data.Wherein, the first mask data or second Whether the absolute value that mask data are used to indicate first or second input data is greater than second threshold, which is user side Or the customized setting of device side, such as 0.05,0 etc..
Treated first input data or the second input data can be treated input data, can also be not locate Input data before reason.For example, the first input data is lateral data block, such as the matrix data block in above-mentioned exampleThe lateral data block that can get that treated after the processing of the first mapping circuit, after handling here Lateral data block can be original matrix data blockIt can also be the matrix data block after compression processingIt is to be understood that the application is data processing effect in the transmission and based process circuit for reduce data volume Rate, preferably treated the input data (such as treated basic data block or part vertically data block) should be compression Data that treated.Preferably, the data that main process task circuit is sent into based process circuit, concretely described treated Target data in input data, concretely absolute value can also be non-zero data greater than the data of preset threshold to the target data Etc..
Correspondingly in based process circuit, the second mapping circuit can be according to associated first mark of first input data Know data and the associated second identifier data of second input data obtain connection identifier data;The connection identifier data are used Absolute value is all larger than the data of third threshold value in instruction first input data and second input data, wherein third Threshold value is user side or the customized setting of device side, such as 0.05,0.Further, second mapping circuit can be according to institute It states connection identifier data respectively to handle received first input data and the second input data, to obtain, treated First input data and treated the second input data.
For example, the first input data is matrix data blockSecond input block is equally For matrix data blockIt can get first input data after the processing of the first mapping circuit to close The first identifier data block of connectionAnd treated the first input blockCorrespondingly Obtain the associated second identifier data block of second input dataTreated, and the second input block isIt correspondingly, is improve data transfer rate, it only can will treated the first input in main process task circuit Target data 1,0.06 and 0.5 and the associated first identifier data block of first input block in data block are sent to Based process circuit;Meanwhile by the target data 1,1.1,0.6,0.3 and 0.5 in treated the second input block, and The associated second identifier data block of second input block is sent to based process circuit.
Correspondingly, based process circuit, can be by the second mapping circuit to above-mentioned first mark after receiving above-mentioned data Know data block and second identifier data block carries out obtaining connection identifier data block by element and operationAccordingly Ground, the second mapping circuit is using the connection identifier data block respectively to treated first input block and treated Second input block is respectively processed, to obtain, treated that the first input block isPlace The second input block after reason isIt wherein, can be according to first identifier data block in based process circuit And the target data in received first data block, determine that the first data block where the target data is corresponding (is passed through First mapping circuit treated the first data block);Correspondingly, according to second identifier data block and received second data block In target data, (i.e. by the first mapping circuit, treated for the second data block for determining where the target data is corresponding Second data block);Then, after the second mapping circuit knows connection identifier data block, distinguished using the connection identifier data block It carries out with the first determining data block and the second data block determined by element and operation, to obtain at via the second mapping circuit The first data block after reason and treated the second data block.
In 3rd embodiment:
First mapping circuit can't be set in the main process task circuit, but third can be inputted number by the main process task circuit Accordingly and the associated third mark data of the third input data that prestores is sent in based process circuit connected to it. The second mapping circuit is provided in the based process circuit.The tool for the data compression process that the second mapping circuit is related to is described below Body embodiment.
It is to be understood that the third input data includes but is not limited to basic data block, the vertical data block in part, vertical number According to block etc..Similarly, in neural network processor, which can also be at least one weight, and/or at least one A input nerve, the application is without limitation.
In the second mapping circuit, second mapping circuit can the associated third mark of third input data based on the received Know data to handle the third input data, so that treated third input data is obtained, so as to subsequent to processing Third input data afterwards executes correlation operation, such as inner product operation.
For example, the received third input data of the second mapping circuit is matrix data blockPhase The associated third identification data block of the third input data (also at mask matrix data block) prestored with answering isFurther, the second mapping circuit handle to third input block according to third identification data block To treated, third input block is specially
In addition, the input neuron and output neuron mentioned in the embodiment of the present invention do not mean that entire neural network The neuron in neuron and output layer in input layer, but for two layers of neuron of arbitrary neighborhood in neural network, place Neuron in network feed forward operation lower layer is to input neuron, and the neuron in network feed forward operation upper layer is Output neuron.By taking convolutional neural networks as an example, it is assumed that a convolutional neural networks have L layers, K=1,2,3 ... L-1, for K For layer and K+1 layer, K layer referred to as input layer, the neuron in this layer is above-mentioned input neuron, and K+1 layers are claimed For input layer, the neuron in this layer is above-mentioned output neuron, i.e., other than top layer, each layer all can serve as to input Layer, next layer are corresponding output layer.
In 4th implementation:
In the main process task circuit and it is not provided with mapping circuit, the first mapping electricity is provided in the based process circuit Road and the second mapping circuit.About the data processing of first mapping circuit and the second mapping circuit, for details, reference can be made to aforementioned Described in one embodiment to 3rd embodiment, which is not described herein again.
Optionally, there is also the 5th embodiments.In 5th embodiment, in the based process circuit and it is not provided with mapping electricity First mapping circuit and the second mapping circuit are arranged in main process task circuit by road, about first mapping circuit Data processing with the second mapping circuit is no longer gone to live in the household of one's in-laws on getting married here for details, reference can be made to described in aforementioned first embodiment to 3rd embodiment It states.It is that the compression processing of data is completed in main process task circuit, by treated, input data is sent to based process circuit, So that based process circuit is executed using treated input data (weight after concretely treated neuron and processing) Arithmetic operation correspondingly.
The concrete structure schematic diagram this application involves mapping circuit is described below.It possible is reflected as Fig. 5 a and 5b show two kinds Transmit-receive radio road.Wherein, mapping circuit as shown in Figure 5 a includes comparator and selector.Number about the comparator and selector Measure the application without limitation.As Fig. 5 a shows a comparator and two selectors, wherein the comparator is for determining input Whether data meet preset condition.The preset condition can be above-mentioned for the customized setting of user side or equipment side, such as the application The input data absolute value be greater than or equal to preset threshold.If meeting preset condition, comparator can determine permission The input data is exported, it is 1 which, which corresponds to associated mark data,;Otherwise it can determine and do not export the input data, or It is 0 that person, which defaults the input data,.Correspondingly, it is 0 that the input data, which corresponds to associated mark data, at this time.It that is to say, by this After comparator, the associated mark data of input data would know that.
It further, can be by the mark data of acquisition after the comparator is to the judgement of input data progress preset condition It is input in selector, so that selector decides whether to export input data correspondingly using the mark data, that is, obtains Input data that treated.
As Fig. 5 a can be in the matrix data block by comparator by taking the input data is matrix data block as an example Each data carry out the judgement of preset condition, to can get the associated identification data block of matrix data block (mask matrix). Further, the matrix data block is screened using the identification data block in first selector, by the matrix The data that absolute value is greater than or equal to preset threshold (meeting preset condition) in data block are retained, and remainder data is deleted It removes, with output treated matrix data block.Optionally, also defeated to other using the identification data block in second selector Enter data (such as second matrix data block) to be handled, such as carries out by element and operation, by the second matrix data block The data that middle absolute value is greater than or equal to preset threshold are retained, with output treated the second matrix data block.
It is to be understood that corresponding in above-mentioned the first and second embodiments, the specific structure of first mapping circuit can be wrapped Include the comparator and first selector at least one comparator and at least one selector, such as upper example in Fig. 5 a;Described The concrete outcome of two mapping circuits may include one or more selectors, such as go up the second selector of Fig. 5 a in example.
Such as Fig. 5 b, the structural schematic diagram of another mapping circuit is shown.Such as Fig. 5 b, the mapping circuit includes selector, The quantity of the selector without limitation, can be one, can also be multiple.Specifically, the selector is used for according to input Mark data associated by input data selects the input data of input, will be in the input data absolutely The data that value is greater than or equal to preset threshold are exported, and remainder data delete/do not export, to obtain, that treated is defeated Enter data.
By taking the input data is matrix data block as an example, Xiang Suoshu mapping circuit inputs the matrix data block and the square The identification data block of battle array data block associated, selector can select the matrix data block according to the identification data block, will Its absolute value is exported more than or equal to 0 data, and remainder data not exports, thus output treated matrix data Block.
It is to be understood that structure as shown in Figure 5 b can be applied to the second mapping circuit in above-mentioned 3rd embodiment, it is The concrete outcome of the second mapping circuit in above-mentioned 3rd embodiment may include at least one selector.Similarly, for main process task The first mapping circuit and the second mapping circuit that design in circuit and based process circuit can be according to as shown in figure 5 a and 5b Functional component carries out combined crosswise or component is split, and the application is without limitation.
Based on previous embodiment, it is specifically described in main process task circuit and based process circuit below at the operation for needing to complete Following method can be used to carry out for reason:
Main process task circuit first enables the first mapping circuit and handles the first input data, and to obtain, treated first Input data and the associated first identifier data of first input data;Then again will treated the first input data and The associated first identifier data of first input data are transferred to based process circuit computing.For example, main process task circuit can incite somebody to action Data (such as lateral data block/vertical data block) to be calculated are transmitted further to based process circuit after being handled, its advantage is that The bit wide that transmission data can be reduced, reduces the total bit number amount of transmission, and based process circuit executes the lesser data fortune of bit wide The efficiency of calculation is also higher, and power consumption is lower.
Based process circuit enable the second mapping circuit using the first identifier data to received second input data into Then row processing, second input data that obtains that treated again execute treated the first input data and the second input data Correlation operation.For example, based process circuit receives the second input data (such as sparse number that main process task circuit transmission comes According to vertical data block), compression processing is first carried out to it and carries out operation again, improves operation efficiency, reduces power consumption.
Optionally, main process task circuit can be first associated by the first input data (such as basic data block), the first input data First identifier data, the second input data (the vertical data block in such as part) and the associated second identifier number of the second input data According to being first transferred to based process circuit computing.
Correspondingly, after based process circuit receives data, can first enable the second mapping circuit according to first identifier data and Second identifier data obtain connection identifier data block, then defeated to the first input data and second using the connection identifier data Enter data to be handled, can also further be completed in based process circuit for treated first input data and The arithmetic operation of second input data, benefit reduce data operation quantity, improve operation efficiency, reduce power consumption.
Optionally, the associated first identifier data of the first input data and the second input data that main process task circuit is sent Associated second identifier data are to be stored in advance in the main process task circuit, or enable first for the main process task circuit and reflect Transmit-receive radio road is obtained by the first/second input data, and the application is without limitation.
The application method (such as Fig. 2 a) of based process circuit;
Main process task circuit receives input data to be calculated outside device;
Optionally, main process task circuit utilizes the various computing circuits of this unit, vector operation circuit, inner product operation device electricity Road, accumulator circuit etc. carry out calculation process to data;
Main process task circuit is by data output interface to based process gate array (the set of all based process circuits Referred to as based process gate array) send data (as shown in Figure 2 b);
The mode of transmission data herein can be to a part of based process circuit and directly transmit data, i.e. repeatedly broadcast Mode;
The mode for sending data herein can send different data, i.e. distributor to different based process circuits respectively Formula;
Based process gate array calculates data;
Based process circuit carries out operation after receiving input data;
Optionally, based process circuit transmits out the data from the data output interface of this unit after receiving data It goes;(it is transferred to other based process circuits for not receiving data from main process task circuit directly.)
Optionally, based process circuit transfers out operation result from data output interface;(results of intermediate calculations or Final calculation result)
Main process task circuit receives the output data returned from based process gate array;
Optionally, it is (such as tired to continue processing to the data received from based process gate array for main process task circuit Add or activate operation);
Main process task processing of circuit finishes, and processing result is transferred to outside device from data output interface.
Tensor, which to be completed, using the circuit device multiplies tensor operation, the previously described data block of tensor sum is identical, It can be matrix, vector, three-dimensional data block, four figures according to any one of block and high dimensional data block or multinomial combination;Below As the concrete methods of realizing of Matrix Multiplication vector sum Matrix Multiplication matrix operation is shown respectively in Fig. 2 c and 2f.
Matrix Multiplication vector operation is completed using the circuit device;(Matrix Multiplication vector can be every a line in matrix point Inner product operation is not carried out with vector, and these results are put into a vector by the sequence of corresponding row.)
Be described below calculate size be M row L column matrix S and length be L vector P multiplication operation, following Fig. 2 c It is shown.
The method uses all or part based process circuit of the neural computing device, it is assumed that uses K based process circuit;
The data in some or all of matrix S row are sent each of k based process circuit by main process task circuit Based process circuit;
In a kind of optional scheme, the data of certain row in matrix S are sent one by the control circuit of main process task circuit every time Number or a part of number give some based process circuit;(for example, for sending a number every time, it can be for for some Based process circuit, the 1st transmission the 1st number of the 3rd row, the 2nd the 2nd number sent in the 3rd row data, the 3rd transmission the 3rd The 3rd capable number ..., or for sending a part of number every time, the 1st the 3rd row the first two number of transmission (the i.e. the 1st, 2 number), Second of the 3rd row the 3rd of transmission and the 4th number, third time send the 3rd row the 5th and the 6th number ...;)
In a kind of optional scheme, the control circuit of main process task circuit respectively sends out the data of rows a few in matrix S every time Send an a part of number of number person to some based process circuit;(for example, for some based process circuit, send for the 1st time 1st number of the every row of 3,4,5 rows, the 2nd number of the 2nd the 3rd, 4, the 5 every row of row of transmission, the 3rd the 3rd, 4, the 5 every row of row of transmission 3rd number ... or the 1st transmission every row the first two number of the 3rd, 4,5 row, second sends the 3rd, 4, the 5 every row of row the 3rd and the 4 numbers, third time send the every row the 5th of the 3rd, 4,5 row and the 6th number ....)
The data in vector P are gradually sent the 0th based process circuit by the control circuit of main process task circuit;
After 0th based process circuit receives the data of vector P, send the data to coupled next A based process circuit, i.e. based process circuit 1;
Specifically, some based process circuits cannot directly obtain all numbers needed for calculating from main process task circuit According to for example, the based process circuit 1 in Fig. 2 d, only one Data Input Interface are connected with main process task circuit, so can only be straight The data that matrix S is obtained from main process task circuit are connect, and the data of vector P are just needed by the output of based process circuit 0 to basis Processing circuit 1, similarly, based process circuit 1 after also receiving data will also continue the data of vector P to export to based process Circuit 2.
Each based process circuit carries out operation to the data received, which includes but is not limited to: inner product operation, Multiplying, add operation etc.;
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will As a result it is added on register and/or on piece caching;
After based process circuit counting goes out result, result is transferred out from data output interface and (is transferred to and connects with it Other based process circuits connect);
In a kind of optinal plan, which can be the final result or intermediate result of inner product operation;
After based process circuit receives the calculated result from other based process circuits, transmit the data to Its other based process circuit or main process task circuit for being connected;
Main process task circuit receive each based process circuit inner product operation as a result, the result treatment is most terminated Fruit (processing can be accumulating operation or activation operation etc.).
The embodiment of Matrix Multiplication vector approach is realized using above-mentioned computing device:
In a kind of optinal plan, multiple based process circuits are according to shown in following Fig. 2 d or Fig. 2 e used in method Mode arrange;
As shown in Figure 2 c, main process task circuit can obtain the corresponding mask matrix of matrix S and matrix P (i.e. above respectively Mark data/the identification data block).Specifically, the corresponding mask matrix of matrix S and matrix P can be in advance It is stored in the high-speed memory in main process task circuit;It can also be main process task circuit and enable the first mapping circuit respectively according to matrix Corresponding mask matrix that S and matrix P is obtained.The M row data of matrix S are divided into K group by the control circuit of Main Processor Unit, It is responsible for the operation of i-th group (set of row is denoted as Ai in this group of data) by i-th of based process circuit respectively;Correspondingly, main place The M row data of the corresponding first mask matrix of matrix S equally can be also divided into K group by the control circuit of reason unit, and with matrix S quilt It is divided into the matrix newly formed after K group and sends jointly to based process circuit correspondingly, with complete in the based process circuit At the arithmetic operation of related data.
The method that M row data are grouped herein be arbitrarily will not duplicate allocation packet mode;
In a kind of optinal plan, use the following method of salary distribution: it is a that jth row is given jth %K (% for take the remainder operation) Based process circuit;
In a kind of optinal plan, for being unable to average packet the case where can also be right first to a part of row mean allocation It is distributed in any way in remaining row.
Data in row part or all of in matrix S are successively sent to corresponding by the control circuit of main process task circuit every time Based process circuit;Correspondingly, control circuit can also be by the mark corresponding to a few row data of this in matrix S in the first mask matrix Know data and sends jointly to corresponding based process circuit.
For example, matrix S is the matrix data block of 50*50, matrix S points can be 10 minor matrixs by main process task circuit, each The size of minor matrix is 5*50, then main process task circuit can be by the 1st minor matrix S0(5 rows 50 column) and minor matrix S0It closes The identification data block (5 rows 50 column) of connection sends jointly to the 1st based process circuit, with complete in the 1st based process circuit At the calculation process of related data.
In a kind of optinal plan, it is negative that the control circuit of main process task circuit sends it to i-th of based process circuit every time One or more data in data line in i-th group of data Mi of duty, i-th group of data Mi can be the number in matrix S According to the data being also possible in the corresponding first mask matrix of matrix S;
In a kind of optinal plan, it is negative that the control circuit of main process task circuit sends it to i-th of based process circuit every time One or more data of every row some or all of in i-th group of data Mi of duty in row;
The control circuit of main process task circuit successively sends the data in vector P to the 1st based process circuit;Accordingly Data in the associated 2nd mask matrix of vector P also can be successively sent to the 1st by the control circuit on ground, main process task circuit together A based process circuit
In a kind of optinal plan, the control circuit of main process task circuit can send vector P or vector P association every time The 2nd mask matrix in one or more data;
I-th of based process circuit, which receives, also transmittable after the data of vector P or the 2nd mask matrix gives it Connected i+1 based process circuit;
Each based process circuit receive from certain a line in matrix S or one or more data in a few rows with And after one or more data from vector P, carry out operation (including but not limited to multiplication or addition);
In the specific implementation, each based process circuit receives data in matrix S and the data in the first mask square Associated first identifier data, the data in vector P and the data associated second identifier in the 2nd mask data in battle array After data;Can connection identifier data first be obtained according to first identifier data and second identifier data;Then the connection identifier is utilized Data decide whether to execute correlation operation to the data in the data and vector P in matrix P.The connection identifier data are logical It crosses and first identifier data and second identifier data is carried out and operate obtained, can be some in 0 or 1,1 representing matrix S The data of same position are the data that absolute value is greater than preset threshold in the data and vector P of position;Conversely, 0 representing matrix S The data of same position are the data that absolute value is less than or equal to preset threshold in the data and/or vector P of middle same position.
It is that each the second mapping circuit of based process circuit start is according to the first mask matrix of matrix S and vector P It is that the 1 corresponding data in matrix S and vector P execute related operation behaviour that 2nd mask matrix, which chooses same position identification data, Make, such as multiplication, add operation etc..It that is to say, corresponded to using the first mask matrix and the 2nd mask matrix to choose matrix S With in matrix P in same position absolute value be greater than preset threshold data execute correlation operation, as multiplication operate.
For example, the data that based process circuit receives certain two row in matrix S are matrix Corresponding matrix S0Associated first mask matrixReceiving a few a data in vector P is vector P0 [1 0.01 1.1 0.6]T, vector P0Associated 2nd mask vector [1 01 1]T;Further based process circuit can It is first right to enable the second mapping circuit[1 01 1]TIt carries out obtaining connection mask matrix by element and operationFurther using connection mask matrix to received matrix S0With vector P0It is handled, thus at acquisition Matrix after reasonWith treated vector P0[1 0 0 0.6]T, so that based process circuit is for place Matrix S after reason0With treated vector P0Execute relevant arithmetic operation.
In a kind of optinal plan, if received data (concretely data to be calculated in each based process circuit Block, such as the data of a few row/columns in matrix S or vector P and the corresponding mark data in mask matrix) data volume be more than When preset threshold, which will no longer receive new input data, if main process task circuit is by the matrix S of subsequent transmission Or the data and the corresponding mark data etc. in mask matrix of the data of a few row/columns of vector P, until based process electricity Possess enough buffer/store spaces in road, then receives the data that main process task circuit is newly sent.
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will As a result it is added on register and/or on piece caching;
In a kind of optinal plan, the data that based process circuit receives are also possible to intermediate result, are stored in deposit On device and/or on piece caching;
Based process circuit by local calculated result be transferred to coupled next based process circuit or Main process task circuit;
In a kind of optinal plan, corresponding to the structure of Fig. 2 d, only the last one based process circuit of each column is defeated Outgoing interface is connected with main process task circuit, and in this case, only the last one based process circuit can directly will be local Calculated result be transferred to main process task circuit, the calculated result of other based process circuits will pass to the next base of oneself Plinth processing circuit, next based process circuit pass to down next based process circuit until being all transferred to the last one base Plinth processing circuit, the last one based process circuit is by other based process of local calculated result and this column received The result of circuit executes accumulation calculating and obtains intermediate result, and intermediate result is sent to main process task circuit;It certainly can also be for most The processing result of the result of other tandem circuits of this column and local can be transmitted directly to by the latter based process circuit Main process task circuit.
In a kind of optinal plan, corresponding to the structure of Fig. 2 e, each based process circuit has and main process task circuit Local calculated result is directly transferred to master by the output interface being connected, in this case, each based process circuit Processing circuit;
After based process circuit receives the calculated result that other based process circuits pass over, it is transferred to and its phase The next based process circuit or main process task circuit of connection.
Main process task circuit receive M inner product operation as a result, operation result as Matrix Multiplication vector.
Matrix Multiplication matrix operation is completed using the circuit device;
Be described below calculate size be M row L column matrix S and size be L row N column matrix P multiplication operation, (square Every a line in battle array S is identical as each column length of matrix P, as shown in figure 2f)
This method is illustrated using described device embodiment as shown in Figure 1 b;
First mapping circuit of main process task circuit obtains the corresponding mark mask matrix of matrix S and matrix P, such as opens Dynamic first mapping circuit is respectively handled matrix S and matrix P to obtain the corresponding first mask matrix of matrix S and be somebody's turn to do The corresponding 2nd mask matrix of matrix P;
The control circuit of main process task circuit sends the data in some or all of matrix S row to defeated by lateral data Incoming interface those of be directly connected with main process task circuit based process circuit (for example, in Fig. 1 b the grey filling of the top it is perpendicular To data path);Meanwhile control circuit can also be by the corresponding mark data some or all of in the first mask matrix in row It is sent in based process circuit connected to it.For example, control circuit is by the front two row data and the front two row in matrix S The corresponding front two row mark data in the first mask matrix of data is sent collectively to the tandem circuit being connected with main process task circuit In.
In a kind of optinal plan, the data of certain row in matrix S are sent one by the control circuit of main process task circuit every time Several or a part of number gives some based process circuit;(for example, for some based process circuit, the 1st the 3rd row of transmission 1st number, the 2nd the 2nd number sent in the 3rd row data, the 3rd number ... or the 1st hair of the 3rd the 3rd row of transmission The 3rd row the first two number is sent, second of the 3rd row the 3rd of transmission and the 4th number, third time send the 3rd row the 5th and the 6th number ...;)
Correspondingly, control circuit is simultaneously also by the mark for corresponding to row in the first mask matrix corresponding with the row in matrix S Data send one every time or a part of mark data gives some based process circuit.
In a kind of optinal plan, the control circuit of main process task circuit by the data of rows a few in matrix S and is corresponded to The mark data that several rows are corresponded in first mask matrix respectively sends a number or a part of number every time and gives some based process electricity Road;(for example, for some based process circuit, the 1st number of the 1st the 3rd, 4, the 5 every row of row of transmission, the 2nd transmission the 3rd, 2nd number of the every row of 4,5 rows, the 3rd number ... or the 1st the 3rd, 4,5 row of transmission of the 3rd the 3rd, 4, the 5 every row of row of transmission Every row the first two number, second of transmission every row the 3rd of the 3rd, 4,5 row and the 4th number, third time send every the 5th He of row of the 3rd, 4,5 row 6th number ...;)
The control circuit of main process task circuit sends the data some or all of in matrix P in column to by vertical data Input interface those of is directly connected based process circuit (for example, based process gate array is left in Fig. 1 b with main process task circuit The lateral data path of the grey filling of side);Meanwhile control circuit can will also correspond to the part in the 2nd mask matrix or complete Mark data in portion's row is sent in based process circuit connected to it.For example, control circuit is by the front two row in matrix P Data and the corresponding front two row mark data in the 2nd mask matrix of the front two row data are sent collectively to and main process task electricity In the connected tandem circuit in road.
In a kind of optinal plan, the data that certain in matrix P arranges are sent one by the control circuit of main process task circuit every time Several or a part of number gives some based process circuit;(for example, for some based process circuit, the 3rd column of the 1st transmission 1st number, the 2nd the 2nd number sent in the 3rd column data, the 3rd number ... or the 1st hair of the 3rd column of the 3rd transmission The 3rd column the first two number is sent, second of transmission the 3rd arranges the 3rd and the 4th number, and third time sends the 3rd and arranges the 5th and the 6th number ...;) Correspondingly, control circuit is also each by the mark data for corresponding to row in the 2nd mask matrix corresponding with the row in matrix P simultaneously It sends one or a part of mark data gives some based process circuit.
In a kind of optinal plan, the control circuit of main process task circuit by the data of column a few in matrix P and is corresponded to The mark data that several rows are corresponded in 2nd mask matrix respectively sends an a part of number of number person every time and gives some based process circuit; (for example, for some based process circuit, the 1st number of the 1st the 3rd, 4,5 column each column of transmission, the 2nd transmission the 3rd, 4,5 2nd number of column each column, the 3rd the 3rd number ... for sending the 3rd, 4,5 column each column or the 3rd, 4,5 column of the 1st transmission are often Column the first two number, second sends the 3rd, 4,5 column each column the 3rd and the 4th number, and third time sends the 3rd, 4,5 column each column the 5th and the 6 numbers ...;)
Based process circuit receive matrix S data and the associated first mask matrix of matrix S mark data it Afterwards, which is passed through Its lateral data output interface is transferred to its next based process circuit that is connected (for example, based process circuit in Fig. 1 b The lateral data path of white filling among array);After based process circuit receives the data of matrix P, by the data Coupled next based process circuit is transferred to (for example, basic in Fig. 1 b by its vertical data output interface The vertical data path of white filling among processing circuit array);
Each based process circuit carries out operation to the data received;Specifically, each based process circuit receives Into matrix S the corresponding first identifier data associated in the first mask matrix of the data and the data of certain a line or a few rows, In matrix P after the corresponding second identifier data associated in the 2nd mask data of the data and the data of a certain column or several column; Can connection identifier data first be obtained according to first identifier data and second identifier data;Then it is determined using the connection identifier data Whether correlation operation is executed to the data in the data and matrix P in matrix S.The connection identifier data are by first Mark data and the progress of second identifier data and operation are obtained, can be the number of some position in 0 or 1,1 representing matrix S It is the data that absolute value is greater than preset threshold according to the data with same position in matrix P;Conversely, a certain position in 0 representing matrix S The data of same position are the data that absolute value is less than or equal to preset threshold in the data and/or matrix P set.For details, reference can be made to Described in previous embodiment, which is not described herein again.
It is that each the second mapping circuit of based process circuit start is according to the first mask matrix of matrix S and matrix P 2nd mask matrix chooses the data that same position identification data is 1 and executes correlation operation, such as multiplication, add operation Etc..
In a kind of optinal plan, if received data (concretely data to be calculated in each based process circuit Block, such as the data of a few row/columns in matrix S or matrix P and the corresponding mark data in mask matrix) data volume be more than When preset threshold, which will no longer receive new input data, if main process task circuit is by the matrix S of subsequent transmission Or the data and the corresponding mark data etc. in mask matrix of the data of a few row/columns of matrix P, until based process electricity Possess enough buffer/store spaces in road, then receives the data that main process task circuit is newly sent.
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will As a result it is added on register and/or on piece caching;
After based process circuit counting goes out result, result can be transferred out from data output interface;
In a kind of optinal plan, which can be the final result or intermediate result of inner product operation;
Specifically, from the interface if the based process circuit has the output interface being directly connected with main process task circuit Transmission is as a result, if it is not, towards that directly can export result to the direction of the based process circuit of main process task circuit output (for example, bottom line based process circuit outputs it result and is directly output to main process task circuit in Fig. 1 b, other bases Processing circuit transmits downwards operation result from vertical output interface).
After based process circuit receives the calculated result from other based process circuits, transmit the data to Its other based process circuit or main process task circuit for being connected;
Towards that directly can export result to the direction of main process task circuit output, (for example, in Fig. 1 b, bottom line is basic Processing circuit outputs it result and is directly output to main process task circuit, other based process circuits are downward from vertical output interface Transmit operation result);
Main process task circuit receive each based process circuit inner product operation as a result, output result can be obtained.
The embodiment of " Matrix Multiplication matrix " method:
Method uses the based process gate array arranged according to mode as shown in Figure 1 b;
First mapping circuit of main process task circuit obtains the corresponding mark mask matrix of matrix S and matrix P, such as opens Dynamic first mapping circuit is respectively handled matrix S and matrix P to obtain the corresponding first mask matrix of matrix S and be somebody's turn to do The corresponding 2nd mask matrix of matrix P, optionally, also available treated matrix S and matrix P, it is assumed that treated matrix S There is h row, handles
The h row data of matrix S are divided into h group by the control circuit of main process task circuit, are born respectively by i-th of based process circuit Blame the operation of i-th group (set of row is denoted as Hi in this group of data);Meanwhile control circuit can also be corresponding in the first mask by data Mark data some or all of in matrix in row is sent in based process circuit connected to it.For example, control circuit Together by the front two row mark data of front two row data and front two row data correspondence in the first mask matrix in matrix S It is sent in the tandem circuit being connected with main process task circuit.
The method that h row data are grouped herein be arbitrarily will not duplicate allocation packet mode;
In a kind of optinal plan, using the following method of salary distribution: jth row is given by the control circuit of main process task circuit J%h based process circuit;
In a kind of optinal plan, for being unable to average packet the case where can also be right first to a part of row mean allocation It is distributed in any way in remaining row.
The W column data of matrix P is divided into w group by the control circuit of main process task circuit, is born respectively by i-th of based process circuit Blame the operation of i-th group (set of row is denoted as Wi in this group of data);Correspondingly, control circuit simultaneously will also be with the column in matrix P The corresponding mark data of respective column in the 2nd mask matrix send every time one or a part of mark data to some basis at Manage circuit.
The method that W column data is grouped herein be arbitrarily will not duplicate allocation packet mode;
In a kind of optinal plan, using the following method of salary distribution: jth row is given by the control circuit of main process task circuit J%w based process circuit;
In a kind of optinal plan, first a part of column average can also be distributed the case where for being unable to average packet, it is right It is distributed in any way in remaining column.
The data in some or all of matrix S row are sent based process circuit battle array by the control circuit of main process task circuit First based process circuit of every row in column;
In a kind of optinal plan, the control circuit of main process task circuit the i-th row into based process gate array every time First based process circuit sends one or more data in the data line in its i-th group of responsible data Hi;Simultaneously The corresponding mark data in mask matrix of i-th group of data Hi can be also sent to using same procedure by first foundation processing electricity Road;
In a kind of optinal plan, the control circuit of main process task circuit the i-th row into based process gate array every time First based process circuit some or all of sends in its i-th group of responsible data Hi the one or more of every row in row Data;The corresponding mark data in mask matrix of i-th group of data Hi can be also sent to by the first base using same procedure simultaneously Plinth processing circuit;
The data in some or all of matrix P column are sent based process circuit battle array by the control circuit of main process task circuit First based process circuit of each column in column;Meanwhile control circuit can also will corresponding part in the 2nd mask matrix or Mark data in whole rows is sent in based process circuit connected to it.For example, control circuit is by before in matrix P two Line number is accordingly and the corresponding front two row mark data in the 2nd mask matrix of the front two row data is sent collectively to and main process task In the connected tandem circuit of circuit.
In a kind of optinal plan, the control circuit of main process task circuit every time into based process gate array i-th column First based process circuit sends one or more data in the column data in its i-th group of responsible data Wi;
In a kind of optinal plan, the control circuit of main process task circuit every time into based process gate array i-th column First based process circuit some or all of sends in its i-th group of responsible data Ni the one or more of each column in column Data;
After based process circuit receives the data of matrix S, which is passed by its lateral data output interface Its next based process circuit that is connected is defeated by (for example, the cross of the white filling in Fig. 1 b among based process gate array To data path);After based process circuit receives the data of matrix P, which is connect by its vertical data output Port transmission is to coupled next based process circuit (for example, the white in Fig. 1 b among based process gate array The vertical data path of filling);
Each based process circuit carries out operation to the data received;Specifically, each based process circuit receives Into matrix S the corresponding first identifier data associated in the first mask matrix of the data and the data of certain a line or a few rows, In matrix P after the corresponding second identifier data associated in the 2nd mask data of the data and the data of a certain column or several column; Can connection identifier data first be obtained according to first identifier data and second identifier data;Then it is determined using the connection identifier data Whether correlation operation is executed to the data in the data and matrix P in matrix S.The connection identifier data are by first Mark data and the progress of second identifier data and operation are obtained, can be the number of some position in 0 or 1,1 representing matrix S It is the data that absolute value is greater than preset threshold according to the data with same position in matrix P;Conversely, a certain position in 0 representing matrix S The data of same position are the data that absolute value is less than or equal to preset threshold in the data and/or matrix P set.For details, reference can be made to Described in previous embodiment, which is not described herein again.
It is that each the second mapping circuit of based process circuit start is according to the first mask matrix of matrix S and matrix P 2nd mask matrix chooses the data that same position identification data is 1 and executes correlation operation, such as multiplication, add operation Etc..
In a kind of optinal plan, if received data (concretely data to be calculated in each based process circuit Block, such as the data of a few row/columns in matrix S or matrix P and the corresponding mark data in mask matrix) data volume be more than When preset threshold, which will no longer receive new input data, if main process task circuit is by the matrix S of subsequent transmission Or the data and the corresponding mark data etc. in mask matrix of the data of a few row/columns of matrix P, until based process electricity Possess enough buffer/store spaces in road, then receives the data that main process task circuit is newly sent.
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will As a result it is added on register and/or on piece caching;
After based process circuit counting goes out result, result can be transferred out from data output interface;
In a kind of optinal plan, which can be the final result or intermediate result of inner product operation;
Specifically, from the interface if the based process circuit has the output interface being directly connected with main process task circuit Transmission is as a result, if it is not, towards that directly can export result to the direction of the based process circuit of main process task circuit output (for example, bottom line based process circuit, which outputs it result, is directly output to main process task circuit, other based process circuits Operation result is transmitted downwards from vertical output interface).
After based process circuit receives the calculated result from other based process circuits, transmit the data to Its other based process circuit or main process task circuit for being connected;
Towards can be directly to the direction of main process task circuit output output result (for example, bottom line based process electricity Road outputs it result and is directly output to main process task circuit, other based process circuits transmit downwards fortune from vertical output interface Calculate result);
Main process task circuit receive each based process circuit inner product operation as a result, output result can be obtained.
" transverse direction " used in above description, the words such as " vertical " are intended merely to example shown in statement Fig. 1 b, actually make Two different interfaces are represented with " transverse direction " " vertical " interface for only needing to distinguish each unit.
Full connection operation is completed using the circuit device:
If the input data of full articulamentum is a vector (i.e. the case where input of neural network is single sample), Using the weight matrix of full articulamentum as matrix S, input vector as vector P, according to described device using Matrix Multiplication with to Amount method executes operation;
If the input data of full articulamentum is a matrix (i.e. the case where input of neural network is multiple samples), Using the weight matrix of full articulamentum as matrix S, input vector as matrix P, or using the weight matrix of full articulamentum as Matrix P, input vector execute operation as matrix S, according to the Matrix Multiplication of described device with matrix;
Convolution algorithm is completed using the circuit device:
Convolution algorithm is described below, a square indicates that a data, input data indicate (N with Fig. 3 a in figure below A sample, each sample have C channel, a height of H, width W of the characteristic pattern in each channel), weight namely convolution kernel Fig. 3 b It indicates (having M convolution kernel, each convolution kernel has C channel, and height and width are respectively KH and KW).For N number of sample of input data This, the rule of convolution algorithm is the same, and explained later carries out the process of convolution algorithm on a sample, in a sample On, each of M convolution kernel will carry out same operation, and each convolution kernel operation obtains a sheet of planar characteristic pattern, and M is a M plane characteristic figure is finally calculated in convolution kernel, (to a sample, the output of convolution is M characteristic pattern), and one is rolled up Product core will carry out inner product operation in each plan-position of a sample, be slided then along the direction H and W, for example, Fig. 3 c indicates that the position in convolution kernel lower right corner in a sample of input data carries out the corresponding diagram of inner product operation;Fig. 3 d Indicate one lattice of position upward sliding that a lattice are slided in the position of convolution to the left and Fig. 3 e indicates convolution.
This method is illustrated using described device embodiment as shown in Figure 1 b;
First mapping circuit of main process task circuit can be handled the data in some or all of weight convolution kernel, be obtained It (is the number after processing in some or all of weight convolution kernel to corresponding mask data and treated weight data According to).
By the data in some or all of weight convolution kernel, (data can be original to the control circuit of main process task circuit Weight data or treated weight data) it is sent to and is directly connected with main process task circuit by lateral Data Input Interface Those based process circuits (for example, vertical data path that the grey of the top is filled in Fig. 1 b);Meanwhile control circuit will be with The data correspond to associated mask data and also send jointly in the based process circuit with main process task circuit connection;
In a kind of optinal plan, the control circuit of main process task circuit sends the data of some convolution kernel in weight every time One number or a part of number give some based process circuit;(for example, for some based process circuit, send for the 1st time The 1st number of 3 rows, the 2nd the 2nd number sent in the 3rd row data, the 3rd number ... or the 1st of the 3rd the 3rd row of transmission The 3rd row the first two number of secondary transmission, second of the 3rd row the 3rd of transmission and the 4th number, third time send the 3rd row the 5th and the 6th Number ...;) simultaneously, control circuit is by the corresponding mask data of some convolution kernel in the weight also using above-mentioned each generation one Several or a part of data give that based process circuit;
Another situation is that, the control circuit of main process task circuit is by the several convolution kernels of certain in weight in a kind of optinal plan Data every time respectively send an a part of number of number person give some based process circuit;(for example, for some based process electricity Road, the 1st number of the 1st the 3rd, 4, the 5 every row of row of transmission, the 2nd number of the 2nd the 3rd, 4, the 5 every row of row of transmission, the 3rd transmission 3rd number ... of the 3rd, 4, the 5 every row of row or the 1st transmission every row the first two number of the 3rd, 4,5 row, second of transmission the 3rd, The every row the 3rd of 4,5 rows and the 4th number, third time send the every row the 5th of the 3rd, 4,5 row and the 6th number ...;) correspondingly, control electricity Road also will occur every time one using above-mentioned identical method with associated mask data corresponding to certain several convolution kernel in the weight Number or a part of data give that based process circuit;
The control circuit of main process task circuit divides input data according to the position of convolution, the control of main process task circuit Circuit by the data some or all of in input data in convolution position be sent to by vertical Data Input Interface directly with Main process task circuit be connected those of based process circuit (for example, what the grey in Fig. 1 b on the left of based process gate array was filled Lateral data path);Correspondingly, control circuit equally also can be according to the position mask associated for the input data of convolution Data are divided, and correspondingly control circuit simultaneously also can be by the number some or all of in the input data in convolution position It is also sent jointly in the based process circuit being electrically connected with main process task circuit according to corresponding mask data;
In a kind of optinal plan, the control circuit of main process task circuit by the data of some convolution position in input data with And associated mask data corresponding with the data send a number or a part of number every time and give some based process circuit;(example Such as, for some based process circuit, the 1st transmission the 3rd arranges the 1st number, the 2nd the 2nd sent in the 3rd column data Number, the 3rd number ... or the 1st the 3rd column the first two number of transmission of the 3rd column of the 3rd transmission, second of transmission the 3rd arrange the 3rd With the 4th number, third time sends the 3rd and arranges the 5th and the 6th number ...;)
Another situation is that, the control circuit of main process task circuit is by the several volumes of certain in input data in a kind of optinal plan The data and associated mask data corresponding with the data of product position respectively send a number or a part of number to some every time Based process circuit;(for example, for some based process circuit, the 1st number of the 1st the 3rd, 4,5 column each column of transmission, the 2nd Secondary the 2nd number for sending the 3rd, 4,5 column each column, the 3rd number ... or the 1st hair of the 3rd the 3rd, 4,5 column each column of transmission The 3rd, 4,5 column each column the first two number is sent, second of the 3rd, 4,5 column each column the 3rd of transmission and the 4th number, third time send the 3rd, 4,5 Column each column the 5th and the 6th number ...;)
Based process circuit receive weight data (concretely in weight convolution kernel data (abbreviation weight data) Or associated mask data corresponding with the weight data) after, which is transmitted by its lateral data output interface It is connected next based process circuit to it (for example, the transverse direction of the white filling in Fig. 1 b among based process gate array Data path);Based process circuit receives data (the input number that the data can send for main processing circuit of input data Accordingly and the associated mark mask data of the input data) after, which is transferred to by its vertical data output interface Coupled next based process circuit is (for example, white filling in Fig. 1 b among based process gate array is perpendicular To data path);
Specifically, the control circuit of main process task circuit can be by input data and the associated mask data one of the input data It rises and is sent to base processing circuit, based process circuit receives the input data and the associated mask data of the input data;
Each based process circuit carries out operation to the data received;Specifically, based process circuit can enable Two mapping circuits are according to the associated mask data of input data and associated mask data (the i.e. convolution kernel in weight of weight data Associated mask data) obtain connection identifier data;Recycle connection identifier data selection input data and weight data The data that middle absolute value is greater than preset threshold carry out multiplying;
In a kind of optinal plan, if received data (concretely data to be calculated in each based process circuit Block, as the mask data, input data or the input data of data and the data correlation in weight in convolution kernel are associated with Mask data) data volume be more than preset threshold when, which will no longer receive new input data, such as main place Reason circuit by the several convolution kernels of certain in the weight of subsequent transmission data and the data correspond to associated mask data etc., Until possessing enough buffer/store spaces in based process circuit, then receive the data that main process task circuit is newly sent.
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will As a result it is added on register and/or on piece caching;
After based process circuit counting goes out result, result can be transferred out from data output interface;
In a kind of optinal plan, which can be the final result or intermediate result of inner product operation;
Specifically, from the interface if the based process circuit has the output interface being directly connected with main process task circuit Transmission is as a result, if it is not, towards that directly can export result to the direction of the based process circuit of main process task circuit output (for example, bottom line based process circuit outputs it result and is directly output to main process task circuit in Fig. 1 b, other bases Processing circuit transmits downwards operation result from vertical output interface).
After based process circuit receives the calculated result from other based process circuits, transmit the data to Its other based process circuit or main process task circuit for being connected;
Towards can be directly to the direction of main process task circuit output output result (for example, bottom line based process electricity Road outputs it result and is directly output to main process task circuit, other based process circuits transmit downwards fortune from vertical output interface Calculate result);
Main process task circuit receive each based process circuit inner product operation as a result, output result can be obtained.
In one embodiment, the invention discloses a kind of neural network computing devices comprising for executing institute as above Functional unit corresponding to all or part of embodiments provided in embodiment of the method is provided.
In one embodiment, the invention discloses a kind of chip (such as Fig. 4), for executing embodiment of the method as described above All or part of embodiments of middle offer.
In one embodiment, the invention discloses a kind of electronic devices comprising real for executing method as described above Apply the functional unit of all or part of embodiments in example.
Electronic device include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal, Mobile phone, automobile data recorder, navigator, sensor, camera, server, camera, video camera, projector, wrist-watch, earphone, movement Storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.
Particular embodiments described above has carried out further in detail the purpose of present disclosure, technical scheme and beneficial effects Describe in detail it is bright, it is all it should be understood that be not limited to present disclosure the foregoing is merely the specific embodiment of present disclosure Within the spirit and principle of present disclosure, any modification, equivalent substitution, improvement and etc. done should be included in the guarantor of present disclosure Within the scope of shield.

Claims (12)

1. a kind of integrated circuit chip device, which is characterized in that the integrated circuit chip device include: main process task circuit and Multiple based process circuits;The main process task circuit includes the first mapping circuit, at least one in the multiple based process circuit A circuit includes the second mapping circuit, and first mapping circuit and second mapping circuit are used to execute neural network The compression processing of each data in operation;
The multiple based process circuit is in array distribution;Each based process circuit and other adjacent based process circuits connect It connects, what n based process circuit of the 1st row of main process task circuit connection, n based process circuit of m row and the 1st arranged M based process circuit;
The main process task circuit refers to for obtaining input block, convolution kernel data block and convolution instruction according to the convolution It enables and the input block is divided into vertical data block, the convolution kernel data block is divided into lateral data block;According to institute The operation control for stating convolution instruction determines that the first mapping circuit of starting handles the first data block, obtains that treated first Data block;First data block includes the lateral data block and/or the vertical data block;Instructing according to the convolution will The first data block that treated is sent at least one of based process circuit being connected with main process task circuit basis Manage circuit;
The multiple based process circuit, the operation control for instructing according to the convolution determine whether starting the second mapping electricity Road handles the second data block, and according to treated, operation that the second data block executes in a parallel fashion in neural network is obtained To operation result, and by the operation result by giving the main place with the based process circuit transmission of the main process task circuit connection Manage circuit;Second data block is the data block that the reception main process task circuit that the based process circuit determines is sent, Second data block and treated first data block associated;
The main process task circuit obtains the instruction results of the convolution instruction for handling the operation result.
2. integrated circuit chip device according to claim 1, which is characterized in that when first data block includes laterally When data block and vertical data block,
The main process task circuit is specifically used for starting first mapping circuit to the lateral data block and the vertical data Block is handled to obtain treated lateral data block and the identification data block of the transverse direction data block associated, and treated vertically The identification data block of data block and the vertical data block associated;Treated by described in lateral data block and the transverse direction data The associated identification data block progress deconsolidation process of block obtains multiple basic data blocks and the basic data block is respectively associated Identification data block, by the multiple basic data block and the respectively associated identification data block distribution of the multiple basic data block To based process circuit connected to it, by the mark number of treated the vertical data block and the vertical data block associated It broadcasts according to block to based process circuit connected to it;
The based process circuit, specifically for starting second mapping circuit according to the mark of the vertical data block associated Data block and the associated mark data of the basic data block obtain connection identifier data block, and according to the connection identifier data Block is handled to obtain treated vertical data block and basic data block to the vertical data block and the basic data block; Are executed by convolution algorithm and obtains operation result for treated the vertical data block and basic data block, the operation result is sent out It send to the main process task circuit.
3. integrated circuit chip device according to claim 1, which is characterized in that when first data block includes laterally When data block,
The main process task circuit carries out handling everywhere specifically for starting first mapping circuit to the lateral data block The identification data block of lateral data block and the transverse direction data block associated after reason, or starting first mapping circuit according to The identification data block of the lateral data block associated prestored to the lateral data block handled to obtain treated laterally Data block;The identification data block progress deconsolidation process of treated the lateral data block and the transverse direction data block associated is obtained To the respective associated identification data block of multiple basic data blocks and the basic data block, by the multiple basic data block with And respectively associated identification data block is distributed to based process circuit connected to it to the multiple basic data block, it will be described perpendicular It broadcasts to data block to based process circuit connected to it;
The based process circuit is specifically used for starting second mapping circuit according to the associated mark of the basic data block Data block handles the vertical data block, the vertical data block that obtains that treated;To treated the vertical data Block and treated the basic data block execute convolution algorithm and obtain operation result, and the operation result is sent to the master Processing circuit.
4. integrated circuit chip device according to claim 1, which is characterized in that when first data block includes vertical When data block,
The main process task circuit handles the vertical data block specifically for starting first mapping circuit, obtains The identification data block of vertical data block that treated and the vertical data block associated, or starting the first mapping circuit root Handled to obtain that treated is perpendicular to the vertical data block according to the identification data block of the vertical data block associated prestored To data block;Deconsolidation process is carried out to the lateral data block and obtains multiple basic data blocks;By the multiple basic data block It is distributed to based process circuit connected to it, by the mark of treated the vertical data block and the vertical data block associated Know data block to broadcast to based process circuit connected to it;
The based process circuit, specifically for starting second mapping circuit according to the mark of the vertical data block associated Data block is handled to obtain treated basic data block to the basic data block;To treated the vertical data block Inner product operation is executed with treated the basic data block and obtains operation result, and the operation result is sent to the main place Manage circuit.
5. the integrated circuit chip device according to any one of claim 2-4, which is characterized in that
The based process circuit executes product calculation specifically for the basic data block and the vertical data block and obtains product knot The result of product is added up and obtains operation result, the operation result is sent to the main process task circuit by fruit;
The accumulation result is arranged to obtain by the main process task circuit for obtaining accumulation result after adding up to the operation result Described instruction result.
6. integrated circuit chip device according to claim 5, which is characterized in that
The main process task circuit, specifically for by the mark of treated the vertical data block and the vertical data block associated Data block is divided into the identification data block of the vertical data block associated of the vertical data block of multiple portions and the part, will be the multiple The vertical data block in part and the vertical data block in the multiple part respectively associated identification data block by repeatedly broadcasting to institute State based process circuit;The multiple vertical data block combinations in part form the vertical data block;
The based process circuit is specifically used for starting second mapping circuit according to the associated mark of the basic data block The identification data block of the vertical data block associated of data block and the part obtains connection identifier data block;It is marked according to the connection Know data block and is handled to obtain treated basic data block to the vertical data block of the basic data block and the part And treated part broadcast data;To treated the basic data block and treated that the vertical data block in part is held Row convolution algorithm;
Alternatively, the based process circuit, is specifically used for starting second mapping circuit according to the vertical data block in the part Associated identification data block is handled to obtain treated basic data block to the basic data block;Treated to described The vertical data block of basic data block and the part executes convolution algorithm.
7. the integrated circuit chip device according to any one of claim 2-4, which is characterized in that
The main process task circuit, specifically for by the vertical data block or treated that vertical data block is divided into multiple portions Vertical data block, by the vertical data block in the multiple part by repeatedly broadcasting to the based process circuit;Alternatively,
The main process task circuit, specifically for by the vertical data block or treated vertical data block by once broadcasting To the based process circuit.
8. the integrated circuit chip device according to any one of claim 2-4, which is characterized in that
The based process circuit is specifically used for executing the vertical data block in the part and the basic data block into an inner product processing After obtain inner product processing result, partial arithmetic result is obtained by the inner product processing result is cumulative, by the partial arithmetic result It is sent to the main process task circuit;Alternatively,
The based process circuit executes the vertical data block in the part and n specifically for the multiplexing vertical data block in the n times part The inner product operation of the basic data block obtains n part processing result, and n are obtained after n part processing result is added up respectively The n partial arithmetic result is sent to main process task circuit by partial arithmetic result, and the n is the integer more than or equal to 2.
9. integrated circuit chip device according to claim 1, which is characterized in that
The main process task circuit includes: buffer circuit on master register or main leaf;
The based process circuit includes: base register or basic on piece buffer circuit.
10. integrated circuit chip device according to claim 1, which is characterized in that
A kind of input block are as follows: or any combination in matrix, three-dimensional data block, 4 D data block and n dimensional data block;
The convolution kernel data block are as follows: a kind of in matrix, three-dimensional data block, 4 D data block and n dimensional data block or any group It closes.
11. a kind of chip, which is characterized in that device of the integrated chip as described in claim 1-10 any one.
12. a kind of operation method of neural network, which is characterized in that the method is applied in integrated circuit chip device, institute Stating integrated circuit chip device includes: the integrated circuit chip device as described in claim 1-10 any one, described integrated Circuit chip device is used to execute the convolution algorithm of neural network.
CN201810164331.7A 2018-02-27 2018-02-27 Integrated circuit chip device and related product Active CN110197269B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010617208.3A CN111767997B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related products
CN201810164331.7A CN110197269B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related product
PCT/CN2019/076088 WO2019165946A1 (en) 2018-02-27 2019-02-25 Integrated circuit chip device, board card and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810164331.7A CN110197269B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related product

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202010617208.3A Division CN111767997B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related products

Publications (2)

Publication Number Publication Date
CN110197269A true CN110197269A (en) 2019-09-03
CN110197269B CN110197269B (en) 2020-12-29

Family

ID=67750912

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010617208.3A Active CN111767997B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related products
CN201810164331.7A Active CN110197269B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related product

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202010617208.3A Active CN111767997B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related products

Country Status (1)

Country Link
CN (2) CN111767997B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126481A (en) * 2016-06-29 2016-11-16 华为技术有限公司 A kind of computing engines and electronic equipment
CN106447034A (en) * 2016-10-27 2017-02-22 中国科学院计算技术研究所 Neutral network processor based on data compression, design method and chip
CN107316078A (en) * 2016-04-27 2017-11-03 北京中科寒武纪科技有限公司 Apparatus and method for performing artificial neural network self study computing
CN107609641A (en) * 2017-08-30 2018-01-19 清华大学 Sparse neural network framework and its implementation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915322B (en) * 2015-06-09 2018-05-01 中国人民解放军国防科学技术大学 A kind of hardware-accelerated method of convolutional neural networks
CN107329936A (en) * 2016-04-29 2017-11-07 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing neural network computing and matrix/vector computing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107316078A (en) * 2016-04-27 2017-11-03 北京中科寒武纪科技有限公司 Apparatus and method for performing artificial neural network self study computing
CN106126481A (en) * 2016-06-29 2016-11-16 华为技术有限公司 A kind of computing engines and electronic equipment
CN106447034A (en) * 2016-10-27 2017-02-22 中国科学院计算技术研究所 Neutral network processor based on data compression, design method and chip
CN107609641A (en) * 2017-08-30 2018-01-19 清华大学 Sparse neural network framework and its implementation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUNJI CHEN ET AL.: "DaDianNao: A Machine-Learning Supercomputer", 《IEEE》 *

Also Published As

Publication number Publication date
CN110197269B (en) 2020-12-29
CN111767997A (en) 2020-10-13
CN111767997B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN110197270A (en) Integrated circuit chip device and Related product
CN109993301A (en) Neural metwork training device and Related product
CN110163334A (en) Integrated circuit chip device and Related product
CN108170640A (en) The method of its progress operation of neural network computing device and application
CN110197274A (en) Integrated circuit chip device and Related product
CN109993291A (en) Integrated circuit chip device and Related product
CN111160542A (en) Integrated circuit chip device and related product
CN110197263A (en) Integrated circuit chip device and Related product
CN109993290A (en) Integrated circuit chip device and Related product
CN110197268A (en) Integrated circuit chip device and Related product
CN110197269A (en) Integrated circuit chip device and Related product
CN110197265A (en) Integrated circuit chip device and Related product
CN110197275A (en) Integrated circuit chip device and Related product
CN110197271A (en) Integrated circuit chip device and Related product
CN110197264A (en) Neural network processor board and Related product
CN110197272A (en) Integrated circuit chip device and Related product
WO2019129302A1 (en) Integrated circuit chip device and related product
CN102710307B (en) Pairing method and device among user terminals
CN110197267A (en) Neural network processor board and Related product
CN110197273A (en) Integrated circuit chip device and Related product
CN109993289A (en) Integrated circuit chip device and Related product
CN110197266A (en) Integrated circuit chip device and Related product
CN109978151A (en) Neural network processor board and Related product
CN105553723B (en) A kind of Virtual Cluster laying method of network flow perception
CN105228249B (en) A kind of sub-carrier wave distribution method, relevant apparatus and base station

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201127

Address after: Room 611-194, R & D center building, China (Hefei) international intelligent voice Industrial Park, 3333 Xiyou Road, hi tech Zone, Hefei City, Anhui Province

Applicant after: Anhui Cambrian Information Technology Co., Ltd

Address before: 200120, No. two, No. 888, West Road, Nanhui new town, Shanghai, Pudong New Area

Applicant before: Shanghai Cambricon Information Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant