CN110197268A - Integrated circuit chip device and Related product - Google Patents

Integrated circuit chip device and Related product Download PDF

Info

Publication number
CN110197268A
CN110197268A CN201810164317.7A CN201810164317A CN110197268A CN 110197268 A CN110197268 A CN 110197268A CN 201810164317 A CN201810164317 A CN 201810164317A CN 110197268 A CN110197268 A CN 110197268A
Authority
CN
China
Prior art keywords
data block
circuit
data
treated
broadcast
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810164317.7A
Other languages
Chinese (zh)
Other versions
CN110197268B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN202010617209.8A priority Critical patent/CN111767998B/en
Priority to CN201810164317.7A priority patent/CN110197268B/en
Priority to PCT/CN2019/075979 priority patent/WO2019165940A1/en
Publication of CN110197268A publication Critical patent/CN110197268A/en
Application granted granted Critical
Publication of CN110197268B publication Critical patent/CN110197268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Optimization (AREA)
  • Neurology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Image Processing (AREA)
  • Logic Circuits (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

It includes: main process task circuit and multiple based process circuits that present disclosure, which provides a kind of integrated circuit chip device and Related product, the integrated circuit chip device,;The main process task circuit includes the first mapping circuit, at least one circuit includes the second mapping circuit in the multiple based process circuit, and first mapping circuit and second mapping circuit are used to execute the compression processing of each data in neural network computing.The advantage that the technical solution that present disclosure provides has calculation amount small, low in energy consumption.

Description

Integrated circuit chip device and Related product
Technical field
Present disclosure is related to field of neural networks more particularly to a kind of integrated circuit chip device and Related product.
Background technique
Artificial neural network (Artificial Neural Network, i.e. ANN), it is artificial since being the 1980s The research hotspot that smart field rises.It is abstracted human brain neuroid from information processing angle, and it is simple to establish certain Model is formed different networks by different connection types.Neural network or class are also often directly referred to as in engineering and academia Neural network.Neural network is a kind of operational model, is constituted by being coupled to each other between a large amount of node (or neuron).It is existing Neural network operation be based on CPU (Central Processing Unit, central processing unit) or GPU (English: Graphics Processing Unit, graphics processor) Lai Shixian neural network operation, such operation it is computationally intensive, Power consumption is high.
Summary of the invention
Present disclosure embodiment provides a kind of integrated circuit chip device and Related product, can promote the processing of computing device Speed improves efficiency.
In a first aspect, providing a kind of integrated circuit chip device, the integrated circuit chip device includes: main process task circuit And multiple based process circuits;The main process task circuit includes the first mapping circuit, in the multiple based process circuit extremely A few circuit (i.e. part or all of based process circuit) includes the second mapping circuit, first mapping circuit and described Second mapping circuit is used to execute the compression processing of each data in neural network computing;
The main process task circuit, for obtaining input block, weight data block and multiplying order, according to the multiplication The input block is divided into distribution data block by instruction, and the weight data block is divided into broadcast data block;According to institute The operation control for stating multiplying order determines that the first mapping circuit of starting handles the first data block, obtains that treated first Data block;First data block includes the distribution data block and/or the broadcast data block;It will according to the multiplying order The first data block that treated is sent at least one of based process circuit being connected with main process task circuit basis Manage circuit;
The multiple based process circuit, for determining whether that starting second is reflected according to the operation control of the multiplying order Transmit-receive radio road handles the second data block, and according to treated, the second data block executes the fortune in neural network in a parallel fashion Calculation obtains operation result, and the operation result is described by giving with the based process circuit transmission of the main process task circuit connection Main process task circuit;Second data block is the data that the reception main process task circuit that the based process circuit determines is sent Block, second data block and treated first data block associated;
The main process task circuit obtains the instruction results of the multiplying order for handling the operation result.
Second aspect, provides a kind of neural network computing device, and the neural network computing device includes one or more The integrated circuit chip device that first aspect provides.
The third aspect, provides a kind of combined treatment device, and the combined treatment device includes: the nerve that second aspect provides Network operations device, general interconnecting interface and general processing unit;
The neural network computing device is connect by the general interconnecting interface with the general processing unit.
Fourth aspect, provides a kind of chip, the device or third of the device of the integrated chip first aspect, second aspect The device of aspect.
5th aspect, provides a kind of electronic equipment, the electronic equipment includes the chip of fourth aspect.
6th aspect, provides a kind of operation method of neural network, and the method is applied in integrated circuit chip device, The integrated circuit chip device includes: integrated circuit chip device described in first aspect, the integrated circuit chip device For executing the operation of neural network.
As can be seen that operation will be carried out again by providing mapping circuit by present disclosure embodiment after data block compression processing, section Transfer resource and computing resource are saved, so it is with low in energy consumption, the small advantage of calculation amount.
Detailed description of the invention
Fig. 1 a is a kind of integrated circuit chip device structural schematic diagram.
Fig. 1 b is another integrated circuit chip device structural schematic diagram.
Fig. 1 c is a kind of structural schematic diagram of based process circuit.
Fig. 2 is a kind of Matrix Multiplication with vector flow diagram.
Fig. 2 a is Matrix Multiplication with the schematic diagram of vector.
Fig. 2 b is a kind of Matrix Multiplication with matrix procedures schematic diagram.
Fig. 2 c is schematic diagram of the matrix A i multiplied by vector B.
Fig. 2 d is schematic diagram of the matrix A multiplied by matrix B.
Fig. 2 e is schematic diagram of the matrix A i multiplied by matrix B.
Fig. 3 is a kind of structural schematic diagram for neural network chip that present disclosure embodiment stream provides;
Fig. 4 a- Fig. 4 b is the structural schematic diagram of two kinds of mapping circuits provided by the embodiments of the present application.
Specific embodiment
In order to make those skilled in the art more fully understand present disclosure scheme, below in conjunction in present disclosure embodiment The technical solution in present disclosure embodiment is clearly and completely described in attached drawing, it is clear that described embodiment is only Present disclosure a part of the embodiment, instead of all the embodiments.Based on the embodiment in present disclosure, those of ordinary skill in the art Every other embodiment obtained without creative efforts belongs to the range of present disclosure protection.
In the device that first aspect provides, the main process task circuit includes the first mapping circuit, at the multiple basis Managing at least one circuit in circuit includes the second mapping circuit, and first mapping circuit and second mapping circuit are used In the compression processing for executing each data in neural network computing;
The main process task circuit, for obtaining input block, weight data block and multiplying order, according to the multiplication The input block is divided into distribution data block by instruction, and the weight data block is divided into broadcast data block;According to institute The operation control for stating multiplying order determines that the first mapping circuit of starting handles the first data block, obtains that treated first Data block;First data block includes the distribution data block and/or the broadcast data block;It will according to the multiplying order The first data block that treated is sent at least one of based process circuit being connected with main process task circuit basis Manage circuit;
The multiple based process circuit, for determining whether that starting second is reflected according to the operation control of the multiplying order Transmit-receive radio road handles the second data block, and according to treated, the second data block executes the fortune in neural network in a parallel fashion Calculation obtains operation result, and the operation result is described by giving with the based process circuit transmission of the main process task circuit connection Main process task circuit;Second data block is the data that the reception main process task circuit that the based process circuit determines is sent Block, second data block and treated first data block associated;The main process task circuit is used for the operation knot Fruit handles to obtain the instruction results of the multiplying order.
In the device that first aspect provides, when first data block includes distribution data block and broadcast data block, The main process task circuit, be specifically used for starting first mapping circuit to the distribution data block and the broadcast data block into The identification data block of distribution data block that row processing obtains that treated and the distribution data block associated, treated broadcast data Block and the associated identification data block of broadcast data block;Data block and the distribution data block associated will be distributed after the processing Identification data block is split to obtain multiple basic data blocks and the multiple basic data block respectively associated mark data Block;By the multiple basic data block and the multiple basic data block, respectively associated identification data block is distributed to and connects with it The broadcast data block and the associated identification data block of broadcast data block are broadcasted to it and are connected by the based process circuit connect The based process circuit connect;
The based process circuit, for starting second mapping circuit according to the associated mark of the basic data block Data block and the associated identification data block of broadcast data block obtain connection identifier data block;According to the connection identifier number The basic data block and the broadcast data block are handled according to block, after treated basic data block and processing Broadcast data block execute product calculation obtain operation result, the operation result is sent to the main process task circuit.
Wherein, the identification data block can specifically be indicated with the mode that direct index or step-length index, optional to go back The list (List of Lists, LIL) of freelist, list of coordinates (Coordinate list, COO), compression loose line (Compressed Sparse Row, CSR), compression sparse column (Compressed Sparse Column, CSC), (ELL Pack, ELL) and mixing (Hybird, HYB) etc. modes indicate that the application is without limitation.
By taking the identification data block indicates in the way of direct index as an example, the identification data block is concretely by 0 The data block constituted with 1, wherein the absolute value for the data (such as weight or input neuron) for including in 0 expression data block is less than Or it is equal to first threshold, the absolute value for the data (such as weight or input neuron) for including in 1 expression data block is greater than the first threshold Value, first threshold is user side or device side is customized is randomly provided, such as 0.05,0 etc..
To save volume of transmitted data, improve data transfer efficiency, in the main process task circuit to the based process circuit Send data during, specifically can by the multiple basic data block target data and the multiple basic data block Respective associated identification data block is distributed to based process circuit connected to it;It is optional, it can also will treated the broadcast Target data and the associated identification data block of broadcast data block in data block are broadcasted to based process electricity connected to it Road.Wherein, the target data refers to that absolute value is greater than the data of first threshold in data block, or refers to data block (here Concretely treated distribution data block or treated broadcast data block) in non-zero data.
For example, distribution data block is M1Row N1The matrix of column, basic data block M2Row N2The matrix of column, wherein M1>M2, N1 >N2.Correspondingly, the identification data block of the distribution data block associated is equally also M1Row N1The matrix of column, basic data block association Identification data block be similarly M2Row N2The matrix of column.By taking basic data block is the matrix of 2*2 as an example, it is set asThe One threshold value is 0.05, then the associated identification data block of the basic data block isIt is reflected about the first mapping circuit and second The processing of data block will be specifically addressed in transmit-receive radio road later.
In the device that first aspect provides, when first data block includes distribution data block, the main process task electricity Road, is handled to obtain that treated distributes data block specifically for starting first mapping circuit to the distribution data block And the identification data block of the distribution data block associated, or starting first mapping circuit is according to prestoring the distribution data The associated identification data block of block is handled to obtain treated distribution data block to the distribution data block;After the processing Distribution data block and the identification data block of the distribution data block associated split to obtain multiple basic data blocks and institute State multiple basic data blocks respectively associated identification data block;By the multiple basic data block and the multiple master data Respectively associated identification data block is distributed to based process circuit connected to it to block;By the broadcast data block broadcast to its The based process circuit of connection;
The based process circuit, for starting second mapping circuit according to the associated mark of the basic data block Data block handles the broadcast data block, executes product to treated broadcast data block and the basic data block Operation obtains operation result, and the operation result is sent to the main process task circuit.
In an alternative embodiment, the main process task circuit, also particularly useful for by the broadcast data block or that treated is wide Multicast data block and the associated identification data block of the broadcast data block carry out deconsolidation process obtain multiple portions broadcast data block with And the respective associated identification data block of the multiple part broadcast data block;By the multiple part broadcast data block and described Respectively associated identification data block by one or many is broadcast to the based process circuit to multiple portions broadcast data block;Its In, the multiple part broadcast data block combines to form the broadcast data block or treated broadcast data block.
Correspondingly, the based process circuit is specifically used for starting second mapping circuit and is broadcasted according to the part The identification data block of data block associated and the associated identification data block of the basic data block obtain connection identifier data block;Root Handled to obtain that treated to the part broadcast data block and the basic data block according to the connection identifier data Part broadcast data block and treated basic data block;To treated part broadcast data block and the processing Basic data block afterwards executes inner product operation.
Wherein, which is by the associated identification data block of the basic data block and the part The associated identification data block of broadcast data block carries out the data block obtained by element and operation.Optionally, the connection identifier number It is used to indicate that data in two data blocks (specially basic data block and broadcast data block) to be all larger than the number of absolute value according to block According to.Specifically it is described in detail later.
For example, the matrix that the identification data block of distribution data block associated is 2*3The association of part broadcast data block Identification data block be 2*2 matrixThen the corresponding connection identifier data block obtained is
In an alternative embodiment, the main process task circuit carries out deconsolidation process also particularly useful for by the broadcast data block Obtain multiple portions broadcast data block;The multiple part broadcast data block is broadcast at the basis by one or many Manage circuit;Wherein, the multiple part broadcast data block combines to form the broadcast data block or treated broadcast data block.
Correspondingly, the based process circuit is specifically used for according to the associated identification data block of the basic data block to institute It states part broadcast data block and is handled to obtain treated part broadcast data block;To the basic data block and the place Part broadcast data block after reason executes inner product operation.
In the device that first aspect provides, when first data block includes broadcast data block, the main process task electricity Road is handled to obtain treated broadcast data block to the broadcast data block specifically for starting first mapping circuit And the associated identification data block of broadcast data block, or starting first mapping circuit is according to the broadcast number prestored Handled to obtain treated broadcast data block to the broadcast data block according to the associated identification data block of block;By the distribution Data block is split to obtain multiple basic data blocks;The multiple master data is distributed to based process electricity connected to it Road;Treated the broadcast data block and the associated identification data block of broadcast data block are broadcasted to base connected to it Plinth processing circuit;
The based process circuit, for starting second mapping circuit according to the associated mark of broadcast data block Data block is handled to obtain treated basic data block to the basic data block;To treated the basic data block And treated the broadcast data block executes product calculation and obtains operation result, and the operation result is sent to the master Processing circuit.
In an alternative embodiment, the main process task circuit, also particularly useful for will treated the broadcast data block and should The associated identification data block of broadcast data block carries out deconsolidation process and obtains multiple portions broadcast data block and the multiple part The associated identification data block of broadcast data block;By the multiple part broadcast data block and the multiple part broadcast data block Respective associated identification data block is broadcast to the based process circuit by one or many;Wherein, the multiple part is wide Multicast data block combines to form the broadcast data block or treated broadcast data block.
Correspondingly, the based process circuit is specifically used for according to the associated identification data block of part broadcast data block Handled to obtain treated basic data block to the basic data block;To treated basic data block and the institute It states part broadcast data block and executes inner product operation.
In the device that first aspect provides, the based process circuit is specifically used for wide with this to the basic data block Multicast data block executes product calculation and obtains result of product, and the result of product is added up and obtains operation result, by the operation knot Fruit is sent to the main process task circuit;The main process task circuit, for obtaining accumulation result after cumulative to the operation result, The accumulation result is arranged to obtain described instruction result.
In the device that first aspect provides, the main process task circuit is specifically used for the broadcast data block (concretely The broadcast data block or treated broadcast data block) pass through a broadcast transmission to the based process connected to it Circuit.
In the device that first aspect provides, the based process circuit is specifically used for (similarly may be used the basic data block For the basic data block or treated basic data block) it with the broadcast data block executes inner product and handles to obtain inner product processing and tie The inner product processing result is added up and obtains operation result, the operation result is sent to the main process task circuit by fruit.
In the device that first aspect provides, the main process task circuit, for being inner product processing in such as described operation result Result when, obtain accumulation result after cumulative to the operation result, which arranged to obtain described instruction result.
In the device that first aspect provides, the main process task circuit is more specifically for the broadcast data block to be divided into A part broadcast data block, by the multiple part broadcast data block by repeatedly broadcasting to the based process circuit;It is described Multiple portions broadcast data block combines to form the broadcast data block.
In the device that first aspect provides, the based process circuit is specifically used for the part broadcast data block (tool Body can be part broadcast data block or treated part broadcast data block) inner product is executed with the basic data block handles After obtain inner product processing result, partial arithmetic result is obtained by the inner product processing result is cumulative, by the partial arithmetic result It is sent to the main process task circuit.Above-mentioned inner product processing is specifically as follows: if the element of part broadcast data block is before matrix B 2 elements, as b10 and b11;The basic data block be input data matrix A the first row preceding 2 elements, i.e. a10 and A11, then inner product=a10*b10+a11*b11.
In the device that first aspect provides, the based process circuit is specifically used for multiplexing n times part broadcast data Block executes the part broadcast data block and the n basic data block inner product operation obtains n part processing result, by n part Processing result obtains n partial arithmetic result after adding up respectively, and the n partial arithmetic result is sent to main process task circuit, The n is the integer more than or equal to 2.
In the device that first aspect provides, the main process task circuit includes: buffer circuit on master register or main leaf;
The based process circuit includes: base register or basic on piece buffer circuit.
In the device that first aspect provides, the main process task circuit includes: vector operation device circuit, arithmetic logic unit In circuit, accumulator circuit, matrix transposition circuit, direct memory access circuit, the first mapping circuit or data rearrangement circuit A kind of or any combination.
In the device that first aspect provides, the branch process circuit includes multiple branch process circuits, the main place Reason circuit is separately connected with the multiple branch process circuit, and each branch process circuit and at least one based process circuit connect It connects.
In the device that first aspect provides, the based process circuit, also particularly useful for by the broadcast data block and base Notebook data block, which is transmitted to other based process circuits and executes inner product operation again with advanced row data processing, obtains operation result, by institute It states operation result and is sent to the main process task circuit;
The main process task circuit, for handling to obtain the data block to be calculated to the operation result and operation refers to The instruction results of order.
In the device that first aspect provides, the data block can use tensor representation, concretely: vector, matrix, three A kind of or any combination in dimensional data block, 4 D data block and n dimensional data block.
In the device that first aspect provides, such as operational order is multiplying order, and the main process task circuit determination multiplies Number data block is broadcast data block, and multiplicand data block is distribution data block;
If the operational order is convolution instruction, the main process task circuit determines that input block is broadcast data block, volume Product core is distribution data block.
In the method that the 6th aspect provides, the operation of the neural network includes: convolution algorithm, Matrix Multiplication matrix fortune Calculation, bigoted operation, connects one of operation, GEMM operation, GEMV operation, activation operation entirely or appoints Matrix Multiplication vector operation Meaning combination.
A refering to fig. 1, Fig. 1 a are a kind of structural schematic diagram of integrated circuit chip device, as shown in Figure 1a, the chip apparatus It include: main process task circuit, basic handling circuit and branch process circuit (optional).Wherein,
Main process task circuit may include register and/or on piece buffer circuit, which can also include: control Circuit, vector operation device circuit, ALU (arithmetic and logic unit, arithmetic logic unit) circuit, accumulator electricity The circuits such as road, DMA (Direct Memory Access, direct memory access) circuit, certainly in practical applications, above-mentioned main place Reason circuit can also add, conversion circuit (such as matrix transposition circuit), data rearrangement circuit or active circuit etc. others Circuit;
Optionally, main process task circuit may include: the first mapping circuit, and the first mapping circuit can be used for reception or hair The data sent are handled, with the data that obtain that treated and the mark mask data of the data correlation, mark mask number Whether it is greater than preset threshold according to the absolute value for being used to indicate the data, optionally, which can be 0 or 1, wherein 0 indicates The absolute value of data is less than or equal to preset threshold;Conversely, 1 indicates that the absolute value of data is greater than preset threshold.The preset threshold For user side or the customized setting of terminal equipment side, such as 0.1 or 0.05 etc..In practical applications, pass through the first mapping Data can be 0 or rejected no more than the data of preset threshold (such as 0.1) by circuit, or these data are set to 0.Its is excellent Point is the meter for reducing main process task circuit data operation into the data volume of based process circuit transmission, reduction based process circuit Calculation amount promotes data-handling efficiency.The present invention is not intended to limit the concrete form of above-mentioned first mapping circuit.It is reflected about described first The specific implementation of transmit-receive radio road will be described below.
For example, the input data of main process task circuit is matrix data blockBy the first mapping It can get that treated that matrix data block is after processing of circuitThe associated mark data of matrix data block Block isSpecific processing about the first mapping circuit will be described in detail later.
Correspondingly, when main process task circuit distributes data to based process circuit, 1 and 0.5 the two data can be only sent, And non-sent treated matrix data block, 8 data;It also needs the associated identification data block of matrix data block together simultaneously It is sent to based process circuit, so as to based process circuit identification data block and received two data (1 He based on the received 0.5), correspondence knows that the two data are located at the position of original matrix data block.It is that the based process circuit can be according to reception Identification data block and received data, the corresponding matrix data block that restores in main process task circuit that treated.
Main process task circuit further includes data transmitting line, data receiver circuit or interface, which can collect At data distribution circuit and data broadcasting circuit, certainly in practical applications, data distribution circuit and data broadcasting circuit It can also be respectively set;Above-mentioned data transmitting line and data receiver circuit also can integrate shape together in practical applications At data transmit-receive circuit.For broadcast data, that is, need to be sent to the data of each based process circuit.For distributing data, Need selectively to be sent to the data of part basis processing circuit, specific selection mode can be by main process task circuit foundation Load and calculation are specifically determined.For broadcast transmission mode, i.e., broadcast data is sent to the forms of broadcasting Each based process circuit.(broadcast data in practical applications, is sent to each based process by way of once broadcasting Broadcast data can also be sent to each based process circuit, the application specific implementation by way of repeatedly broadcasting by circuit Mode is not intended to limit the number of above-mentioned broadcast), for distributing sending method, i.e., distribution data are selectively sent to part base Plinth processing circuit.
Realizing that the control circuit of main process task circuit is to some or all of based process circuit transmission number when distributing data According to (data may be the same or different, specifically, if sending data by the way of distribution, each reception data The data that based process circuit receives can be different, naturally it is also possible to which the data for having part basis processing circuit to receive are identical;
Specifically, when broadcast data, the control circuit of main process task circuit is to some or all of based process circuit transmission Data, each based process circuit for receiving data can receive identical data.
Optionally, the vector operation device circuit of above-mentioned main process task circuit can execute vector operation, including but not limited to: two A vector addition subtraction multiplication and division, vector and constant add, subtract, multiplication and division operation, or executes any operation to each element in vector. Wherein, continuous operation is specifically as follows, and vector and constant add, subtract, multiplication and division operation, activating operation, accumulating operation etc..
Each based process circuit may include base register and/or basic on piece buffer circuit;Each based process Circuit can also include: one or any combination in inner product operation device circuit, vector operation device circuit, accumulator circuit etc..On Stating inner product operation device circuit, vector operation device circuit, accumulator circuit can be integrated circuit, above-mentioned inner product operation device electricity Road, vector operation device circuit, accumulator circuit may be the circuit being separately provided.
The chip apparatus can also include optionally one or more branch process circuits, such as have branch process circuit When, wherein main process task circuit and branch process circuit connection, the branch process circuit and basic handling circuit connection, this locates substantially The inner product operation device circuit of reason circuit is used to execute the inner product operation between data block, the control circuit control of the main process task circuit Data receiver circuit or data transmitting line receive and dispatch external data, and controlling data transmitting line by control circuit will be external Data distribution to branch process circuit, branch process circuit is used to receive and dispatch the data of main process task circuit or basic handling circuit. Structure as shown in Figure 1a is suitble to the calculating of complex data, because for main process task circuit, the quantity of the unit of connection It is limited, so needing to add branch process circuit between main process task circuit and basic handling circuit to realize more basic places The access of circuit is managed, to realize the calculating to complex data block.The connection structure of branch process circuit and based process circuit It can be arbitrary, be not limited to the H-type structure of Fig. 1 a.Optionally, main process task circuit is broadcast to based process circuit or divides The structure of hair, based process circuit to main process task circuit are the structures for collecting (gather).Broadcast, distribution and the definition collected are such as Under, for distribution or broadcasting architecture, the quantity of based process circuit at this time is greater than main process task circuit, i.e. 1 main process task circuit Corresponding multiple based process circuits are the structure of broadcast or distribution from main process task circuit to multiple based process circuits, conversely, It can be collection structure from multiple based process circuits to main process task circuit.
The data of based process circuit, the distribution of reception main process task circuit or broadcast are saved in the on piece of based process circuit In caching, operation generation can be carried out as a result, data can be sent to main process task circuit.Optionally, based process circuit may be used also First received data are handled, will treated that data save on piece caching, also using treated data into Row operation generate as a result, it is optional can also will treated that data are sent to other based process circuits or main process task circuit etc., The application is without limitation.
Optionally, each based process circuit may each comprise the second mapping circuit, can also be in part basis processing electricity The second mapping circuit is configured in road;Second mapping circuit can be used for handling (i.e. at compression the data received or sent Reason).The present invention is not intended to limit the concrete form of above-mentioned second mapping circuit.Specific implementation about the second mapping circuit will be under Text is described in detail.
Optionally, the vector operation device circuit of the based process circuit can be to two vectors (any in two vectors It is a or two can be treated vector) vector operation that executes, certainly in practical applications, the inner product of based process circuit Calculator circuit can execute inner product operations to two vectors, accumulator circuit can also the result to inner product operation carry out it is tired Add.
In a kind of optinal plan, two vectors can be stored on piece caching and/or register, based process circuit Operation can be executed according to two vectors that need to extract actually calculated.The operation includes but is not limited to: inner product operation, multiplication fortune Calculation, add operation or other operations.
In a kind of optinal plan, the result of inner product operation can be added on piece caching and/or register;Its is optional The advantages of scheme is the volume of transmitted data reduced between based process circuit and main process task circuit, improves operation efficiency, drop Low data transmission power consumption.
In a kind of optinal plan, the result of inner product operation is transmitted without cumulative directly as result;This technical solution The advantages of be the operand reduced inside based process circuit, improve based process circuit operation efficiency.
In a kind of optinal plan, each based process circuit can execute the inner product operation of two vectors of multiple groups, can also It is added up respectively with the result to multiple groups inner product operation;
In a kind of optinal plan, two vector datas of multiple groups can be stored on piece caching and/or register;
In a kind of optinal plan, the result of multiple groups inner product operation can be added on piece caching and/or register respectively In;
In a kind of optinal plan, the result of each group inner product operation can be transmitted without cumulative directly as result;
In a kind of optinal plan, each based process circuit can execute the same vector and carry out respectively with multiple vectors The operation (" one-to-many " inner product, i.e., it is shared for having a vector in multiple groups inner product in every group of two vectors) of inner product operation, And the corresponding inner product result of each vector is added up respectively.Same set of weight may be implemented to different defeated in this technical solution Enter data repeatedly to be calculated, increase data-reusing, reduce the volume of transmitted data of based process circuit internal data, improves meter Efficiency is calculated, power consumption is reduced.
Specifically, calculate in the data that use of inner product, each group shared every group of vector sum of another vector (i.e. every group it Between different that vector) data source can be different:
In a kind of optinal plan, when calculating inner product, the shared vector of each group comes from main process task circuit or bifurcation Manage the broadcast or distribution of circuit;
In a kind of optinal plan, when calculating inner product, the shared vector of each group is cached from piece;
In a kind of optinal plan, when calculating inner product, the shared vector of each group comes from register;
In a kind of optinal plan, when calculating inner product, every group another unshared vector from main process task circuit or The broadcast or distribution of person's branch process circuit;
In a kind of optinal plan, when calculating inner product, every group another unshared vector to cache since on piece;
In a kind of optinal plan, when calculating inner product, every group another unshared vector comes from register;
In a kind of optinal plan, when carrying out the inner product operation of multiple groups, every group of shared vector is in based process circuit On piece caching and/register in retain any number;
In a kind of optinal plan, shared vector can correspond to every group of inner product and retain portion;
In a kind of optinal plan, shared vector can only retain a;
Specifically, the result of multiple groups inner product operation can be added to respectively on piece caching and/or register;
Specifically, the result of each group inner product operation can be transmitted without cumulative directly as result;
In a kind of optinal plan, vector involved in based process circuit or matrix can be at by the second mapping circuits Vector or matrix after reason, will specifically illustrate later.
Structure shown in a refering to fig. 1, it includes a main process task circuit (vector operations can be executed), more based process electricity Road (can execute inner product operation).The benefit combined in this way is: device can not only use based process circuit execute matrix and to Multiplying is measured, could be used that main process task circuit executes any other vector operation, make device in limited hardware circuit Under configuration, more operations can be completed faster, reduce with the number that carries out data transmission outside device, improve calculating Efficiency reduces power consumption.In addition, this chip settable first mapping circuit in main process task circuit, to execute in neural network The processing of data, the first input data that will be such as less than or equal to preset threshold are rejected, while can also obtain the first input number According to the associated mark mask data of correspondence, the mask data are for indicating whether the absolute value of the first input data is greater than default threshold Value.For details, reference can be made to described in previous embodiment, which is not described herein again.The advantages of designing in this way, can be reduced to based process circuit The data volume of transmission reduces the calculation amount of based process circuit data, improves data processing rate, reduces power consumption.
Settable second mapping circuit in based process circuit, to execute the processing of data in neural network, such as root According to the associated mask data of the first input data to the second input data carry out processing or according to the first input data it is associated Mask data and the associated mask data decimation absolute value of the second input data are greater than the first input data and the of preset threshold Two input datas execute corresponding arithmetic operation etc..About the first mapping circuit and the second mapping circuit to the specific of data Processing can be found in and be described in detail hereinafter.
Optionally, first mapping circuit and second mapping circuit are used to handle data, specific Can be designed into any one of following circuits or it is multinomial in: main process task circuit, branch process circuit and based process circuit etc.. The data volume of calculating can be reduced when carrying out neural computing in this way, and this chip can be (main according to each circuit Main process task circuit and based process circuit) operand (i.e. load capacity) dynamically distribution by which circuit carries out data pressure Contracting processing, can reduce the complicated process of data calculating in this way, reduce power consumption, and dynamically distribution data processing can be realized The computational efficiency of chip is not influenced.The mode of the distribution includes but is not limited to: load balancing, load minimum value distribution etc. side Formula.
Device shown in b refering to fig. 1, device shown in Fig. 1 b for no branch process circuit computing device, such as Fig. 1 b institute The device shown comprising: main process task circuit and N number of based process circuit, wherein (specific structure is as schemed for main process task circuit Shown in 1c) can be with direct or indirect connection, when the mode being for example indirectly connected with N based process circuit, a kind of optional side Case may include N/4 branch process circuit as shown in Figure 1a, and each branch process circuit is separately connected 4 based process electricity Road may refer to above-mentioned as shown in Figure 1a retouch for the circuit that main process task circuit and N based process circuit separately include It states, which is not described herein again, what needs to be explained here is that, above-mentioned based process circuit can also be arranged in branch process circuit, In addition, the quantity of each branch process circuit connection based process circuit can also be not limited to 4, producer can be according to reality It is configured.The first mapping circuit and can be respectively designed in the above-mentioned main process task circuit and N number of based process circuit Two mapping circuits, specifically, can be main process task circuit include the first mapping circuit, N number of based process circuit or in which A part include the second mapping circuit;Being also possible to main process task circuit includes the first mapping circuit and the second mapping circuit, is gone back Can refer to N number of based process circuit or in which a part include the first mapping circuit and the second mapping circuit.It is above-mentioned Main process task circuit can instruct the application entity of dynamic distribution data compression process step according to neural computing, specifically , main process task circuit can determine whether to execute compression processing step to the data received according to the load of itself, specifically, The value of load can be arranged to multiple sections, each section corresponds to the executing subject of distribution data compression process step, for example, with For 3 sections, the load value in section 1 is lower, can execute data compression process step by N number of based process circuit.Main place Data compression process step is individually performed in reason circuit, and 2 load value of section, can be by main process task between section 1 and section 3 Data compression process step is individually performed in circuit, and 3 load value of section is higher, can be by main process task circuit or N number of based process electricity Road executes data compression process step jointly.In this regard, can be executed in a manner of expressing, such as main process task circuit can configure One special instruction or instruction are determined and are executed at data compression when based process circuit receives the special instruction or instruction Step is managed, when not receiving special instruction or instruction such as based process circuit, determination does not execute data compression process step.Again Such as, it can be executed in a manner of hint, for example, based process circuit receives sparse data (i.e. containing 0, or including being less than in advance If the data of threshold value are greater than preset quantity) and determine when needing to be implemented inner product operation, which is subjected to compression processing.
This application involves data compression process specifically in previously described first mapping circuit and the second mapping circuit It executes.It is to be understood that weight is more since neural network is the algorithm of a high calculation amount and high memory access, calculation amount and memory access Amount can all increase.In particular, being calculated in the case where for weight smaller (for example 0, or less than the weight of setting numerical value) to improve Rate, reduction expense need to carry out compression processing to the lesser data of these weights.In practical applications, data compression process is dilute It dredges and is applied in neural network, effect is the most obvious, such as reduces workload, reduction data overhead that data calculate, improves number According to computation rate etc..
By taking input data as an example, the specific embodiment that data compression process is related to is illustrated.The input data includes but not It is limited at least one input neuron and/or at least one weight.
In first embodiment:
First mapping circuit receives the first input data (concretely data to be calculated that main process task circuit is sent Block, such as distribution data block or broadcast data block) after, first mapping circuit can be to first input data at Reason, to obtain treated the first input data with the associated mark mask data of first input data, mask data use Whether it is greater than first threshold, such as 0.5,0 in the absolute value for indicating first input data.
Specifically, the absolute value when first input data is greater than first threshold, then retain the input data;Otherwise it deletes 0 is set to except first input data or by first input data.For example, the matrix data block of input isFirst threshold is 0.05, then can get treated matrix after the processing of the first mapping circuit Data blockIt is with the associated identification data block of matrix data block (alternatively referred to as mask matrix)
Further, to reduce volume of transmitted data, the main process task circuit is again into based process circuit connected to it When distributing data, can be transmitted in treated the matrix data block target data (be in this example 1,0.06 and 0.5) with And the associated identification data block of matrix data block.When it is implemented, the main process task circuit can will be described according to setting rule Target data in treated matrix data block is distributed in based process circuit, for example, successively send according to row sequence or Successively according to column sequence etc., the application is without limitation.Correspondingly, based process circuit receive the target data and After the target data corresponds to associated identification data block, according to setting rule (such as the row sequence) square that is reduced to that treated Battle array data block.0.5) and identification data block such as in this example, based process circuit can data (1,0.06 and based on the receivedIt would know that the corresponding matrix data block of the data (the first mapping circuit treated square i.e. in main process task circuit Battle array data block) be
In embodiments of the present invention, which can be distribution data block and/or broadcast data block.
Correspondingly, the second mapping circuit carries out the second input data using the associated mark data of the first input data Processing, to obtain treated the second input data;Wherein the first input data is different from second input data.Such as When first input data is at least one weight, then second input data can be at least one input neuron; Alternatively, then second input data can be at least one when first input data is at least one input neuron Weight.
In embodiments of the present invention, second input data is different from first input data, the second input number According to can be any of following: distribution data block, basic data block, broadcast data block and part broadcast data block.
For example, then the second input data is part broadcast data block when first input data is distribution data block. Assuming that the second input data is matrix data blockAccordingly with mask matrix in upper exampleAfter processing, obtain that treated that part broadcast data block isDue in practical applications, The matrix data block dimension that input data is related to is larger, and it is only for signals by the application, this does not constitute restriction.
In second embodiment:
First mapping circuit can be used for handling the first input data and the second input data, to be handled The first input data and the associated first identifier mask data of first input data afterwards, treated the second input number Accordingly and the associated second identifier mask data of second input data.Wherein, the first mask data or second Whether the absolute value that mask data are used to indicate first or second input data is greater than second threshold, which is user side Or the customized setting of device side, such as 0.05,0 etc..
Treated first input data or the second input data can be treated input data, can also be not locate Input data before reason.For example, the first input data is distribution data block, such as the matrix data block in above-mentioned exampleBy the first mapping circuit processing after can get treated distribution data block, here handle after Distribution data block can be original matrix data blockIt can also be the matrix data block after compression processingIt is to be understood that the application is data processing effect in the transmission and based process circuit for reduce data volume Rate, preferably treated the input data (such as treated basic data block or part broadcast data block) should be compression Data that treated.Preferably, the data that main process task circuit is sent into based process circuit, concretely described treated Target data in input data, concretely absolute value can also be non-zero data greater than the data of preset threshold to the target data Etc..
Correspondingly in based process circuit, the second mapping circuit can be according to associated first mark of first input data Know data and the associated second identifier data of second input data obtain connection identifier data;The connection identifier data are used Absolute value is all larger than the data of third threshold value in instruction first input data and second input data, wherein third Threshold value is user side or the customized setting of device side, such as 0.05,0.Further, second mapping circuit can be according to institute It states connection identifier data respectively to handle received first input data and the second input data, to obtain, treated First input data and treated the second input data.
For example, the first input data is matrix data blockSecond input block is equally For matrix data blockIt can get first input data after the processing of the first mapping circuit to close The first identifier data block of connectionAnd treated the first input blockCorrespondingly Obtain the associated second identifier data block of second input dataTreated, and the second input block isIt correspondingly, is improve data transfer rate, it only can will treated the first input in main process task circuit Target data 1,0.06 and 0.5 and the associated first identifier data block of first input block in data block are sent to Based process circuit;Meanwhile by the target data 1,1.1,0.6,0.3 and 0.5 in treated the second input block, and The associated second identifier data block of second input block is sent to based process circuit.
Correspondingly, based process circuit, can be by the second mapping circuit to above-mentioned first mark after receiving above-mentioned data Know data block and second identifier data block carries out obtaining connection identifier data block by element and operationAccordingly Ground, the second mapping circuit is using the connection identifier data block respectively to treated first input block and treated Second input block is respectively processed, to obtain, treated that the first input block isPlace The second input block after reason isIt wherein, can be according to first identifier data block in based process circuit And the target data in received first data block, determine that the first data block where the target data is corresponding (is passed through First mapping circuit treated the first data block);Correspondingly, according to second identifier data block and received second data block In target data, (i.e. by the first mapping circuit, treated for the second data block for determining where the target data is corresponding Second data block);Then, after the second mapping circuit knows connection identifier data block, distinguished using the connection identifier data block It carries out with the first determining data block and the second data block determined by element and operation, to obtain at via the second mapping circuit The first data block after reason and treated the second data block.
In 3rd embodiment:
First mapping circuit can't be set in the main process task circuit, but third can be inputted number by the main process task circuit Accordingly and the associated third mark data of the third input data that prestores is sent in based process circuit connected to it. The second mapping circuit is provided in the based process circuit.The tool for the data compression process that the second mapping circuit is related to is described below Body embodiment.
It is to be understood that the third input data includes but is not limited to basic data block, part broadcast data block, broadcast number According to block etc..Similarly, in neural network processor, which can also be at least one weight, and/or at least one A input nerve, the application is without limitation.
In the second mapping circuit, second mapping circuit can the associated third mark of third input data based on the received Know data to handle the third input data, so that treated third input data is obtained, so as to subsequent to processing Third input data afterwards executes correlation operation, such as inner product operation.
For example, the received third input data of the second mapping circuit is matrix data blockPhase The associated third identification data block of the third input data (also at mask matrix data block) prestored with answering isFurther, the second mapping circuit handle to third input block according to third identification data block To treated, third input block is specially
In addition, the input neuron and output neuron mentioned in the embodiment of the present invention do not mean that entire neural network The neuron in neuron and output layer in input layer, but for two layers of neuron of arbitrary neighborhood in neural network, place Neuron in network feed forward operation lower layer is to input neuron, and the neuron in network feed forward operation upper layer is Output neuron.By taking convolutional neural networks as an example, it is assumed that a convolutional neural networks have L layers, K=1,2,3 ... L-1, for K For layer and K+1 layer, K layer referred to as input layer, the neuron in this layer is above-mentioned input neuron, and K+1 layers are claimed For input layer, the neuron in this layer is above-mentioned output neuron, i.e., other than top layer, each layer all can serve as to input Layer, next layer are corresponding output layer.
In 4th implementation:
In the main process task circuit and it is not provided with mapping circuit, the first mapping electricity is provided in the based process circuit Road and the second mapping circuit.About the data processing of first mapping circuit and the second mapping circuit, for details, reference can be made to aforementioned Described in one embodiment to 3rd embodiment, which is not described herein again.
Optionally, there is also the 5th embodiments.In 5th embodiment, in the based process circuit and it is not provided with mapping electricity First mapping circuit and the second mapping circuit are arranged in main process task circuit by road, about first mapping circuit Data processing with the second mapping circuit is no longer gone to live in the household of one's in-laws on getting married here for details, reference can be made to described in aforementioned first embodiment to 3rd embodiment It states.It is that the compression processing of data is completed in main process task circuit, by treated, input data is sent to based process circuit, So that based process circuit is executed using treated input data (weight after concretely treated neuron and processing) Arithmetic operation correspondingly.
The concrete structure schematic diagram this application involves mapping circuit is described below.It possible is reflected as Fig. 4 a and 4b show two kinds Transmit-receive radio road.Wherein, mapping circuit as shown in fig. 4 a includes comparator and selector.Number about the comparator and selector Measure the application without limitation.As Fig. 4 a shows a comparator and two selectors, wherein the comparator is for determining input Whether data meet preset condition.The preset condition can be above-mentioned for the customized setting of user side or equipment side, such as the application The input data absolute value be greater than or equal to preset threshold.If meeting preset condition, comparator can determine permission The input data is exported, it is 1 which, which corresponds to associated mark data,;Otherwise it can determine and do not export the input data, or It is 0 that person, which defaults the input data,.Correspondingly, it is 0 that the input data, which corresponds to associated mark data, at this time.It that is to say, by this After comparator, the associated mark data of input data would know that.
It further, can be by the mark data of acquisition after the comparator is to the judgement of input data progress preset condition It is input in selector, so that selector decides whether to export input data correspondingly using the mark data, that is, obtains Input data that treated.
As Fig. 4 a can be in the matrix data block by comparator by taking the input data is matrix data block as an example Each data carry out the judgement of preset condition, to can get the associated identification data block of matrix data block (mask matrix). Further, the matrix data block is screened using the identification data block in first selector, by the matrix The data that absolute value is greater than or equal to preset threshold (meeting preset condition) in data block are retained, and remainder data is deleted It removes, with output treated matrix data block.Optionally, also defeated to other using the identification data block in second selector Enter data (such as second matrix data block) to be handled, such as carries out by element and operation, by the second matrix data block The data that middle absolute value is greater than or equal to preset threshold are retained, with output treated the second matrix data block.
It is to be understood that corresponding in above-mentioned the first and second embodiments, the specific structure of first mapping circuit can be wrapped Include the comparator and first selector at least one comparator and at least one selector, such as upper example in Fig. 4 a;Described The concrete outcome of two mapping circuits may include one or more selectors, such as go up the second selector of Fig. 4 a in example.
Such as Fig. 4 b, the structural schematic diagram of another mapping circuit is shown.Such as Fig. 4 b, the mapping circuit includes selector, The quantity of the selector without limitation, can be one, can also be multiple.Specifically, the selector is used for according to input Mark data associated by input data selects the input data of input, will be in the input data absolutely The data that value is greater than or equal to preset threshold are exported, and remainder data delete/do not export, to obtain, that treated is defeated Enter data.
By taking the input data is matrix data block as an example, Xiang Suoshu mapping circuit inputs the matrix data block and the square The identification data block of battle array data block associated, selector can select the matrix data block according to the identification data block, will Its absolute value is exported more than or equal to 0 data, and remainder data not exports, thus output treated matrix data Block.
It is to be understood that structure as shown in Figure 4 b can be applied to the second mapping circuit in above-mentioned 3rd embodiment, it is The concrete outcome of the second mapping circuit in above-mentioned 3rd embodiment may include at least one selector.Similarly, for main process task The first mapping circuit and the second mapping circuit that design in circuit and based process circuit can be according to as shown in figures 4 a and 4b Functional component carries out combined crosswise or component is split, and the application is without limitation.
A kind of forward operation using neural network as shown in Figure 1a is provided below, the device of nerve net realizes the side calculated Method, the method for the calculating are specifically as follows the calculation of neural network, such as the training of network, in practical applications, forward direction fortune Matrix Multiplication matrix, convolution algorithm, activation operation, transform operation etc. operation can be executed according to different input datas by calculating, on Stating operation can be realized using device as shown in Figure 1a.
First mapping circuit of main process task circuit first carries out compression processing to data and is then transferred to basis by control circuit Processing circuit operation, for example, the first mapping circuit of main process task circuit is transmitted further to base after can carrying out compression processing to data Plinth processing circuit reduces the total bit number amount of transmission, based process circuit its advantage is that the data volume of transmission data can be reduced The efficiency for executing data operation is also higher, and power consumption is lower.
Data to be calculated are transferred on all or part based process circuit by main process task circuit;With Matrix Multiplication with For vector calculates, matrix data can be split each column as a basic data by the control circuit of main process task circuit, such as M*n matrix, can split into the vector of n m row, and the control circuit of main process task circuit distributes the vector of n m row after fractionation To multiple based process circuits.For vector, vector can be integrally broadcast to each basis by the control circuit of main process task circuit Processing circuit.If the value of m is bigger, control circuit can be with x=2 first by m*n matrix-split at x*n vector Example, can specifically split into, 2n vector, and each vector includes m/2 row, i.e., vector each in the vector of n m row is equal It is divided into 2 vectors, by taking the first row as an example, if first vector of the vector of n m row is 1000 rows, then being divided into 2 vectors Can be, by preceding 500 row form primary vector, will after 500 rows form secondary vector, control circuit by 2 broadcast by 2 to Amount is broadcast to multiple based process circuits.
The mode of data transmission can be broadcast and perhaps distribute or other any possible transmission modes;
After based process circuit receives data, first passes through the second mapping circuit and data is handled, then execute operation, Obtain operation result;
Operation result is transmitted back to main process task circuit by based process circuit;
The operation result can be intermediate calculation results, be also possible to final operation result.
Multiply the operation of tensor, the previously described data block phase of tensor sum using the achievable tensor of device as shown in Figure 1a It together, can be matrix, vector, three-dimensional data block, four figures according to any one of block and high dimensional data block or multinomial combination; Below as the concrete methods of realizing of Matrix Multiplication vector sum Matrix Multiplication matrix operation is shown respectively in Fig. 2 and 2b.
The operation of Matrix Multiplication vector is completed using device as shown in Figure 1a;(Matrix Multiplication vector can be each in matrix Row carries out inner product operation with vector respectively, and these results are put into a vector by the sequence of corresponding row.)
Be described below calculate size be M row L column matrix S and length be L vector P multiplication operation, following Fig. 2 a It is shown, (every a line in matrix S is identical as vector P length, and the data opsition dependent in them corresponds) described neural network Computing device possesses K based process circuit:
Referring to Fig.2, Fig. 2 has provided a kind of implementation method of Matrix Multiplication vector, can specifically include:
Step S201, every data line in matrix S is distributed to K based process electricity by the control circuit of main process task circuit In some in road, based process circuit by the distribution data received be stored in based process circuit on piece caching and/ Or in register;
In a kind of optional scheme, data that the data of the matrix S are that treated.Specifically, main process task circuit opens Matrix S is handled with the first mapping circuit, to obtain treated matrix S and the associated first identifier of matrix S (mask) matrix.Alternatively, the first mapping circuit of main process task circuit is according to the associated first mask matrix of the matrix S prestored to square Battle array S is handled, the matrix S that obtains that treated.Further, by control circuit by every a line in treated matrix S Data and the row data correspondence correspond to associated mark data in the first mask matrix and send jointly to K based process electricity In road some or it is multiple in.It, specifically can will treated square when main process task circuit sends data to based process circuit Absolute value is greater than the data of preset threshold in battle array S or non-zero data are sent to based process circuit, to reduce volume of transmitted data. For example, the collection of the row in the matrix S that is distributed in i-th of based process circuit that treated is combined into Ai, Mi row is shared;Correspondingly, It is a part of the first mask matrix that also distribution, which has identity matrix Bi corresponding with Ai, Bi, simultaneously, shares and is greater than or equal to Mi Row.
In a kind of optinal plan, if line number M≤K of matrix S, the control circuit of main process task circuit is to K basis Processing circuit distributes a line of s-matrix respectively;Optionally, while also it sends by the corresponding row in first identifier matrix of the row Mark data;
In a kind of optinal plan, if line number M > K of matrix S, the control circuit of main process task circuit gives each basis Processing circuit distributes a line or the data of multirow in s-matrix respectively.Optionally, it while also sending and is corresponded to by a line or a few rows The mark data of row in first identifier matrix;
The collection for the row being distributed in the S of i-th of based process circuit is combined into Ai, shares Mi row, as Fig. 2 c is indicated i-th Calculating to be executed on based process circuit.
In a kind of optinal plan, in each based process circuit, such as in i-th of based process circuit, it can incite somebody to action The distribution data received such as matrix A i is stored in the register and/or on piece caching of i-th of based process circuit;Advantage The volume of transmitted data of distribution data after being the reduction of, improves computational efficiency, reduces power consumption.
Step S202, each section in vector P is transferred to K base by the control circuit of main process task circuit in a broadcast manner Plinth processing circuit;
In a kind of optinal plan, the data (each section) of the vector P can be treated data.Specifically, main place Reason circuit enables the first mapping circuit and handles vector P, to obtain, treated that vector P and vector P is associated Second identifier (mask) matrix.Alternatively, the first mapping circuit of main process task circuit is according to the vector P associated second prestored Mask matrix handles vector P, the vector P that obtains that treated.Further, by control circuit will treated to Data (i.e. each section) and data correspondence in amount P correspond to associated mark data in the 2nd mask matrix and send together To in K based process circuit some or it is multiple in.When main process task circuit sends data to based process circuit, specifically Absolute value in treated vector P can be greater than the data of preset threshold or non-zero data are sent to based process circuit, with Reduce volume of transmitted data.
In a kind of optinal plan, the control circuit of main process task circuit, which can only broadcast each section in vector P, once to be arrived In register or the on piece caching of each based process circuit, i-th of based process circuit is to the vector P's this time obtained Data are fully multiplexed, and the corresponding inner product operation with every a line in matrix A i is completed.Advantage is reduced from main process task circuit To the volume of transmitted data of the repetition transmission of the vector P of based process circuit, execution efficiency is improved, reduces transmission power consumption.
In a kind of optinal plan, each section in vector P can be repeatedly broadcast to respectively by the control circuit of main process task circuit In register or the on piece caching of a based process circuit, data of i-th of based process circuit to the vector P obtained every time Without multiplexing, the inner product operation of the every a line corresponded in matrix A i is completed by several times;Advantage is reduced in based process circuit The volume of transmitted data of the vector P of the single transmission in portion, and the capacity of based process circuit caching and/or register can be reduced, Execution efficiency is improved, transmission power consumption is reduced, reduces cost.
In a kind of optinal plan, each section in vector P can be repeatedly broadcast to respectively by the control circuit of main process task circuit In register or the on piece caching of a based process circuit, data of i-th of based process circuit to the vector P obtained every time Fractional reuse is carried out, the inner product operation of the every a line corresponded in matrix A i is completed;Advantage is reduced from main process task circuit to base The volume of transmitted data of plinth processing circuit also reduces the volume of transmitted data inside based process circuit, improves execution efficiency, reduces and pass Defeated power consumption.
Step S203, the inner product of the data of inner product operation device the circuit counting matrix S and vector P of K based process circuit, Such as i-th of based process circuit, the inner product of the data of the data and vector P of calculating matrix Ai;
In a kind of specific embodiment, data in the based process circuit receives that treated matrix S and The corresponding associated mark data in the first mask matrix of the data;Data in the vector P that also receives that treated simultaneously.Phase Ying Di, based process circuit enable the second mapping circuit based on the received the mark data in the first mask matrix to it is received to The data of amount P are handled, the data for the vector P that obtains that treated.Further, which enables inner product operation Device circuit executes inner product operation to the data of data and treated vector P in received treated matrix S, obtains inner product The result of operation.For example, i-th of based process circuit, receives matrix A i, the Ai associated identity matrix Bi and vector P; The second mapping circuit can be enabled at this time is handled to obtain treated vector P to vector P using Bi;Inner product operation is enabled again To matrix A i, vector P carries out inner product operation to device circuit with treated.
In a kind of specific embodiment, data in the based process circuit receives that treated vector P and The corresponding associated mark data in the 2nd mask matrix of the data;Data in the matrix S that also receives that treated simultaneously.Phase Ying Di, based process circuit enable mark data of second mapping circuit based on the received in the 2nd mask matrix to received square The data of battle array S are handled, the data for the matrix S that obtains that treated.Further, which enables inner product operation Device circuit executes inner product operation to the data in the data of received treated vector P and treated matrix S, obtains inner product The result of operation.For example, i-th of based process circuit, receive that matrix A i, treated, and vector P and vector P is associated Second identifier matrix;The second mapping circuit can be enabled at this time is handled to obtain treated square to Ai using second identifier matrix Battle array Ai;Inner product operation device circuit is enabled again to treated matrix A i and treated that vector P carries out inner product operation.
In a kind of specific embodiment, data in the based process circuit receives that treated matrix S and The corresponding associated mark data in the first mask matrix of the data;Data in the vector P that also receives that treated simultaneously with And the corresponding associated mark data in the 2nd mask matrix of the data.Correspondingly, based process circuit enables the second mapping electricity Mark data of the road based on the received in the mark data and the 2nd mask matrix in the first mask matrix obtains relation identity square Battle array;Then using the mark data in relation identity matrix respectively to the data in the data and vector P in received matrix S into The data of row processing, the data for the matrix S that obtains that treated and treated vector P.Further, inner product operation device electricity is enabled Road executes inner product operation to the data of data and treated vector P in treated matrix S, obtains the knot of inner product operation Fruit.For example, i-th of based process circuit, receives the associated identity matrix Bi of matrix A i, the Ai, vector P and vector P is closed The second identifier matrix of connection;The second mapping circuit can be enabled at this time obtains relation identity matrix using Bi and second identifier matrix, The relation identity matrix is recycled simultaneously or separately to handle matrix A i and vector P, obtain that treated matrix A i and place Vector P after reason.Then, inner product operation device circuit is enabled to treated matrix A i and treated that vector P carries out inner product fortune It calculates.
Step S204, the accumulator circuit of K based process circuit is added up the result of inner product operation As a result, accumulation result to be transmitted back to main process task circuit in the form of fixed point type.
In a kind of optinal plan, each based process circuit can be executed to the part and (part that inner product operation obtains That is a part of accumulation result, such as accumulation result are as follows: F1*G1+F2*G2+F3*G3+F4*G4+F5*G5, then part and Can be with are as follows: the value of F1*G1+ F2*G2+F3*G3) it is transmitted back to main process task circuit and adds up;Advantage is to reduce based process Operand inside circuit improves the operation efficiency of based process circuit.
The part that can also be obtained the inner product operation that each based process circuit executes in a kind of optinal plan and guarantor It is cumulative to terminate to be transmitted back to main process task circuit later in register and/or the on piece caching of existence foundation processing circuit;Advantage is, Reduce the volume of transmitted data between based process circuit and main process task circuit, improve operation efficiency, reduces data transmission Power consumption.
In a kind of optinal plan, can also by the obtained part of inner product operation that each based process circuit executes and It is stored in the register and/or on piece caching of based process circuit and adds up under partial picture, be transferred under partial picture Main process task circuit adds up, cumulative to terminate to be transmitted back to main process task circuit later;Advantage is to reduce based process circuit and master Volume of transmitted data between processing circuit, improves operation efficiency, reduces data transmission power consumption, reduces based process circuit Internal operand improves the operation efficiency of based process circuit.
Refering to Fig. 2 b, the operation of Matrix Multiplication matrix is completed using device as shown in Figure 1a;
Be described below calculate size be M row L column matrix S and size be L row N column matrix P multiplication operation, (square Every a line in battle array S is identical with each column length of matrix P, as shown in Figure 2 d) to possess K a for the neural computing device Based process circuit:
Step S201b, every data line in matrix S is distributed to K based process by the control circuit of main process task circuit In some in circuit, the data received are stored on piece caching and/or register by based process circuit;
In a kind of optional scheme, data that the data of the matrix S are that treated.Specifically, main process task circuit opens Matrix S is handled with the first mapping circuit, to obtain treated matrix S and the associated first identifier of matrix S (mask) matrix.Alternatively, the first mapping circuit of main process task circuit is according to the associated first mask matrix of the matrix S prestored to square Battle array S is handled, the matrix S that obtains that treated.Further, by control circuit by every a line in treated matrix S Data and the row data correspondence correspond to associated mark data in the first mask matrix and send jointly to K based process electricity In road some or it is multiple in.It, specifically can will treated square when main process task circuit sends data to based process circuit Absolute value is greater than the data of preset threshold in battle array S or non-zero data are sent to based process circuit, to reduce volume of transmitted data.
In a kind of optinal plan, if line number M≤K of S, the control circuit of main process task circuit is to M based process Circuit distributes a line of s-matrix respectively;Optionally, while also it sends by the mark of the corresponding row in first identifier matrix of the row Data;
In a kind of optinal plan, if line number M > K of S, the control circuit of main process task circuit is to each based process electricity Distribute a line or the data of multirow in s-matrix respectively in road.Optionally, while also it sends corresponding in the first mark by a line or a few rows Know the mark data of the row in matrix;
There is Mi row to be distributed to i-th of based process circuit in S, the collection of this Mi row is collectively referred to as Ai, as Fig. 2 e indicates i-th of base Calculating to be executed on plinth processing circuit.
In a kind of optinal plan, in each based process circuit, such as in i-th of based process circuit:
Matrix A i is stored in i-th of based process circuit register by the received matrix A i distributed by main process task circuit And/or on piece caching;Advantage be the reduction of after volume of transmitted data, improve computational efficiency, reduce power consumption.
Step S202b, each section in matrix P is transferred to each base by the control circuit of main process task circuit in a broadcast manner Plinth processing circuit;
In a kind of optinal plan, the data (each section) of the matrix P can be treated data.Specifically, main place Reason circuit enables the first mapping circuit and handles matrix P, to obtain, treated that matrix P and matrix P is associated Second identifier (mask) matrix.Alternatively, the first mapping circuit of main process task circuit is according to the matrix P associated second prestored Mask matrix handles matrix P, the matrix P that obtains that treated.It further, will treated square by control circuit Data (i.e. each section) and data correspondence in battle array P correspond to associated mark data in the 2nd mask matrix and send together To in K based process circuit some or it is multiple in.When main process task circuit sends data to based process circuit, specifically Absolute value in treated matrix P can be greater than the data of preset threshold or non-zero data are sent to based process circuit, with Reduce volume of transmitted data.
In a kind of optinal plan, each section in matrix P can only be broadcasted and once arrive posting for each based process circuit In storage or on piece caching, i-th of based process circuit is fully multiplexed the data of the matrix P this time obtained, Complete the corresponding inner product operation with every a line in matrix A i;Multiplexing in the present embodiment is specifically as follows based process circuit and exists Reused in calculating, for example, matrix P data multiplexing, can be and the data of matrix P are being used for multiple times.
In a kind of optinal plan, each section in matrix P can be repeatedly broadcast to respectively by the control circuit of main process task circuit In register or the on piece caching of a based process circuit, data of i-th of based process circuit to the matrix P obtained every time Without multiplexing, the inner product operation of the every a line corresponded in matrix A i is completed by several times;
In a kind of optinal plan, each section in matrix P can be repeatedly broadcast to respectively by the control circuit of main process task circuit In register or the on piece caching of a based process circuit, data of i-th of based process circuit to the matrix P obtained every time Fractional reuse is carried out, the inner product operation of the every a line corresponded in matrix A i is completed;
In a kind of optinal plan, each based process circuit, such as i-th of based process circuit, calculating matrix Ai's The inner product of data and the data of matrix P;
Step S203b, the result of inner product operation is added up and is transmitted by the accumulator circuit of each based process circuit Return main process task circuit.
Optionally, before step S203b, the inner product operation device of based process circuit needs the data of calculating matrix S and matrix P Inner product, specifically there are following several embodiments.
In a kind of specific embodiment, data in the based process circuit receives that treated matrix S and The corresponding associated mark data in the first mask matrix of the data;Data in the matrix P that also receives that treated simultaneously.Phase Ying Di, based process circuit enable mark data of second mapping circuit based on the received in the first mask matrix to received square The data of battle array P are handled, the data for the matrix P that obtains that treated.Further, which enables inner product operation Device circuit executes inner product operation to the data of data and treated matrix P in received treated matrix S, obtains inner product The result of operation.
In a kind of specific embodiment, data in the based process circuit receives that treated matrix P and The corresponding associated mark data in the 2nd mask matrix of the data;Data in the matrix S that also receives that treated simultaneously.Phase Ying Di, based process circuit enable mark data of second mapping circuit based on the received in the 2nd mask matrix to received square The data of battle array S are handled, the data for the matrix S that obtains that treated.Further, which enables inner product operation Device circuit executes inner product operation to the data in the data of received treated matrix P and treated matrix S, obtains inner product The result of operation.
In a kind of specific embodiment, data in the based process circuit receives that treated matrix S and The corresponding associated mark data in the first mask matrix of the data;Data in the matrix P that also receives that treated simultaneously with And the corresponding associated mark data in the 2nd mask matrix of the data.Correspondingly, based process circuit enables the second mapping electricity Mark data of the road based on the received in the mark data and the 2nd mask matrix in the first mask matrix obtains relation identity square Battle array;Then using the mark data in relation identity matrix respectively to the data in the data and matrix P in received matrix S into The data of row processing, the data for the matrix S that obtains that treated and treated matrix P.Further, inner product operation device electricity is enabled Road executes inner product operation to the data of data and treated matrix P in treated matrix S, obtains the knot of inner product operation Fruit.For example, i-th of based process circuit, receives the associated identity matrix Bi of matrix A i, the Ai, matrix P and matrix P is closed The second identifier matrix of connection;The second mapping circuit can be enabled at this time obtains relation identity matrix using Bi and second identifier matrix, The relation identity matrix is recycled simultaneously or separately to handle matrix A i and matrix P, obtain that treated matrix A i and place Matrix P after reason.Then, inner product operation device circuit is enabled to treated matrix A i and treated that matrix P carries out inner product fortune It calculates.
In a kind of optinal plan, based process circuit can execute the part and be transmitted back to that inner product operation obtains for each Main process task circuit adds up;
The part that can also be obtained the inner product operation that each based process circuit executes in a kind of optinal plan and guarantor It is cumulative to terminate to be transmitted back to main process task circuit later in register and/or the on piece caching of existence foundation processing circuit;
In a kind of optinal plan, can also by the obtained part of inner product operation that each based process circuit executes and It is stored in the register and/or on piece caching of based process circuit and adds up under partial picture, be transferred under partial picture Main process task circuit adds up, cumulative to terminate to be transmitted back to main process task circuit later.
The present invention also provides a kind of chip, which includes computing device, which includes:
Including a main process task circuit, involved data can be the data after compression processing in main process task circuit, In an alternative embodiment, the data after the compression processing include that at least one inputs neuron or at least one weight, Each neuron at least one described neuron is greater than first threshold alternatively, each weight at least one described weight Greater than second threshold.The first threshold and the second threshold are the customized setting of user side, they can be identical, can also It is different.
In a kind of optinal plan, main process task circuit includes the first mapping circuit;
In a kind of optinal plan, main process task circuit includes the arithmetic element for executing data compression process, such as vector operation Unit etc.;
Specifically, the Data Input Interface comprising reception input data;
In a kind of optinal plan, the received data source may is that the neural network computing circuit device Some or all of external or described neural network computing circuit device based process circuit;
In a kind of optinal plan, the Data Input Interface can have multiple;Specifically, it may include output data Data output interface;
In a kind of optinal plan, the whereabouts of the data of the output may is that the outer of the neural network computing device Some or all of portion or the neural network computing circuit device based process circuit;
In a kind of optinal plan, the data output interface can have multiple;
In a kind of optinal plan, the main process task circuit includes on piece caching and/or register;
In a kind of optinal plan, includes arithmetic element in the main process task circuit, data operation can be executed;
It include arithmetic operation unit in the main process task circuit in a kind of optinal plan;
It include vector operation unit in the main process task circuit in a kind of optinal plan, it can be simultaneously to one group of data Execute operation;Specifically, the arithmetical operation and/or vector operation can be any type of operation, including but not limited to: two Number phase addition subtraction multiplication and division, a number and constant addition subtraction multiplication and division count one and execute exponent arithmetic, power operation, logarithm operation, with And various nonlinear operations, comparison operation, logical operation etc. are executed to two numbers.Two addition of vectors subtract multiplication and division, a vector Each of element and constant addition subtraction multiplication and division, exponent arithmetic, power operation, logarithm fortune are executed to each of vector element Calculation and various nonlinear operations etc. execute comparison operation, logical operation to the corresponding element of every two in a vector Deng.
In a kind of optinal plan, the main process task circuit includes data rearrangement column unit, in a certain order To based process circuit transmission data, or original place rearranges data in a certain order;
In a kind of optinal plan, the sequence of the data arrangement includes: to carry out dimension order to a poly-dimensional block data Transformation;The sequence of the data arrangement can also include: to carry out piecemeal to a data block to be sent at different bases Manage circuit.
The computing device further includes multiple based process circuits: each based process circuit is for calculating two vectors The method of inner product, calculating is two groups of numbers that based process circuit receives, by the corresponding multiplication of element in this two groups of numbers, and will The result of multiplication adds up;The result of inner product transfers out, and transfers out the position according to based process circuit here, and having can Other based process circuits can be transferred to, main process task circuit can also be transferred directly to.
Involved data can be the data after compression processing in based process circuit, in a kind of alternative embodiment In, the data after the compression processing include at least one input neuron or at least one weight, at least one described nerve Each neuron in member is greater than first threshold alternatively, each weight at least one described weight is greater than second threshold.Institute Stating first threshold and the second threshold is the customized setting of user side, they can be identical, can also be different.
In a kind of optinal plan, based process circuit includes the second mapping circuit;
In a kind of optinal plan, based process circuit includes the vector operation unit for executing data compression process;
Specifically, the storage unit including being made of on piece caching and/or register;
Specifically, including one or more Data Input Interfaces for receiving data;
In a kind of optinal plan, including two Data Input Interfaces, it can divide from two Data Input Interfaces every time It Huo get not one or more data;
In a kind of optinal plan, based process circuit saves after can receiving input data from Data Input Interface In register and/or on piece caching;
The source that above-mentioned Data Input Interface receives data may is that other based process circuits and/or main process task circuit.
The main process task circuit of the neural network computing circuit device;
Other based process circuits of the neural network computing circuit device (gather around by the neural network computing circuit device There are multiple based process circuits);
Specifically, the data output interface including one or more transmission output datas;
In a kind of optinal plan, one or more data can be transferred out from data output interface;
Specifically, it may is that the number received from Data Input Interface by the data that data output interface transfers out According to, be stored on piece caching and/or register in data, multiplier computation result, accumulator operation result or inner product operation One of device operation result or any combination.
It include three data output interfaces, two therein to correspond respectively to two data defeated in a kind of optinal plan Incoming interface, each layer go out upper one layer of data received from Data Input Interface, and third data output interface is responsible for output fortune Calculate result;
Specifically, the whereabouts of data output interface transmission data may is that context data source and data herein Whereabouts determines the connection relationship of based process circuit in a device.
The main process task circuit of the neural network computing circuit device;
Other based process circuits of the neural network computing circuit device, the neural network computing circuit device are gathered around There are multiple based process circuits;
Specifically, including arithmetic circuity: the arithmetic circuity is specifically as follows: one or more multipliers electricity Road, one or more accumulator circuits, one or more execute one or any combination in the circuit of two groups of number inner product operations.
In a kind of optinal plan, the multiplying of two numbers can be executed, result can be stored on piece caching and/ Or on register, can also directly it be added in register and/or on piece caching;
In a kind of optinal plan, the inner product operation of two groups of data can be executed, result can be stored on piece caching And/or in register, can also directly it be added in register and/or on piece caching;
In a kind of optinal plan, the accumulating operation of data can be executed, data accumulation on piece is cached and/or deposited In device;
Specifically, the data that accumulator circuit is cumulatively added may is that the data received from Data Input Interface, save Data, multiplier computation result, accumulator operation result, inner product operation device operation knot on piece caching and/or register One in fruit or any combination.
It should be noted that " Data Input Interface " and " data used in the above-mentioned description to based process circuit Output interface " refers to data input and the output interface of each based process circuit, rather than the data of whole device input With output interface.
In one embodiment, the invention discloses a kind of neural network computing devices comprising for executing institute as above Functional unit corresponding to all or part of embodiments provided in embodiment of the method is provided.
In one embodiment, the invention discloses a kind of chips, provide for executing in embodiment of the method as described above All or part of embodiments.
In one embodiment, the invention discloses a kind of electronic devices comprising real for executing method as described above Apply the functional unit of all or part of embodiments in example.
Electronic device include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal, Mobile phone, automobile data recorder, navigator, sensor, camera, server, camera, video camera, projector, wrist-watch, earphone, movement Storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.
Particular embodiments described above has carried out further in detail the purpose of present disclosure, technical scheme and beneficial effects Describe in detail it is bright, it is all it should be understood that be not limited to present disclosure the foregoing is merely the specific embodiment of present disclosure Within the spirit and principle of present disclosure, any modification, equivalent substitution, improvement and etc. done should be included in the guarantor of present disclosure Within the scope of shield.

Claims (12)

1. a kind of integrated circuit chip device, which is characterized in that the integrated circuit chip device include: main process task circuit and Multiple based process circuits;The main process task circuit includes the first mapping circuit, at least one in the multiple based process circuit A circuit includes the second mapping circuit, and first mapping circuit and second mapping circuit are used to execute neural network The compression processing of each data in operation;
The main process task circuit, for obtaining input block, weight data block and multiplying order, according to the multiplying order The input block is divided into distribution data block, the weight data block is divided into broadcast data block;Multiply according to described in The operation control of method instruction determines that the first mapping circuit of starting handles the first data block, first data that obtain that treated Block;First data block includes the distribution data block and/or the broadcast data block;It will be handled according to the multiplying order The first data block afterwards is sent to the electricity of at least one based process in the based process circuit being connected with the main process task circuit Road;
The multiple based process circuit, for determining whether starting the second mapping electricity according to the operation control of the multiplying order Road handles the second data block, and according to treated, operation that the second data block executes in a parallel fashion in neural network is obtained To operation result, and by the operation result by giving the main place with the based process circuit transmission of the main process task circuit connection Manage circuit;Second data block is the data block that the reception main process task circuit that the based process circuit determines is sent, Second data block and treated first data block associated;
The main process task circuit obtains the instruction results of the multiplying order for handling the operation result.
2. integrated circuit chip device according to claim 1, which is characterized in that when first data block includes distribution When data block and broadcast data block,
The main process task circuit is specifically used for starting first mapping circuit to the distribution data block and the broadcast data Block is handled to obtain the identification data block of treated distribution data block and the distribution data block associated, treated broadcast Data block and the associated identification data block of broadcast data block;Data block will be distributed after the processing and the distribution data block is closed The identification data block of connection is split to obtain multiple basic data blocks and the multiple basic data block respectively associated mark Data block;By the multiple basic data block and the multiple basic data block respectively associated identification data block be distributed to Its connect based process circuit, by the broadcast data block and the associated identification data block of broadcast data block broadcast to Its based process circuit connected;
The based process circuit, for starting second mapping circuit according to the associated mark data of the basic data block Block and the associated identification data block of broadcast data block obtain connection identifier data block;According to the connection identifier data block The basic data block and the broadcast data block are handled, to treated basic data block and that treated is wide Multicast data block executes product calculation and obtains operation result, and the operation result is sent to the main process task circuit.
3. integrated circuit chip device according to claim 1, which is characterized in that when first data block includes distribution When data block,
The main process task circuit carries out handling everywhere specifically for starting first mapping circuit to the distribution data block The identification data block of distribution data block and the distribution data block associated after reason, or starting first mapping circuit according to It prestores the identification data block of the distribution data block associated and is handled to obtain that treated distributes number to the distribution data block According to block;By it is described treated distribution data block and the distribution data block associated identification data block split to obtain it is multiple The respective associated identification data block of basic data block and the multiple basic data block;By the multiple basic data block and Respectively associated identification data block is distributed to based process circuit connected to it to the multiple basic data block;By the broadcast Data block is broadcasted to based process circuit connected to it;
The based process circuit, for starting second mapping circuit according to the associated mark data of the basic data block Block handles the broadcast data block, executes product calculation to treated broadcast data block and the basic data block Operation result is obtained, the operation result is sent to the main process task circuit.
4. integrated circuit chip device according to claim 1, which is characterized in that when first data block includes broadcast When data block,
The main process task circuit carries out handling everywhere specifically for starting first mapping circuit to the broadcast data block Broadcast data block and the associated identification data block of broadcast data block after reason, or starting the first mapping circuit foundation The associated identification data block of broadcast data block prestored is handled to obtain treated broadcast to the broadcast data block Data block;It is split the distribution data block to obtain multiple basic data blocks;By the multiple master data be distributed to Its based process circuit connected;Treated by described in broadcast data block and the associated identification data block of broadcast data block It broadcasts to based process circuit connected to it;
The based process circuit, for starting second mapping circuit according to the associated mark data of broadcast data block Block is handled to obtain treated basic data block to the basic data block;To treated the basic data block and Treated the broadcast data block executes product calculation and obtains operation result, and the operation result is sent to the main process task Circuit.
5. the integrated circuit chip device according to any one of claim 2-4, which is characterized in that
The based process circuit obtains product specifically for executing product calculation to the basic data block and the broadcast data block As a result, the result of product, which is added up, obtains operation result, the operation result is sent to the main process task circuit;
The main process task circuit arranges the accumulation result for obtaining accumulation result after cumulative to the operation result To described instruction result.
6. the integrated circuit chip device according to any one of claim 2-4, which is characterized in that
The main process task circuit, specifically for by the broadcast data block or treated broadcast data block by once broadcast to The multiple based process circuit;Alternatively,
The main process task circuit, specifically for by the broadcast data block or treated that broadcast data block is divided into multiple portions Broadcast data block, by the multiple part broadcast data block by repeatedly broadcasting to the multiple based process circuit.
7. the integrated circuit chip device according to any one of claim 2-4, which is characterized in that
The main process task circuit, being specifically used for will described treated broadcast data block and associated mark of broadcast data block Data block is split to obtain multiple portions broadcast data block and the multiple part broadcast data block respectively associated mark Data block;By the respective associated identification data block of the multiple part broadcast data block and the multiple part broadcast data block Pass through one or many broadcast to the based process circuit;The multiple part broadcast data block combines to form the processing after Broadcast data block;
It is associated according to the part broadcast data block to be specifically used for starting second mapping circuit for the based process circuit Identification data block and the associated identification data block of the basic data block obtain connection identifier data block;It is marked according to the connection Know data block and is handled to obtain treated broadcast data block to the part broadcast data block and the basic data block And treated basic data block;To treated the broadcast data block and treated that basic data block executes product Operation;
Alternatively, the based process circuit, is specifically used for starting second mapping circuit according to the part broadcast data block Associated identification data block is handled to obtain treated basic data block to the basic data block, and treated to described Master data and the part broadcast data block execute product calculation.
8. integrated circuit chip device according to claim 7, which is characterized in that
The based process circuit is specifically used for obtaining after the part broadcast data block and the basic data block are executed a product To result of product, the result of product is added up and obtains partial arithmetic result, the partial arithmetic result is sent to the master Processing circuit;Alternatively,
The based process circuit executes the part broadcast data block and n specifically for multiplexing n times part broadcast data block The inner product operation of the basic data block obtains n part processing result, and n are obtained after n part processing result is added up respectively The n partial arithmetic result is sent to main process task circuit by partial arithmetic result, and the n is the integer more than or equal to 2.
9. integrated circuit chip device according to claim 1, which is characterized in that the integrated circuit chip device also wraps Include: branch process circuit, the branch process circuit are arranged between main process task circuit and at least one based process circuit;
The branch process circuit, for forwarding data between main process task circuit and at least one based process circuit.
10. integrated circuit chip device according to claim 1, which is characterized in that
The input block are as follows: vector or matrix;
The weight data block are as follows: vector or matrix.
11. a kind of chip, which is characterized in that the integrated chip such as claim 1-10 any one described device.
12. a kind of operation method of neural network, which is characterized in that the method is applied in integrated circuit chip device, institute Stating integrated circuit chip device includes: the integrated circuit chip device as described in claim 1-10 any one, described integrated The Matrix Multiplication matrix operation, Matrix Multiplication vector operation or vector that circuit chip device is used to execute neural network multiply vector operation.
CN201810164317.7A 2018-02-27 2018-02-27 Integrated circuit chip device and related product Active CN110197268B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010617209.8A CN111767998B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related products
CN201810164317.7A CN110197268B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related product
PCT/CN2019/075979 WO2019165940A1 (en) 2018-02-27 2019-02-23 Integrated circuit chip apparatus, board card and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810164317.7A CN110197268B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related product

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202010617209.8A Division CN111767998B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related products

Publications (2)

Publication Number Publication Date
CN110197268A true CN110197268A (en) 2019-09-03
CN110197268B CN110197268B (en) 2020-08-04

Family

ID=67751070

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010617209.8A Active CN111767998B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related products
CN201810164317.7A Active CN110197268B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related product

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202010617209.8A Active CN111767998B (en) 2018-02-27 2018-02-27 Integrated circuit chip device and related products

Country Status (1)

Country Link
CN (2) CN111767998B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767998A (en) * 2018-02-27 2020-10-13 上海寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN117974417A (en) * 2024-03-28 2024-05-03 腾讯科技(深圳)有限公司 AI chip, electronic device, and image processing method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
CN106126481A (en) * 2016-06-29 2016-11-16 华为技术有限公司 A kind of computing engines and electronic equipment
CN106447034A (en) * 2016-10-27 2017-02-22 中国科学院计算技术研究所 Neutral network processor based on data compression, design method and chip
CN106650923A (en) * 2015-10-08 2017-05-10 上海兆芯集成电路有限公司 Neural network elements with neural memory and neural processing unit array and sequencer
CN106991478A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 Apparatus and method for performing artificial neural network reverse train
CN107301456A (en) * 2017-05-26 2017-10-27 中国人民解放军国防科学技术大学 Deep neural network multinuclear based on vector processor speeds up to method
CN107316078A (en) * 2016-04-27 2017-11-03 北京中科寒武纪科技有限公司 Apparatus and method for performing artificial neural network self study computing
CN107609641A (en) * 2017-08-30 2018-01-19 清华大学 Sparse neural network framework and its implementation

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844330B (en) * 2016-03-22 2019-06-28 华为技术有限公司 The data processing method and neural network processor of neural network processor
CN107329936A (en) * 2016-04-29 2017-11-07 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing neural network computing and matrix/vector computing
CN111767998B (en) * 2018-02-27 2024-05-14 上海寒武纪信息科技有限公司 Integrated circuit chip device and related products

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
CN106650923A (en) * 2015-10-08 2017-05-10 上海兆芯集成电路有限公司 Neural network elements with neural memory and neural processing unit array and sequencer
CN106991478A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 Apparatus and method for performing artificial neural network reverse train
CN107316078A (en) * 2016-04-27 2017-11-03 北京中科寒武纪科技有限公司 Apparatus and method for performing artificial neural network self study computing
CN106126481A (en) * 2016-06-29 2016-11-16 华为技术有限公司 A kind of computing engines and electronic equipment
CN106447034A (en) * 2016-10-27 2017-02-22 中国科学院计算技术研究所 Neutral network processor based on data compression, design method and chip
CN107301456A (en) * 2017-05-26 2017-10-27 中国人民解放军国防科学技术大学 Deep neural network multinuclear based on vector processor speeds up to method
CN107609641A (en) * 2017-08-30 2018-01-19 清华大学 Sparse neural network framework and its implementation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUNJI CHEN ET AL.: "DaDianNao: A Machine-Learning Supercomputer", 《IEEE》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767998A (en) * 2018-02-27 2020-10-13 上海寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN117974417A (en) * 2024-03-28 2024-05-03 腾讯科技(深圳)有限公司 AI chip, electronic device, and image processing method

Also Published As

Publication number Publication date
CN111767998B (en) 2024-05-14
CN110197268B (en) 2020-08-04
CN111767998A (en) 2020-10-13

Similar Documents

Publication Publication Date Title
CN110197270A (en) Integrated circuit chip device and Related product
CN109993301A (en) Neural metwork training device and Related product
CN109993291A (en) Integrated circuit chip device and Related product
CN110197268A (en) Integrated circuit chip device and Related product
CN109961134A (en) Integrated circuit chip device and Related product
US11704544B2 (en) Integrated circuit chip device and related product
CN109993292A (en) Integrated circuit chip device and Related product
CN110197274A (en) Integrated circuit chip device and Related product
CN110197272A (en) Integrated circuit chip device and Related product
CN110197271A (en) Integrated circuit chip device and Related product
CN109961135A (en) Integrated circuit chip device and Related product
CN109993290A (en) Integrated circuit chip device and Related product
TWI787430B (en) Integrated circuit chip apparatus, chip, electronic device, and computing method of neural network
CN110197266A (en) Integrated circuit chip device and Related product
CN110197267A (en) Neural network processor board and Related product
CN110197275B (en) Integrated circuit chip device and related product
CN110197265A (en) Integrated circuit chip device and Related product
CN109978150A (en) Neural network processor board and Related product
CN111767997B (en) Integrated circuit chip device and related products
CN109977071A (en) Neural network processor board and Related product
CN109993284A (en) Integrated circuit chip device and Related product
US11734548B2 (en) Integrated circuit chip device and related product
CN110197273A (en) Integrated circuit chip device and Related product
CN109978155A (en) Integrated circuit chip device and Related product
CN109978130A (en) Integrated circuit chip device and Related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant