CN109960673A - Integrated circuit chip device and Related product - Google Patents
Integrated circuit chip device and Related product Download PDFInfo
- Publication number
- CN109960673A CN109960673A CN201711343642.1A CN201711343642A CN109960673A CN 109960673 A CN109960673 A CN 109960673A CN 201711343642 A CN201711343642 A CN 201711343642A CN 109960673 A CN109960673 A CN 109960673A
- Authority
- CN
- China
- Prior art keywords
- circuit
- data
- based process
- data block
- main process
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/382—Information transfer, e.g. on bus using universal interface adapter
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
It includes: main process task circuit and multiple based process circuits that present disclosure, which provides a kind of integrated circuit chip device and Related product, the integrated circuit chip device,;At least one circuit includes: data type computing circuit in the main process task circuit or multiple based process circuits;The data type computing circuit, for executing the conversion between floating point type data and fixed point type data;The multiple based process circuit is in array distribution;Each based process circuit and other adjacent based process circuit connections, m based process circuit of n based process circuit of the 1st row of main process task circuit connection, n based process circuit of m row and the 1st column.The advantage that the technical solution that present disclosure provides has calculation amount small, low in energy consumption.
Description
Technical field
Present disclosure is related to field of neural networks more particularly to a kind of integrated circuit chip device and Related product.
Background technique
Artificial neural network (Artificial Neural Network, i.e. ANN), it is artificial since being the 1980s
The research hotspot that smart field rises.It is abstracted human brain neuroid from information processing angle, and it is simple to establish certain
Model is formed different networks by different connection types.Neural network or class are also often directly referred to as in engineering and academia
Neural network.Neural network is a kind of operational model, is constituted by being coupled to each other between a large amount of node (or neuron).It is existing
Neural network operation be based on CPU (Central Processing Unit, central processing unit) or GPU (English:
Graphics Processing Unit, graphics processor) Lai Shixian neural network operation, such operation it is computationally intensive,
Power consumption is high.
Summary of the invention
Present disclosure embodiment provides a kind of integrated circuit chip device and Related product, can promote the processing of computing device
Speed improves efficiency.
In a first aspect, providing a kind of integrated circuit chip device includes: main process task circuit and multiple based process
Circuit;
The multiple based process circuit is in array distribution;Each based process circuit and other adjacent adjacent bases
Processing circuit connects, k based process circuit in the multiple based process circuit of main process task circuit connection, the k
A tandem circuit are as follows: m basis of n based process circuit of the 1st row, n based process circuit of m row and the 1st column
Processing circuit;
The multiple based process circuit includes: data type conversion circuit, for executing floating point type data and determining
Conversion between vertex type data;
The main process task circuit, for execute each continuous operation in neural network computing and and with the k
Based process circuit transmission data;
The k based process circuit, for the number between the main process task circuit and multiple based process circuits
According to forwarding;
The multiple based process circuit, for determining whether to start the data according to the type of transmission data and operation
Type conversion circuit executes the conversion of the data type of transmission data, executes in neural network in a parallel fashion according to transmission data
Operation, and by operation result by with the k based process circuit transmission give the main process task circuit.
Second aspect, provides a kind of neural network computing device, and the neural network computing device includes one or more
The integrated circuit chip device that first aspect provides.
The third aspect, provides a kind of combined treatment device, and the combined treatment device includes: the nerve that second aspect provides
Network operations device, general interconnecting interface and general processing unit;
The neural network computing device is connect by the general interconnecting interface with the general processing unit.
Fourth aspect, provides a kind of chip, the device or third of the device of the integrated chip first aspect, second aspect
The device of aspect.
5th aspect, provides a kind of electronic equipment, the electronic equipment includes the chip of fourth aspect.
6th aspect, provides a kind of operation method of neural network, and the method is applied in integrated circuit chip device,
The integrated circuit chip device includes: integrated circuit chip device described in first aspect, the integrated circuit chip device
For executing the operation of neural network.
As can be seen that providing data conversion computing circuit by present disclosure embodiment and converting the type of data block
Operation afterwards saves transfer resource and computing resource, so it is with low in energy consumption, the small advantage of calculation amount.
Detailed description of the invention
Fig. 1 a is a kind of integrated circuit chip device structural schematic diagram.
Fig. 1 b is another integrated circuit chip device structural schematic diagram.
Fig. 1 c is a kind of structural schematic diagram of based process circuit.
Fig. 1 d is a kind of structural schematic diagram of main process task circuit.
Fig. 1 e is a kind of schematic configuration diagram of fixed-point data type.
Fig. 2 a is a kind of application method schematic diagram of based process circuit.
Fig. 2 b is a kind of main process task circuit transmission schematic diagram data.
Fig. 2 c is Matrix Multiplication with the schematic diagram of vector.
Fig. 2 d is a kind of integrated circuit chip device structural schematic diagram.
Fig. 2 e is another integrated circuit chip device structural schematic diagram.
Fig. 2 f is Matrix Multiplication with the schematic diagram of matrix.
Fig. 3 a is convolution input data schematic diagram.
Fig. 3 b is convolution kernel schematic diagram.
Fig. 3 c is the operation window schematic diagram of a three-dimensional data block of input data.
Fig. 3 d is another operation window schematic diagram of a three-dimensional data block of input data.
Fig. 3 e is the another operation window schematic diagram of a three-dimensional data block of input data
Fig. 4 a is neural network forward operation schematic diagram.
Fig. 4 b is the reversed operation schematic diagram of neural network.
Fig. 4 c is that present disclosure is also disclosed that a combined treatment device structural schematic diagram.
Fig. 4 d is that present disclosure is also disclosed that a combined treatment device another kind structural schematic diagram.
Fig. 5 a is a kind of structural schematic diagram for neural network processor board that present disclosure embodiment provides;
Fig. 5 b is a kind of structural schematic diagram for neural network chip encapsulating structure that present disclosure embodiment stream provides;
Fig. 5 c is a kind of structural schematic diagram for neural network chip that present disclosure embodiment stream provides;
Fig. 6 is a kind of schematic diagram for neural network chip encapsulating structure that present disclosure embodiment stream provides;
Fig. 6 a is the schematic diagram for another neural network chip encapsulating structure that present disclosure embodiment stream provides.
Specific embodiment
In order to make those skilled in the art more fully understand present disclosure scheme, below in conjunction in present disclosure embodiment
The technical solution in present disclosure embodiment is clearly and completely described in attached drawing, it is clear that described embodiment is only
Present disclosure a part of the embodiment, instead of all the embodiments.Based on the embodiment in present disclosure, those of ordinary skill in the art
Every other embodiment obtained without creative efforts belongs to the range of present disclosure protection.
In the device that first aspect provides, the main process task circuit, for obtaining data block and operation to be calculated
Instruction is divided into distribution data block and broadcast data block to the data block to be calculated according to the operational order;To described
Distribution data block carries out deconsolidation process and obtains multiple basic data blocks, and the multiple basic data block is distributed to and the K
Based process circuit, by the broadcast data block broadcast to the k based process circuit;
The multiple based process circuit, for referring to according to the basic data block, broadcast data block and the operation that receive
Basic data block, the broadcast data block that basic data block, broadcast data block are converted into fixed-point data type are enabled, to described basic
Data block and the broadcast data block execute inner product operation with fixed-point data type and obtain with fixed-point data type operation as a result, will
Fixed-point data type operation result is converted into floating type operation result by the k based process circuit transmission to institute
State main process task circuit;
The main process task circuit, for handling to obtain the data block to be calculated to the operation result and operation refers to
The instruction results of order.
Optionally, the main process task circuit is once broadcasted to the k base specifically for passing through the broadcast data block
Plinth processing circuit.
In the device that first aspect provides, the main process task circuit, for being inner product processing in such as described operation result
Result when, obtain accumulation result after cumulative to the operation result, which arranged to obtain the number to be calculated
According to block and the instruction results of operational order.
In the device that first aspect provides, the main process task circuit is more specifically for the broadcast data block to be divided into
A part broadcast data block, by the multiple part broadcast data block by repeatedly broadcasting to the k based process circuit.
In the device that first aspect provides, the multiple based process circuit is specifically used for the part broadcasting number
It is converted into fixed-point data type according to block and the basic data block, to be determined after inner product processing of fixed-point data type execution
The inner product processing result of the fixed-point data type is added up and obtains fixed-point data type by the inner product processing result of point data type
Partial arithmetic result, the partial arithmetic result of the fixed point type is converted into floating point type operation result and passes through the k
Based process circuit is sent to the main process task circuit.
In the device that first aspect provides, the multiple based process circuit is specifically used for the broadcast of multiplexing n times part
Data block executes the part broadcast data block with fixed-point data type and the n basic data block inner product operation obtains fixed-point data
N part processing result of type obtains fixed-point data type n after n part processing result of fixed-point data type adds up respectively
A partial arithmetic result starts the data type computing circuit for n partial arithmetic result conversion of the fixed-point data type
Main process task circuit is sent to by the k based process circuit at n partial arithmetic result of floating point type data type, it is described
N is the integer more than or equal to 2.
In the device that first aspect provides, the main process task circuit includes: buffer circuit on master register or main leaf;
The multiple based process circuit includes: base register or basic on piece buffer circuit.
In the device that first aspect provides, the main process task circuit includes: vector operation device circuit, arithmetic logic unit
Circuit, accumulator circuit, matrix transposition circuit, direct memory access circuit, data type computing circuit or data rearrangement circuit
One of or any combination.
First aspect provide device in, the data are as follows: vector, matrix, three-dimensional data block, 4 D data block and
A kind of or any combination in n dimensional data block.
In the device that first aspect provides, such as operational order is multiplying order, and the main process task circuit determination multiplies
Number data block is broadcast data block, and multiplicand data block is distribution data block;
If the operational order is convolution instruction, the main process task circuit determines that input block is broadcast data block, volume
Product core is distribution data block.
In the method that the 6th aspect provides, the operation of the neural network includes: convolution algorithm, Matrix Multiplication matrix fortune
Calculation, bigoted operation, connects one of operation, GEMM operation, GEMV operation, activation operation entirely or appoints Matrix Multiplication vector operation
Meaning combination.
A refering to fig. 1, Fig. 1 a are a kind of integrated circuit chip device that present disclosure provides, the integrated circuit chip device packet
Include: main process task circuit and multiple based process circuits, the multiple based process circuit are arranged in array (m*n array), wherein
M, the value range of n is that at least one value is more than or equal to 2 in integer and m, n more than or equal to 1.For m*n array distribution
Multiple based process circuits, each based process circuit and adjacent based process circuit connection, the main process task circuit connection
K based process circuit of multiple based process circuits, the k based process circuit can be with are as follows: at n basis of the 1st row
Manage m based process circuit of circuit, n based process circuit of m row and the 1st column.Integrated circuit as shown in Figure 1a
Chip apparatus, main process task circuit and/or multiple based process circuits may include data type conversion computing circuit, specifically
Can have part basis processing circuit in multiple based process circuits includes data type conversion circuit, for example, optional at one
Technical solution in, can be by k based process circuit-configuration data type conversion circuit, such n based process circuit can
To be each responsible for the data progress data type conversion step to m based process circuit of this column.This setting can be improved fortune
Efficiency is calculated, power consumption is reduced, because for n based process circuit of the 1st row, since it is initially received main process task circuit
The data of transmission, then the data conversion received can be reduced subsequent based process circuit at the data of fixed point type
Calculation amount and the amount transmitted with the data of subsequent based process circuit similarly match m based process circuit of first row
Setting data type conversion circuit also has the advantages that calculation amount is small and low in energy consumption.In addition, according to the structure, main process task circuit can be with
Using dynamic data sending strategy, for example, m based process circuit broadcast data of the main process task circuit to the 1st column, main process task
Circuit sends distribution data to n based process circuit of the 1st row, this advantage is transmitted not by different data input ports
In same data to based process circuit, such based process circuit can not distinguish which kind of data is the data received be,
Only it needs to be determined that the data can know which kind of data it belongs to from which receiving port reception.
The main process task circuit, for execute each continuous operation in neural network computing and with it is coupled
The based process circuit transmission data;It above-mentioned continuous operation but is not limited to: accumulating operation, ALU operation, activation operation etc.
Operation.
The multiple based process circuit, for executing the fortune in neural network in a parallel fashion according to the data of transmission
It calculates, and by operation result by giving the main process task circuit with the based process circuit transmission of the main process task circuit connection.On
Stating the operation that parallel mode executes in neural network includes but is not limited to: inner product operation, matrix or vector multiplication operation etc..
Main process task circuit may include: data transmitting line, data receiver circuit or interface, which can be with
Integrated data distributes circuit and data broadcasting circuit, certainly in practical applications, data distribution circuit and data broadcasting electricity
Road can also be respectively set.For broadcast data, that is, need to be sent to the data of each based process circuit.For distributing number
According to, that is, need selectively to be sent to the data of part basis processing circuit, specifically, such as convolution algorithm, the volume of convolution algorithm
Product input data needs to be sent to all based process circuits, and all its is broadcast data, and convolution kernel needs selectively hair
Part basis data block is given, so convolution kernel is distribution data.Distribution data specifically select to be sent to that based process
The mode of circuit can specifically be determined by main process task circuit according to load and other methods of salary distribution.For broadcast transmission
Broadcast data is sent to each based process circuit with the forms of broadcasting by mode.(in practical applications, by once broadcasting
Mode broadcast data is sent to each based process circuit, broadcast data can also be sent by way of repeatedly broadcasting
To each based process circuit, present disclosure specific embodiment is not intended to limit the number of above-mentioned broadcast), for distributing sender
Distribution data are selectively sent to part basis processing circuit by formula.
Main process task circuit (as shown in Figure 1 d) may include register and/or on piece buffer circuit, and the main process task circuit is also
It may include: control circuit, vector operation device circuit, ALU (arithmetic and logic unit, arithmetic logic unit)
The circuits such as circuit, accumulator circuit, DMA (Direct Memory Access, direct memory access) circuit, certainly in reality
In, above-mentioned main process task circuit can also be added, conversion circuit (such as matrix transposition circuit), data rearrangement circuit or activation
Circuit etc. others circuit.
Each based process circuit may include base register and/or basic on piece buffer circuit;Each based process
Circuit can also include: one or any combination in inner product operation device circuit, vector operation device circuit, accumulator circuit etc..On
Stating inner product operation device circuit, vector operation device circuit, accumulator circuit can be integrated circuit, above-mentioned inner product operation device electricity
Road, vector operation device circuit, accumulator circuit may be the circuit being separately provided.
Optionally, the cumulative fortune of inner product operation can be executed for the accumulator circuit of n based process circuit of m row
It calculates, because this product knot for arranging all based process circuits can be received for m row based process circuit
Fruit, and the accumulating operation of inner product operation is executed to the accumulating operation of inner product operation by n based process circuit of m row, this
Sample can effectively distribute computing resource, have the advantages that save power consumption.This technical solution is larger particularly with m quantity
Shi Gengwei is applicable in.
The circuit of execution can be distributed by main process task circuit for data type conversion, specifically, display can be passed through
Or implicit mode distributes the circuit of execution, for display mode, main process task circuit can configure a special instruction or refer to
It enables, when based process circuit receives the special instruction or instruction, determines and execute data type conversion, such as based process circuit
When not receiving special instruction or instruction, determination does not execute data type conversion.For another example, it can be executed in a manner of hint,
For example, based process circuit receives data type for the data of floating point type and when determination needs to be implemented inner product operation, by this
Data type conversion at fixed point type data.For the mode of display configuration, special instruction or instruction can configure one and pass
Decreasing order column, the every of the descending series pass through a based process circuit, and numerical value subtracts 1, and based process circuit reads the descending series
Value, if the value be greater than zero, then execute data type conversion, if the value be equal to or less than zero, then do not execute data type conversion.
This setting is what the based process circuit distributed according to array was configured, such as m based process circuit of the i-th column is come
Say, main process task circuit needs the 5 based process circuits in front to execute data type conversion, then main process task circuit issue one it is special
Instruction, the special instruction include descending series, and the initial value of the descending series can be 5, then every to pass through a based process
Circuit, the value of descending series subtract 1, and when to the 5th based process circuit, the value of the descending series is 1, at the 6th basis
When managing circuit, which is 0, and the 6th based process circuit will not execute the data type conversion, such mode at this time
The executing subject and execution number that the dynamic configuration data type of main process task circuit can be allowed to convert.
Present disclosure one embodiment provides a kind of integrated circuit chip device, including a main process task circuit (can also claim
For master unit) and multiple based process circuits (being referred to as base unit);The structure of embodiment is as shown in Figure 1 b;Wherein,
It is the internal structure of the neural network computing device in dotted line frame;The arrow of grey filling indicates at main process task circuit and basis
The data transmission path between gate array is managed, hollow arrow indicates each based process circuit in based process gate array
Data transmission path between (adjacent foundation processing circuit).Wherein, the length and width length of based process gate array can be different,
I.e. the value of m, n can be different, naturally it is also possible to which identical, present disclosure is not intended to limit the occurrence of above-mentioned value.
The circuit structure of based process circuit is as illustrated in figure 1 c;Dotted line frame indicates the boundary of based process circuit in figure, with
The block arrow that dotted line frame is intersected indicates that (be directed toward in dotted line frame is input channel to data I/O channel, it is indicated that dotted line frame is defeated
Channel out);Rectangle frame in dotted line frame indicates storage unit circuit (register and/or on piece caching), including input data 1,
Input data 2, multiplication or inner product are as a result, cumulative data;Diamond indicates calculator circuit, including multiplication or inner product operation device,
Adder.
In the present embodiment, the neural network computing device includes a main process task circuit and 16 based process circuits
(16 based process circuits are just to for example, in practical applications, can use other numerical value);
In the present embodiment, there are two Data Input Interface, two data output interfaces for based process circuit;After this example
In continuous description, lateral input interface (being directed toward the lateral arrows of this unit in Fig. 1 b) is referred to as and inputs 0, vertical input interface
(the vertical arrow of this unit is directed toward in Fig. 1 b), which is referred to as, inputs 1;By each lateral data output interface (from this list in Fig. 1 b
The lateral arrows that member is pointed out) it is referred to as output 0, vertical data output interface (the vertical arrow pointed out in Fig. 1 b from this unit) claims
Make output 1.
The Data Input Interface and data output interface of each based process circuit can be separately connected different units,
Including main process task circuit and other based process circuits;
In this example, the input 0 of based process circuit 0,4,8,12 (number is shown in Fig. 1 b) this four based process circuits and master
The data output interface of processing circuit connects;
In this example, the input 1 of 0,1,2,3 this four based process circuits of based process circuit and the data of main process task circuit
Output interface connection;
In this example, output 1 and the main process task circuit of 12,13,14,15 this four based process circuits of based process circuit
Data Input Interface is connected;
In this example, the case where based process circuit output interface is connected with other based process circuit input interfaces, sees figure
Shown in 1b, it will not enumerate;
Specifically, the output interface S1 of S cell is connected with the input interface P1 of P unit, indicates that P unit can be from it
P1 interface is sent to the data of its S1 interface to S cell.
The present embodiment includes a main process task circuit, and main process task circuit is connected (i.e. by input interface with external device (ED)
Have output interface), the Data Input Interface of a part of data output interface of main process task circuit and a part of based process circuit
It is connected;A part of Data Input Interface of main process task circuit is connected with the data output interface of a part of based process circuit.
The application method of integrated circuit chip device
Involved data can be the data of arbitrary data types in the application method that present disclosure provides, for example, can
To be data that data that the floating number of any bit wide indicates are also possible to that the fixed-point number of any bit wide indicates.
A kind of structural schematic diagram of the fixed point type data is a kind of fixed point type number as shown in fig. le as shown in fig. le
According to expression, for computing system, the storage bit number of 1 floating data is 32bit, and for fixed-point data, especially
The expression of data is carried out using the data of floating point type as shown in fig. le, the storage bit number of 1 fixed-point data can be accomplished
16Bit hereinafter, so for this conversion for, the transport overhead that can be significantly reduced between calculator, in addition, for calculate
For device, the space of the data storage of less bit is also smaller, i.e., storage overhead can be smaller, and calculation amount can also be reduced, that is, counts
Calculating expense can be reduced, so the expense of computing cost and storage can be reduced, but be also to need for the conversion of data type
There are the expense of part, hereinafter referred to as transition overhead, for computationally intensive, the big data of data storage capacity, transition overhead phase
It almost can be ignored for subsequent computing cost, storage overhead and transport overhead, so for computationally intensive,
The big data of data storage capacity, present disclosure are used data type conversion into the technical solution of the data of fixed point type, conversely,
It is small for calculation amount, the small data of data storage capacity, at this time since computing cost itself, storage overhead and transport overhead just compare
It is smaller, at this time if using fixed-point data, since the precision of fixed-point data can be slightly below floating data, before calculation amount is lesser
It puts, needs to guarantee the precision calculated, so here by the data conversion of fixed point type at floating data, i.e., it is smaller by increasing
Expense improve the purpose of the precision of calculating.
The operation completed in based process circuit is needed, following methods progress can be used:
Based process circuit computing is first converted and then be transmitted further to main process task circuit to the type of data (for example, main
Floating number can be converted into the lower fixed-point number of bit wide and be transmitted further to based process circuit by processing circuit, its advantage is that can subtract
The bit wide for transmitting data less, reduces the total bit number amount of transmission, and based process circuit executes the efficiency of ground bit wide fixed-point calculation
Higher, power consumption is lower)
After based process circuit can receive data then advanced row data type conversion is calculated again (for example, basis
Processing circuit receives the floating number that main process task circuit transmission comes, and is then converted into fixed-point number and carries out operation, improves operation efficiency,
Reduce power consumption).
Then based process circuit counting can be transmitted further to main process task electricity after going out result with advanced row data type conversion
Road is (for example, the floating point arithmetic result that based process circuit counting goes out can first be converted to the fixed-point number of low-bit width and then pass again
It is defeated by main process task circuit, benefit is reduction of the data bit width of transmission process, and it is more efficient, and saved power consumption).
The application method (such as Fig. 2 a) of based process circuit;
Main process task circuit receives input data to be calculated outside device;
Optionally, main process task circuit utilizes the various computing circuits of this unit, vector operation circuit, inner product operation device electricity
Road, accumulator circuit etc. carry out calculation process to data;
Main process task circuit is by data output interface to based process gate array (the set of all based process circuits
Referred to as based process gate array) send data (as shown in Figure 2 b);
The mode of transmission data herein can be to a part of based process circuit and directly transmit data, i.e. repeatedly broadcast
Mode;
The mode for sending data herein can send different data, i.e. distributor to different based process circuits respectively
Formula;
Based process gate array calculates data;
Based process circuit carries out operation after receiving input data;
Optionally, based process circuit transmits out the data from the data output interface of this unit after receiving data
It goes;(it is transferred to other based process circuits for not receiving data from main process task circuit directly.)
Optionally, based process circuit transfers out operation result from data output interface;(results of intermediate calculations or
Final calculation result)
Main process task circuit receives the output data returned from based process gate array;
Optionally, it is (such as tired to continue processing to the data received from based process gate array for main process task circuit
Add or activate operation);
Main process task processing of circuit finishes, and processing result is transferred to outside device from data output interface.
Matrix Multiplication vector operation is completed using the circuit device;
(Matrix Multiplication vector can be every a line in matrix and carry out inner product operation with vector respectively, and these results are pressed
The sequence of corresponding row puts into a vector.)
Be described below calculate size be M row L column matrix S and length be L vector P multiplication operation, such as Fig. 2 c institute
Show.
The method uses all or part based process circuit of the neural computing device, it is assumed that uses
K based process circuit;
The data in some or all of matrix S row are sent each of k based process circuit by main process task circuit
Based process circuit;
In a kind of optional scheme, the data of certain row in matrix S are sent one by the control circuit of main process task circuit every time
Number or a part of number give some based process circuit;(for example, for sending a number every time, it can be for for some
Based process circuit, the 1st transmission the 1st number of the 3rd row, the 2nd the 2nd number sent in the 3rd row data, the 3rd transmission the 3rd
The 3rd capable number ..., or for sending a part of number every time, the 1st the 3rd row the first two number of transmission (the i.e. the 1st, 2 number),
Second of the 3rd row the 3rd of transmission and the 4th number, third time send the 3rd row the 5th and the 6th number ...;)
In a kind of optional scheme, the control circuit of main process task circuit respectively sends out the data of rows a few in matrix S every time
Send an a part of number of number person to some based process circuit;(for example, for some based process circuit, send for the 1st time
1st number of the every row of 3,4,5 rows, the 2nd number of the 2nd the 3rd, 4, the 5 every row of row of transmission, the 3rd the 3rd, 4, the 5 every row of row of transmission
3rd number ... or the 1st transmission every row the first two number of the 3rd, 4,5 row, second sends the 3rd, 4, the 5 every row of row the 3rd and the
4 numbers, third time send the every row the 5th of the 3rd, 4,5 row and the 6th number ....)
The data in vector P are gradually sent the 0th based process circuit by the control circuit of main process task circuit;
After 0th based process circuit receives the data of vector P, send the data to coupled next
A based process circuit, i.e. based process circuit 1;
Specifically, some based process circuits cannot directly obtain all numbers needed for calculating from main process task circuit
According to for example, the based process circuit 1 in Fig. 2 d, only one Data Input Interface are connected with main process task circuit, so can only
The data of matrix S are directly obtained from main process task circuit, and the data of vector P just need to export by based process circuit 0 to base
Plinth processing circuit 1, similarly, based process circuit 1 will also continue to export the data of vector P at basis after also receiving data
Manage circuit 2.
Each based process circuit carries out operation to the data received, which includes but is not limited to: inner product operation,
Multiplying, add operation etc.;
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will
As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will
As a result it is added on register and/or on piece caching;
After based process circuit counting goes out result, result is transferred out from data output interface and (is transferred to and connects with it
Other based process circuits connect);
In a kind of optinal plan, which can be the final result or intermediate result of inner product operation;
After based process circuit receives the calculated result from other based process circuits, transmit the data to
Its other based process circuit or main process task circuit for being connected;
Main process task circuit receive each based process circuit inner product operation as a result, the result treatment is most terminated
Fruit (processing can be accumulating operation or activation operation etc.).
The embodiment of Matrix Multiplication vector approach is realized using above-mentioned computing device:
In a kind of optinal plan, multiple based process circuits are according to as shown in Fig. 2 d or Fig. 2 e used in method
Mode arranges;
As shown in Figure 2 c, matrix S and matrix P are converted into fixed point type by the data conversion computing circuit of main process task circuit
Data;The M row data of matrix S are divided into K group by the control circuit of Main Processor Unit, are responsible for respectively by i-th of based process circuit
The operation of i-th group (set of row is denoted as Ai in this group of data);
The method that M row data are grouped herein be arbitrarily will not duplicate allocation packet mode;
In a kind of optinal plan, use the following method of salary distribution: it is a that jth row is given jth %K (% for take the remainder operation)
Based process circuit;
In a kind of optinal plan, for being unable to average packet the case where can also be right first to a part of row mean allocation
It is distributed in any way in remaining row.
Data in row part or all of in matrix S are successively sent to corresponding by the control circuit of main process task circuit every time
Based process circuit;
In a kind of optinal plan, it is negative that the control circuit of main process task circuit sends it to i-th of based process circuit every time
One or more data in data line in i-th group of data Mi of duty;
In a kind of optinal plan, it is negative that the control circuit of main process task circuit sends it to i-th of based process circuit every time
One or more data of every row some or all of in i-th group of data Mi of duty in row;
The control circuit of main process task circuit successively sends the data in vector P to the 1st based process circuit;
In a kind of optinal plan, the control circuit of main process task circuit can send one or more in vector P every time
A data;
The data that i-th of based process circuit receives vector P are sent to coupled i+1 based process later
Circuit;
Each based process circuit receive from certain a line in matrix S or one or more data in a few rows with
And after one or more data from vector P, carry out operation (including but not limited to multiplication or addition);
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will
As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will
As a result it is added on register and/or on piece caching;
In a kind of optinal plan, the data that based process circuit receives are also possible to intermediate result, are stored in deposit
On device and/or on piece caching;
Based process circuit by local calculated result be transferred to coupled next based process circuit or
Main process task circuit;
In a kind of optinal plan, corresponding to the structure of Fig. 2 d, only the last one based process circuit of each column is defeated
Outgoing interface is connected with main process task circuit, and in this case, only the last one based process circuit can directly will be local
Calculated result be transferred to main process task circuit, the calculated result of other based process circuits will pass to the next base of oneself
Plinth processing circuit, next based process circuit pass to down next based process circuit until being all transferred to the last one base
Plinth processing circuit, the last one based process circuit is by other based process of local calculated result and this column received
The result of circuit executes accumulation calculating and obtains intermediate result, and intermediate result is sent to main process task circuit;It certainly can also be for most
The processing result of the result of other tandem circuits of this column and local can be transmitted directly to by the latter based process circuit
Main process task circuit.
In a kind of optinal plan, corresponding to the structure of Fig. 2 e, each based process circuit has and main process task circuit
Local calculated result is directly transferred to master by the output interface being connected, in this case, each based process circuit
Processing circuit;
After based process circuit receives the calculated result that other based process circuits pass over, it is transferred to and its phase
The next based process circuit or main process task circuit of connection.
Main process task circuit receive M inner product operation as a result, operation result as Matrix Multiplication vector.
Matrix Multiplication matrix operation is completed using the circuit device;
Be described below calculate size be M row L column matrix S and size be L row N column matrix P multiplication operation, (square
Every a line in battle array S is identical as each column length of matrix P, as shown in figure 2f)
This method is illustrated using described device embodiment as shown in Figure 1 b;
Matrix S and matrix P are converted into the data of fixed point type by the data conversion computing circuit of main process task circuit;
The control circuit of main process task circuit sends the data in some or all of matrix S row to defeated by lateral data
Incoming interface those of be directly connected with main process task circuit based process circuit (for example, in Fig. 1 b the grey filling of the top it is perpendicular
To data path);
In a kind of optinal plan, the data of certain row in matrix S are sent one by the control circuit of main process task circuit every time
Several or a part of number gives some based process circuit;(for example, for some based process circuit, the 1st the 3rd row of transmission
1st number, the 2nd time send the 3rd row data in the 2nd number, the 3rd time transmission the 3rd row the 3rd number ... or the 1st time
The 3rd row the first two number is sent, second of the 3rd row the 3rd of transmission and the 4th number, third time send the 3rd row the 5th and the 6th
Number ...;)
In a kind of optinal plan, the control circuit of main process task circuit respectively sends the data of rows a few in matrix S every time
One a part of number of number person gives some based process circuit;(for example, for some based process circuit, the 1st transmission the 3rd,
1st number of the every row of 4,5 rows, the 2nd number of the 2nd the 3rd, 4, the 5 every row of row of transmission, the 3rd the 3rd, 4, the 5 every row of row of transmission
3rd number ... or the 1st transmission every row the first two number of the 3rd, 4,5 row, second sends the 3rd, 4, the 5 every row of row the 3rd and the
4 numbers, third time send the every row the 5th of the 3rd, 4,5 row and the 6th number ...;)
The control circuit of main process task circuit sends the data some or all of in matrix P in column to by vertical data
Input interface those of is directly connected based process circuit (for example, based process gate array is left in Fig. 1 b with main process task circuit
The lateral data path of the grey filling of side);
In a kind of optinal plan, the data that certain in matrix P arranges are sent one by the control circuit of main process task circuit every time
Several or a part of number gives some based process circuit;(for example, for some based process circuit, the 3rd column of the 1st transmission
1st number, the 2nd time send the 3rd column data in the 2nd number, the 3rd time send the 3rd column the 3rd number ... or the 1st time
The 3rd column the first two number is sent, second of transmission the 3rd arranges the 3rd and the 4th number, and third time sends the 3rd and arranges the 5th and the 6th
Number ...;)
In a kind of optinal plan, the control circuit of main process task circuit respectively sends the data of column a few in matrix P every time
One a part of number of number person gives some based process circuit;(for example, for some based process circuit, the 1st transmission the 3rd,
1st number of 4,5 column each columns, the 2nd number of the 2nd the 3rd, 4,5 column each column of transmission, the 3rd the 3rd, 4,5 column each column of transmission
3rd number ... or the 1st the 3rd, 4,5 column each column the first two number of transmission, second sends the 3rd, 4,5 column each column the 3rd and the
4 numbers, third time send the 3rd, 4,5 column each column the 5th and the 6th number ...;)
After based process circuit receives the data of matrix S, which is passed by its lateral data output interface
Its next based process circuit that is connected is defeated by (for example, the cross of the white filling in Fig. 1 b among based process gate array
To data path);After based process circuit receives the data of matrix P, which is connect by its vertical data output
Port transmission is to coupled next based process circuit (for example, the white in Fig. 1 b among based process gate array
The vertical data path of filling);
Each based process circuit carries out operation to the data received;
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will
As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will
As a result it is added on register and/or on piece caching;
After based process circuit counting goes out result, result can be transferred out from data output interface;
In a kind of optinal plan, which can be the final result or intermediate result of inner product operation;
Specifically, from the interface if the based process circuit has the output interface being directly connected with main process task circuit
Transmission is as a result, if it is not, towards that directly can export result to the direction of the based process circuit of main process task circuit output
(for example, bottom line based process circuit outputs it result and is directly output to main process task circuit in Fig. 1 b, other bases
Processing circuit transmits downwards operation result from vertical output interface).
After based process circuit receives the calculated result from other based process circuits, transmit the data to
Its other based process circuit or main process task circuit for being connected;
Towards that directly can export result to the direction of main process task circuit output, (for example, in Fig. 1 b, bottom line is basic
Processing circuit outputs it result and is directly output to main process task circuit, other based process circuits are downward from vertical output interface
Transmit operation result);
Main process task circuit receive each based process circuit inner product operation as a result, output result can be obtained.
The embodiment of " Matrix Multiplication matrix " method:
Method uses the based process gate array arranged according to mode as shown in Figure 1 b, it is assumed that has h row, w column;
Matrix S and matrix P are converted into the data of fixed point type by the data conversion computing circuit of main process task circuit;
The h row data of matrix S are divided into h group by the control circuit of main process task circuit, are born respectively by i-th of based process circuit
Blame the operation of i-th group (set of row is denoted as Hi in this group of data);
The method that h row data are grouped herein be arbitrarily will not duplicate allocation packet mode;
In a kind of optinal plan, using the following method of salary distribution: jth row is given by the control circuit of main process task circuit
J%h based process circuit;
In a kind of optinal plan, for being unable to average packet the case where can also be right first to a part of row mean allocation
It is distributed in any way in remaining row.
The W column data of matrix P is divided into w group by the control circuit of main process task circuit, is born respectively by i-th of based process circuit
Blame the operation of i-th group (set of row is denoted as Wi in this group of data);
The method that W column data is grouped herein be arbitrarily will not duplicate allocation packet mode;
In a kind of optinal plan, using the following method of salary distribution: jth row is given by the control circuit of main process task circuit
J%w based process circuit;
In a kind of optinal plan, first a part of column average can also be distributed the case where for being unable to average packet, it is right
It is distributed in any way in remaining column.
The data in some or all of matrix S row are sent based process circuit battle array by the control circuit of main process task circuit
First based process circuit of every row in column;
In a kind of optinal plan, the control circuit of main process task circuit the i-th row into based process gate array every time
First based process circuit sends one or more data in the data line in its i-th group of responsible data Hi;
In a kind of optinal plan, the control circuit of main process task circuit the i-th row into based process gate array every time
First based process circuit some or all of sends in its i-th group of responsible data Hi the one or more of every row in row
Data;
The data in some or all of matrix P column are sent based process circuit battle array by the control circuit of main process task circuit
First based process circuit of each column in column;
In a kind of optinal plan, the control circuit of main process task circuit every time into based process gate array i-th column
First based process circuit sends one or more data in the column data in its i-th group of responsible data Wi;
In a kind of optinal plan, the control circuit of main process task circuit every time into based process gate array i-th column
First based process circuit some or all of sends in its i-th group of responsible data Ni the one or more of each column in column
Data;
After based process circuit receives the data of matrix S, which is passed by its lateral data output interface
Its next based process circuit that is connected is defeated by (for example, the cross of the white filling in Fig. 1 b among based process gate array
To data path);After based process circuit receives the data of matrix P, which is connect by its vertical data output
Port transmission is to coupled next based process circuit (for example, the white in Fig. 1 b among based process gate array
The vertical data path of filling);
Each based process circuit carries out operation to the data received;
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will
As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will
As a result it is added on register and/or on piece caching;
After based process circuit counting goes out result, result can be transferred out from data output interface;
In a kind of optinal plan, which can be the final result or intermediate result of inner product operation;
Specifically, from the interface if the based process circuit has the output interface being directly connected with main process task circuit
Transmission is as a result, if it is not, towards that directly can export result to the direction of the based process circuit of main process task circuit output
(for example, bottom line based process circuit, which outputs it result, is directly output to main process task circuit, other based process circuits
Operation result is transmitted downwards from vertical output interface).
After based process circuit receives the calculated result from other based process circuits, transmit the data to
Its other based process circuit or main process task circuit for being connected;
Towards can be directly to the direction of main process task circuit output output result (for example, bottom line based process electricity
Road outputs it result and is directly output to main process task circuit, other based process circuits transmit downwards fortune from vertical output interface
Calculate result);
Main process task circuit receive each based process circuit inner product operation as a result, output result can be obtained.
" transverse direction " used in above description, the words such as " vertical " are intended merely to example shown in statement Fig. 1 b, actually make
Two different interfaces are represented with " transverse direction " " vertical " interface for only needing to distinguish each unit.
Full connection operation is completed using the circuit device:
If the input data of full articulamentum is a vector (i.e. the case where input of neural network is single sample),
Using the weight matrix of full articulamentum as matrix S, input vector as vector P, according to described device using Matrix Multiplication with to
Amount method executes operation;
If the input data of full articulamentum is a matrix (i.e. the case where input of neural network is multiple samples),
Using the weight matrix of full articulamentum as matrix S, input vector as matrix P, or using the weight matrix of full articulamentum as
Matrix P, input vector execute operation as matrix S, according to the Matrix Multiplication of described device with matrix;
Convolution algorithm is completed using the circuit device:
Convolution algorithm is described below, a square indicates that a data, input data indicate (N with Fig. 3 a in figure below
A sample, each sample have C channel, a height of H, width W of the characteristic pattern in each channel), weight namely convolution kernel Fig. 3 b
It indicates (having M convolution kernel, each convolution kernel has C channel, and height and width are respectively KH and KW).For N number of sample of input data
This, the rule of convolution algorithm is the same, and explained later carries out the process of convolution algorithm on a sample, in a sample
On, each of M convolution kernel will carry out same operation, and each convolution kernel operation obtains a sheet of planar characteristic pattern, and M is a
M plane characteristic figure is finally calculated in convolution kernel, (to a sample, the output of convolution is M characteristic pattern), and one is rolled up
Product core will carry out inner product operation in each plan-position of a sample, be slided then along the direction H and W, for example,
Fig. 3 c indicates that the position in convolution kernel lower right corner in a sample of input data carries out the corresponding diagram of inner product operation;Fig. 3 d
Indicate one lattice of position upward sliding that a lattice are slided in the position of convolution to the left and Fig. 3 e indicates convolution.
This method is illustrated using described device embodiment as shown in Figure 1 b;
The data conversion computing circuit of main process task circuit can be by the data conversion in some or all of weight convolution kernel
At the data of fixed point type, the control circuit of main process task circuit sends the data in some or all of weight convolution kernel to logical
It crosses lateral Data Input Interface and those of is directly connected based process circuit with main process task circuit (for example, the top in Fig. 1 b
The vertical data path of grey filling);
In a kind of optinal plan, the control circuit of main process task circuit sends the data of some convolution kernel in weight every time
One number or a part of number give some based process circuit;(for example, for some based process circuit, send for the 1st time
The 1st number of 3 rows, the 2nd the 2nd number sent in the 3rd row data, the 3rd number ... or the 1st of the 3rd the 3rd row of transmission
The 3rd row the first two number of secondary transmission, second of the 3rd row the 3rd of transmission and the 4th number, third time send the 3rd row the 5th and the 6th
Number ...;)
Another situation is that, the control circuit of main process task circuit is by the several convolution kernels of certain in weight in a kind of optinal plan
Data every time respectively send an a part of number of number person give some based process circuit;(for example, for some based process electricity
Road, the 1st number of the 1st the 3rd, 4, the 5 every row of row of transmission, the 2nd number of the 2nd the 3rd, 4, the 5 every row of row of transmission, the 3rd transmission
3rd number ... of the 3rd, 4, the 5 every row of row or the 1st transmission every row the first two number of the 3rd, 4,5 row, second of transmission the 3rd,
The every row the 3rd of 4,5 rows and the 4th number, third time send the every row the 5th of the 3rd, 4,5 row and the 6th number ...;)
The control circuit of main process task circuit divides input data according to the position of convolution, the control of main process task circuit
Circuit by the data some or all of in input data in convolution position be sent to by vertical Data Input Interface directly with
Main process task circuit be connected those of based process circuit (for example, what the grey in Fig. 1 b on the left of based process gate array was filled
Lateral data path);
In a kind of optinal plan, the control circuit of main process task circuit is every by the data of some convolution position in input data
One number of secondary transmission or a part of number give some based process circuit;(for example, for some based process circuit, the 1st time
It sending the 3rd and arranges the 1st number, the 2nd the 2nd number sent in the 3rd column data sends the 3rd number ... of the 3rd column for the 3rd time,
Or the 1st the 3rd column the first two number of transmission, second, which sends the 3rd, arranges the 3rd and the 4th number, and third time sends the 3rd and arranges the 5th and the
6 numbers ...;)
Another situation is that, the control circuit of main process task circuit is by the several volumes of certain in input data in a kind of optinal plan
The data of product position respectively send a number every time or a part of number gives some based process circuit;(for example, for some base
Plinth processing circuit, the 1st number of the 1st the 3rd, 4,5 column each column of transmission, the 2nd number of the 2nd the 3rd, 4,5 column each column of transmission,
The 3rd number ... or the 1st the 3rd, 4,5 column each column the first two number of transmission of 3rd the 3rd, 4,5 column each column of transmission, second
The 3rd, 4,5 column each column the 3rd and the 4th number are sent, third time sends the 3rd, 4,5 column each column the 5th and the 6th number ...;)
After based process circuit receives the data of weight, which is transmitted by its lateral data output interface
It is connected next based process circuit to it (for example, the transverse direction of the white filling in Fig. 1 b among based process gate array
Data path);After based process circuit receives the data of input data, which is connect by its vertical data output
Port transmission is to coupled next based process circuit (for example, the white in Fig. 1 b among based process gate array
The vertical data path of filling);
Each based process circuit carries out operation to the data received;
In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will
As a result it is added on register and/or on piece caching;
In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will
As a result it is added on register and/or on piece caching;
After based process circuit counting goes out result, result can be transferred out from data output interface;
In a kind of optinal plan, which can be the final result or intermediate result of inner product operation;
Specifically, from the interface if the based process circuit has the output interface being directly connected with main process task circuit
Transmission is as a result, if it is not, towards that directly can export result to the direction of the based process circuit of main process task circuit output
(for example, bottom line based process circuit outputs it result and is directly output to main process task circuit in Fig. 1 b, other bases
Processing circuit transmits downwards operation result from vertical output interface).
After based process circuit receives the calculated result from other based process circuits, transmit the data to
Its other based process circuit or main process task circuit for being connected;
Towards can be directly to the direction of main process task circuit output output result (for example, bottom line based process electricity
Road outputs it result and is directly output to main process task circuit, other based process circuits transmit downwards fortune from vertical output interface
Calculate result);
Main process task circuit receive each based process circuit inner product operation as a result, output result can be obtained.
The method for completing to add bias operation using the circuit device;
The function that two vectors or two matrixes are added may be implemented using the vector operation device circuit of main process task circuit;
The every a line for a vector being added to a matrix may be implemented using the vector operation device circuit of main process task circuit
On or each column on function.
In a kind of optinal plan, the matrix can come from the result that described device executes Matrix Multiplication matrix operation;
In a kind of optinal plan, the vector can come from the result that described device executes Matrix Multiplication vector operation;
In a kind of optinal plan, the matrix can come from the number that the main process task circuit of described device receives from outside
According to.
In a kind of optinal plan, the vector can come from the number that the main process task circuit of described device receives from outside
According to.
Including but not limited to the above data source.
The method for completing activation primitive operation using the circuit device:
Using the active circuit of main process task circuit, a vector is inputted, the activation vector of the vector is calculated;
In a kind of optinal plan, the active circuit of main process task circuit swashs each of input vector value by one
Function (input of activation primitive is a numerical value, and output is also a numerical value) living, calculate a numerical value be output to output to
The corresponding position of amount;
In a kind of optinal plan, activation primitive may is that y=max (m, x), and wherein x is input numerical value, and y is output number
Value, m is a constant;
In a kind of optinal plan, activation primitive may is that y=tanh (x), and wherein x is input numerical value, and y is output number
Value;
In a kind of optinal plan, activation primitive may is that y=sigmoid (x), and wherein x is input numerical value, and y is output
Numerical value;
In a kind of optinal plan, activation primitive can be a piecewise linear function;
In a kind of optinal plan, activation primitive can be one number of any input, export a several function.
In a kind of optinal plan, the source of input vector has (including but not limited to):
The external data sources of described device;
In a kind of optinal plan, input data carries out the operation result of Matrix Multiplication vector from described device;
In a kind of optinal plan, input data carries out the operation result of Matrix Multiplication matrix from described device;
The main process task circuit counting result of described device;
In a kind of optinal plan, input data from described device main process task circuit realize biasing set after calculating knot
Fruit.
The method for realizing BLAS (Basic Linear Algebra Subprograms) using described device;
GEMM calculating refers to: the operation of the matrix-matrix multiplication in the library BLAS.The usual representation of the operation are as follows: C=
Alpha*op (S) * op (P)+beta*C, wherein A and B is two matrixes of input, and C is output matrix, and alpha and beta are
Scalar, op represents certain operation to matrix S or P, in addition, also having the integer of some auxiliary as a parameter to illustrating matrix
The width of A and B is high;
The step of GEMM is calculated is realized using described device are as follows:
Input matrix S and matrix P can be carried out the conversion of data type by main process task circuit before carrying out OP operation;
The conversion circuit of main process task circuit carries out respective op operation to input matrix S and matrix P;
In a kind of optinal plan, op can operate for the transposition of matrix;Utilize the vector operation function of main process task circuit
Or data rearrangement column function (being previously noted the circuit that main process task circuit has data rearrangement column), realize matrix transposition behaviour
Make, certainly in practical applications, above-mentioned OP can also be realized directly by conversion circuit, such as when the operation of matrix transposition, directly
Matrix transposition circuit was connected to realize that OP is operated;
In a kind of optinal plan, the op of some matrix can be sky, OP operation without;
The matrix multiplication between op (S) and op (P) is completed using the calculation method of Matrix Multiplication matrix to calculate;
Using main process task circuit logical circuit of arithmetic to each of result of op (S) * op (P) value carry out multiplied by
The operation of alpha;
In a kind of optinal plan alpha be 1 in the case where multiplied by alpha operation without;
The operation of beta*C is realized using the logical circuit of arithmetic of main process task circuit;
In the case that beta is 1 in a kind of optinal plan, without being operated multiplied by beta;
Using the logical circuit of arithmetic of main process task circuit, it is right between matrix alpha*op (S) * op (P) and beta*C to realize
The step of answering position to be added;
In the case that beta is 0 in a kind of optinal plan, without phase add operation;
GEMV calculating refers to: the operation of the Matrix-Vector multiplication in the library BLAS.The usual representation of the operation are as follows: C=
Alpha*op (S) * P+beta*C, wherein S is input matrix, and P is the vector of input, and C is output vector, and alpha and beta are
Scalar, op represent certain operation to matrix S;
The step of GEMV is calculated is realized using described device are as follows:
Input matrix S and matrix P can be carried out the conversion of data type by main process task circuit before carrying out OP operation;
The conversion circuit of main process task circuit carries out corresponding op operation to input matrix S;
In a kind of optinal plan, op can operate for the transposition of matrix;Utilize the matrix transposition circuit of main process task circuit
Realize the operation of matrix transposition;
In a kind of optinal plan, the op of some matrix can be sky, op operation without;
The Matrix-Vector multiplication between matrix op (S) and vector P is completed with the calculation method of Matrix Multiplication vector to calculate;
Each of result of op (S) * P value is carried out multiplied by alpha using the logical circuit of arithmetic of main process task circuit
Operation;
In a kind of optinal plan alpha be 1 in the case where multiplied by alpha operation without;
Using the logical circuit of arithmetic of main process task circuit, the operation of beta*C is realized;
In the case that beta is 1 in a kind of optinal plan, without being operated multiplied by beta;
Using the logical circuit of arithmetic of main process task circuit, realizes and correspond to position between matrix alpha*op (S) * P and beta*C
The step of setting addition;
In the case that beta is 0 in a kind of optinal plan, without phase add operation;
Realize data type conversion
It is realized using the data type conversion computing circuit of main process task circuit by the conversion of data type;
In a kind of optinal plan, the form of data type conversion includes but is not limited to: floating number turns fixed-point number and fixed point
It is several to turn floating number etc.;
Update the method for weight:
The right value update function in neural network training process, tool are realized using the vector operation device circuit of main process task circuit
Body, right value update refers to the gradient of using weights to update the method for weight.
In a kind of optinal plan, using main process task circuit vector operation device circuit to weight and weight gradient the two
Vector carries out plus and minus calculation and obtains operation result, which is to update weight.
In a kind of optinal plan, using main process task circuit vector operation device circuit weight and weight gradient multiplied by
Or intermediate weight and intermediate weight gradient value are obtained divided by a number, vector operation device circuit is to intermediate weight and intermediate weight ladder
Angle value carries out plus and minus calculation and obtains operation result, which is to update weight.
In a kind of optinal plan, can the gradients of first using weights calculate one group of momentum, then reuse momentum with
Weight carries out plus-minus and updated weight is calculated.
The method for realizing the reversed operation of full articulamentum
The reversed operation of full articulamentum is segmented into two parts, and in Fig. 4 a, solid arrow indicates the forward direction of full articulamentum
Calculating process, dotted portion indicate the retrospectively calculate process of full articulamentum.
It can be seen that come from Fig. 4 a, the method for completing matrix multiple operation using described device that device can be used is completed
The reversed operation of full articulamentum;
Realize the reversed operation of convolutional layer;
The reversed operation of convolutional layer is segmented into two parts, and in Fig. 4 a, solid arrow indicates that the positive of convolutional layer calculates
Process indicates the retrospectively calculate process of convolutional layer as shown in Figure 4 b.
Device as shown in Figure 1a can be used using as shown in Figure 1 b in the reversed operation of convolutional layer shown in Fig. 4 a, Fig. 4 b
Device complete convolutional layer reversed operation.Execute forward operation or reversed operation it is practical be neural network multiple operations,
Multiple operation includes but is not limited to: Matrix Multiplication is with matrix, Matrix Multiplication in vector, convolution algorithm, activation operation etc. operation
One kind or any combination, the mode of above-mentioned operation can not repeat here with the description in present disclosure.
Present disclosure is also disclosed that a neural network computing device comprising one or more is in such as Fig. 1 a or such as Fig. 1 b institute
The chip shown is used to obtained from other processing units to operational data and control information, executes specified neural network computing,
Implementing result passes to peripheral equipment by I/O interface.Peripheral equipment for example camera, display, mouse, keyboard, network interface card,
Wifi interface, server.When comprising more than one mind such as Fig. 1 a or chip as shown in Figure 1 b, such as Fig. 1 a or as shown in Figure 1 b
Chip chamber can be linked by specific structure and transmit data, for example, interconnected and transmitted by PCIE bus
Data, to support the operation of more massive neural network.At this point it is possible to share same control system, can also have respectively solely
Vertical control system;Can with shared drive, can also each accelerator have respective memory.In addition, its mutual contact mode can be
Any interconnection topology.
The neural network computing device compatibility with higher can pass through PCIE interface and various types of server phases
Connection.
Present disclosure is also disclosed that a combined treatment device comprising above-mentioned neural network computing device, general interconnection
Interface and other processing units (i.e. general processing unit).Neural network computing device is interacted with other processing units, altogether
The operation specified with completion user.Such as the schematic diagram that Fig. 4 c is combined treatment device.
Other processing units, including central processor CPU, graphics processor GPU, neural network processor etc. are general/special
With one of processor or above processor type.Processor quantity included by other processing units is with no restrictions.Its
His interface of the processing unit as neural network computing device and external data and control, including data are carried, and are completed to Benshen
Unlatching, stopping through network operations device etc. control substantially;Other processing units can also cooperate with neural network computing device
It is common to complete processor active task.
General interconnecting interface, for transmitting data and control between the neural network computing device and other processing units
Instruction.The neural network computing device obtains required input data, write-in neural network computing dress from other processing units
Set the storage device of on piece;Control instruction can be obtained from other processing units, write-in neural network computing device on piece
Control caching;The data in the memory module of neural network computing device can also be read and be transferred to other processing units.
As shown in figure 4d, optionally, which further includes storage device, for being stored in this arithmetic element/arithmetic unit
Or data required for other arithmetic elements, be particularly suitable for required for operation data this neural network computing device or its
The data that can not be all saved in the storage inside of his processing unit.
The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment
The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment
The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard,
Network interface card, wifi interface.
Present disclosure embodiment provides a kind of neural network processor board, can be used for numerous general or special purpose calculating systems
In system environment or configuration.Such as: personal computer, server computer, handheld device or portable device, laptop device,
Smart home, household electrical appliances, multicomputer system, microprocessor-based system, robot, programmable consumer-elcetronics devices, net
Network personal computer (personal computer, PC), minicomputer, mainframe computer including any of the above system are set
Standby distributed computing environment etc..
A referring to figure 5., Fig. 5 a are a kind of structural representation for neural network processor board that present disclosure embodiment provides
Figure.As shown in Fig. 5 a, above-mentioned neural network processor board 10 include neural network chip encapsulating structure 11, first it is electrical and
Non-electrical attachment device 12 and first substrate (substrate) 13.
Present disclosure is not construed as limiting the specific structure of neural network chip encapsulating structure 11, optionally, as shown in Figure 5 b,
Above-mentioned neural network chip encapsulating structure 11 includes: neural network chip 111, second electrical and non-electrical attachment device 112, the
Two substrates 113.
The concrete form of neural network chip 111 involved in present disclosure is not construed as limiting, above-mentioned neural network chip 111
Including but not limited to the neural network chip for integrating neural network processor, above-mentioned chip can be by silicon materials, germanium material, amount
Sub- material or molecular material etc. are made.(such as: more harsh environment) and different application demands can will be upper according to the actual situation
Neural network chip is stated to be packaged, so that the major part of neural network chip is wrapped, and will be on neural network chip
Pin is connected to the outside of encapsulating structure by conductors such as gold threads, for carrying out circuit connection with more outer layer.
Present disclosure is not construed as limiting the specific structure of neural network chip 111, optionally, please refers to Fig. 1 a or Fig. 1 b institute
The device shown.
Present disclosure for first substrate 13 and the second substrate 113 type without limitation, can be printed circuit board
(printed circuit board, PCB) or (printed wiring board, PWB), it is also possible to be other circuit boards.It is right
The making material of PCB is also without limitation.
The second substrate 113 involved in present disclosure is electrical and non-by second for carrying above-mentioned neural network chip 111
The neural network chip that above-mentioned neural network chip 111 and the second substrate 113 are attached by electrical connection arrangement 112
Encapsulating structure 11, for protecting neural network chip 111, convenient for by neural network chip encapsulating structure 11 and first substrate 13 into
Row further encapsulation.
Electrical for above-mentioned specific second and non-electrical attachment device 112 the corresponding knot of packaged type and packaged type
Structure is not construed as limiting, and can be selected suitable packaged type with different application demands according to the actual situation and simply be improved,
Such as: flip chip ball grid array encapsulates (Flip Chip Ball Grid Array Package, FCBGAP), slim four directions
Flat type packaged (Low-profile Quad Flat Package, LQFP), the quad flat package (Quad with radiator
Flat Package with Heat sink, HQFP), without pin quad flat package (Quad Flat Non-lead
Package, QFN) or the encapsulation side small spacing quad flat formula encapsulation (Fine-pitch Ball Grid Package, FBGA) etc.
Formula.
Flip-chip (Flip Chip), suitable for the area requirements after encapsulation are high or biography to the inductance of conducting wire, signal
In the case where defeated time-sensitive.In addition to this packaged type that wire bonding (Wire Bonding) can be used, reduces cost, mentions
The flexibility of high encapsulating structure.
Ball grid array (Ball Grid Array), is capable of providing more pins, and the average conductor length of pin is short, tool
The effect of standby high-speed transfer signal, wherein encapsulation can encapsulate (Pin Grid Array, PGA), zero slotting with Pin-Grid Array
Pull out force (Zero Insertion Force, ZIF), single edge contact connection (Single Edge Contact Connection,
SECC), contact array (Land Grid Array, LGA) etc. replaces.
Optionally, using the packaged type of flip chip ball grid array (Flip Chip Ball Grid Array) to mind
It is packaged through network chip 111 and the second substrate 113, the schematic diagram of specific neural network chip encapsulating structure can refer to
Fig. 6.As shown in fig. 6, above-mentioned neural network chip encapsulating structure includes: neural network chip 21, pad 22, soldered ball 23, second
Tie point 25, pin 26 on substrate 24, the second substrate 24.
Wherein, pad 22 is connected with neural network chip 21, passes through the tie point 25 on pad 22 and the second substrate 24
Between welding form soldered ball 23, neural network chip 21 and the second substrate 24 are connected, that is, realize neural network chip 21
Encapsulation.
Pin 26 is used for the external circuit with encapsulating structure (for example, the first substrate on neural network processor board 10
13) be connected, it can be achieved that external data and internal data transmission, it is corresponding convenient for neural network chip 21 or neural network chip 21
Neural network processor data are handled.Type and quantity present disclosure for pin are also not construed as limiting, according to difference
Encapsulation technology different pin forms can be selected, and defer to certain rule and arranged.
Optionally, above-mentioned neural network chip encapsulating structure further includes insulation filler, is placed in pad 22, soldered ball 23 and connects
In gap between contact 25, interference is generated between soldered ball and soldered ball for preventing.
Wherein, the material of insulation filler can be silicon nitride, silica or silicon oxynitride;Interference comprising electromagnetic interference,
Inductive interferences etc..
Optionally, above-mentioned neural network chip encapsulating structure further includes radiator, for distributing neural network chip 21
Heat when operation.Wherein, radiator can be the good sheet metal of one piece of thermal conductivity, cooling fin or radiator, for example, wind
Fan.
For example, as shown in Figure 6 a, neural network chip encapsulating structure 11 include: neural network chip 21, pad 22,
Soldered ball 23, the second substrate 24, the tie point 25 in the second substrate 24, pin 26, insulation filler 27, thermal grease 28 and metal
Shell cooling fin 29.Wherein, thermal grease 28 and metal shell cooling fin 29 are used to distribute heat when neural network chip 21 is run
Amount.
Optionally, above-mentioned neural network chip encapsulating structure 11 further includes reinforced structure, is connect with pad 22, and interior is embedded in
In soldered ball 23, to enhance the bonding strength between soldered ball 23 and pad 22.
Wherein, reinforced structure can be metal wire structure or column structure, it is not limited here.
Present disclosure is electrical for first and the concrete form of non-electrical device of air 12 is also not construed as limiting, can refer to second it is electrical and
Neural network chip encapsulating structure 11 is packaged by the description of non-electrical device of air 112 by welding, can also be with
By the way of connecting line connection or pluggable mode connection the second substrate 113 and first substrate 13, it is convenient for the first base of subsequent replacement
Plate 13 or neural network chip encapsulating structure 11.
Optionally, first substrate 13 includes the interface etc. for the internal storage location of extension storage capacity, such as: synchronous dynamic
Random access memory (Synchronous Dynamic Random Access Memory, SDRAM), Double Data Rate synchronous dynamic with
Machine memory (Double Date Rate SDRAM, DDR) etc., the processing of neural network processor is improved by exented memory
Ability.
It may also include quick external equipment interconnection bus (Peripheral Component on first substrate 13
Interconnect-Express, PCI-E or PCIe) interface, hot-swappable (the Small Form-factor of small package
Pluggable, SFP) interface, Ethernet interface, Controller Area Network BUS (Controller Area Network, CAN) connect
Mouthful etc., for the data transmission between encapsulating structure and external circuit, the convenience of arithmetic speed and operation can be improved.
Neural network processor is encapsulated as neural network chip 111, neural network chip 111 is encapsulated as neural network
Neural network chip encapsulating structure 11 is encapsulated as neural network processor board 10, by board by chip-packaging structure 11
Interface (slot or lock pin) and external circuit (such as: computer motherboard) carry out data interaction, i.e., directly by using nerve
Network processing unit board 10 realizes the function of neural network processor, and protects neural network chip 111.And Processing with Neural Network
Other modules can be also added on device board 10, improve the application range and operation efficiency of neural network processor.
In one embodiment, the present disclosure discloses an electronic devices comprising above-mentioned neural network processor plate
Card 10 or neural network chip encapsulating structure 11.
Electronic device include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal,
Mobile phone, automobile data recorder, navigator, sensor, camera, server, camera, video camera, projector, wrist-watch, earphone, movement
Storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven,
Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument
And/or electrocardiograph.
Particular embodiments described above has carried out further in detail the purpose of present disclosure, technical scheme and beneficial effects
Describe in detail it is bright, it is all it should be understood that be not limited to present disclosure the foregoing is merely the specific embodiment of present disclosure
Within the spirit and principle of present disclosure, any modification, equivalent substitution, improvement and etc. done should be included in the guarantor of present disclosure
Within the scope of shield.
Claims (17)
1. a kind of integrated circuit chip device, which is characterized in that the integrated circuit chip device include: main process task circuit and
Multiple based process circuits;
The multiple based process circuit is in array distribution;Each based process circuit and other adjacent adjacent based process
Circuit connection, k based process circuit in the multiple based process circuit of main process task circuit connection, the k base
Plinth circuit are as follows: m based process of n based process circuit of the 1st row, n based process circuit of m row and the 1st column
Circuit;
The multiple based process circuit includes: data type conversion circuit, for executing floating point type data and fixed point class
Conversion between type data;
The main process task circuit, for execute each continuous operation in neural network computing and and with the k basis
Processing circuit transmits data;
The k based process circuit turns for the data between the main process task circuit and multiple based process circuits
Hair;
The multiple based process circuit, for determining whether to start the data type according to the type of transmission data and operation
Conversion circuit executes the conversion of the data type of transmission data, executes the fortune in neural network in a parallel fashion according to transmission data
It calculates, and by operation result by giving the main process task circuit with the k based process circuit transmission.
2. integrated circuit chip device according to claim 1, which is characterized in that
The main process task circuit, for obtaining data block and operational order to be calculated, according to the operational order to it is described to
The data block of calculating is divided into distribution data block and broadcast data block;Distribution data block progress deconsolidation process is obtained more
A basic data block, by the multiple basic data block be distributed to the K based process circuit, by the broadcast data block
Broadcast to the k based process circuit;
The multiple based process circuit, the basic data block received for foundation, broadcast data block and operational order will
Basic data block, broadcast data block are converted into the basic data block of fixed-point data type, broadcast data block, to the master data
Block and the broadcast data block execute inner product operation with fixed-point data type and obtain with fixed-point data type operation as a result, will fixed point
Data type operation result is converted into floating type operation result by the k based process circuit transmission to the master
Processing circuit;
The main process task circuit, for handling to obtain the data block to be calculated and operational order to the operation result
Instruction results.
3. integrated circuit chip device according to claim 2, which is characterized in that
The main process task circuit is once broadcasted to the k based process circuit specifically for passing through the broadcast data block.
4. integrated circuit chip device according to claim 2, which is characterized in that
The main process task circuit, for being added up to the operation result when such as described operation result is the result of inner product processing
After obtain accumulation result, which is arranged to obtain the instruction results of the data block to be calculated and operational order.
5. integrated circuit chip device according to claim 2, which is characterized in that
The main process task circuit will be the multiple specifically for the broadcast data block is divided into multiple portions broadcast data block
Part broadcast data block is by repeatedly broadcasting to the k based process circuit.
6. integrated circuit chip device according to claim 5, which is characterized in that
The multiple based process circuit, it is fixed specifically for the part broadcast data block and the basic data block to be converted into
Point data type will to obtain the inner product processing result of fixed-point data type after inner product processing of fixed-point data type execution
The inner product processing result of the fixed-point data type is cumulative to obtain the partial arithmetic result of fixed-point data type, by the fixed point class
The partial arithmetic result of type is converted into floating point type operation result and is sent to the main process task by the k based process circuit
Circuit.
7. integrated circuit chip device according to claim 6, which is characterized in that
The multiple based process circuit, being specifically used for multiplexing n times part broadcast data block should with the execution of fixed-point data type
Part broadcast data block and the n basic data block inner product operation obtain n part processing result of fixed-point data type, will pinpoint
N part processing result of data type obtains n partial arithmetic result of fixed-point data type after adding up respectively, starts the data
Described n partial arithmetic result of fixed-point data type is converted into n part of floating point type data type and transported by type operation circuit
It calculates result and main process task circuit is sent to by the k based process circuit, the n is the integer more than or equal to 2.
8. integrated circuit chip device according to claim 1-7, which is characterized in that
The main process task circuit includes: buffer circuit on master register or main leaf;
The multiple based process circuit includes: base register or basic on piece buffer circuit.
9. integrated circuit chip device according to claim 8, which is characterized in that
The main process task circuit includes: vector operation device circuit, arithmetic logic unit circuit, accumulator circuit, matrix transposition electricity
One of road, direct memory access circuit, data type computing circuit or data rearrangement circuit or any combination.
10. integrated circuit chip device according to claim 1, which is characterized in that
A kind of data are as follows: or any combination in vector, matrix, three-dimensional data block, 4 D data block and n dimensional data block.
11. integrated circuit chip device according to claim 2, which is characterized in that
If the operational order is multiplying order, the main process task circuit determines that multiplier data block is broadcast data block, multiplicand
Data block is distribution data block;
If the operational order is convolution instruction, the main process task circuit determines that input block is broadcast data block, convolution kernel
To distribute data block.
12. a kind of neural network computing device, which is characterized in that the neural network computing device includes one or more as weighed
Benefit requires integrated circuit chip device described in 1-11 any one.
13. a kind of combined treatment device, which is characterized in that the combined treatment device includes: mind as claimed in claim 12
Through network operations device, general interconnecting interface and general processing unit;
The neural network computing device is connect by the general interconnecting interface with the general processing unit.
14. a kind of chip, which is characterized in that the integrated chip such as claim 1-13 any one described device.
15. a kind of smart machine, which is characterized in that the smart machine includes chip as claimed in claim 14.
16. a kind of operation method of neural network, which is characterized in that the method is applied in integrated circuit chip device, institute
Stating integrated circuit chip device includes: the integrated circuit chip device as described in claim 1-11 any one, described integrated
Circuit chip device is used to execute the operation of neural network.
17. according to the method for claim 16, which is characterized in that the operation of the neural network includes: convolution algorithm, square
Battle array multiplies matrix operation, Matrix Multiplication vector operation, bigoted operation, connects operation entirely, GEMM operation, GEMV operation, activates in operation
One kind or any combination.
Priority Applications (15)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711343642.1A CN109960673B (en) | 2017-12-14 | 2017-12-14 | Integrated circuit chip device and related product |
TW107144037A TWI767097B (en) | 2017-12-14 | 2018-12-07 | Integrated circuit chip apparatus and related product |
PCT/CN2019/073453 WO2019114842A1 (en) | 2017-12-14 | 2019-01-28 | Integrated circuit chip apparatus |
US16/721,875 US11562216B2 (en) | 2017-12-14 | 2019-12-19 | Integrated circuit chip apparatus |
US16/721,879 US11507809B2 (en) | 2017-12-14 | 2019-12-19 | Integrated circuit chip apparatus |
US16/721,882 US11586891B2 (en) | 2017-12-14 | 2019-12-19 | Integrated circuit chip apparatus |
US16/721,883 US20200192632A1 (en) | 2017-12-14 | 2019-12-19 | Integrated circuit chip apparatus |
US16/721,888 US11704545B2 (en) | 2017-12-14 | 2019-12-19 | Integrated circuit chip apparatus |
US16/721,892 US11507810B2 (en) | 2017-12-14 | 2019-12-19 | Integrated circuit chip apparatus |
US16/721,885 US11308389B2 (en) | 2017-12-14 | 2019-12-19 | Integrated circuit chip apparatus |
US17/010,761 US11562219B2 (en) | 2017-12-14 | 2020-09-02 | Integrated circuit chip apparatus |
US17/688,853 US11900242B2 (en) | 2017-12-14 | 2022-03-07 | Integrated circuit chip apparatus |
US17/688,844 US11900241B2 (en) | 2017-12-14 | 2022-03-07 | Integrated circuit chip apparatus |
US18/085,273 US20230120704A1 (en) | 2017-12-14 | 2022-12-20 | Integrated circuit chip apparatus |
US18/085,332 US20230121164A1 (en) | 2017-12-14 | 2022-12-20 | Integrated circuit chip apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711343642.1A CN109960673B (en) | 2017-12-14 | 2017-12-14 | Integrated circuit chip device and related product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109960673A true CN109960673A (en) | 2019-07-02 |
CN109960673B CN109960673B (en) | 2020-02-18 |
Family
ID=67018616
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711343642.1A Active CN109960673B (en) | 2017-12-14 | 2017-12-14 | Integrated circuit chip device and related product |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109960673B (en) |
TW (1) | TWI767097B (en) |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2149162A (en) * | 1983-10-31 | 1985-06-05 | British Telecomm | Fixed point to floating point conversion |
EP0570950A2 (en) * | 1992-05-22 | 1993-11-24 | International Business Machines Corporation | Advanced parallel array processor (APAP) |
EP0512007A4 (en) * | 1989-12-29 | 1995-11-29 | Supercomputer Systems Ltd | Cluster architecture for a highly parallel scalar/vector multiprocessor system |
US5752068A (en) * | 1994-08-23 | 1998-05-12 | Massachusetts Institute Of Technology | Mesh parallel computer architecture apparatus and associated methods |
CN201311633Y (en) * | 2008-11-07 | 2009-09-16 | 中国北车股份有限公司大连电力牵引研发中心 | Function module realized by floating point divider based on FPGA |
CN101794210A (en) * | 2010-04-07 | 2010-08-04 | 上海交通大学 | General matrix floating point multiplier based on FPGA (Field Programmable Gate Array) |
CN102495719A (en) * | 2011-12-15 | 2012-06-13 | 中国科学院自动化研究所 | Vector floating point operation device and method |
CN102665049A (en) * | 2012-03-29 | 2012-09-12 | 中国科学院半导体研究所 | Programmable visual chip-based visual image processing system |
CN104572011A (en) * | 2014-12-22 | 2015-04-29 | 上海交通大学 | FPGA (Field Programmable Gate Array)-based general matrix fixed-point multiplier and calculation method thereof |
CN105892989A (en) * | 2016-03-28 | 2016-08-24 | 中国科学院计算技术研究所 | Neural network accelerator and operational method thereof |
CN105956660A (en) * | 2016-05-16 | 2016-09-21 | 浪潮集团有限公司 | Neural network chip realization method used for real-time image identification |
CN106502626A (en) * | 2016-11-03 | 2017-03-15 | 北京百度网讯科技有限公司 | Data processing method and device |
CN106844294A (en) * | 2016-12-29 | 2017-06-13 | 华为机器有限公司 | Convolution algorithm chip and communication equipment |
CN106940815A (en) * | 2017-02-13 | 2017-07-11 | 西安交通大学 | A kind of programmable convolutional neural networks Crypto Coprocessor IP Core |
CN107239829A (en) * | 2016-08-12 | 2017-10-10 | 北京深鉴科技有限公司 | A kind of method of optimized artificial neural network |
CN107292334A (en) * | 2017-06-08 | 2017-10-24 | 北京深瞐科技有限公司 | Image-recognizing method and device |
CN107330515A (en) * | 2016-04-29 | 2017-11-07 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing artificial neural network forward operation |
CN107451658A (en) * | 2017-07-24 | 2017-12-08 | 杭州菲数科技有限公司 | Floating-point operation fixed point method and system |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9292790B2 (en) * | 2012-11-20 | 2016-03-22 | Qualcom Incorporated | Piecewise linear neuron modeling |
ES2738319T3 (en) * | 2014-09-12 | 2020-01-21 | Microsoft Technology Licensing Llc | Computer system to train neural networks |
CN106650923B (en) * | 2015-10-08 | 2019-04-09 | 上海兆芯集成电路有限公司 | Neural network unit with neural memory and neural processing unit and sequencer |
US9870341B2 (en) * | 2016-03-18 | 2018-01-16 | Qualcomm Incorporated | Memory reduction method for fixed point matrix multiply |
CN111104164A (en) * | 2016-04-26 | 2020-05-05 | 中科寒武纪科技股份有限公司 | Apparatus and method for performing matrix multiplication operation |
-
2017
- 2017-12-14 CN CN201711343642.1A patent/CN109960673B/en active Active
-
2018
- 2018-12-07 TW TW107144037A patent/TWI767097B/en active
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2149162A (en) * | 1983-10-31 | 1985-06-05 | British Telecomm | Fixed point to floating point conversion |
EP0512007A4 (en) * | 1989-12-29 | 1995-11-29 | Supercomputer Systems Ltd | Cluster architecture for a highly parallel scalar/vector multiprocessor system |
EP0570950A2 (en) * | 1992-05-22 | 1993-11-24 | International Business Machines Corporation | Advanced parallel array processor (APAP) |
US5752068A (en) * | 1994-08-23 | 1998-05-12 | Massachusetts Institute Of Technology | Mesh parallel computer architecture apparatus and associated methods |
CN201311633Y (en) * | 2008-11-07 | 2009-09-16 | 中国北车股份有限公司大连电力牵引研发中心 | Function module realized by floating point divider based on FPGA |
CN101794210A (en) * | 2010-04-07 | 2010-08-04 | 上海交通大学 | General matrix floating point multiplier based on FPGA (Field Programmable Gate Array) |
CN102495719A (en) * | 2011-12-15 | 2012-06-13 | 中国科学院自动化研究所 | Vector floating point operation device and method |
CN102665049A (en) * | 2012-03-29 | 2012-09-12 | 中国科学院半导体研究所 | Programmable visual chip-based visual image processing system |
CN104572011A (en) * | 2014-12-22 | 2015-04-29 | 上海交通大学 | FPGA (Field Programmable Gate Array)-based general matrix fixed-point multiplier and calculation method thereof |
CN105892989A (en) * | 2016-03-28 | 2016-08-24 | 中国科学院计算技术研究所 | Neural network accelerator and operational method thereof |
CN107330515A (en) * | 2016-04-29 | 2017-11-07 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing artificial neural network forward operation |
CN105956660A (en) * | 2016-05-16 | 2016-09-21 | 浪潮集团有限公司 | Neural network chip realization method used for real-time image identification |
CN107239829A (en) * | 2016-08-12 | 2017-10-10 | 北京深鉴科技有限公司 | A kind of method of optimized artificial neural network |
CN106502626A (en) * | 2016-11-03 | 2017-03-15 | 北京百度网讯科技有限公司 | Data processing method and device |
CN106844294A (en) * | 2016-12-29 | 2017-06-13 | 华为机器有限公司 | Convolution algorithm chip and communication equipment |
CN106940815A (en) * | 2017-02-13 | 2017-07-11 | 西安交通大学 | A kind of programmable convolutional neural networks Crypto Coprocessor IP Core |
CN107292334A (en) * | 2017-06-08 | 2017-10-24 | 北京深瞐科技有限公司 | Image-recognizing method and device |
CN107451658A (en) * | 2017-07-24 | 2017-12-08 | 杭州菲数科技有限公司 | Floating-point operation fixed point method and system |
Non-Patent Citations (1)
Title |
---|
方睿等: "卷积神经网络的FPGA并行加速方案设计", 《计算机工程与应用》 * |
Also Published As
Publication number | Publication date |
---|---|
TWI767097B (en) | 2022-06-11 |
TW201928796A (en) | 2019-07-16 |
CN109960673B (en) | 2020-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109961138A (en) | Neural network training method and Related product | |
CN109961136A (en) | Integrated circuit chip device and Related product | |
CN109978131A (en) | Integrated circuit chip device and Related product | |
CN111105033B (en) | Neural network processor board card and related products | |
WO2019114842A1 (en) | Integrated circuit chip apparatus | |
CN109961135A (en) | Integrated circuit chip device and Related product | |
CN109961131A (en) | Neural network forward operation method and Related product | |
CN109977446A (en) | Integrated circuit chip device and Related product | |
CN109978151A (en) | Neural network processor board and Related product | |
CN109978152A (en) | Integrated circuit chip device and Related product | |
CN109977071A (en) | Neural network processor board and Related product | |
CN109978150A (en) | Neural network processor board and Related product | |
CN110197264A (en) | Neural network processor board and Related product | |
CN109960673A (en) | Integrated circuit chip device and Related product | |
CN109978157A (en) | Integrated circuit chip device and Related product | |
CN109961133A (en) | Integrated circuit chip device and Related product | |
CN109978156A (en) | Integrated circuit chip device and Related product | |
CN109978148B (en) | Integrated circuit chip device and related product | |
CN109978147A (en) | Integrated circuit chip device and Related product | |
CN110197267A (en) | Neural network processor board and Related product | |
CN109978154A (en) | Integrated circuit chip device and Related product | |
CN110490315A (en) | The reversed operation Sparse methods and Related product of neural network | |
WO2019165946A1 (en) | Integrated circuit chip device, board card and related product | |
CN109978130A (en) | Integrated circuit chip device and Related product | |
CN109978153A (en) | Integrated circuit chip device and Related product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences Applicant after: Zhongke Cambrian Technology Co., Ltd Address before: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |