CN110399976A - Computing device and calculation method - Google Patents
Computing device and calculation method Download PDFInfo
- Publication number
- CN110399976A CN110399976A CN201810376471.0A CN201810376471A CN110399976A CN 110399976 A CN110399976 A CN 110399976A CN 201810376471 A CN201810376471 A CN 201810376471A CN 110399976 A CN110399976 A CN 110399976A
- Authority
- CN
- China
- Prior art keywords
- data
- arithmetic element
- convolution operation
- register group
- register
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Abstract
This application provides a kind of computing device and calculation methods, can be improved data user rate.The computing device includes: multiple arithmetic elements, and for executing multiple convolution operation to multiple data, multiple arithmetic elements include the first arithmetic element and the second arithmetic element;First arithmetic element is used for: in the i-th convolution operation in multiple convolution operation, carrying out multiplying to the first data in the first weight data and multiple data, wherein i is the integer greater than 0;Receive the second data of the second arithmetic element transmission, wherein the second data are the data that the second arithmetic element executes convolution operation in i-th convolution operation;In multiple convolution operation in i+1 time convolution operation, multiplying is carried out to the first weight data and the second data.
Description
Technical field
This application involves data processing field more particularly to a kind of computing device and calculation methods.
Background technique
Convolution operation is the most common calculating operation in convolutional neural networks.Convolution operation is mainly used in convolutional layer
In.In convolution operation, the corresponding input data of each of convolution kernel weight data is multiplied, then by dot product result
It is added, obtains the output result an of convolution operation.Later according to the step size settings of convolutional layer, convolution kernel is slided, is repeated above-mentioned
Convolution operation.The usual substantial amounts of data of convolutional neural networks processing, high data throughout need high access bandwidth.Example
Such as, the arithmetic element and register file of multiple parallel data processings, register are generally included for the computing device of convolution algorithm
Heap is used to store the data of pending convolution algorithm.In each convolution operation, multiple arithmetic elements are required from register file
Data needed for reading convolution operation, since the data volume of convolution operation processing is usually very huge, thus multiple arithmetic elements exist
It needs to read a large amount of data from register file during convolution operation.
Summary of the invention
The application provides a kind of computing device and calculation method, can be improved data user rate.
In a first aspect, providing a kind of computing device, comprising: multiple arithmetic elements, it is multiple for being executed to multiple data
Convolution operation, the multiple arithmetic element include the first arithmetic element and the second arithmetic element;First arithmetic element is used for:
In the i-th convolution operation in multiple convolution operation, to first in the first weight data and the multiple data
Data carry out multiplying, wherein i is the integer greater than 0;Receive the second data that second arithmetic element is sent, wherein
Second data are the data that second arithmetic element executes convolution operation in the i-th convolution operation;Described
In i+1 time convolution operation in multiple convolution operation, multiplication is carried out to first weight data and second data
Operation.
In the embodiment of the present application, the first arithmetic element can obtain weight in convolution operation twice from the second arithmetic element
Multiple data, it is duplicate to convolved data without being read from register file, to improve the utilization rate of data, reduce
The data volume interacted between arithmetic element and register file reduces opening for the access bandwidth between arithmetic element and register file
Pin.
In one possible implementation, further includes: multiple register groups, for storing the multiple data;It is described
Multiple arithmetic elements are also used to obtain the multiple number needed for the multiple convolution operation from the multiple register group
According to, wherein each arithmetic element in the multiple arithmetic element can obtain any one in the multiple register group
The data stored in register group.
In the embodiment of the present application, any operation unit in multiple arithmetic elements is able to access that any register group, because
This above-mentioned multiple register group only need to be read in and be stored once to convolved data, without repeatedly reading and storing more from memory
Duplicate mass data in secondary convolution operation, thus the memory space of multiple register groups can be saved, and reduce multiple post
The pressure of access bandwidth between storage group and memory improves the degree of parallelism of data access.
In one possible implementation, first arithmetic element is configured as: in the i-th convolution operation
In, first data are obtained from the first register group in the multiple register group;Second arithmetic element is matched
It is set to: in the i-th convolution operation, second number is obtained from the second register group in the multiple register group
According to first register group and second register group are different register groups.
In one possible implementation, during the i-th convolution operation, the multiple arithmetic element is from institute
The data read in multiple register groups are stated to be located in the different registers group of the multiple register group.
In the embodiment of the present application, flexible data arrangement mode is used in multiple register groups, it can be at one
Data as much as possible are read in the clock period, are avoided reading data conflict, are realized clog-free convolution algorithm process.
In one possible implementation, the corresponding convolution kernel size of the convolution operation is K*K, the multiple deposit
Device group includes K*K register group, and the multiple data include M row N column data, and the multiple data are deposited at described K*K
Distribution in device group meets the following conditions: the data in xth row s column in the multiple data are stored in b-th of register
In group, the data in xth+a row s column in the multiple data are stored in [b+ (a*K)] mod (K*K) a register group
In, wherein K is integer not less than 1, and mod indicates to carry out complementation, M >=x >=1, N >=s >=1, K*K >=b >=1, M, N, x,
S, b, a are positive integer.
Second aspect provides a kind of calculation method, comprising: the computing device of the application calculation method includes multiple fortune
Unit is calculated, the multiple arithmetic element is used to execute multiple data multiple convolution operation, and the multiple arithmetic element includes the
One arithmetic element and the second arithmetic element;The described method includes: in the i-th convolution operation in multiple convolution operation,
First arithmetic element carries out multiplying to the first data in the first weight data and the multiple data, wherein i
For the integer greater than 0;First arithmetic element receives the second data that second arithmetic element is sent, wherein described the
Two data are the data that second arithmetic element executes convolution operation in the i-th convolution operation;In the multiple volume
In i+1 time convolution operation in product operation, first arithmetic element is to first weight data and second number
According to progress multiplying.
In the embodiment of the present application, the first arithmetic element can obtain weight in convolution operation twice from the second arithmetic element
Multiple data, it is duplicate to convolved data without being read from register file, to improve the utilization rate of data, reduce
The data volume interacted between arithmetic element and register file reduces opening for the access bandwidth between arithmetic element and register file
Pin.
In one possible implementation, the computing device further include: multiple register groups are described more for storing
A data;Wherein, each arithmetic element in the multiple arithmetic element can be obtained for obtaining the multiple register group
In any one register group in the data that store.
In the embodiment of the present application, any operation unit in multiple arithmetic elements is able to access that any register group, because
This above-mentioned multiple register group only need to be read in and be stored once to convolved data, without repeatedly reading and storing more from memory
Duplicate mass data in secondary convolution operation, thus the memory space of multiple register groups can be saved, and reduce multiple post
The pressure of access bandwidth between storage group and memory improves the degree of parallelism of data access.
In one possible implementation, further includes: in the i-th convolution operation, first arithmetic element
First data are obtained from the first register group in the multiple register group;In the i-th convolution operation, institute
It states the second arithmetic element and obtains second data from the second register group in the multiple register group, described first posts
Storage group and second register group are different register groups.
In one possible implementation, during the i-th convolution operation, the multiple arithmetic element is from institute
The data read in multiple register groups are stated to be located in the different registers group of the multiple register group.
In the embodiment of the present application, flexible data arrangement mode is used in multiple register groups, it can be at one
Data as much as possible are read in the clock period, are avoided reading data conflict, are realized clog-free convolution algorithm process.
In one possible implementation, the corresponding convolution kernel size of the convolution operation is K*K, the multiple deposit
Device group includes K*K register group, and the multiple data include M row N column data, and the multiple data are deposited at described K*K
Distribution in device group meets the following conditions: the data in xth row s column in the multiple data are stored in r group register
In, the data in xth+a row s column in the multiple data are stored in [r+ (a*K)] mod (K*K) group register,
In, K is the integer not less than 1, and mod indicates to carry out complementation, M >=x >=1, N >=s >=1, K*K >=r >=1, M, N, x, s, r, a
For positive integer.
The third aspect provides a kind of computing device, including multiple arithmetic elements and multiple register groups, wherein each
Arithmetic element is separately connected the multiple register group;The multiple register group, for storing pending multiple convolution operation
Multiple data;The multiple arithmetic element, for obtaining the multiple data from the multiple register group, and to described
Multiple data execute multiple convolution operation, wherein each arithmetic element in the multiple arithmetic element is described more for obtaining
The data stored in any one register group in a register group.
In the embodiment of the present application, any operation unit in multiple arithmetic elements is able to access that any register group, because
This above-mentioned multiple register group only need to be read in and be stored once to convolved data, without repeatedly reading and storing more from memory
Duplicate mass data in secondary convolution operation, thus the memory space of multiple register groups can be saved, and reduce multiple post
The pressure of access bandwidth between storage group and memory improves the degree of parallelism of data access.
In one possible implementation, the multiple convolution operation in i-th convolution operation during, it is described
The data that multiple arithmetic elements are read from the multiple register group are located at the different registers group of the multiple register group
In, wherein i is the integer greater than 0.
In the embodiment of the present application, flexible data arrangement mode is used in multiple register groups, it can be at one
Data as much as possible are read in the clock period, are avoided reading data conflict, are realized clog-free convolution algorithm process.
In one possible implementation, the corresponding convolution kernel size of the convolution operation is K*K, the multiple deposit
Device group includes K*K register group, and the multiple data include M row N column data, and the multiple data are deposited at described K*K
Distribution in device group meets the following conditions: the data in xth row s column in the multiple data are stored in b-th of register
In group, the data in xth+a row s column in the multiple data are stored in [b+ (a*K)] mod (K*K) a register group
In, wherein K is integer not less than 1, and mod indicates to carry out complementation, M >=x >=1, N >=s >=1, K*K >=b >=1, M, N, x,
S, b, a are positive integer.
In one possible implementation, the multiple arithmetic element includes the first arithmetic element and the second operation list
Member, first arithmetic element are used for: in the i-th convolution operation in multiple convolution operation, to the first weight data
And the first data in the multiple data carry out multiplying, wherein i is the integer greater than 0;Receive second operation
The second data that unit is sent, wherein second data are that second arithmetic element is held in the i-th convolution operation
The data of row convolution operation;In the i+1 time convolution operation in multiple convolution operation, to first weight data with
And second data carry out multiplying.
In the embodiment of the present application, the first arithmetic element can obtain weight in convolution operation twice from the second arithmetic element
Multiple data, it is duplicate to convolved data without being read from register file, to improve the utilization rate of data, reduce
The data volume interacted between arithmetic element and register file reduces opening for the access bandwidth between arithmetic element and register file
Pin.
Fourth aspect provides a kind of chip, and setting is any just like in first aspect or first aspect on the chip
Computing device described in possible implementation, alternatively, setting is appointed just like in the third aspect or the third aspect on the chip
Computing device described in a kind of possible implementation.
5th aspect provides a kind of computer storage medium, and the storage is used for the program code that computing device executes,
Said program code includes the instruction for executing the method for second aspect.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of computing device provided by the embodiments of the present application.
Fig. 2 is different cycles convolution kernel and the schematic diagram to convolved data progress convolution operation.
Fig. 3 is the structural schematic diagram of the computing device of the embodiment of the present application.
Fig. 4 is the structural schematic diagram of the computing device of the another embodiment of the application.
Fig. 5 is the connected mode schematic diagram of the register group and arithmetic element in the embodiment of the present application.
Fig. 6 is the connected mode schematic diagram of the register group and arithmetic element in the another embodiment of the application.
Fig. 7 is the arrangement mode schematic diagram to convolved data in multiple register groups in the embodiment of the present application.
Fig. 8 is the process schematic of the carry out convolution operation in the embodiment of the present application.
Fig. 9 is the flow diagram of the calculation method in the embodiment of the present application.
Specific embodiment
Below in conjunction with attached drawing, the technical solution in the application is described.
In order to make it easy to understand, first describing in detail to the concept of convolution operation and the relevant technologies.
Convolution operation is usually applied in convolutional neural networks.Convolutional neural networks can be used for handling image data.To into
The data of row convolutional calculation are properly termed as input matrix or input data.The input data can be image data.Wherein, convolution
Operation can correspond to a convolution kernel.Convolution kernel is the matrix of a K*K size, and K is the integer more than or equal to 1.In convolution kernel
Each element be a weight data.In convolution process, convolution kernel slides on input matrix according to step-length, input matrix
Many submatrixs identical with convolution kernel size can be divided by sliding window, each submatrix and convolution kernel carry out dot product, and
Dot product result is added up, to obtain the operation result of convolution operation.It should be noted that the width and length of convolution kernel can
It, can also be unequal with equal.In the embodiment of the present application, it is illustrated with the width and equal length of convolution kernel, i.e. convolution kernel
Size is K*K.It will be appreciated by those skilled in the art that the computing device and calculation method in the embodiment of the present application can be applied to roll up
The width and the unequal situation of length of product core.
Fig. 1 is the structural schematic diagram of the computing device 100 provided by the embodiments of the present application for convolution operation.Such as Fig. 1 institute
Show, computing device 100 generally includes convolution processing unit 110, register file 120, control unit 130 and memory 140.Its
In, convolution processing unit 110 may include multiple arithmetic elements.Each arithmetic element is for executing to convolved data and weight data
Between multiplying.Convolution processing unit 110 may also include adder tree unit.The adder tree unit can be used for according to volume
The rule that product calculates, the output result of arithmetic element is added up, to obtain the operation result of convolution operation.Memory 140
For storing data.And register file 120 is used to read the data to convolution from memory 140 by bus, and is stored in
In register file 120.Control unit 130 can be used for controlling convolution processing unit 110 and register file 120 realizes above-mentioned behaviour
Make.In the related art, it is run independently of each other between multiple arithmetic elements.It is closed between i.e. multiple arithmetic elements without connection
System.So each arithmetic element needs to read from register file 120 to convolved data respectively at work.It is grasped in multiple convolution
It is handled to there are a large amount of duplicate data in convolved data in work, but these duplicate data are still needed from register file
It repeats to read in 120.Therefore lower to the utilization rate of data.
In order to make it easy to understand, being illustrated below with reference to process of the Fig. 2 to convolution operation.
Fig. 2 shows different cycles convolution kernels and the schematic diagram that convolution operation is carried out to convolved data.As shown in Fig. 2, volume
The weight matrix that product core is 2*2, is multiple lines and multiple rows data to convolved data, is referred to as input data or input matrix.Fig. 2
Show handled data in convolution operation adjacent twice.Wherein, convolution kernel slides on input matrix according to step-length,
The data of Fig. 2 hatched example areas covering are the duplicate data in operation twice, and which show the reusabilities of convolution algorithm.But
In the related art, the computing device for carrying out convolution operation generallys use multiple arithmetic element parallel data processings.Multiple operations
It is independent from each other between unit, cannot carry out data transmission from each other.Therefore, multiple arithmetic elements are in each convolution operation
When require to read the data of this convolution operation needs from register file.Duplicate data can also weigh in convolution operation twice
It is read from register file again.For example, it is assumed that the first arithmetic element and the second arithmetic element difference in multiple arithmetic elements
Execute the first convolution operation and the second convolution operation.First convolution operation and the second convolution operation are adjacent operation.Then first
Arithmetic element is read from register file to convolved data 10,27,35 and 82 in the first convolution operation cycle.Second arithmetic element
It is read from register file to convolved data 27,69,82 and 75 in the second convolution operation cycle.Wherein, data 27,82 are weighed
It is read from register file again.When the data volume of convolution operation processing is larger, mass data is read repeatedly, thus is increased
The pressure of access bandwidth.
The embodiment of the present application proposes a kind of computing device.The computing device, which can be reduced, to be repeated to read from register file
Data, reduce to access bandwidth.Each period can be simultaneously for each computing unit load weight data or to convolved data.
Fig. 3 is a kind of structural schematic diagram of computing device 300 provided in an embodiment of the present invention.As shown in figure 3, the calculating fills
Setting 300 includes:
Multiple arithmetic elements 310, for executing multiple convolution operation to multiple data, the multiple arithmetic element 310 is wrapped
Include the first arithmetic element 311 and the second arithmetic element 312.First arithmetic element 311 is used for: being operated in the multiple convolution
In i-th convolution operation in, in the first weight data and the multiple data the first data carry out multiplying,
In, i is the integer greater than 0;Receive the second data that second arithmetic element is sent, wherein second data are described
Second arithmetic element 312 executes the data of convolution operation in the i-th convolution operation;In multiple convolution operation
In i+1 time convolution operation, multiplying is carried out to first weight data and second data.
Optionally, the second data that above-mentioned the second arithmetic element of reception 312 is sent can refer to that the first arithmetic element 311 is logical
The connection crossed between the second arithmetic element 312 directly acquires the second data.The first arithmetic element 311 can also be referred to by with
Being indirectly connected between two arithmetic elements 312 obtains the second data.For example, the first arithmetic element 311 and third arithmetic element phase
Even, third arithmetic element is connected with the second arithmetic element 312, and the second arithmetic element 312 is to third arithmetic element transmission described the
Second data are sent to the first arithmetic element 311 again by two data, third arithmetic element.
Above-mentioned multiple arithmetic elements 310 are corresponded with multiple weight datas in convolution kernel, and in a convolution
In operation, the corresponding multiplication operation of multiple weight datas is executed respectively.For example, above-mentioned first weight data is the first arithmetic element
311 corresponding weight datas, above-mentioned second weight data are the corresponding weight data of the second arithmetic element 312.
In the embodiment of the present application, the first arithmetic element can obtain weight in convolution operation twice from the second arithmetic element
Multiple data, it is duplicate to convolved data without being read from register file, to improve the utilization rate of data, reduce
The data volume interacted between arithmetic element and register file reduces opening for the access bandwidth between arithmetic element and register file
Pin.
Optionally, above-mentioned computing device 300 can also include multiple register groups 320, and the multiple register group 320 is used
In the multiple data of the storage for multiple convolution operation;The multiple arithmetic element 310 is also used to from the multiple
The handled multiple data of the multiple convolution operation are obtained in register group 320.For example, above-mentioned partial arithmetic unit 310 can
It is remaining with from being obtained in the multiple register group 320 with unduplicated data in last time convolution operation in this convolution operation
Arithmetic element 310 can be obtained from other arithmetic elements 310 in this convolution operation with duplicate number in last time convolution operation
According to.Optionally, in some convolution operations, due to there is no duplicate number between the secondary convolution operation and last convolution operation
According to then all arithmetic elements 310 need to obtain the data to convolution from register group 320.For example, in first time convolution operation
In, or when convolution kernel is moved by line direction, and when being moved to new a line, or when convolution kernel moves in column direction, and move
When moving a new column, all arithmetic elements 310 are required to obtain the data to convolution from register group 320.
Above-mentioned multiple data include above-mentioned first data and above-mentioned second data.In an example it is assumed that the first data
The first register group being stored in multiple register groups, the second data are stored in the second deposit in the multiple register group
Device group.First register group and the second register group are different register groups.If the data handled in (i-1)-th convolution operation
It does not include above-mentioned first data and above-mentioned second data, then in the i-th convolution operation, first arithmetic element is from institute
It states and obtains first data in the first register group, described in second arithmetic element is obtained from second register group
Second data.
Optionally, above-mentioned multiple register groups 320 are referred to as register file.Each register group includes multiple deposits
Device, each register is for storing a data.Each register group can be equipped with a read port.I.e. above-mentioned multiple arithmetic elements
In each arithmetic element a data can be read from a register group within each clock cycle.
In some instances, above-mentioned multiple arithmetic elements 310 are used in each convolution operation, are executed in convolution operation
Multiplying.For example, it is assumed that the corresponding convolution kernel of convolution operation includes K*K weight data, K is the integer greater than 1.It is then multiple
Arithmetic element 310 may include K*K arithmetic element, and the K*K arithmetic element is respectively used to execute K*K weight data one
One corresponding K*K multiplication operation.
The K*K arithmetic element can be divided into K group processing unit, and every group includes K arithmetic element, in an example
In, the K group arithmetic element can correspond to the K column weight data in K*K weight data, and the K in every group of arithmetic element
Arithmetic element is successively corresponded with K weight data in each column weight data.Alternatively, then the K group arithmetic element can be with
Corresponding to the K row weight data in K*K weight data, K arithmetic element in every group of arithmetic element successively with every row weight
K weight data in data corresponds.In the embodiment of the present application, by taking convolution kernel is along the line direction of input matrix as an example into
Row explanation.It will be appreciated by those skilled in the art that convolution kernel is similar with line direction along the scheme that the column direction of input matrix moves,
Details are not described herein again.
The K group arithmetic element can be used for: in above-mentioned i-th convolution operation, it is corresponding to execute K*K weight data respectively
K*K multiplication operation.After having executed the i-th convolution operation, K-t group in the K group arithmetic element to K
Group arithmetic element is used for the data needed for obtaining i+1 time convolution operation in multiple register groups 320, wherein t is the volume
Product operates corresponding sliding step, and t is the integer more than or equal to 1.Y group arithmetic element in the K group arithmetic element is used for
Read second data from y+t group arithmetic element, y is positive integer, and 1≤y≤K-t.Wherein, y group arithmetic element
In d-th of arithmetic element for d-th of arithmetic element reading, second data from y+t group arithmetic element.It is above-mentioned
Second data are duplicate to convolved data in i-th convolution operation and i+1 time convolution operation.
Optionally, the first arithmetic element in Fig. 3 can be any operation unit in y group arithmetic element.In Fig. 3
Second arithmetic element can be any operation unit of the K-t into K group arithmetic element.
In some instances, the adjacent arithmetic element in every row arithmetic element in above-mentioned K*K arithmetic element mutually interconnects
It connects.The connection type can support data to slide between arithmetic element.To the partial arithmetic unit in multiple arithmetic elements
It can obtain non-repetitive data in convolution operation twice from multiple register groups, and remaining arithmetic element can be from adjacent
Arithmetic element obtains duplicate data in continuous convolution operation twice.To avoid reading repeat number from multiple register groups
According to reduce the data volume interacted between arithmetic element and register file.
It should be noted that be not limited to be connected with each other between every row arithmetic element between above-mentioned K*K arithmetic element,
Other a variety of connection types can be used.For example, it may be the adjacent cells phase between arithmetic element in each column arithmetic element
It connects.Alternatively, being all connected between the arithmetic element in every row arithmetic element.Or other connection sides can also be taken
Formula, as long as the connection type enables to the first arithmetic element 311 that can obtain the second data i.e. from the second arithmetic element 312
It can.
As an example, Fig. 4 shows the concrete structure diagram of the computing device 400 in the application one embodiment.Such as
Shown in Fig. 4, which includes convolutional calculation unit 31 and multiple register groups 320.Assuming that convolution kernel includes K*K
Weight data.The convolution processing unit 31 includes K*K arithmetic element 310 and add tree unit 316.K*K operation list
Member 310 is corresponded with K*K weight data, is respectively used to execute the one-to-one multiplication operation of K*K weight data.K*K
A arithmetic element 310 is arranged as K row K column, and the adjacent arithmetic element in every row arithmetic element is connected.Specifically, each fortune
Calculating unit 310 may include the first register 3101, the second register 3102 and a multiplier 3103.The multiplier 3103
It is connected with the first register 3101, the second register 3102.First register 3101 is corresponding for storing each arithmetic element 310
Weight data, it is corresponding to convolution in a convolution operation that the second register 3102 can be used for storing the arithmetic element 310
Data.Multiplier 3103 is used to carry out multiplication operation to the data that the first register 3101 and the second register 3102 store.Add
Method tree unit 316 from the K*K reception multiplication of arithmetic element 310 for operating respectively as a result, and adding up to the above results
Operation, to obtain the operation result of a convolution operation.
Optionally, above-mentioned computing device 400 can also include memory 330 and control unit 340.Wherein, memory 330
It can be connected by bus with multiple register groups 320.Memory 330 is used to store data to convolution, and pass through bus to
Multiple register groups input the data to convolution.Control unit 340 can be used for controlling convolution processing unit 31 and multiple deposits
Device group 320 realizes aforesaid operations.
The embodiment of the present application is not construed as limiting the type of memory.It is synchronized for example, above-mentioned memory can be Double Data Rate
Dynamic RAM (Double Data Rate Synchronous Dynamic Random Access Memory, DDR
SDRAM)。
Optionally, the embodiment of the present application is treated storage mode of the convolved data in multiple register groups 320 and is not construed as limiting.
For example, subregion division can be carried out to multiple register groups 320.Each region corresponds to an arithmetic element 310, the operation list
Member 310 is only needed for the reading of the region to convolved data, without reading data from other regions.
As an example, Fig. 5 shows multiple arithmetic elements 310 and multiple register groups in the embodiment of the present application
320 connection type.Multiple arithmetic elements 310 are corresponded with multiple register groups 320, each arithmetic element and a deposit
Device group is connected.It is corresponding to convolved data that each register group can be used for storing an arithmetic element.Each arithmetic element exists
A data are read in from corresponding register group in each clock cycle.
But in the storage mode of subregion, the data in corresponding region can only be read due to each arithmetic element, because
This, if multiple arithmetic elements when convolved data repeats, multiple subregions in multiple register groups 320 are needed from memory
The data are repeatedly read and stored in 330.This storage mode occupies the memory space of multiple register groups 320, also increases
Bandwidth requirement of multiple register groups 320 from memory reading data is added.For example, with continued reference to Fig. 1, in a convolution operation
In, if the first arithmetic element 311 reads in data 10, the second arithmetic element 312 reads in data 27, moves right a column in convolution kernel
In the convolution operation carried out afterwards, the first arithmetic element 311 need to read in data 27, and the second arithmetic element 312 need to read in data 69.Then
Data 27 can be repeated to read in and be stored by the first arithmetic element 311 and the corresponding subregion of the second arithmetic element 312.
Alternatively it is also possible to not carry out subregion, each accessible any deposit of arithmetic element to multiple register groups 320
The data stored in device group.Under the mode of not subregion, each arithmetic element 310 in the multiple arithmetic element 310 is used for
Obtain the data stored in any one register group in the multiple register group 320.
As an example, Fig. 6 shows multiple arithmetic elements 310 in the another embodiment of the application and multiple registers
The connection type of group 320.It should be noted that connection type shown in fig. 6 or structure can apply calculating provided by the present application
In device, such as the computing device of Fig. 1, Fig. 3 or Fig. 4.It can also apply in other any computing devices, the embodiment of the present application pair
This is not construed as limiting.
As shown in fig. 6, full connection switching network can be arranged between multiple arithmetic elements 310 and multiple register groups 320
Network.Full connection exchange network is for realizing the network being connected entirely before multiple arithmetic elements 310 and multiple register groups 320.
By the full connection exchange network, multiple arithmetic elements 310 can be with any one register phase in multiple register groups 320
Even.In other words, each arithmetic element in the multiple arithmetic element can be used for obtaining appointing in the multiple register group
The data stored in register group of anticipating.
For example, above-mentioned first arithmetic element 311 can be in the multiple register group 320 in i-th convolution operation
The first register group in obtain above-mentioned first data, in jth time convolution operation, above-mentioned first arithmetic element 311 can be
Third data are obtained in the second register group in the multiple register group 320.Wherein, i, j are positive integer, i ≠ j.
In the embodiment of the present application, full connection is used between multiple arithmetic elements 310 and multiple register groups 320 to hand over
Switching network is connected, so that any operation unit 310 in multiple arithmetic elements 310 is able to access that any register group, therefore on
Stating multiple register groups 320 need to read in and store once to convolved data, without reading and storing from memory more than 330 times
Duplicate mass data in multiple convolution operation, thus the memory space of multiple register groups 320 can be saved, and reduce more
The pressure of access bandwidth between a register group 320 and memory 330, improves the degree of parallelism of data access.
It is each due to each register group 320 correspondences, one read port but in the storage mode of not subregion
Arithmetic element 310 was only capable of reading in a data from a register group 320 within each clock cycle.If in a convolution operation
In it is corresponding multiple when convolved data is stored in the same register group 320, there are reading data conflicts.Then multiple operations
Unit 310 needs multiple clock cycle that could all read required to convolved data.This storage mode will affect operation
The processing speed of unit 310.
To solve the above-mentioned problems, the embodiment of the present application in multiple register groups 320 it is further proposed that store wait roll up
The mode of volume data.Continue with the mode for describing multiple storages of register groups 320 to convolved data.
In some embodiments, multiple data in can operating to multiple convolution to convolution are arranged, so that i-th
In secondary convolution operating process, the data that the multiple arithmetic element 310 is read from the multiple register group 320 are located at described
In the different registers group of multiple register groups 320.
Wherein, above-mentioned i-th convolution operation can be any one secondary convolution operation in multiple convolution operation.It is above-mentioned multiple
The data that arithmetic element 310 is read from the multiple register group 320 can refer to be needed in i-th convolution operation from described more
The data read in a register group 320.The number read from multiple register groups 320 is needed in above-mentioned i-th convolution operation
According to the total data that can be the processing in i-th convolution operation, it is also possible to the partial data of i-th convolution operation processing.
For example, if in i-th convolution operation with there are the duplicate data in part, duplicate data can be from (i-1)-th convolution operation
It is obtained inside multiple arithmetic elements 310, unduplicated data can be obtained from the multiple register group 320.Therefore i-th volume
It is above-mentioned unduplicated data that the data read from multiple register groups 320 are needed in product operation.If i-th convolution operation with
Duplicate data are not present in (i-1)-th convolution operation, then need to read from multiple register groups 320 in i-th convolution operation
The data taken are total data to be processed in i-th convolution operation.
In the embodiment of the present application, flexible data arrangement mode is used in multiple register groups, it can be at one
Data as much as possible are read in the clock period, are avoided reading data conflict, are realized clog-free convolution algorithm process.
Carrying out arrangement to above-mentioned multiple data to convolution, there are a variety of concrete modes.In one example, the convolution
Operating corresponding convolution kernel size is K*K, and the multiple register group includes K*K register group, and the multiple data include M
Row N column data, distribution of the multiple data in the K*K register group meet the following conditions:
The data in xth row s column in the multiple data are stored in b-th of register, in the multiple data
Xth+a row s column in data be stored in [b+ (a*K)] mod (K*K) a register group, wherein K be not less than 1
Integer, mod indicate progress complementation, and M >=x >=1, N >=s >=1, K*K >=b >=1, M, N, x, s, b, a is positive integer.
Optionally, on the basis of meeting above-mentioned condition, each row of data in above-mentioned M row data can be in multiple registers
Arrangement is circuited sequentially in group.For example, the first row data can successively arrange from the first register group to the K*K register group,
It then proceedes to carry out circulation arrangement since the first register group.
It is alternatively possible to arranged to above-mentioned multiple data so that: it is any in each row of data in above-mentioned M row data
K continuous data is stored in any K consecutive numbers in every column data in different registers group and/or in above-mentioned N column data
According to being stored in different register groups.
Wherein, in some embodiments, above-mentioned arrangement mode only as an example, is also referred to above-mentioned formula to more
The arrangement mode of a data is modified and is converted.For example, above-mentioned multiple register groups can also be not limited to K*K register
Group, as long as the data in a convolution operation to convolution are stored in different register groups.Alternatively, in above-mentioned multiple data
Row and column can be inverted after, still can be according to above-mentioned arrangement mode storing data.
Fig. 7 is specifically showing for multiple arrangement modes to convolved data in multiple register groups in the embodiment of the present application
It is intended to.In order to make it easy to understand, being illustrated below with reference to Fig. 7 to multiple specific arrangement modes to convolved data.Such as Fig. 7 institute
Show, convolution kernel is the weight matrix of 3*3 size.Above-mentioned multiple arithmetic elements may include 3*3 arithmetic element, above-mentioned multiple to post
Storage group may include 3*3 register group.By connecting exchange network entirely between multiple arithmetic elements and multiple register groups
It is connected.Above-mentioned multiple register groups may be constructed register array.Every row indicates that a register group, each register group include
Several registers.Wherein, the line number of register array can use row number r1, r2 ... rn to indicate.The line number of register array can
To be indicated with b1, b2 ... b9.It is also shown in Fig. 7 multiple to convolved data needed for multiple convolution operation.Above-mentioned multiple data
Including multiple lines and multiple rows data.For the ease of indicating, above-mentioned data use 1,2,3 respectively ..., and 256 indicate.
Above-mentioned multiple data can arrange in multiple register groups according to mode hereinbefore.For example, the first row data
1,2 ..., 16 can be stored among the first register group b1 to the 9th register b9 with circuiting sequentially.If in the first row data
First data is stored into b-th of register, then the second row data are loaded since [b+ (a*K)] mod (K*K) organizes register
Data.For example, first data 1 in the first row data are located in first register group, i.e. b=1, according to above-mentioned calculating side
Formula, first data 17 in the second row data are stored in [1+ (1*3)] mod 9=4 register group.And third line number
First data 33 in should be stored in [1+ (2*3)] mod 9=7 register group.As a kind of optional place
Formula, the second row data can be stored into multiple register groups with circuiting sequentially since the 4th register group.The third line data
It can be stored into multiple register groups with starting the cycle over from the 7th register group.It is more in the case where using above-mentioned arrangement mode
All data needed for a arithmetic element can read a convolution operation within a clock cycle.In addition, as shown in fig. 7,
Assuming that 3*3 weight data is respectively q1-q9.Then above-mentioned weight data can also be stored in the difference in multiple register groups respectively
In register group.In the case where using above-mentioned arrangement mode, before multiple convolution operation starts, multiple arithmetic elements can be
All weight datas are read in one clock cycle, then read again in convolution operation to convolved data.
In order to make it easy to understand, continuing to be discussed in detail the computing device in the embodiment of the present application below with reference to Fig. 8 and carrying out convolution
The treatment process of operation.Fig. 8 is the specific example figure of the convolution operation in the another embodiment of the application.Assuming that the convolution in Fig. 8
Core is the convolution kernel of 3*3, and which employs the data assignment modes in Fig. 3 or computing device shown in Fig. 4 and Fig. 7.Convolution behaviour
The process of work is as follows:
S801, from memory 330, the corresponding weight data of load convolution kernel is into multiple register groups 320,3*3 power
Value Data is respectively stored into different register groups 320.
S802, the corresponding multiple data of multiple convolution operation are loaded into multiple register groups 320, arranging rule such as Fig. 7
It is shown.
S803, it reads from multiple register groups 320 to the 3*3 corresponding weight data of arithmetic element 310, and by weight
Data are stored into the first register 3101 in each arithmetic element 310.Due to weight data ordered arrangement, carried
, it can be achieved that Lothrus apterus is read when entering arithmetic element 310.
S804, in first time convolution operation, will be to preceding 3 reading data of 3 rows before in convolved data to multiple fortune
Unit 310 is calculated, total 3*3 data, due to the above-mentioned arranging rule to convolved data, above-mentioned 3*3 data are from above-mentioned 3*3
It is read in different registers group 320 in register group 320, therefore first time convolution behaviour can be read in a clock cycle
Data needed for making.Above-mentioned 3*3 arithmetic element 310 carries out operating with the one-to-one multiplication of 3*3 weight data respectively, and
Export result.
The output result of 3*3 arithmetic element 310 is sent into add tree unit 316 and (added up by S805, each arithmetic element 310
Unit), the operation result of first time convolution operation is obtained, and operation result can be write back in register file.Wherein, described to post
Storage heap may include above-mentioned multiple register groups 320, can also include other register groups.The operation result is writing back deposit
When device heap, the storage location of the operation result does not conflict mutually with the position to convolved data stored before.
S806, before carrying out second of convolution algorithm, third column operations unit 310 is read from multiple register groups 320
Take the 4th data of preceding 3 row of multiple data;
The data of processing in first time convolution operation are transferred to the second column operations by S807, third column operations unit 310
In unit 310, the data of its processing in first time convolution operation are transferred to the first column operations by secondary series arithmetic element 310
In unit 310.Then, each arithmetic element 310 carries out the multiplying in second of convolution operation.
The output result of 3*3 arithmetic element 310 is sent into add tree unit 316 and (added up by S808, each arithmetic element 310
Unit), the operation result of second of convolution operation is obtained, and operation result can be write back in multiple register groups 320.Later
Convolution process repeat aforesaid operations.
The present processes embodiment is described below, embodiment of the method is corresponding with Installation practice, therefore not in detail
The part carefully described may refer to each Installation practice in front.
Fig. 9 is the schematic flow chart of the calculation method of the embodiment of the present application.Using the computing device of the calculation method
Including multiple arithmetic elements, the multiple arithmetic element is used to execute multiple data multiple convolution operation, the multiple operation
Unit includes the first arithmetic element and the second arithmetic element;
The described method includes:
In S901, the i-th convolution operation in the multiple convolution operates, first arithmetic element is to the first weight
The first data in data and the multiple data carry out multiplying, wherein i is the integer greater than 0;
S902, first arithmetic element receive the second data that second arithmetic element is sent, wherein described second
Data are the data that second arithmetic element executes convolution operation in the i-th convolution operation;
S903, in the i+1 time convolution operation in multiple convolution operation, first arithmetic element is to described the
One weight data and second data carry out multiplying.
Optionally, the computing device in Fig. 9 further includes multiple register groups, for storing the multiple data;In Fig. 9
Calculation method further include: control the multiple arithmetic element obtained from multiple register groups of the computing device it is described repeatedly
The multiple data needed for convolution operation.
Optionally, in the method for Fig. 9, each arithmetic element in the multiple arithmetic element is the multiple for obtaining
The data stored in any one register group in register group.
Optionally, in the method for Fig. 9, during the i-th convolution operation, the multiple arithmetic element is from described
The data read in multiple register groups are located in the different registers group of the multiple register group.
Optionally, in the method for Fig. 9, the corresponding convolution kernel size of the convolution operation is K*K, the multiple register
Group includes K*K register group, and the multiple data include M row N column data, and the multiple data are in the K*K register
Distribution in group meets the following conditions: the data in xth row s column in the multiple data are stored in r group register,
The data in xth+a row s column in the multiple data are stored in [r+ (a*K)] mod (K*K) group register, wherein
K is integer not less than 1, and mod indicates to carry out complementation, M >=x >=1, N >=s >=1, K*K >=r >=1, M, N, x, s, r, and a is
Positive integer.
In the embodiment of the present application, a kind of computing device of the sliding to convolved data between supporting arithmetic element, energy are provided
It supports that multiple arithmetic elements are read from neighbouring arithmetic element to convolved data, avoids reading from multiple register groups with this
A large amount of repeated datas are taken, the data volume interacted between arithmetic element and multiple register groups is reduced, reduce data access conflict,
Improve the operational performance of framework.Further, full connection exchange can be used between multiple arithmetic elements and multiple register groups
Network is connected, so that any operation unit in multiple arithmetic elements is able to access that any register group, therefore above-mentioned multiple posts
Storage group only need to be read in and be stored once to convolved data, without repeatedly reading and storing from memory in multiple convolution operation
Duplicate mass data, thus the memory space of multiple register groups can be saved, and reduce multiple register groups and storage
The pressure of access bandwidth between device improves the degree of parallelism of data access.It in the embodiment of the present application, can be in multiple registers
Flexible data arrangement mode is used in group, can be read data as much as possible within a clock cycle, be avoided data
Conflict is read, realizes clog-free convolution algorithm process.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
Scope of the present application.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, server or network equipment etc.) execute each embodiment the method for the application all or part of the steps.
And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), arbitrary access are deposited
The various media that can store program code such as reservoir (Random Access Memory, RAM), magnetic or disk.
The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any
Those familiar with the art within the technical scope of the present application, can easily think of the change or the replacement, and should all contain
Lid is within the scope of protection of this application.Therefore, the protection scope of the application should be based on the protection scope of the described claims.
Claims (14)
1. a kind of computing device characterized by comprising
Multiple arithmetic elements, for executing multiple convolution operation to multiple data, the multiple arithmetic element includes the first operation
Unit and the second arithmetic element;
First arithmetic element is used for:
In the i-th convolution operation of multiple convolution operation, to the in the first weight data and the multiple data
One data carry out multiplying, wherein i is the integer greater than 0;
Receive the second data that second arithmetic element is sent, wherein second data are that second arithmetic element exists
The data of convolution operation are executed in the i-th convolution operation;
In the i+1 time convolution operation of multiple convolution operation, to first weight data and second data
Carry out multiplying.
2. device as described in claim 1, which is characterized in that further include:
Multiple register groups, for storing the multiple data;
The multiple arithmetic element is also used to obtain institute needed for the multiple convolution operation from the multiple register group
State multiple data, wherein each arithmetic element in the multiple arithmetic element can obtain in the multiple register group
The data stored in any one register group.
3. device as claimed in claim 2, which is characterized in that
First arithmetic element, is specifically used in the i-th convolution operation, and first from the multiple register group
First data are obtained in register group;
Second arithmetic element, is specifically used in the i-th convolution operation, and second from the multiple register group
Second data are obtained in register group, wherein first register group and second register group are posted to be different
Storage group.
4. device as claimed in claim 2 or claim 3, which is characterized in that
During the i-th convolution operation, data that the multiple arithmetic element is read from the multiple register group
In the different registers group of the multiple register group.
5. device as claimed in claim 4, which is characterized in that the corresponding convolution kernel size of the convolution operation is K*K, described
Multiple register groups include K*K register group, and the multiple data include M row N column data, and the multiple data are in the K*
Distribution in K register group meets the following conditions:
The data in xth row s column in the multiple data are stored in b-th of register group, in the multiple data
Data in xth+a row s column are stored in [b+ (a*K)] mod (K*K) a register group, wherein K is whole not less than 1
Number, mod indicate progress complementation, and M >=x >=1, N >=s >=1, K*K >=b >=1, M, N, x, s, b, a is positive integer.
6. a kind of calculation method, which is characterized in that the computing device of the application calculation method includes multiple arithmetic elements, described
Multiple arithmetic elements be used for multiple data execute multiple convolution operation, the multiple arithmetic element include the first arithmetic element and
Second arithmetic element;
The described method includes:
In the i-th convolution operation of multiple convolution operation, first arithmetic element is to the first weight data and institute
The first data stated in multiple data carry out multiplying, wherein i is the integer greater than 0;
First arithmetic element receives the second data that second arithmetic element is sent, wherein second data are institutes
State the data that the second arithmetic element executes convolution operation in the i-th convolution operation;
In the i+1 time convolution operation of multiple convolution operation, first arithmetic element is to first weight data
And second data carry out multiplying.
7. calculation method as claimed in claim 6, which is characterized in that the computing device further includes the multiple for storing
Multiple register groups of data, wherein each arithmetic element in the multiple arithmetic element can obtain the multiple deposit
The data stored in any one register group in device group.
8. calculation method as claimed in claim 7, which is characterized in that the method also includes:
In the i-th convolution operation, first arithmetic element is from the first register group in the multiple register group
It is middle to obtain first data;
In the i-th convolution operation, second arithmetic element is from the second register group in the multiple register group
It is middle to obtain second data, wherein first register group and second register group are different register groups.
9. calculation method as claimed in claim 7 or 8, it is characterised in that: described during the i-th convolution operation
The data that multiple arithmetic elements are read from the multiple register group are located at the different registers group of the multiple register group
In.
10. calculation method as claimed in claim 9, which is characterized in that the corresponding convolution kernel size of the convolution operation is K*
K, the multiple register group include K*K register group, and the multiple data include M row N column data, and the multiple data exist
Distribution in the K*K register group meets the following conditions:
The data in xth row s column in the multiple data are stored in r group register, the xth in the multiple data
Data in+a row s column are stored in [r+ (a*K)] mod (K*K) group register, wherein and K is the integer not less than 1,
Mod indicates progress complementation, and M >=x >=1, N >=s >=1, K*K >=r >=1, M, N, x, s, r, a is positive integer.
11. a kind of computing device, which is characterized in that including multiple arithmetic elements and multiple register groups, wherein each operation list
Member is separately connected the multiple register group;
The multiple register group, for storing multiple data of pending multiple convolution operation;
The multiple arithmetic element, for obtaining the multiple data from the multiple register group, and to the multiple number
It is operated according to multiple convolution is executed, wherein each arithmetic element in the multiple arithmetic element is for obtaining the multiple deposit
The data stored in any one register group in device group.
12. device as claimed in claim 11, which is characterized in that the i-th convolution operation in multiple convolution operation
In the process, the data that the multiple arithmetic element is read from the multiple register group are located at the multiple register group not
With in register group, wherein i is the integer greater than 0.
13. the device as described in claim 11 or 12, which is characterized in that the corresponding convolution kernel size of the convolution operation is K*
K, the multiple register group include K*K register group, and the multiple data include M row N column data, and the multiple data exist
Distribution in the K*K register group meets the following conditions:
The data in xth row s column in the multiple data are stored in b-th of register group, in the multiple data
Data in xth+a row s column are stored in [b+ (a*K)] mod (K*K) a register group, wherein K is whole not less than 1
Number, mod indicate progress complementation, and M >=x >=1, N >=s >=1, K*K >=b >=1, M, N, x, s, b, a is positive integer.
14. the device as described in any one of claim 11 to 13, which is characterized in that the multiple arithmetic element includes first
Arithmetic element and the second arithmetic element,
First arithmetic element is used for:
In the i-th convolution operation in multiple convolution operation, in the first weight data and the multiple data
First data carry out multiplying, wherein i is the integer greater than 0;
Receive the second data that second arithmetic element is sent, wherein second data are that second arithmetic element exists
The data of convolution operation are executed in the i-th convolution operation;
In the i+1 time convolution operation in multiple convolution operation, to first weight data and second number
According to progress multiplying.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810376471.0A CN110399976B (en) | 2018-04-25 | 2018-04-25 | Computing device and computing method |
PCT/CN2019/084011 WO2019206162A1 (en) | 2018-04-25 | 2019-04-24 | Computing device and computing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810376471.0A CN110399976B (en) | 2018-04-25 | 2018-04-25 | Computing device and computing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110399976A true CN110399976A (en) | 2019-11-01 |
CN110399976B CN110399976B (en) | 2022-04-05 |
Family
ID=68293769
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810376471.0A Active CN110399976B (en) | 2018-04-25 | 2018-04-25 | Computing device and computing method |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110399976B (en) |
WO (1) | WO2019206162A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113536220A (en) * | 2020-04-21 | 2021-10-22 | 中科寒武纪科技股份有限公司 | Operation method, processor and related product |
CN113807506A (en) * | 2020-06-11 | 2021-12-17 | 杭州知存智能科技有限公司 | Data loading circuit and method |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1261966A (en) * | 1997-06-30 | 2000-08-02 | 博普斯公司 | Manifold array processor |
CN101237240A (en) * | 2008-02-26 | 2008-08-06 | 北京海尔集成电路设计有限公司 | A method and device for realizing cirrocumulus interweaving/de-interweaving based on external memory |
US20080288727A1 (en) * | 2007-05-14 | 2008-11-20 | International Business Machines Corporation | Computing System with Optimized Support for Transactional Memory |
CN101841730A (en) * | 2010-05-28 | 2010-09-22 | 浙江大学 | Real-time stereoscopic vision implementation method based on FPGA |
CN103092767A (en) * | 2013-01-25 | 2013-05-08 | 浪潮电子信息产业股份有限公司 | Management method for cloud computing interior physical machine information memory pool |
CN103970505A (en) * | 2013-01-24 | 2014-08-06 | 想象力科技有限公司 | Register file having a plurality of sub-register files |
CN104035750A (en) * | 2014-06-11 | 2014-09-10 | 西安电子科技大学 | Field programmable gate array (FPGA)-based real-time template convolution implementing method |
CN104932994A (en) * | 2015-06-17 | 2015-09-23 | 青岛海信信芯科技有限公司 | Data processing method and device |
CN106228240A (en) * | 2016-07-30 | 2016-12-14 | 复旦大学 | Degree of depth convolutional neural networks implementation method based on FPGA |
US20170102945A1 (en) * | 2015-10-08 | 2017-04-13 | Via Alliance Semiconductor Co., Ltd. | Direct execution by an execution unit of a micro-operation loaded into an architectural register file by an architectural instruction of a processor |
CN106844294A (en) * | 2016-12-29 | 2017-06-13 | 华为机器有限公司 | Convolution algorithm chip and communication equipment |
CN106951395A (en) * | 2017-02-13 | 2017-07-14 | 上海客鹭信息技术有限公司 | Towards the parallel convolution operations method and device of compression convolutional neural networks |
CN107832843A (en) * | 2017-10-30 | 2018-03-23 | 上海寒武纪信息科技有限公司 | A kind of information processing method and Related product |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679620B (en) * | 2017-04-19 | 2020-05-26 | 赛灵思公司 | Artificial neural network processing device |
CN107844826B (en) * | 2017-10-30 | 2020-07-31 | 中国科学院计算技术研究所 | Neural network processing unit and processing system comprising same |
-
2018
- 2018-04-25 CN CN201810376471.0A patent/CN110399976B/en active Active
-
2019
- 2019-04-24 WO PCT/CN2019/084011 patent/WO2019206162A1/en active Application Filing
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1261966A (en) * | 1997-06-30 | 2000-08-02 | 博普斯公司 | Manifold array processor |
US20080288727A1 (en) * | 2007-05-14 | 2008-11-20 | International Business Machines Corporation | Computing System with Optimized Support for Transactional Memory |
CN101237240A (en) * | 2008-02-26 | 2008-08-06 | 北京海尔集成电路设计有限公司 | A method and device for realizing cirrocumulus interweaving/de-interweaving based on external memory |
CN101841730A (en) * | 2010-05-28 | 2010-09-22 | 浙江大学 | Real-time stereoscopic vision implementation method based on FPGA |
CN103970505A (en) * | 2013-01-24 | 2014-08-06 | 想象力科技有限公司 | Register file having a plurality of sub-register files |
CN103092767A (en) * | 2013-01-25 | 2013-05-08 | 浪潮电子信息产业股份有限公司 | Management method for cloud computing interior physical machine information memory pool |
CN104035750A (en) * | 2014-06-11 | 2014-09-10 | 西安电子科技大学 | Field programmable gate array (FPGA)-based real-time template convolution implementing method |
CN104932994A (en) * | 2015-06-17 | 2015-09-23 | 青岛海信信芯科技有限公司 | Data processing method and device |
US20170102945A1 (en) * | 2015-10-08 | 2017-04-13 | Via Alliance Semiconductor Co., Ltd. | Direct execution by an execution unit of a micro-operation loaded into an architectural register file by an architectural instruction of a processor |
CN106228240A (en) * | 2016-07-30 | 2016-12-14 | 复旦大学 | Degree of depth convolutional neural networks implementation method based on FPGA |
CN106844294A (en) * | 2016-12-29 | 2017-06-13 | 华为机器有限公司 | Convolution algorithm chip and communication equipment |
CN106951395A (en) * | 2017-02-13 | 2017-07-14 | 上海客鹭信息技术有限公司 | Towards the parallel convolution operations method and device of compression convolutional neural networks |
CN107832843A (en) * | 2017-10-30 | 2018-03-23 | 上海寒武纪信息科技有限公司 | A kind of information processing method and Related product |
Non-Patent Citations (5)
Title |
---|
MIKA LAIHO ET AL: "A mixed-mode polynomial-type CNN for analysing brain electrical activity in epilepsy", 《INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATION》 * |
宋鹏 主编: "《信息论与编码》", 31 January 2018, 《西安电子科技大学出版社》 * |
文良勇: "多线程环境下寄存器文件的设计与优化", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
方睿 等: "卷积神经网络的FPGA并行加速方案设计", 《计算机工程与应用》 * |
郑金彬: "基于汇编指令译码映射关系的软件代码安全分析", 《龙岩学院学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113536220A (en) * | 2020-04-21 | 2021-10-22 | 中科寒武纪科技股份有限公司 | Operation method, processor and related product |
CN113807506A (en) * | 2020-06-11 | 2021-12-17 | 杭州知存智能科技有限公司 | Data loading circuit and method |
Also Published As
Publication number | Publication date |
---|---|
WO2019206162A1 (en) | 2019-10-31 |
CN110399976B (en) | 2022-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102492477B1 (en) | Matrix multiplier | |
TWI803663B (en) | A computing device and computing method | |
CN109034373B (en) | Parallel processor and processing method of convolutional neural network | |
JP7245338B2 (en) | neural network processor | |
EP3557484A1 (en) | Neural network convolution operation device and method | |
US10768894B2 (en) | Processor, information processing apparatus and operation method for processor | |
CN108182471A (en) | A kind of convolutional neural networks reasoning accelerator and method | |
CN108510064A (en) | The processing system and method for artificial neural network including multiple cores processing module | |
KR102162749B1 (en) | Neural network processor | |
CN110188869B (en) | Method and system for integrated circuit accelerated calculation based on convolutional neural network algorithm | |
CN110766128A (en) | Convolution calculation unit, calculation method and neural network calculation platform | |
CN110580519B (en) | Convolution operation device and method thereof | |
EP3844610B1 (en) | Method and system for performing parallel computation | |
CN110399976A (en) | Computing device and calculation method | |
KR20220015813A (en) | Method and apparatus for performing deep learning operations. | |
CN113344172A (en) | Mapping convolutions to channel convolution engines | |
CN110377874B (en) | Convolution operation method and system | |
CN116306840A (en) | Neural network operation method, device, chip, electronic equipment and storage medium | |
CN110399977A (en) | Pond arithmetic unit | |
CN110414672B (en) | Convolution operation method, device and system | |
Meng et al. | Ppoaccel: A high-throughput acceleration framework for proximal policy optimization | |
JP6906622B2 (en) | Arithmetic circuit and arithmetic method | |
JP6432348B2 (en) | Arithmetic apparatus and arithmetic method | |
US20180349061A1 (en) | Operation processing apparatus, information processing apparatus, and method of controlling operation processing apparatus | |
CN102231624A (en) | Vector processor-oriented floating point complex number block finite impulse response (FIR) vectorization realization method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |