CN108875917A - A kind of control method and device for convolutional neural networks processor - Google Patents
A kind of control method and device for convolutional neural networks processor Download PDFInfo
- Publication number
- CN108875917A CN108875917A CN201810685538.9A CN201810685538A CN108875917A CN 108875917 A CN108875917 A CN 108875917A CN 201810685538 A CN201810685538 A CN 201810685538A CN 108875917 A CN108875917 A CN 108875917A
- Authority
- CN
- China
- Prior art keywords
- convolutional calculation
- numerical value
- loaded
- feature vector
- input feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
The present invention provides a kind of control method, including:1) the size n*n of the convolution algorithm needed to be implemented is determined;2) the size n*n of the convolution algorithm executed as needed is selected in m2It is loaded into the numerical value of convolution kernel corresponding with the size in the convolutional calculation unit of a 3*3, and remaining each numerical value is filled with 0,3m >=n;3) size of the convolution algorithm executed as needed, need to be implemented convolution input feature vector figure size, periodicity needed for determining convolutional calculation process;4) numerical value of corresponding input feature vector figure is loaded into the m by each period during convolutional calculation2In the convolutional calculation unit of a 3*3, the numerical value of the input feature vector figure is in the m2The numerical value of distribution and the convolution kernel in the convolutional calculation unit of a 3*3 is in the m2Distribution in the convolutional calculation unit of a 3*3 is consistent;Control is loaded with the m of the numerical value of convolution kernel and input feature vector figure2The convolutional calculation unit of a 3*3 executes convolutional calculation corresponding with the periodicity respectively.
Description
Technical field
The present invention relates to a kind of convolutional neural networks processors, more particularly to the hardware for convolutional neural networks processor
Accelerate the improvement of aspect.
Background technique
Artificial intelligence technology has obtained swift and violent development in recent years, and extensive pass has been obtained in worldwide
The research work that note, either industry or academia have all carried out artificial intelligence technology, artificial intelligence technology is infiltrated into
The every field such as visual perception, speech recognition, auxiliary driving, smart home, traffic scheduling.Depth learning technology is artificial intelligence
The boost motor of technology development.Deep learning is trained using the topological structure of deep neural network, is optimized and reasoning etc., depth
White piece of convolutional neural networks of neural network, depth confidence network, Recognition with Recurrent Neural Network etc., by iterating, training.With image
For identification application, deep learning algorithm can be automatically derived the characteristic of hiding image by deep neural network,
And it generates and is better than traditional effect based on pattern recognition analysis method.
However, the realization of existing depth learning technology depends on great calculation amount.In the training stage, need in magnanimity
Pass through the weighted data being calculated in neural network that iterates in data;In the reasoning stage, also need using nerve net
Network completes the calculation process to input data within the extremely short response time (usually Millisecond), this needs disposed nerve net
Network computing circuit (including CPU, GPU, FPGA and ASIC etc.) reaches per second hundred billion times even computing capability of trillion times.Thus,
It is that have very much must to for realizing the hardware-accelerated of depth learning technology, such as to the hardware-accelerated of convolutional neural networks processor
It wants.
It has been generally acknowledged that realizing that hardware-accelerated mode can be roughly divided into two kinds, one is use more massive hardware simultaneously
Carry out calculation processing capablely, it is another then be that processing speed or efficiency are improved by the special hardware circuit of design.
For the above-mentioned second way, neural network is directly mapped as hardware circuit by some prior arts, for each
Different computing units is respectively adopted in network layer, so that the calculating for each network layer carries out in pipelined fashion.For example,
Each computing unit in addition to first computing unit is using the output of previous computing unit as its input, and each meter
Unit is calculated to be only used for executing the calculating for being directed to corresponding network layer, within the different unit time of assembly line, the meter
Unit is calculated to calculate the different inputs of the network layer.Such prior art, generally directed to be to need continuous place
The scene of different inputs is managed, such as the video file comprising multiple image is handled.Also, such prior art is logical
Often it is directed to the neural network with less network layer.This is because, the network number of plies of deep neural network and larger,
Neural network is directly mapped as hardware circuit, the cost of circuit area is very large, and power consumption also can be with circuit area
Increase and increase.In addition, it is contemplated that each network layer mutual operation time, there is also larger differences, in order to realize assembly line
Function, be supplied to each assembly line level runing time needs be forced to be set as being equal to each other, that is, be equal to processing speed
The operation time of most slow assembly line level.For the deep neural network with a large amount of network layers, design assembly line is needed
Factors much more very is considered, to reduce waiting needed for the comparatively faster assembly line level of processing speed during pipeline computing
Time.
There are also some prior arts in the case where the rule calculated with reference to neural network, and proposition can be for mind
" time division multiplexing " is carried out to improve the reusability of computing unit through the computing unit in network processing unit, is different from above-mentioned flowing water
The mode of line successively calculates each network layer in neural network using identical computing unit.Such as to input layer,
First hidden layer, the second hidden layer ... output layer is seriatim calculated, and next iteration calculating in repeat above-mentioned mistake
Journey.Such prior art can also be directed to deep neural network for the neural network with less network layer, and
It is particularly suitable for the limited application scenarios of hardware resource.For such application scenarios, neural network processor is being directed to one
After a input has carried out the calculating of network layer A, it may all not need to carry out the calculating for network layer A again for a long time,
If different hardware, which is respectively adopted, as its computing unit in each network layer then will lead to limitation to hardware, so that hardware
Reusability is not high.Most prior arts are all based on such consideration, using it is different for computing unit " time-division is multiple
With " mode and the hardware of neural network processor is correspondingly improved.
However, no matter using which kind of above-mentioned prior art designing convolutional neural networks processor, however it remains hardware benefit
In place of having much room for improvement with rate.
Summary of the invention
Therefore, it is an object of the invention to overcome the defect of the above-mentioned prior art, provides a kind of for convolutional neural networks
The control method of processor, the convolutional neural networks processor have the convolutional calculation unit of 3*3, the control method packet
It includes:
1) the convolution kernel size n*n of the convolution algorithm needed to be implemented is determined;
2) the convolution kernel size n*n of the convolution algorithm executed as needed is selected in m2In the convolutional calculation unit of a 3*3
It is loaded into the numerical value of convolution kernel corresponding with the size, and remaining each numerical value is filled with 0,3m >=n;
3) size of the convolution algorithm executed as needed and need to be implemented convolution input feature vector figure size, really
Periodicity needed for determining convolutional calculation process;And
4) according to the periodicity, each period during convolutional calculation, by the numerical value of corresponding input feature vector figure
It is loaded into the m2In the convolutional calculation unit of a 3*3, the numerical value of the input feature vector figure is in the m2The convolutional calculation of a 3*3
The numerical value of distribution and the convolution kernel in unit is in the m2Distribution in the convolutional calculation unit of a 3*3 is consistent;
Control is loaded with the m of the numerical value of convolution kernel and input feature vector figure2The convolutional calculation unit of a 3*3 execute with
The corresponding convolutional calculation of the periodicity;
5) to the m2Corresponding element adds up in the convolutional calculation result of the convolutional calculation unit of a 3*3, to obtain
Obtain the output characteristic pattern of convolution algorithm finally.
Preferably, according to the method, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented is less than or equal to 3*3, then in the convolutional calculation unit of the same 3*3
Remaining each numerical value is simultaneously filled with 0 by the middle numerical value for being loaded into convolution kernel corresponding with the size;
If the size of the convolution algorithm needed to be implemented is greater than 3*3, then in the convolutional calculation unit of the 3*3 of respective numbers
It is loaded into the numerical value of convolution kernel corresponding with the size and remaining each numerical value is filled with 0.
Preferably, according to the method, wherein step 4) includes:
In each period during convolutional calculation, if comprising described in the numerical value for the input feature vector figure for needing to be loaded into
The element of left side first row in input feature vector figure, then disposably by the input feature vector figure with the convolution algorithm that needs to be implemented
Multiple elements that size matches are loaded into the corresponding position of the convolutional calculation unit and by the numbers of remaining each position
Value is filled with 0, otherwise then will be moved to the left as a whole a unit with element identical in previous cycle, and will be defeated
Enter different from previous cycle in characteristic pattern and the multiple elements updated is needed to be loaded into the position vacated by the movement
Set place.
Preferably, according to the method, wherein step 4) includes:
In each period during convolutional calculation, the convolutional calculation unit of the 3*3 is controlled to the needle loaded by it
To input feature vector figure and multiplication is executed for the element of the corresponding position of convolution kernel and is added up to the result of multiplication, with
Obtain the element of corresponding position in output characteristic pattern.
Preferably, according to the method, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented is 5*5, then 5*5 is loaded into the convolutional calculation unit of four 3*3
Convolution kernel numerical value and remaining each numerical value is filled with 0;
Also, step 4) includes:
In each period in the whole periods for executing convolutional calculation, the numerical value of corresponding input feature vector figure is loaded into institute
In the convolutional calculation unit for stating four 3*3, the numerical value of the input feature vector figure is in the convolutional calculation unit of four 3*3
It is distributed and is consistent with distribution of the numerical value of the convolution kernel of the 5*5 in the convolutional calculation unit of four 3*3;
Wherein, in each period during convolutional calculation, if being wrapped in the numerical value for the input feature vector figure that needs are loaded into
Element containing left side first row in the input feature vector figure, then disposably by 25 having a size of 5*5 in the input feature vector figure
Element is loaded into the corresponding position of the convolutional calculation unit of four 3*3 and is filled with the numerical value of remaining each position
0, it otherwise will then be moved to the left as a whole a unit with element identical in previous cycle, and by input feature vector figure
In it is different from previous cycle and the element that updates is needed to be loaded at the position vacated by the movement.
Preferably, according to the method, wherein step 4) includes:
In each period during convolutional calculation, the convolutional calculation unit of four 3*3 is controlled respectively to its institute
What is be loaded into executes multiplication for input feature vector figure and for the element of the corresponding position of convolution kernel and carries out to the result of multiplication
It is cumulative,
And step 5) includes:It adds up to the calculated result by all convolutional calculation units of four 3*3, with
Obtain the element of corresponding position in output characteristic pattern.
Preferably, according to the method, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented is 7*7, then 7*7 is loaded into the convolutional calculation unit of nine 3*3
Convolution kernel numerical value and remaining each numerical value is filled with 0;
Also, step 4) includes:
In each period in the whole periods for executing convolutional calculation, the numerical value of corresponding input feature vector figure is loaded into institute
In the convolutional calculation unit for stating nine 3*3, the numerical value of the input feature vector figure is in the convolutional calculation unit of nine 3*3
It is distributed and is consistent with distribution of the numerical value of the convolution kernel of the 7*7 in the convolutional calculation unit of nine 3*3;
Wherein, in each period during convolutional calculation, if being wrapped in the numerical value for the input feature vector figure that needs are loaded into
Element containing left side first row in the input feature vector figure, then disposably by 49 having a size of 7*7 in the input feature vector figure
Element is loaded into the corresponding position of the convolutional calculation unit of nine 3*3 and is filled with the numerical value of remaining each position
0, it otherwise will then be moved to the left as a whole a unit with element identical in previous cycle, and by input feature vector figure
In it is different from previous cycle and the element that updates is needed to be loaded at the position vacated by the movement.
Preferably, according to the method, wherein step 4) includes:
In each period during convolutional calculation, the convolutional calculation unit of nine 7*7 is controlled respectively to its institute
What is be loaded into executes multiplication for input feature vector figure and for the element of the corresponding position of convolution kernel and carries out to the result of multiplication
It is cumulative,
And step 5) includes:It adds up to the calculated result by all convolutional calculation units of nine 7*7, with
Obtain the element of corresponding position in output characteristic pattern.
And a kind of control unit, for realizing control method described in above-mentioned any one.
And a kind of convolutional neural networks processor, including:The convolutional calculation unit and control unit of 7*7, it is described
Control unit is for realizing above-mentioned any one the method.
Compared with the prior art, the advantages of the present invention are as follows:
The reusability for improving the computing unit for executing convolution reaches reduction and is necessarily placed at convolutional neural networks
Manage the effect of the hardware computational unit in device.It is unnecessary to be directed to need using various sizes of volume for convolutional neural networks processor
It accumulates the different convolutional layers of core and is arranged largely with various sizes of hardware computational unit.A convolutional layer is directed to executing
Calculating when, can be calculated using other unmatched computing units of size of the convolution kernel with the volume base, thus
Improve the utilization rate of hardware computational unit in convolutional neural networks processor.
Detailed description of the invention
Embodiments of the present invention is further illustrated referring to the drawings, wherein:
Fig. 1 is that M kind convolution kernel is used to carry out convolutional calculation to input figure layer in the prior art to obtain output figure layer and show
It is intended to, wherein each convolution kernel has N number of channel;
Fig. 2 is the signal for the convolution algorithm that the prior art and the present invention realize 3*3 using the computing unit of a 3*3
Figure;
Fig. 3 a be when carrying out 5*5 convolution algorithm using the computing unit of four 3*3 according to one embodiment of present invention by
The schematic diagram of computing unit loading input feature vector figure;
Fig. 3 b is the convolution algorithm for realizing 5*5 using the computing unit of four 3*3 according to one embodiment of present invention
Schematic diagram;
Fig. 4 is single by calculating when carrying out 3*3 convolution algorithm using the computing unit of 3*3 according to one embodiment of present invention
Member is loaded into the schematic diagram of input feature vector figure;
Fig. 5 is to realize that the convolution algorithm of 7*7 shows using the computing unit of nine 3*3 according to one embodiment of present invention
It is intended to.
Specific embodiment
It elaborates with reference to the accompanying drawings and detailed description to the present invention.
Inventor has found during studying the prior art, existing various classical neural networks, such as Alexnet,
GoogleNet, VGG, Resnet etc., these neural networks include the convolutional layer of different number, and different convolutional layer institutes
The convolution kernel size of use also difference.Such as Alexnet, the first layer of the network are the volume that convolution kernel size is 11*11
Lamination, the second layer of the network are the convolutional layer that convolution kernel size is 5*5, and it is 3*3's which, which is convolution kernel size,
Convolutional layer etc..
However, in existing various neural network processors, be be arranged for different size of convolution kernel it is different
Computing unit.This is resulted in, and when executing the calculating of some convolutional layer, is mismatched with the size of the convolution kernel of the volume base
Other computing units be in idle state.
For example, as shown in Figure 1, neural network processor can provide M kind different convolution kernels, it is denoted as convolution kernel 0
To convolution kernel M-1, each convolution kernel has N number of channel, is respectively used to carry out convolutional calculation for N number of channel of input figure layer,
Each convolution kernel and an input figure layer carry out available output figure layer after convolution algorithm.Scheme for an input
Layer, can be calculated M-1 output figure layer using whole M kind convolution kernels.If some input figure layer needs to be implemented use
The convolution algorithm of convolution kernel 1, other computing units in addition to computing unit corresponding with convolution kernel 1 are in idle shape at this time
State.
In this regard, adjusting computing unit reality by controlling the invention proposes the multiplexing scheme of a kind of pair of computing unit
The data that border is loaded into (for the same computing unit, both need to be loaded into the numerical value of convolution kernel, it is also desirable to it is special to be loaded into input
Levy the numerical value in figure), to realize the convolution algorithm realized with the computing unit of 3*3 scale and be directed to sizes, with reduce into
The scale for the hardware computational unit that row convolution algorithm must use.
Neural network processor system architecture of the present invention may include following five parts, input data storage
Unit, control unit, output data storage unit, weight storage unit, computing unit.
Input data storage unit is used to store the data for participating in calculating;Output data storage unit storage is calculated
Neuron response;Weight storage unit is for storing trained neural network weight;
Control unit is connected with output data storage unit, weight storage unit, computing unit respectively, and control unit can root
The control signal control computing unit obtained according to parsing carries out neural computing.
Computing unit is used for the control signal that generates according to control unit to execute corresponding neural computing.It calculates single
Member completes most of operation in neural network algorithm, i.e. multiply-add operation of vector etc..
Multiplexing according to the present invention to computing unit can control and reality computing unit by above-mentioned control unit
It is existing, it will specifically be introduced by several embodiments below.In the following, introducing how traditional prior art is first
The convolution algorithm of 3*3 is realized using the computing unit of 3*3.An example is gone out with reference to given in Fig. 2, for the prior art
For, the computing unit of 3*3 scale realizes convolution algorithm as follows:
In the period 1, by each element in 1-3 row in input feature vector figure, 1-3 column (referred to herein as to be needle
To the sliding window of input feature vector figure) with what each element of corresponding position in convolution kernel was multiplied resulting result respectively add up it
With the element as the 1st row the 1st column in output characteristic pattern, i.e. (3 × 2)+(2 × (- 8))=- 10.
In second round, each element in 1-3 row in input feature vector figure, 2-4 column (is slided for current period
Numerical value in window) it is multiplied respectively the sum of cumulative as exporting of resulting result with each element of corresponding position in convolution kernel
The 2nd column element (not shown in FIG. 2) of the 1st row in characteristic pattern.
And so on, by the right or moving down the sliding window having a size of 3*3 totally 24 times, to obtain having a size of 5*5
Output characteristic pattern.
The present invention does not repel using aforesaid way the convolution algorithm for utilizing the computing unit of 3*3 to realize 3*3.Also, into
One step, it in the present invention, can also be by control so that the computing unit realization of 3*3 scale is directed to other in addition to 3*3
Size convolution kernel operation, such as the convolution algorithm of 5*5,7*7,9*9.
As described in above, it is traditional in the prior art, computing unit can be only used for execute and its size phase
Deng convolution algorithm.The prior art does not have given as to how to realize such as 5*5,7*7,9*9 using the computing unit of 3*3
The introduction of convolution algorithm.One side computing unit is it is not apparent how be loaded into convolution kernel and input feature vector figure.On the other hand exist
In the prior art, the size for exporting characteristic pattern depends on the mobile number of sliding window, for example, being directed to the input feature vector figure of 7*7
The convolution algorithm of 3*3 is carried out, the traverse range and longitudinal movement range of sliding window are 5 units, by multiple
The calculating in period can obtain the output characteristic pattern of 5*5, this makes the convolution that other sizes are realized using the computing unit of 3*3
Operation is very difficult.It is appreciated that in the case where continuing to use the prior art, using the computing unit of 3*3 for the defeated of 7*7
Enter characteristic pattern and carry out convolutional calculation to can only obtain the output characteristic pattern (such as shown in Fig. 2) having a size of 5*5, calculate single
Member and processor are it is not apparent how mobile sliding window can obtain the volume of such as 5*5 using the computing unit of 3*3
Product operation.
In this regard, the invention proposes a kind of corresponding control method, it is special by dispatching the input being loaded into computing unit
Sign figure, convolution kernel, and control and execute multiplication, add operation, realize the convolution algorithm that 5*5 is executed with the computing unit of four 3*3.
According to one embodiment of present invention, with reference to Fig. 3 a, the computing unit that can be divided into four 3*3 is loaded into 5*5 jointly
Convolution kernel numerical value and input feature vector figure numerical value, four computing units are shown by dotted line.For example, in Fig. 3 a
In, 9 numerical value are loaded into be located at the computing unit of upper left, 6 numerical value are loaded into, by the calculating list of lower-left by the computing unit of upper right
Member is loaded into 6 numerical value, is loaded into 4 numerical value by the computing unit of bottom right.
Fig. 3 b shows specific control method corresponding with Fig. 3 a, and the control method is as follows:
The size of input feature vector figure is 7*7, and the size of the convolution algorithm needed to be implemented is 5*5, thus may determine that convolution
Calculating needs to be implemented the 3 × 3=9 period in total.
Judge 5>3, thus need to be completed the convolution fortune having a size of 5*5 jointly by the computing unit of more than one 3*3
It calculates.Here it can choose the computing unit that can be used for being loaded into k 3*3 of the data having a size of 5*5 just.Herein for k's
It is selected as:K=m2, 3m can choose the minimum positive integer more than or equal to n.It is of course also possible to select more more than above-mentioned
The computing unit of 3*3 execute the convolution algorithm of 5*5.For example shown by Fig. 3 b, select k=4 calculating single here
Member.
When computing unit carries out convolutional calculation, control is in each period respectively by the value of corresponding convolution kernel and corresponding
The value of input feature vector figure be loaded into the computing unit of four 3*3.
In the period 1, the element that 1-3 row, 1-3 are arranged in input feature vector figure is loaded into Fig. 3 b positioned at upper left
In the computing unit of 3*3, the element that 1-3 row, 4-5 are arranged in input feature vector figure is loaded into the 3*3 for being located at upper right in Fig. 3 b
Computing unit in, by the element of 4-5 row in input feature vector figure, 1-3 column be loaded into Fig. 3 b positioned at lower-left 3*3 meter
It calculates in unit, the element that 4-5 row, 4-5 are arranged in input feature vector figure is loaded into the calculating list for being located at the 3*3 of bottom right in Fig. 3 b
In member, and the element of remaining 6th row, the 6th column is filled with " 0 ";Also, in a similar way in the meter of four 3*3
The convolution kernel for being loaded into 5*5 in unit is calculated, i.e., the element that 1-3 row, 4-5 are arranged in convolution kernel is loaded into Fig. 3 b and is located at upper right
3*3 computing unit in, the element of 4-5 row in convolution kernel, 1-3 column is loaded into the 3*3 in Fig. 3 b positioned at lower-left
In computing unit, the element that 4-5 row, 4-5 are arranged in convolution kernel is loaded into the computing unit for being located at the 3*3 of bottom right in Fig. 3 b
In, and the element of remaining 6th row, the 6th column is filled with " 0 ".Thus it is special that input is loaded in the computing unit of four 3*3
The value of sign figure and convolution kernel.The computing unit of four 3*3 is controlled respectively to the input feature vector figure and convolution kernel loaded by it
The element of middle corresponding position executes multiplication, cumulative, and by corresponding position in all calculated result of the computing unit of four 3*3
Element it is cumulative, to obtain the element of the 1st row the 1st column in output characteristic pattern.Due to removing the convolution of script 3*3 in computing unit
Each element other than the numerical value of core is 0, therefore the result calculated and the practical computing unit using 3*3 carry out convolutional calculation
Result it is completely the same.
In second round, by inputted in Fig. 3 b 1-5 row in input feature vector figure, 2-6 column whole elements (i.e. " 0,0,
2,0,-3;0,3,-2,5,0;0,0,0,2,0;0,0,0,3,0;0,0,0,0,0 ") it is loaded into the computing unit using as 1-
The new element of 5 rows, 1-5 column.The element that control computing unit is directed to loaded by it executes multiplication and accumulating operation, to obtain
The element that the 1st row the 2nd arranges in characteristic pattern must be exported.
A preferred embodiment according to the present invention, can also be in the second cycle to the above-mentioned calculating in four 3*3
The mode that the data of input feature vector figure are loaded into unit improves, to improve loading efficiency.That is, upper left will be located in Fig. 3 b
Whole elements (i.e. " 0,0 of 1-3 row, 2-3 column in the computing unit of 3*3;0,3;0,0 ") whole to be moved to the left 1 unit
Using the new element as 1-3 row, 1-2 column, and by 1-3 row in input feature vector figure, the element (i.e. " 2 of the 3rd column;-
2;0 ") it is loaded into the computing unit using the new element as 1-3 row, the 3rd column;Similarly, upper right will be located in Fig. 3 b
3*3 computing unit in 1-3 row, the 2nd column whole elements (i.e. " 0;5;2 ") it is whole be moved to the left 1 unit using as
The new element of 1-3 row, the 1st column, and by 1-3 row in input feature vector figure, the element (i.e. " -3 of the 4th column;0;0 ") it is loaded into
Into the computing unit using the new element as 1-3 row, the 2nd column;It will be located in Fig. 3 b in the computing unit of the 3*3 of lower-left
Whole elements (i.e. " 0,0 of 1-2 row, 2-3 column;0,0 ") whole to be moved to the left 1 unit using as 1-2 row, 1-2
The new element of column, and by 1-2 row in input feature vector figure, the element (i.e. " 0 of the 3rd column;0 ") it is loaded into the computing unit
Using the new element as 1-2 row, the 3rd column;1-2 row in the computing unit of the 3*3 of bottom right, the 2nd column will be located in Fig. 3 b
Whole elements (i.e. " 3;0 ") whole to be moved to the left 1 unit using the new element as 1-2 row, the 1st column, and will be defeated
Enter the element (i.e. " 0 of 1-2 row in characteristic pattern, the 4th column;0 ") it is loaded into the computing unit to arrange as 1-2 row, the 2nd
New element.Thus the value of input feature vector figure loaded in the computing unit of the 5*5 is updated, has been reached and tradition
Using the effect that sliding window is similar in scheme.And computing unit is similarly controlled to multiply for the element execution loaded by it
Method and accumulating operation, to obtain the element of the 1st row the 2nd column in output characteristic pattern.
And so on, complete the period 3.
In the period 4, by the element of 2-6 row in input feature vector figure, 1-5 column to be carried with hereinbefore similar mode
Enter into the computing unit of four 3*3, and controls computing unit and execute multiplication and cumulative fortune for the element loaded by it
It calculates, to obtain the element of the 2nd row the 1st column in output characteristic pattern.And in the 5th subsequent to the period 6, using with aforementioned
Two to period 3 similar mode is loaded into the element of corresponding input feature vector figure in the computing unit.And so on, until
All nine periods are completed, the output characteristic pattern of 3*3 is obtained.
As can be seen that it is special that input is disposably loaded with into computing unit in the period 1 by above-mentioned control method
Levy 25 numerical value of 5*5 in figure.Similarly, the four, the seven periods were also disposably loaded with 25 numerical value in input feature vector figure.And
Correspondingly, in the second to three period, 3 numerical value for being loaded into input feature vector figure are only needed every time, and will be in previous cycle
The multiple numerical value used are moved to the left, and the numerical value for convolution kernel loaded in computing unit is not made an amendment then.It is similar
Ground, the five to six, the eight to nine period also use mode similar with the second to three period to be loaded into input feature vector figure
Element.
Thus, it is possible to ensure when in each period, in computing unit, the position of each element of input feature vector figure with
It is one-to-one that the position of the respective element in the convolution kernel of multiplying is carried out with it.Also, for remove for realizing
For other units, such as computing unit itself or processor except the unit of control method of the invention, they are not
It can be appreciated that the convolution unit of four 3*3 actually implemented is the convolution algorithm of 5*5.In addition, by above-mentioned control method,
So that the numerical value of input feature vector figure loaded by computing unit is not directly dependent on sliding window within each period.On the one hand
The arrangement for being embodied in the numerical value for the input feature vector figure being loaded into each computing unit is not dependent on the sliding window having a size of 5*5
Each actual arrangement mode of numerical value in mouthful, the periodicity for being on the other hand also embodied in calculating are also not dependent on size and calculate single
The mobile number of the sliding window of the equal 3*3 of elemental size, the quantity and size for exporting result can controls through the invention
Method controls, it is possible thereby to carry out the convolution algorithm of 5*5 for the input feature vector figure of 7*7 using the computing unit of four 3*3
And thereby obtain the output result of 3*3.
It certainly, in the present invention can also be special by input when executing the convolution algorithm of 3*3 using the computing unit of 3*3
A unit is moved to the left with part identical in previous cycle in sign figure and by corresponding 3 of new input feature vector figure
Element is filled at the position vacated by the movement, such as shown in Fig. 4.
It is appreciated that the present invention can also implement the convolution algorithm for being less than 3*3 using the computing unit of 3*3, i.e., same
It is loaded into the numerical value of the convolution kernel of corresponding size, the numerical value of input feature vector figure in a computing unit, and rest part is filled with
"0".It in the specific implementation, can be required to determine according to the size of input feature vector figure, the size of performed convolution algorithm
Period, and controlled by the way of similar to the above embodiments.
According to one embodiment of present invention, a kind of control method is provided, realizes and is executed with the computing unit of multiple 3*3
The convolution algorithm of 7*7, with reference to Fig. 5, the specific control method is as follows:
Judge 7>3, selection is loaded into the data having a size of 5*5 in the computing unit of k 3*3:K=m2, 3m selection is greatly
In or equal to n minimum positive integer, here select k=9 computing unit.
The numerical value of used convolution kernel is divided into nine parts and is loaded into the computing unit of nine 3*3 respectively by control, will
Rest part is filled with " 0 ";Also, in each period, data corresponding in input feature vector figure are divided into four parts difference by control
It is loaded into the computing unit of nine 3*3, rest part is filled with " 0 ".Here in the computing unit of nine 3*3
In, the distribution mode of the numerical value of convolution kernel and the distribution mode of the numerical value of input feature vector figure are consistent.
Also, it controls each computing unit and executes multiplication and accumulating operation for the element loaded by it, by will be whole
Corresponding calculated result adds up in the calculated result of nine computing units, to obtain the number of corresponding output characteristic pattern
Value.
In this embodiment input feature vector figure can also be loaded by the way of similar with Fig. 3 b.
Similarly, the convolution algorithm of such as 9*9 can also be realized using the computing unit of nine 3*3.
In the present invention, corresponding control unit can be set for above-mentioned control method, and such control unit can be with
It is adapted to an existing convolutional neural networks processor, come to based on convolution by way of implementing above-mentioned control method
It calculates unit to be multiplexed, matched convolutional Neural net can also be designed based on hardware resource required for such control unit
Network processor, such as minimal number of hardware resource is used in the case where meeting above-mentioned multiplexing scheme.
Scheme provided by the present invention is related to improving the reusability of the computing unit for executing convolution, must be set with reducing
The hardware computational unit in convolutional neural networks processor is set, it is unnecessary to be directed to need to use for convolutional neural networks processor
The different convolutional layers of various sizes of convolution kernel and be arranged largely have various sizes of hardware computational unit.Executing needle
When to the calculating of a convolutional layer, the convolutional calculation for being directed to different convolutional layers can be realized using the computing unit of same size,
Which thereby enhance the utilization rate of hardware computational unit in convolutional neural networks processor.
It is appreciated that the present invention repel as described in background technique using more massive hardware concurrent into
Row calculation processing and the reusability that computing unit is improved by way of " time division multiplexing ".
Furthermore, it is desirable to illustrate, each step introduced in above-described embodiment is all not necessary, art technology
Personnel can carry out according to actual needs it is appropriate accept or reject, replacement, modification etc..
It should be noted last that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting.On although
The invention is described in detail with reference to an embodiment for text, those skilled in the art should understand that, to skill of the invention
Art scheme is modified or replaced equivalently, and without departure from the spirit and scope of technical solution of the present invention, should all be covered at this
In the scope of the claims of invention.
Claims (10)
1. a kind of control method for convolutional neural networks processor, the convolutional neural networks processor has the volume of 3*3
Product computing unit, the control method include:
1) the convolution kernel size n*n of the convolution algorithm needed to be implemented is determined;
2) the convolution kernel size n*n of the convolution algorithm executed as needed is selected in m2It is loaded into the convolutional calculation unit of a 3*3
The numerical value of convolution kernel corresponding with the size, and remaining each numerical value is filled with 0,3m >=n;
3) size of the convolution algorithm executed as needed and need to be implemented convolution input feature vector figure size, determine volume
Periodicity needed for product calculating process;And
4) according to the periodicity, the numerical value of corresponding input feature vector figure is loaded by each period during convolutional calculation
To the m2In the convolutional calculation unit of a 3*3, the numerical value of the input feature vector figure is in the m2The convolutional calculation unit of a 3*3
In distribution with the numerical value of the convolution kernel in the m2Distribution in the convolutional calculation unit of a 3*3 is consistent;
Control is loaded with the m of the numerical value of convolution kernel and input feature vector figure2The convolutional calculation unit of a 3*3 execute with it is described
The corresponding convolutional calculation of periodicity;
5) to the m2Corresponding element adds up in the convolutional calculation result of the convolutional calculation unit of a 3*3, final to obtain
Convolution algorithm output characteristic pattern.
2. according to the method described in claim 1, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented is less than or equal to 3*3, then carried in the convolutional calculation unit of the same 3*3
Enter the numerical value of convolution kernel corresponding with the size and remaining each numerical value is filled with 0;
If the size of the convolution algorithm needed to be implemented is greater than 3*3, then it is loaded into the convolutional calculation unit of the 3*3 of respective numbers
Remaining each numerical value is simultaneously filled with 0 by the numerical value of convolution kernel corresponding with the size.
3. according to the method described in claim 1, wherein step 4) includes:
In each period during convolutional calculation, if comprising the input in the numerical value for the input feature vector figure for needing to be loaded into
The element of left side first row in characteristic pattern, then disposably by the size in the input feature vector figure with the convolution algorithm needed to be implemented
The multiple elements to match are loaded into the corresponding position of the convolutional calculation unit and fill out the numerical value of remaining each position
Filling is 0, a unit otherwise will be then moved to the left as a whole with element identical in previous cycle, and will input spy
Multiple elements that and needs different from previous cycle update in sign figure are loaded into the position vacated by the movement
Place.
4. according to the method described in claim 1, wherein step 4) includes:
In each period during convolutional calculation, control the convolutional calculation unit of the 3*3 to loaded by it for defeated
Enter characteristic pattern and execute multiplication for the element of the corresponding position of convolution kernel and add up to the result of multiplication, to obtain
Export the element of corresponding position in characteristic pattern.
5. method described in any one of -4 according to claim 1, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented is 5*5, then the volume of 5*5 is loaded into the convolutional calculation unit of four 3*3
Remaining each numerical value is simultaneously filled with 0 by the numerical value of product core;
Also, step 4) includes:
In each period in the whole periods for executing convolutional calculation, the numerical value of corresponding input feature vector figure is loaded into described four
In the convolutional calculation unit of a 3*3, distribution of the numerical value of the input feature vector figure in the convolutional calculation unit of four 3*3
It is consistent with the distribution of the numerical value of the convolution kernel of the 5*5 in the convolutional calculation unit of four 3*3;
Wherein, in each period during convolutional calculation, if comprising institute in the numerical value for the input feature vector figure for needing to be loaded into
The element for stating left side first row in input feature vector figure, then disposably by 25 elements in the input feature vector figure having a size of 5*5
It is loaded into the corresponding position of the convolutional calculation unit of four 3*3 and the numerical value of remaining each position is filled with 0, it is no
To then be moved to the left as a whole a unit with element identical in previous cycle, and by input feature vector figure with
The element that different and needs update in previous cycle is loaded at the position vacated by the movement.
6. according to the method described in claim 5, wherein step 4) includes:
In each period during convolutional calculation, the convolutional calculation unit of four 3*3 is controlled respectively to loaded by it
For input feature vector figure and execute and multiplication and the result of multiplication carried out tired for the element of the corresponding position of convolution kernel
Add,
And step 5) includes:It adds up to the calculated result by all convolutional calculation units of four 3*3, to obtain
Export the element of corresponding position in characteristic pattern.
7. method described in any one of -4 according to claim 1, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented is 7*7, then the volume of 7*7 is loaded into the convolutional calculation unit of nine 3*3
Remaining each numerical value is simultaneously filled with 0 by the numerical value of product core;
Also, step 4) includes:
In each period in the whole periods for executing convolutional calculation, the numerical value of corresponding input feature vector figure is loaded into described nine
In the convolutional calculation unit of a 3*3, distribution of the numerical value of the input feature vector figure in the convolutional calculation unit of nine 3*3
It is consistent with the distribution of the numerical value of the convolution kernel of the 7*7 in the convolutional calculation unit of nine 3*3;
Wherein, in each period during convolutional calculation, if comprising institute in the numerical value for the input feature vector figure for needing to be loaded into
The element for stating left side first row in input feature vector figure, then disposably by 49 elements in the input feature vector figure having a size of 7*7
It is loaded into the corresponding position of the convolutional calculation unit of nine 3*3 and the numerical value of remaining each position is filled with 0, it is no
To then be moved to the left as a whole a unit with element identical in previous cycle, and by input feature vector figure with
The element that different and needs update in previous cycle is loaded at the position vacated by the movement.
8. according to the method described in claim 7, wherein step 4) includes:
In each period during convolutional calculation, the convolutional calculation unit of nine 7*7 is controlled respectively to loaded by it
For input feature vector figure and execute and multiplication and the result of multiplication carried out tired for the element of the corresponding position of convolution kernel
Add,
And step 5) includes:It adds up to the calculated result by all convolutional calculation units of nine 7*7, to obtain
Export the element of corresponding position in characteristic pattern.
9. a kind of control unit, for realizing the control method as described in any one of claim 1-8.
10. a kind of convolutional neural networks processor, including:The convolutional calculation unit and control unit of 7*7, the control are single
Member is for realizing any one of such as claim 1-8 the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810685538.9A CN108875917A (en) | 2018-06-28 | 2018-06-28 | A kind of control method and device for convolutional neural networks processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810685538.9A CN108875917A (en) | 2018-06-28 | 2018-06-28 | A kind of control method and device for convolutional neural networks processor |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108875917A true CN108875917A (en) | 2018-11-23 |
Family
ID=64295557
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810685538.9A Pending CN108875917A (en) | 2018-06-28 | 2018-06-28 | A kind of control method and device for convolutional neural networks processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108875917A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109948775A (en) * | 2019-02-21 | 2019-06-28 | 山东师范大学 | The configurable neural convolutional network chip system of one kind and its configuration method |
CN110377874A (en) * | 2019-07-23 | 2019-10-25 | 江苏鼎速网络科技有限公司 | Convolution algorithm method and system |
CN110414672A (en) * | 2019-07-23 | 2019-11-05 | 江苏鼎速网络科技有限公司 | Convolution algorithm method, apparatus and system |
CN110443357A (en) * | 2019-08-07 | 2019-11-12 | 上海燧原智能科技有限公司 | Convolutional neural networks calculation optimization method, apparatus, computer equipment and medium |
CN113673690A (en) * | 2021-07-20 | 2021-11-19 | 天津津航计算技术研究所 | Underwater noise classification convolution neural network accelerator |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105139346A (en) * | 2015-07-09 | 2015-12-09 | Tcl集团股份有限公司 | Digital image processing method digital image processing device |
CN107657581A (en) * | 2017-09-28 | 2018-02-02 | 中国人民解放军国防科技大学 | Convolutional neural network CNN hardware accelerator and acceleration method |
CN107818367A (en) * | 2017-10-30 | 2018-03-20 | 中国科学院计算技术研究所 | Processing system and processing method for neutral net |
CN107844826A (en) * | 2017-10-30 | 2018-03-27 | 中国科学院计算技术研究所 | Neural-network processing unit and the processing system comprising the processing unit |
CN108205700A (en) * | 2016-12-20 | 2018-06-26 | 上海寒武纪信息科技有限公司 | Neural network computing device and method |
-
2018
- 2018-06-28 CN CN201810685538.9A patent/CN108875917A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105139346A (en) * | 2015-07-09 | 2015-12-09 | Tcl集团股份有限公司 | Digital image processing method digital image processing device |
CN108205700A (en) * | 2016-12-20 | 2018-06-26 | 上海寒武纪信息科技有限公司 | Neural network computing device and method |
CN107657581A (en) * | 2017-09-28 | 2018-02-02 | 中国人民解放军国防科技大学 | Convolutional neural network CNN hardware accelerator and acceleration method |
CN107818367A (en) * | 2017-10-30 | 2018-03-20 | 中国科学院计算技术研究所 | Processing system and processing method for neutral net |
CN107844826A (en) * | 2017-10-30 | 2018-03-27 | 中国科学院计算技术研究所 | Neural-network processing unit and the processing system comprising the processing unit |
Non-Patent Citations (2)
Title |
---|
LI DU 等: ""A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things"", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS》 * |
张抢强: ""基于分块卷积的大图像输入卷积神经网络加速"", 《中国科技论文在线》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109948775A (en) * | 2019-02-21 | 2019-06-28 | 山东师范大学 | The configurable neural convolutional network chip system of one kind and its configuration method |
CN110377874A (en) * | 2019-07-23 | 2019-10-25 | 江苏鼎速网络科技有限公司 | Convolution algorithm method and system |
CN110414672A (en) * | 2019-07-23 | 2019-11-05 | 江苏鼎速网络科技有限公司 | Convolution algorithm method, apparatus and system |
CN110377874B (en) * | 2019-07-23 | 2023-05-02 | 江苏鼎速网络科技有限公司 | Convolution operation method and system |
CN110443357A (en) * | 2019-08-07 | 2019-11-12 | 上海燧原智能科技有限公司 | Convolutional neural networks calculation optimization method, apparatus, computer equipment and medium |
CN113673690A (en) * | 2021-07-20 | 2021-11-19 | 天津津航计算技术研究所 | Underwater noise classification convolution neural network accelerator |
CN113673690B (en) * | 2021-07-20 | 2024-05-28 | 天津津航计算技术研究所 | Underwater noise classification convolutional neural network accelerator |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108875917A (en) | A kind of control method and device for convolutional neural networks processor | |
CN106447034B (en) | A kind of neural network processor based on data compression, design method, chip | |
CN106951395B (en) | Parallel convolution operations method and device towards compression convolutional neural networks | |
CN107578095B (en) | Neural computing device and processor comprising the computing device | |
CN107153873B (en) | A kind of two-value convolutional neural networks processor and its application method | |
CN108985449A (en) | A kind of control method and device of pair of convolutional neural networks processor | |
Goodman et al. | Brian: a simulator for spiking neural networks in python | |
CN110059798A (en) | Develop the sparsity in neural network | |
CN109621422A (en) | Electronics chess and card decision model training method and device, strategy-generating method and device | |
CN110298443A (en) | Neural network computing device and method | |
CN112084038B (en) | Memory allocation method and device of neural network | |
CN107977414A (en) | Image Style Transfer method and its system based on deep learning | |
CN106875013A (en) | The system and method for optimizing Recognition with Recurrent Neural Network for multinuclear | |
CN107578098A (en) | Neural network processor based on systolic arrays | |
CN108009627A (en) | Neutral net instruction set architecture | |
CN111176758B (en) | Configuration parameter recommendation method and device, terminal and storage medium | |
CN110121721A (en) | The framework accelerated for sparse neural network | |
CN111291878A (en) | Processor for artificial neural network computation | |
CN106529670A (en) | Neural network processor based on weight compression, design method, and chip | |
CN107451653A (en) | Computational methods, device and the readable storage medium storing program for executing of deep neural network | |
CN109901878A (en) | One type brain computing chip and calculating equipment | |
CN107301453A (en) | The artificial neural network forward operation apparatus and method for supporting discrete data to represent | |
CN109325591A (en) | Neural network processor towards Winograd convolution | |
CN109472356A (en) | A kind of accelerator and method of restructural neural network algorithm | |
CN107301454A (en) | The artificial neural network reverse train apparatus and method for supporting discrete data to represent |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181123 |
|
RJ01 | Rejection of invention patent application after publication |