CN108875925A - A kind of control method and device for convolutional neural networks processor - Google Patents
A kind of control method and device for convolutional neural networks processor Download PDFInfo
- Publication number
- CN108875925A CN108875925A CN201810685989.2A CN201810685989A CN108875925A CN 108875925 A CN108875925 A CN 108875925A CN 201810685989 A CN201810685989 A CN 201810685989A CN 108875925 A CN108875925 A CN 108875925A
- Authority
- CN
- China
- Prior art keywords
- numerical value
- convolutional calculation
- feature vector
- loaded
- input feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
The present invention provides a kind of control method, including:1) the size n*n of the convolution algorithm needed to be implemented is determined;2) the size n*n of the convolution algorithm executed as needed is selected in m2It is loaded into the numerical value of convolution kernel corresponding with the size in the convolutional calculation unit of a 7*7, and remaining each numerical value is filled with 0,7m >=n;3) size of the convolution algorithm executed as needed, need to be implemented convolution input feature vector figure size, periodicity needed for determining convolutional calculation process;4) numerical value of corresponding input feature vector figure is loaded into the m by each period during convolutional calculation2In the convolutional calculation unit of a 7*7, the numerical value of the input feature vector figure is in the m2The numerical value of distribution and the convolution kernel in the convolutional calculation unit of a 7*7 is in the m2Distribution in the convolutional calculation unit of a 7*7 is consistent;Control is loaded with the m of the numerical value of convolution kernel and input feature vector figure2The convolutional calculation unit of a 7*7 executes convolutional calculation corresponding with the periodicity respectively.
Description
Technical field
The present invention relates to a kind of convolutional neural networks processors, more particularly to the hardware for convolutional neural networks processor
Accelerate the improvement of aspect.
Background technique
Artificial intelligence technology has obtained swift and violent development in recent years, and extensive pass has been obtained in worldwide
The research work that note, either industry or academia have all carried out artificial intelligence technology, artificial intelligence technology is infiltrated into
The every field such as visual perception, speech recognition, auxiliary driving, smart home, traffic scheduling.Depth learning technology is artificial intelligence
The boost motor of technology development.Deep learning is trained using the topological structure of deep neural network, is optimized and reasoning etc., depth
White piece of convolutional neural networks of neural network, depth confidence network, Recognition with Recurrent Neural Network etc., by iterating, training.With image
For identification application, deep learning algorithm can be automatically derived the characteristic of hiding image by deep neural network,
And it generates and is better than traditional effect based on pattern recognition analysis method.
However, the realization of existing depth learning technology depends on great calculation amount.In the training stage, need in magnanimity
Pass through the weighted data being calculated in neural network that iterates in data;In the reasoning stage, also need using nerve net
Network completes the calculation process to input data within the extremely short response time (usually Millisecond), this needs disposed nerve net
Network computing circuit (including CPU, GPU, FPGA and ASIC etc.) reaches per second hundred billion times even computing capability of trillion times.Thus,
It is that have very much must to for realizing the hardware-accelerated of depth learning technology, such as to the hardware-accelerated of convolutional neural networks processor
It wants.
It has been generally acknowledged that realizing that hardware-accelerated mode can be roughly divided into two kinds, one is use more massive hardware simultaneously
Carry out calculation processing capablely, it is another then be that processing speed or efficiency are improved by design specialized hardware circuit.
For the above-mentioned second way, neural network is directly mapped as hardware circuit by some prior arts, for each
Different computing units is respectively adopted in network layer, so that the calculating for each network layer carries out in pipelined fashion.For example,
Each computing unit in addition to first computing unit is using the output of previous computing unit as its input, and each meter
Unit is calculated to be only used for executing the calculating for being directed to corresponding network layer, within the different unit time of assembly line, the meter
Unit is calculated to calculate the different inputs of the network layer.Such prior art, generally directed to be to need continuous place
The scene of different inputs is managed, such as the video file comprising multiple image is handled.Also, such prior art is logical
Often it is directed to the neural network with less network layer.This is because, the network number of plies of deep neural network and larger,
Neural network is directly mapped as hardware circuit, the cost of circuit area is very large, and power consumption also can be with circuit area
Increase and increase.In addition, it is contemplated that each network layer mutual operation time, there is also larger differences, in order to realize assembly line
Function, be supplied to each assembly line level runing time needs be forced to be set as being equal to each other, that is, be equal to processing speed
The operation time of most slow assembly line level.For the deep neural network with a large amount of network layers, design assembly line is needed
Factors much more very is considered, to reduce waiting needed for the comparatively faster assembly line level of processing speed during pipeline computing
Time.
There are also some prior arts in the case where the rule calculated with reference to neural network, and proposition can be for mind
" time division multiplexing " is carried out to improve the reusability of computing unit through the computing unit in network processing unit, is different from above-mentioned flowing water
The mode of line successively calculates each network layer in neural network using identical computing unit.Such as to input layer,
First hidden layer, the second hidden layer ... output layer is seriatim calculated, and next iteration calculating in repeat above-mentioned mistake
Journey.Such prior art can also be directed to deep neural network for the neural network with less network layer, and
It is particularly suitable for the limited application scenarios of hardware resource,.For such application scenarios, neural network processor is being directed to one
After a input has carried out the calculating of network layer A, it may all not need to carry out the calculating for network layer A again for a long time,
If different hardware, which is respectively adopted, as its computing unit in each network layer then will lead to limitation to hardware, so that hardware
Reusability is not high.Most prior arts are all based on such consideration, using it is different for computing unit " time-division is multiple
With " mode and the hardware of neural network processor is correspondingly improved.
However, no matter using which kind of above-mentioned prior art designing convolutional neural networks processor, however it remains hardware benefit
In place of having much room for improvement with rate.
Summary of the invention
Therefore, it is an object of the invention to overcome the defect of the above-mentioned prior art, provides a kind of for convolutional neural networks
The control method of processor, the convolutional neural networks processor have the convolutional calculation unit of 7*7, the control method packet
It includes:
1) the convolution kernel size n*n of the convolution algorithm needed to be implemented is determined;
2) the convolution kernel size n*n of the convolution algorithm executed as needed is selected in m2In the convolutional calculation unit of a 7*7
It is loaded into the numerical value of convolution kernel corresponding with the size, and remaining each numerical value is filled with 0,7m >=n;
3) size of the convolution algorithm executed as needed and need to be implemented convolution input feature vector figure size, really
Periodicity needed for determining convolutional calculation process;And
4) according to the periodicity, each period during convolutional calculation, by the numerical value of corresponding input feature vector figure
It is loaded into the m2In the convolutional calculation unit of a 7*7, the numerical value of the input feature vector figure is in the m2The convolutional calculation of a 7*7
The numerical value of distribution and the convolution kernel in unit is in the m2Distribution in the convolutional calculation unit of a 7*7 is consistent;
Control is loaded with the m of the numerical value of convolution kernel and input feature vector figure2The convolutional calculation unit of a 7*7 is held respectively
Row convolutional calculation corresponding with the periodicity;
5) to the m2Corresponding element adds up in the convolutional calculation result of the convolutional calculation unit of a 7*7, to obtain
Obtain the output characteristic pattern of convolution algorithm finally.
Preferably, according to the method, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented is less than 7*7, then it is loaded into the convolutional calculation unit of the same 7*7
Remaining each numerical value is simultaneously filled with 0 by the numerical value of convolution kernel corresponding with the size;
If the size of the convolution algorithm needed to be implemented is greater than 7*7, then in the convolutional calculation unit of the 7*7 of respective numbers
It is loaded into the numerical value of convolution kernel corresponding with the size and remaining each numerical value is filled with 0.
Preferably, according to the method, wherein step 4) includes:
In each period during convolutional calculation, if comprising described in the numerical value for the input feature vector figure for needing to be loaded into
The element of left side first row in input feature vector figure, then disposably by the input feature vector figure with the convolution algorithm that needs to be implemented
Multiple elements that size matches are loaded into the corresponding position of the convolutional calculation unit and by the numbers of remaining each position
Value is filled with 0, otherwise then will be moved to the left as a whole a unit with element identical in previous cycle, and will be defeated
Enter different from previous cycle in characteristic pattern and the multiple elements updated is needed to be loaded into the position vacated by the movement
Set place.
Preferably, according to the method, wherein step 4) includes:
In each period during convolutional calculation, the m is controlled2The convolutional calculation unit of a 7*7 is respectively to its institute
What is be loaded into executes multiplication for input feature vector figure and for the element of the corresponding position of convolution kernel and carries out to the result of multiplication
It is cumulative, to obtain the element of corresponding position in output characteristic pattern.
Preferably, according to the method, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented is 5*5, then 5* is loaded into the convolutional calculation unit of the same 7*7
Remaining each numerical value is simultaneously filled with 0 by the numerical value of 5 convolution kernel;
Also, step 4) includes:
In each period in the whole periods for executing convolutional calculation, the numerical value of corresponding input feature vector figure is loaded into institute
It states in the convolutional calculation unit of 7*7, distribution and institute of the numerical value of the input feature vector figure in the convolutional calculation unit of the 7*7
Distribution of the numerical value of the convolution kernel of 5*5 in the convolutional calculation unit of the 7*7 is stated to be consistent;
Wherein, in each period during convolutional calculation, if being wrapped in the numerical value for the input feature vector figure that needs are loaded into
Element containing left side first row in the input feature vector figure, then disposably by 25 having a size of 5*5 in the input feature vector figure
Element is loaded into the corresponding position of the convolutional calculation unit and the numerical value of remaining each position is filled with 0, otherwise then
To be moved to the left as a whole a unit with element identical in previous cycle, and by input feature vector figure with it is previous
5 elements that different and needs update in period are loaded at the position vacated by the movement.
Preferably, according to the method, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented is 3*3, then it is loaded into the convolutional calculation unit of the same 7*7
Remaining each numerical value to the numerical value of 4 channels, 3*3 convolution kernel and is filled with 0 by spininess;
Also, step 4) includes:
In each period in the whole periods for executing convolutional calculation, the numerical value of corresponding input feature vector figure is loaded into institute
It states in the convolutional calculation unit of 7*7, the numerical value of the input feature vector figure is with one equal with the quantity of the convolution kernel of the 3*3
Or the form of multiple copies is loaded into the convolutional calculation unit of the 7*7, and the input feature vector figure is the 7*7's
The distribution phase being distributed with the numerical value of the convolution kernel of the 3*3 in the convolutional calculation unit of the 7*7 in convolutional calculation unit
It is corresponding;
Wherein, in each period during convolutional calculation, if being wrapped in the numerical value for the input feature vector figure that needs are loaded into
Element containing left side first row in the input feature vector figure, then disposably by 9 members in the input feature vector figure having a size of 3*3
Element is loaded into the corresponding position of the convolutional calculation unit and the numerical value of remaining each position is filled with 0, otherwise then will
Be moved to the left a unit as a whole with element identical in previous cycle, and by input feature vector figure with the last week
3 elements that interim different and needs update are loaded into the corresponding position vacated by the movement.
Preferably, according to the method, wherein step 4) further includes:
If being loaded into the number for being directed to 2 or 4 channels, 3*3 convolution kernel in the convolutional calculation unit of the same 7*7
Value, and do not include the element of left side first row in the input feature vector figure in the numerical value for the input feature vector figure for needing to be loaded into, then
By same number of columns is in the convolutional calculation unit of the 7*7 and in different line numbers, 2 about the input feature vector figure
Be moved to the left a unit in copy as a whole with element identical in previous cycle, and by input feature vector figure with
3 elements that different and needs update in previous cycle are loaded into the corresponding position vacated by the movement.
Preferably, according to the method, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented is 11*11, then control common by the convolutional calculation unit of four 7*7
It is loaded into the numerical value of the convolution kernel of 11*11 and remaining each numerical value is filled with 0;
Also, step 4) includes:
In each period in the whole periods for executing convolutional calculation, the numerical value of corresponding input feature vector figure is loaded into institute
In the convolutional calculation unit for stating four 7*7, the numerical value of the input feature vector figure is in the convolutional calculation unit of four 7*7
It is distributed and is consistent with distribution of the numerical value of the convolution kernel of the 11*11 in the convolutional calculation unit of four 7*7;
Wherein, in each period during convolutional calculation, if being wrapped in the numerical value for the input feature vector figure that needs are loaded into
Containing in the input feature vector figure left side first row element, then disposably by the input feature vector figure having a size of 11*11 121
A element is loaded into the corresponding position of the convolutional calculation unit and the numerical value of remaining each position is filled with 0, otherwise
To then be moved to the left as a whole a unit with element identical in previous cycle, and by input feature vector figure with before
11 elements that different and needs update in one period are loaded into the corresponding position vacated by the movement.
Preferably, according to the method, wherein step 4) includes:
In each period during convolutional calculation, the convolutional calculation unit of four 7*7 is controlled respectively to its institute
What is be loaded into executes multiplication for input feature vector figure and for the element of the corresponding position of convolution kernel and carries out to the result of multiplication
It is accumulated in each period during convolutional calculation, controls the convolutional calculation unit of four 7*7 respectively to loaded by it
For input feature vector figure and execute and multiplication and the result of multiplication carried out tired for the element of the corresponding position of convolution kernel
Add;
And step 5) includes:It adds up to the calculated result by all convolutional calculation units of four 7*7, with
Obtain the element of corresponding position in output characteristic pattern.
And a kind of control unit, for realizing control method described in above-mentioned any one.
And a kind of convolutional neural networks processor, including:The convolutional calculation unit and control unit of 7*7, it is described
Control unit is for realizing above-mentioned any one the method.
Compared with the prior art, the advantages of the present invention are as follows:
The reusability for improving the computing unit for executing convolution reaches reduction and is necessarily placed at convolutional neural networks
Manage the effect of the hardware computational unit in device.It is unnecessary to be directed to need using various sizes of volume for convolutional neural networks processor
It accumulates the different convolutional layers of core and is arranged largely with various sizes of hardware computational unit.A convolutional layer is directed to executing
Calculating when, can be calculated using other unmatched computing units of size of the convolution kernel with the volume base, thus
Improve the utilization rate of hardware computational unit in convolutional neural networks processor.
Detailed description of the invention
Embodiments of the present invention is further illustrated referring to the drawings, wherein:
Fig. 1 is that M kind convolution kernel is used to carry out convolutional calculation to input figure layer in the prior art to obtain output figure layer and show
It is intended to, wherein each convolution kernel has N number of channel;
Fig. 2 is the schematic diagram for the convolution algorithm that the prior art realizes 7*7 using the computing unit of a 7*7;
Fig. 3 is to realize that the convolution algorithm of 5*5 shows using the computing unit of 7*7 according to one embodiment of present invention
It is intended to;
Fig. 4 is disposably to be realized using the computing unit of a 7*7 to 4 channels according to one embodiment of present invention
The schematic diagram of the convolution algorithm of 3*3;
Fig. 5 is the convolution algorithm for realizing 11*11 using the computing unit of four 7*7 according to one embodiment of present invention
Schematic diagram.
Specific embodiment
It elaborates with reference to the accompanying drawings and detailed description to the present invention.
Inventor has found during studying the prior art, existing various classical neural networks, such as Alexnet,
GoogleNet, VGG, Resnet etc., these neural networks include the convolutional layer of different number, and different convolutional layer institutes
The convolution kernel size of use also difference.Such as Alexnet, the first layer of the network are the volume that convolution kernel size is 11*11
Lamination, the second layer of the network are the convolutional layer that convolution kernel size is 5*5, and it is 3*3's which, which is convolution kernel size,
Convolutional layer etc..
However, in existing various neural network processors, be be arranged for different size of convolution kernel it is different
Computing unit.This is resulted in, and when executing the calculating of some convolutional layer, is mismatched with the size of the convolution kernel of the volume base
Other computing units be in idle state.
For example, as shown in Figure 1, neural network processor can provide M kind different convolution kernels, it is denoted as convolution kernel 0
To convolution kernel M-1, each convolution kernel has N number of channel, is respectively used to carry out convolutional calculation for N number of channel of input figure layer,
Each convolution kernel and an input figure layer carry out available output figure layer after convolution algorithm.Scheme for an input
Layer, can be calculated M-1 output figure layer using whole M kind convolution kernels.If some input figure layer needs to be implemented use
The convolution algorithm of convolution kernel 1, other computing units in addition to computing unit corresponding with convolution kernel 1 are in idle shape at this time
State.
In this regard, adjusting computing unit reality by controlling the invention proposes the multiplexing scheme of a kind of pair of computing unit
The data that border is loaded into (for the same computing unit, both need to be loaded into the numerical value of convolution kernel, it is also desirable to it is special to be loaded into input
Levy the numerical value in figure), to realize the convolution algorithm realized with the computing unit of 7*7 scale and be directed to sizes, with reduce into
The scale for the hardware computational unit that row convolution algorithm must use.
Neural network processor system architecture of the present invention may include following five parts, input data storage
Unit, control unit, output data storage unit, weight storage unit, computing unit.
Input data storage unit is used to store the data for participating in calculating;Output data storage unit storage is calculated
Neuron response;Weight storage unit is for storing trained neural network weight;
Control unit is connected with output data storage unit, weight storage unit, computing unit respectively, and control unit can root
The control signal control computing unit obtained according to parsing carries out neural computing.
Computing unit is used for the control signal that generates according to control unit to execute corresponding neural computing.It calculates single
Member completes most of operation in neural network algorithm, i.e. multiply-add operation of vector etc..
Multiplexing according to the present invention to computing unit can control and reality computing unit by above-mentioned control unit
It is existing, it will specifically be introduced by several embodiments below.
In the following, introducing traditional prior art is how to realize the convolution of 7*7 using the computing unit of 7*7 first
Operation.An example is gone out with reference to given in Fig. 2, for the prior art, the computing unit of 7*7 scale is as follows
To realize convolution algorithm:
In the period 1, by each element in 1-7 row in input feature vector figure, 1-7 column (referred to herein as to be needle
To the sliding window of input feature vector figure) with what each element of corresponding position in convolution kernel was multiplied resulting result respectively add up it
With as output characteristic pattern in the 1st row the 1st column element, i.e., 2 × (- 4)+(3 × 2)+(- 2 × (- 4))+(2 × (- 8))+(-
7 × 3)=- 31.
In second round, each element in 1-7 row in input feature vector figure, 2-8 column (is slided for current period
Numerical value in window) it is multiplied respectively the sum of cumulative as exporting of resulting result with each element of corresponding position in convolution kernel
The 2nd column element (not shown in FIG. 2) of the 1st row in characteristic pattern.
And so on, by the right or moving down the sliding window having a size of 7*7 totally 15 times, to obtain having a size of 4*4
Output characteristic pattern.
The present invention does not repel using aforesaid way the convolution algorithm for utilizing the computing unit of 7*7 to realize 7*7.Also, into
One step, it in the present invention, can also be by control so that the computing unit realization of 7*7 scale is directed to other in addition to 7*7
Size convolution kernel operation, such as the convolution algorithm of 5*5,3*3,11*11.
As described in above, it is traditional in the prior art, the size for exporting characteristic pattern depends on sliding window
The size of mobile number and convolution kernel, for example, the convolution algorithm of 7*7 is carried out for the input feature vector figure of 10*10, sliding window
Traverse range and longitudinal movement range are 4 units, and the output that can obtain 4*4 by the calculating in multiple periods is special
Sign figure, this makes the convolution algorithm for realizing other sizes using the computing unit of 7*7 be very difficult.It is appreciated that
In the case where continuing to use the prior art, carrying out convolutional calculation for the input feature vector figure of 10*10 using the computing unit of 7*7 can only be obtained
To the output characteristic pattern (such as shown in Fig. 2) of 4*4, computing unit and processor are it is not apparent how mobile sliding
Window can obtain the convolution algorithm of such as 5*5 using the computing unit of 7*7.
In this regard, the invention proposes a kind of corresponding control method, it is special by dispatching the input being loaded into computing unit
Sign figure, convolution kernel, and control and execute multiplication, add operation, realize the convolution algorithm that 5*5 is executed with the computing unit of 7*7.
According to one embodiment of present invention, with reference to Fig. 3, the specific control method is as follows:
Computing unit carry out convolutional calculation when, control in each sliding window respectively by the value of corresponding convolution kernel with
And the value of corresponding input feature vector figure is loaded into the computing unit of 7*7.
As shown in figure 3, the size of input feature vector figure is 10*10, the size of the convolution algorithm needed to be implemented is 5*5, therefore
It can determine that convolutional calculation needs to be implemented the 6 × 6=36 period in total.
In the period 1, by the element of 1-5 row in input feature vector figure, 1-5 column be loaded into the computing unit of 7*7 with
" 0 " is filled with as the element that 1-5 row, 1-5 are arranged, and by the element of the remaining 6th, 7 rows, the 6th, 7 column;By 5*5's
Convolution kernel is loaded into using the element as 1-5 row, 1-5 column in the computing unit of 7*7, and by the remaining 6th, 7 rows, the
6, the element of 7 column is filled with " 0 ", and the value of input feature vector figure and convolution kernel is thus loaded in the computing unit of 7*7.Control should
The computing unit of 7*7 executes multiplication to the element of corresponding position in input feature vector figure and convolution kernel, adds up, special to obtain output
Levy the element of the 1st row the 1st column in figure, i.e. (2 × (- 4))+(3 × 2)+(- 2 × (- 4))+(2 × (- 8))=- 10.Due to calculating
The each element removed other than the numerical value of the convolution kernel of script 5*5 in unit is 0, therefore the result and reality that calculate use 5*
The result that 5 computing unit carries out convolutional calculation is completely the same.
In second round, by whole elements (i.e. " 0,0,2,0, -3 of 1-5 row, 2-6 column in input feature vector figure;0,
3,-2,5,0;0,0,0,2,0;0,0,0,3,0;0,0,0,0,0 ") it is loaded into the computing unit using as 1-5 row, 1-5
The new element of column.The element that control computing unit is directed to loaded by it executes multiplication and accumulating operation, to obtain output feature
The element that the 1st row the 2nd arranges in figure.
A preferred embodiment according to the present invention, can also be in the second cycle to the above-mentioned computing unit in 7*7
The mode of the middle data for being loaded into input feature vector figure improves, to improve loading efficiency.That is, with reference to Fig. 3, by the calculating list of 7*7
Whole elements (i.e. " 0,0,2,0 of 1-5 row, 2-5 column in member;0,3,-2,5;0,0,0,2;0,0,0,3;0,0,0,0 ") whole
Body is moved to the left 1 unit using the new element as 1-5 row, 1-4 column, and by 1-5 row in input feature vector figure, the
The element (i.e. " -3 of 6 column;0;0;0;0 ") it is loaded into using the new element as 1-5 row, the 5th column in the computing unit, thus
The value of input feature vector figure loaded in the computing unit of the 7*7 is updated, reached and uses cunning in traditional scheme
Effect as dynamic window class.And computing unit is similarly controlled and executes multiplication and accumulating operation for the element loaded by it,
To obtain the element of the 1st row the 2nd column in output characteristic pattern.
And so on, third is completed to the period 6.
In the 7th period, by the element of 2-6 row in input feature vector figure, 1-5 column be loaded into the computing unit of 7*7 with
As the element that 1-5 row, 1-5 are arranged, and controls computing unit and execute multiplication and cumulative fortune for the element loaded by it
It calculates, to obtain the element of the 2nd row the 1st column in output characteristic pattern.And in the 8th to the tenth subsequent two cycles, using with it is aforementioned
Second to period 6 similar mode is loaded into the element of corresponding input feature vector figure in the computing unit.And so on, directly
To all 36 periods are completed, the output characteristic pattern of 6*6 is obtained.
As can be seen that it is special that input is disposably loaded with into computing unit in the period 1 by above-mentioned control method
Levy 25 numerical value of 5*5 in figure.Similarly, the seven, the 13, ten nine, 25,31 periods were also disposably loaded with input
25 numerical value in characteristic pattern.And correspondingly, in the second to six period, 5 numerical value for being loaded into input feature vector figure are only needed every time,
And also use in previous cycle 20 numerical value are moved to the left, for loaded for convolution kernel in computing unit
Numerical value is not made an amendment then.Similarly, the 8th to 12, the 14th to 18, the 26th to 30, the 32nd to 36
The element for also using mode similar with the second to six period to be loaded into input feature vector figure.
Thus, it is possible to ensure when in each period, in computing unit, the position of each element of input feature vector figure with
It is one-to-one that the position of the respective element in the convolution kernel of multiplying is carried out with it.Also, for remove for realizing
For other units, such as computing unit itself or processor except the unit of control method of the invention, they are not
It can be appreciated that the convolution unit of 7*7 actually implemented is the convolution algorithm of 5*5.In addition, by above-mentioned control method, so that
The numerical value of input feature vector figure loaded by computing unit is not directly dependent on sliding window in each period.On the one hand it is embodied in
The arrangement of the numerical value for the input feature vector figure being loaded into computing unit is not dependent on each number in the sliding window having a size of 5*5
It is worth actual arrangement mode, the periodicity for being on the other hand also embodied in calculating is also not dependent on the sliding window having a size of 7*7
Mobile number (i.e. 4*4), export result quantity and size can control method through the invention control, it is possible thereby to sharp
The convolution algorithm of 5*5 is carried out for the input feature vector figure of 10*10 with the computing unit of 7*7 and thereby obtains the output result of 6*6.
Similarly, mode similar with the example in above-mentioned Fig. 3 can also be used, in the computing unit of the same 7*7
The convolution algorithm that size is less than 7*7 is executed, such as executes the convolution algorithm of 3*3.That is, the convolution kernel of 3*3 to be loaded into the meter of 7*7
It calculates in unit, and its remainder values is filled with " 0 ".According to the size of input feature vector figure and the ruler of the convolution algorithm needed to be implemented
Very little 3*3, periodicity needed for determining convolution algorithm.In each period, numerical value corresponding in input feature vector figure is loaded into this
In the computing unit of 7*7, to execute convolution algorithm.
It is appreciated that it is disposably only performed once the convolution algorithm of such as 3*3 for the computing unit using 7*7,
The utilization rate of its hardware is not high.In this regard, the invention also provides a kind of schemes to control in the computing unit of the same 7*7
The convolution algorithm of the 3*3 in four channels is disposably executed for the same input feature vector figure.
According to one embodiment of present invention, a kind of control method is additionally provided, realizes and 3* is executed with the computing unit of 7*7
3 convolution algorithm, with reference to Fig. 4, the specific control method is as follows:
The size of input feature vector figure is 10*10, and the size of the convolution algorithm needed to be implemented is 3*3, thus may determine that volume
Product calculates and needs to be implemented the 8 × 8=64 period in total.
In the period 1, the element that 1-3 row, 1-3 are arranged in input feature vector figure is copied into 4 parts, is loaded into 7*7 respectively
Computing unit 1-3 row 1-3 column, 1-3 row 4-6 column, 4-6 row 1-3 column, 4-6 row 4-6 column, and will be remaining
7th row, the 7th element arranged are filled with " 0 ";The calculating of 7*7 will be loaded into respectively for the convolution kernel of four 3*3 in four channels
Unit 1-3 row 1-3 column, 1-3 row 4-6 column, 4-6 row 1-3 column, 4-6 row 4-6 column, and by remaining 7th row,
The element of 7th column is filled with " 0 ".In the embodiment illustrated in figure 3, each convolution kernel is used for a channel, and is calculating needle
When to the result of the convolution algorithm of multichannel, the convolution results of the same position in each channel can be accumulated in together using as
The output result of the position.In the present invention, can after being loaded with the element of above-mentioned input feature vector figure and convolution kernel,
The computing unit for controlling the 7*7 executes multiplication to the element of corresponding position in input feature vector figure and convolution kernel, adds up, and is obtained
Result be consistent with the convolution results accumulated result of same position for being directed to four channels, it is possible thereby to be exported
The element that the 1st row the 1st arranges in characteristic pattern.
In second round, whole elements (i.e. " 0,0 that 1-3 row 2-3 in the computing unit of 7*7 is arranged;0,3;0,0 ") whole
Body is moved to the left new element of 1 unit to arrange as 1-3 row 1-2, and 1-3 row, the 4th in input feature vector figure are arranged
Element (i.e. " 2;-2;0 ") it is loaded into the computing unit using the new element as 1-3 row, the 3rd column.Similarly, for
The 1-3 row 4-6 column of script, 4-6 row 1-3 column, 4-6 row 4-6 column also execute above-mentioned movement and are loaded into the computing unit
The operation of new element.The value of input feature vector figure loaded in the computing unit of the 7*7 is updated as a result,.Further
Ground in the present invention can also regard respectively the 1-6 row 2-3 column of script in the computing unit and 1-6 row 5-6 column as
One entirety is moved, and/or is loaded into meter together for needing the new element being loaded into copy as two parts and regard an entirety as
Unit is calculated, thus needs the step of controling and operating to reduce.Computing unit is similarly controlled to hold for the element loaded by it
Row multiplication and accumulating operation, to obtain the element of the 1st row the 2nd column in output characteristic pattern.
And so on, until completing whole periods, obtain the output characteristic pattern of 8*8.
As can be seen that can disposably carry out the 3*3's in four channels for input feature vector figure by above-mentioned control method
Convolution algorithm is particularly suitable for the more situation of number of channels.In the case of number of channels is not equal to four, it can also select
Selecting and being used to be loaded at least one area filling having a size of 3*3 of convolution kernel in the computing unit by 7*7 is " 0 ", such as by the
4-6 row 4-6 column are stuffed entirely with as " 0 ".In the case of number of channels is greater than four, such as there is the case where seven channels, it can
To be calculated twice by control, it is loaded into 4 convolution kernels when calculating first time, is loaded into 3 convolution when calculating for second
Core, and control and added up the element of same position in the result calculated twice to obtain the output result for being directed to the position.
The present invention describes the convolutional calculation for how controlling computing unit realization 5*5,3*3 of 7*7 through the foregoing embodiment,
It is explained below and how to control the computing unit of 7*7 to realize that size is more than the convolutional calculation of 7*7.
According to one embodiment of present invention, a kind of control method is provided, realizes and 11* is executed with the computing unit of 7*7
11 convolution algorithm, with reference to Fig. 5, the specific control method is as follows:
First, it is determined that out 11>7, thus need to be completed jointly by the computing unit of more than one 7*7 having a size of 11*11
Convolution algorithm.Here it can choose the k computing unit that can be used for being loaded into the data having a size of 11*11 just.Here needle
K is selected as:K=m2, 7m can choose the minimum positive integer more than or equal to n.It is of course also possible to select than above-mentioned quantity
The computing unit of more 7*7 executes the convolution algorithm of 11*11.For example illustrated in fig. 5, k=4 meter is selected here
Calculate unit.
The numerical value of used convolution kernel is divided into four parts and is loaded into the computing unit of four 7*7 respectively by control, will
Rest part is filled with " 0 ";Also, in each period, data corresponding in input feature vector figure are divided into four parts difference by control
It is loaded into the computing unit of four 7*7, rest part is filled with " 0 ".Here in the computing unit of four 7*7
In, the distribution mode of the numerical value of convolution kernel and the distribution mode of the numerical value of input feature vector figure are consistent.
Also, it controls each computing unit and executes multiplication and accumulating operation for the element loaded by it, by will be whole
Corresponding calculated result adds up in the calculated result of four computing units, to obtain the number of corresponding output characteristic pattern
Value.
In this embodiment, further the mode similar with previous embodiment can also be used with logical in each period
It crosses and moves and be loaded into the mode of the numerical value of corresponding input feature vector figure to update the number of the input feature vector figure in each computing unit
Value.For example, in second period, by the numerical value of the 2-7 column of the computing unit (it is in the upper left corner in Fig. 5) of first 7*7
It is moved to the left 1 unit, and is loaded into new numerical value in the 7th column;By the computing unit of second 7*7, (it is in upper right in Fig. 5
Angle) the numerical value of 2-4 column be moved to the left 1 unit, and be loaded into new numerical value in the 4th column;By the calculating list of third 7*7
The numerical value of the 2-7 column 1-4 row of first (it is in the lower left corner in Fig. 5) is moved to the left 1 unit, and is loaded into the 7th column 1-4 row
New numerical value;The numerical value of the 2-4 column 1-4 row of the computing unit (it is in the lower right corner in Fig. 5) of 4th 7*7 is moved to the left
1 unit, and new numerical value is loaded into the 4th column 1-4 row.
In the present invention, corresponding control unit can be set for above-mentioned control method, and such control unit can be with
It is adapted to an existing convolutional neural networks processor, come to based on convolution by way of implementing above-mentioned control method
It calculates unit to be multiplexed, matched convolutional Neural net can also be designed based on hardware resource required for such control unit
Network processor, such as minimal number of hardware resource is used in the case where meeting above-mentioned multiplexing scheme.
Scheme provided by the present invention is related to improving the reusability of the computing unit for executing convolution, must be set with reducing
The hardware computational unit in convolutional neural networks processor is set, it is unnecessary to be directed to need to use for convolutional neural networks processor
The different convolutional layers of various sizes of convolution kernel and be arranged largely have various sizes of hardware computational unit.Executing needle
When to the calculating of a convolutional layer, the convolutional calculation for being directed to different convolutional layers can be realized using the computing unit of same size,
Which thereby enhance the utilization rate of hardware computational unit in convolutional neural networks processor.
It is appreciated that the present invention repel as described in background technique using more massive hardware concurrent into
Row calculation processing and the reusability that computing unit is improved by way of " time division multiplexing ".
Furthermore, it is desirable to illustrate, each step introduced in above-described embodiment is all not necessary, art technology
Personnel can carry out according to actual needs it is appropriate accept or reject, replacement, modification etc..
It should be noted last that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting.On although
The invention is described in detail with reference to an embodiment for text, those skilled in the art should understand that, to skill of the invention
Art scheme is modified or replaced equivalently, and without departure from the spirit and scope of technical solution of the present invention, should all be covered at this
In the scope of the claims of invention.
Claims (11)
1. a kind of control method for convolutional neural networks processor, the convolutional neural networks processor has the volume of 7*7
Product computing unit, the control method include:
1) the convolution kernel size n*n of the convolution algorithm needed to be implemented is determined;
2) the convolution kernel size n*n of the convolution algorithm executed as needed is selected in m2It is loaded into the convolutional calculation unit of a 7*7
The numerical value of convolution kernel corresponding with the size, and remaining each numerical value is filled with 0,7m >=n;
3) size of the convolution algorithm executed as needed and need to be implemented convolution input feature vector figure size, determine volume
Periodicity needed for product calculating process;And
4) according to the periodicity, the numerical value of corresponding input feature vector figure is loaded by each period during convolutional calculation
To the m2In the convolutional calculation unit of a 7*7, the numerical value of the input feature vector figure is in the m2The convolutional calculation unit of a 7*7
In distribution with the numerical value of the convolution kernel in the m2Distribution in the convolutional calculation unit of a 7*7 is consistent;
Control is loaded with the m of the numerical value of convolution kernel and input feature vector figure2The convolutional calculation unit of a 7*7 execute respectively with
The corresponding convolutional calculation of the periodicity;
5) to the m2Corresponding element adds up in the convolutional calculation result of the convolutional calculation unit of a 7*7, final to obtain
Convolution algorithm output characteristic pattern.
2. according to the method described in claim 1, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented be less than 7*7, then in the convolutional calculation unit of the same 7*7 be loaded into and institute
It states the numerical value of the corresponding convolution kernel of size and remaining each numerical value is filled with 0;
If the size of the convolution algorithm needed to be implemented is greater than 7*7, then it is loaded into the convolutional calculation unit of the 7*7 of respective numbers
Remaining each numerical value is simultaneously filled with 0 by the numerical value of convolution kernel corresponding with the size.
3. according to the method described in claim 1, wherein step 4) includes:
In each period during convolutional calculation, if comprising the input in the numerical value for the input feature vector figure for needing to be loaded into
The element of left side first row in characteristic pattern, then disposably by the size in the input feature vector figure with the convolution algorithm needed to be implemented
The multiple elements to match are loaded into the corresponding position of the convolutional calculation unit and fill out the numerical value of remaining each position
Filling is 0, a unit otherwise will be then moved to the left as a whole with element identical in previous cycle, and will input spy
Multiple elements that and needs different from previous cycle update in sign figure are loaded into the position vacated by the movement
Place.
4. according to the method described in claim 1, wherein step 4) includes:
In each period during convolutional calculation, the m is controlled2The convolutional calculation unit of a 7*7 is respectively to loaded by it
For input feature vector figure and multiplication is executed for the element of the corresponding position of convolution kernel and is added up to the result of multiplication,
To obtain the element of corresponding position in output characteristic pattern.
5. method described in any one of -4 according to claim 1, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented is 5*5, then it is loaded into 5*5's in the convolutional calculation unit of the same 7*7
Remaining each numerical value is simultaneously filled with 0 by the numerical value of convolution kernel;
Also, step 4) includes:
In each period in the whole periods for executing convolutional calculation, the numerical value of corresponding input feature vector figure is loaded into the 7*
In 7 convolutional calculation unit, distribution of the numerical value of the input feature vector figure in the convolutional calculation unit of the 7*7 and the 5*
Distribution of the numerical value of 5 convolution kernel in the convolutional calculation unit of the 7*7 is consistent;
Wherein, in each period during convolutional calculation, if comprising institute in the numerical value for the input feature vector figure for needing to be loaded into
The element for stating left side first row in input feature vector figure, then disposably by 25 elements in the input feature vector figure having a size of 5*5
It is loaded into the corresponding position of the convolutional calculation unit and the numerical value of remaining each position is filled with 0, it otherwise then will be with
Identical element is moved to the left a unit as a whole in previous cycle, and by input feature vector figure with previous cycle
5 elements that middle different and needs update are loaded at the position vacated by the movement.
6. method described in any one of -4 according to claim 1, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented is 3*3, then spininess is loaded into the convolutional calculation unit of the same 7*7
Remaining each numerical value is simultaneously filled with 0 by numerical value to 4 channels, 3*3 convolution kernel;
Also, step 4) includes:
In each period in the whole periods for executing convolutional calculation, the numerical value of corresponding input feature vector figure is loaded into the 7*
In 7 convolutional calculation unit, the numerical value of the input feature vector figure is with one or more equal with the quantity of the convolution kernel of the 3*3
The form of a copy is loaded into the convolutional calculation unit of the 7*7, and the input feature vector figure is in the convolution of the 7*7
Distribution in computing unit is corresponding with distribution of the numerical value of the convolution kernel of the 3*3 in the convolutional calculation unit of the 7*7;
Wherein, in each period during convolutional calculation, if comprising institute in the numerical value for the input feature vector figure for needing to be loaded into
The element for stating left side first row in input feature vector figure, then disposably carry 9 elements in the input feature vector figure having a size of 3*3
Enter the corresponding position to the convolutional calculation unit and the numerical value of remaining each position be filled with 0, otherwise then will with it is preceding
Identical element is moved to the left a unit as a whole in one period, and will be in input feature vector figure and in previous cycle
3 elements that different and needs update are loaded into the corresponding position vacated by the movement.
7. according to the method described in claim 6, wherein step 4) further includes:
If being loaded into the numerical value for being directed to 2 or 4 channels, 3*3 convolution kernel in the convolutional calculation unit of the same 7*7,
And the element for not including left side first row in the input feature vector figure in the numerical value for the input feature vector figure for needing to be loaded into, then by institute
State 2 copies in the convolutional calculation unit of 7*7 in same number of columns and in different line numbers, about the input feature vector figure
In with element identical in previous cycle be moved to the left a unit as a whole, and by input feature vector figure with it is previous
3 elements that different and needs update in period are loaded into the corresponding position vacated by the movement.
8. method described in any one of claim 1-4, wherein step 2) includes:
If the size of the convolution algorithm needed to be implemented is 11*11, then control is loaded into jointly by the convolutional calculation unit of four 7*7
Remaining each numerical value is simultaneously filled with 0 by the numerical value of the convolution kernel of 11*11;
Also, step 4) includes:
In each period in the whole periods for executing convolutional calculation, the numerical value of corresponding input feature vector figure is loaded into described four
In the convolutional calculation unit of a 7*7, distribution of the numerical value of the input feature vector figure in the convolutional calculation unit of four 7*7
It is consistent with the distribution of the numerical value of the convolution kernel of the 11*11 in the convolutional calculation unit of four 7*7;
Wherein, in each period during convolutional calculation, if comprising institute in the numerical value for the input feature vector figure for needing to be loaded into
The element for stating left side first row in input feature vector figure, then disposably by 121 members in the input feature vector figure having a size of 11*11
Element is loaded into the corresponding position of the convolutional calculation unit and the numerical value of remaining each position is filled with 0, otherwise then will
Be moved to the left a unit as a whole with element identical in previous cycle, and by input feature vector figure with the last week
11 elements that interim different and needs update are loaded into the corresponding position vacated by the movement.
9. according to the method described in claim 8, wherein step 4) includes:
In each period during convolutional calculation, the convolutional calculation unit of four 7*7 is controlled respectively to loaded by it
For input feature vector figure and execute and multiplication and the result of multiplication carried out tired for the element of the corresponding position of convolution kernel
Add;
And step 5) includes:It adds up to the calculated result by all convolutional calculation units of four 7*7, to obtain
Export the element of corresponding position in characteristic pattern.
10. a kind of control unit, for realizing the control method as described in any one of claim 1-9.
11. a kind of convolutional neural networks processor, including:The convolutional calculation unit and control unit of 7*7, the control are single
Member is for realizing any one of such as claim 1-9 the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810685989.2A CN108875925A (en) | 2018-06-28 | 2018-06-28 | A kind of control method and device for convolutional neural networks processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810685989.2A CN108875925A (en) | 2018-06-28 | 2018-06-28 | A kind of control method and device for convolutional neural networks processor |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108875925A true CN108875925A (en) | 2018-11-23 |
Family
ID=64295468
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810685989.2A Pending CN108875925A (en) | 2018-06-28 | 2018-06-28 | A kind of control method and device for convolutional neural networks processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108875925A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109711367A (en) * | 2018-12-29 | 2019-05-03 | 北京中科寒武纪科技有限公司 | Operation method, device and Related product |
CN113052291A (en) * | 2019-12-27 | 2021-06-29 | 上海商汤智能科技有限公司 | Data processing method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170192638A1 (en) * | 2016-01-05 | 2017-07-06 | Sentient Technologies (Barbados) Limited | Machine learning based webinterface production and deployment system |
US20170337467A1 (en) * | 2016-05-18 | 2017-11-23 | Nec Laboratories America, Inc. | Security system using a convolutional neural network with pruned filters |
CN107818367A (en) * | 2017-10-30 | 2018-03-20 | 中国科学院计算技术研究所 | Processing system and processing method for neutral net |
CN107844826A (en) * | 2017-10-30 | 2018-03-27 | 中国科学院计算技术研究所 | Neural-network processing unit and the processing system comprising the processing unit |
CN107918794A (en) * | 2017-11-15 | 2018-04-17 | 中国科学院计算技术研究所 | Neural network processor based on computing array |
CN108205700A (en) * | 2016-12-20 | 2018-06-26 | 上海寒武纪信息科技有限公司 | Neural network computing device and method |
-
2018
- 2018-06-28 CN CN201810685989.2A patent/CN108875925A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170192638A1 (en) * | 2016-01-05 | 2017-07-06 | Sentient Technologies (Barbados) Limited | Machine learning based webinterface production and deployment system |
US20170337467A1 (en) * | 2016-05-18 | 2017-11-23 | Nec Laboratories America, Inc. | Security system using a convolutional neural network with pruned filters |
CN108205700A (en) * | 2016-12-20 | 2018-06-26 | 上海寒武纪信息科技有限公司 | Neural network computing device and method |
CN107818367A (en) * | 2017-10-30 | 2018-03-20 | 中国科学院计算技术研究所 | Processing system and processing method for neutral net |
CN107844826A (en) * | 2017-10-30 | 2018-03-27 | 中国科学院计算技术研究所 | Neural-network processing unit and the processing system comprising the processing unit |
CN107918794A (en) * | 2017-11-15 | 2018-04-17 | 中国科学院计算技术研究所 | Neural network processor based on computing array |
Non-Patent Citations (3)
Title |
---|
KOTA ANDO 等,: "A Multithreaded CGRA for Convolutional Neural", 《SCIENTIFIC RESEARCH》 * |
LI DU 等,: "A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS》 * |
张抢强 等,: "基于分块卷积的大图像输入卷积神经网络加速", 《中国科技论文在线》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109711367A (en) * | 2018-12-29 | 2019-05-03 | 北京中科寒武纪科技有限公司 | Operation method, device and Related product |
CN113052291A (en) * | 2019-12-27 | 2021-06-29 | 上海商汤智能科技有限公司 | Data processing method and device |
CN113052291B (en) * | 2019-12-27 | 2024-04-16 | 上海商汤智能科技有限公司 | Data processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108875917A (en) | A kind of control method and device for convolutional neural networks processor | |
CN106447034B (en) | A kind of neural network processor based on data compression, design method, chip | |
CN107578095B (en) | Neural computing device and processor comprising the computing device | |
CN107844828B (en) | Convolution calculation method in neural network and electronic device | |
CN108985449A (en) | A kind of control method and device of pair of convolutional neural networks processor | |
Goodman et al. | Brian: a simulator for spiking neural networks in python | |
CN107609641A (en) | Sparse neural network framework and its implementation | |
Zweig et al. | Interponet, a brain inspired neural network for optical flow dense interpolation | |
CN107918794A (en) | Neural network processor based on computing array | |
CN107977414A (en) | Image Style Transfer method and its system based on deep learning | |
CN110298443A (en) | Neural network computing device and method | |
CN111291878A (en) | Processor for artificial neural network computation | |
CN105095964B (en) | A kind of data processing method and device | |
CN111176758B (en) | Configuration parameter recommendation method and device, terminal and storage medium | |
CN107451653A (en) | Computational methods, device and the readable storage medium storing program for executing of deep neural network | |
EP0505179A2 (en) | A parallel data processing system | |
CN109472356A (en) | A kind of accelerator and method of restructural neural network algorithm | |
CN110321997A (en) | High degree of parallelism computing platform, system and calculating implementation method | |
CN110309911A (en) | Neural network model verification method, device, computer equipment and storage medium | |
CN107203808A (en) | A kind of two-value Convole Unit and corresponding two-value convolutional neural networks processor | |
CN110580519B (en) | Convolution operation device and method thereof | |
CN209231976U (en) | A kind of accelerator of restructural neural network algorithm | |
CN110147252A (en) | A kind of parallel calculating method and device of convolutional neural networks | |
CN108875925A (en) | A kind of control method and device for convolutional neural networks processor | |
Rahmati et al. | Back propagation artificial neural network structure error reduction by defined factor of capacity and algorithm reinforcement method,(nd) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181123 |