CN107153873B - A kind of two-value convolutional neural networks processor and its application method - Google Patents
A kind of two-value convolutional neural networks processor and its application method Download PDFInfo
- Publication number
- CN107153873B CN107153873B CN201710316252.9A CN201710316252A CN107153873B CN 107153873 B CN107153873 B CN 107153873B CN 201710316252 A CN201710316252 A CN 201710316252A CN 107153873 B CN107153873 B CN 107153873B
- Authority
- CN
- China
- Prior art keywords
- data
- value
- convolution
- neural networks
- convolutional neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
The present invention provides a kind of two-value convolutional neural networks processor, including:Data storage device to be calculated, for storing the convolution nuclear element of the element for treating convolved data of bi-level fashion and bi-level fashion;Two-value Convole Unit treats that corresponding element carries out two-value convolution operation in convolved data for the convolution nuclear element to the bi-level fashion and the bi-level fashion;Data scheduling device, for the convolution nuclear element to be treated, corresponding element is loaded into the two-value Convole Unit in convolved data with described;Pond makeup is put, and the result for being obtained to convolution carries out pond processing;And normalized device, for operation to be normalized to the result by pond.
Description
Technical field
The present invention relates to the storages and scheduling of data in being calculated for neural network model.
Background technology
With the development of artificial intelligence technology, it is related to the technology of deep neural network, especially convolutional neural networks near
Development at full speed is obtained within several years, in image identification, speech recognition, natural language understanding, weather forecasting, gene expression, content
Recommend to achieve with fields such as intelligent robots and be widely applied.
The deep neural network is construed as a kind of operational model, wherein comprising mass data node, per number
It is connected according to node with other back end, the connection relation between each node is represented with weight.With deep neural network not
Disconnected development, complexity are also improved constantly.
In order to weigh the contradiction between complexity and operation effect, in bibliography:Courbariaux M,Hubara I,
Soudry D,et al.Binarized neural networks:Training deep neural networks with
weights and activations constrained to+1or-1[J].arXiv preprint arXiv:
It is proposed in 1602.02830,2016. and " two-value convolutional neural networks model " may be employed to reduce answering for traditional neural network
Miscellaneous degree.In the two-value convolutional neural networks, weight, input data, output data in convolutional neural networks use " two
Value form " approx represents its size by " 1 " and " -1 ", such as represents the numerical value more than or equal to 0 with " 1 ", and with " -
1 " represents less than 0 numerical value.By the above-mentioned means, the data bit width that operation is used in neutral net is reduced, thus greatly
Reduce to degree required parameter capacity, cause two-value convolutional neural networks be particularly suitable for object end realize image identification,
Augmented reality and virtual reality.
Generally use general computer processor runs deep neural network, such as central processing in the prior art
Device (CPU) and graphics processor (GPU) etc..However, and there is no the application specific processors for two-value convolutional neural networks.It is general
Computer processor computing unit bit wide be usually more bits, calculate binary neural network can generate the wasting of resources.
The content of the invention
Therefore, it is an object of the invention to overcome above-mentioned prior art the defects of, provides a kind of two-value convolutional neural networks
Processor, including:
Data storage device to be calculated, for storing the element for treating convolved data of bi-level fashion and the volume of bi-level fashion
Product nuclear element;
Two-value Convole Unit treats convolved data for the convolution nuclear element to the bi-level fashion and the bi-level fashion
In corresponding element carry out two-value convolution operation;
Data scheduling device, for the convolution nuclear element to be treated in convolved data with described described in corresponding element loading
Two-value Convole Unit;
Pond makeup is put, and the result for being obtained to convolution carries out pond processing;And
Normalized device, for operation to be normalized to the result by pond.
Preferably, according to the two-value convolutional neural networks processor, wherein the two-value Convole Unit, including:
XNOR, with the convolution nuclear element of the bi-level fashion and the bi-level fashion treat it is corresponding in convolved data
Element is inputted as it;
Adding up device is inputted output XNOR described as it, for tiring out to output XNOR described
Add, to export the result of two-value convolution operation;
Wherein, the adding up device includes OR and/or Hamming weight computing unit, wherein,
At least one input OR described is described XNOR of output;
At least one input of the Hamming weight computing unit is described XNOR of output.
Preferably, according to the two-value convolutional neural networks processor, wherein the data storage device to be calculated is also
It is used for online to being obtained the convolution kernel by two-value conversion and/or treating that convolved data stores.
Preferably, according to the two-value convolutional neural networks processor, wherein further including:
Binaryzation device, for by the convolution kernel obtained and/or treating that convolved data is converted to bi-level fashion.
Preferably, according to the two-value convolutional neural networks processor, wherein being provided in the data scheduling device
Register, for being loaded into the convolution nuclear element for needing to reuse when in use.
Preferably, the two-value convolutional neural networks processor according to above-mentioned any one, in the data to be calculated
The element and the convolution nuclear element that convolved data is treated described in storage device store in a manner that figure layer is interlocked.
Preferably, according to the two-value convolutional neural networks processor, the institute in the data storage device to be calculated
The element for treating convolved data calculated is participated in successively when stating element when convolved data according to the size and convolution operation of convolution kernel
And it stores.
Preferably, according to the two-value convolutional neural networks processor, the institute in the data storage device to be calculated
The storage mode satisfaction for stating the element for treating convolved data and/or the convolution nuclear element is following one or more:
It is stored according to the convolution kernel and the matrix distributing order for treating convolved data;
Convolution kernel and/or treat that the element in the matrix of convolved data in same position, different passages continuously stores
In continuous multiple storage units;
It whole elements in same convolution kernel under same weight and/or same treats in convolved data for carrying out convolution operation
Submatrix in whole elements storage in the storage device in continuous multiple storage units.
Also, the present invention also provides a kind of two-value convolutional neural networks processors for described in above-mentioned any one
Application method, including:
1) it will treat that convolved data is loaded into register in the data storage device to be calculated;
2) will treat to need in convolved data and the data storage device to be calculated described in the register and institute
It states and treats that the element of convolved data execution multiplication is loaded into two-value Convole Unit, to carry out two-value convolution operation;
3) output to the two-value Convole Unit is put by pond makeup and carries out pond processing;
4) operation is normalized in the output put by the normalized device to pond makeup.
And a kind of computer readable storage medium, wherein being stored with computer program, the computer program is being held
Above-mentioned method is used to implement during row.
Compared with prior art, the advantage of the invention is that:
Provide the hardware configuration for being used to perform convolution algorithm by simplified and the two-value convolution god based on the structure
Through network processing unit and corresponding computational methods, by reducing the bit wide of the data calculated in calculating process, reach and carry
High operation efficiency, the effect for reducing memory capacity and energy consumption.
Description of the drawings
Embodiments of the present invention is further illustrated referring to the drawings, wherein:
Fig. 1 is the schematic diagram of the multilayered structure of neutral net;
Fig. 2 is the schematic diagram that convolutional calculation is carried out in two-dimensional space;
Fig. 3 is the hardware architecture diagram of two-value Convole Unit according to an embodiment of the invention;
Fig. 4 is the hardware architecture diagram of the two-value Convole Unit of another embodiment according to the present invention;
Fig. 5 is the hardware architecture diagram of the two-value Convole Unit of another embodiment according to the present invention;
Fig. 6 a~6c show hardware configuration signal of the present invention using the two-value Convole Unit of Hamming weight computing element
Figure;
Fig. 7 is the convolution kernel i.e. weight 0 and weight 1 to multichannel and to treat convolution number according to one embodiment of present invention
According to the schematic diagram stored;
Fig. 8 is the schematic diagram of the structure of two-value convolutional neural networks processor according to an embodiment of the invention;
Fig. 9 is the signal calculated according to one embodiment of present invention using two-value convolutional neural networks processor
Figure;
Figure 10 is to be shown according to still another embodiment of the invention using what two-value convolutional neural networks processor was calculated
It is intended to.
Specific embodiment
It elaborates with reference to the accompanying drawings and detailed description to the present invention.
It is a kind of mathematical modulo for copying biologically nerve synapse coupling structure to cause neutral net in Computer Subject
Type can realize the various functions such as machine learning, pattern-recognition using the application system being made of neutral net.
The neutral net is divided into multilayer in structure, and Fig. 1 shows a kind of schematic diagram of neutral net multilayered structure.Ginseng
Examine Fig. 1, the first layer in the multilayered structure is input layer, last layer is output layer, remaining each layer is hidden layer.It is using
During the neutral net, original image, i.e. input layer figure layer are inputted to input layer, (" image ", " figure in the present invention
Layer " refers to pending initial data, is not only the image obtained by shooting photo of narrow sense), by neutral net
Each layer the figure layer inputted is processed and result is input in next layer of neutral net, it is and most defeated at last
Go out the output of layer as the result exported.
As described in the text, in order to tackle the increasingly complicated structure of neutral net, the prior art proposes a kind of two-value
The concept of convolutional neural networks.As its name suggests, the computing of two-value convolutional neural networks includes carrying out " volume to the data inputted
Product " operation, and it further includes the operations such as " pond ", " normalization ", " binaryzation ".
It is operated as one important in two-value convolutional neural networks,.The meter of " convolution " will be discussed in detail by Fig. 2 below
Calculation process.
Fig. 2 show in two-dimensional space be to size using the convolution kernel that size is 3 " two-values " for multiplying 35 multiply 5 " two
The image of value " carries out the calculating process of convolution.With reference to figure 2, first against image 1-3 rows from top to bottom, from left to right
Corresponding element and each element multiplication in convolution kernel is respectively adopted in each element in the range of 1-3 row:For example,
The element (being expressed as " convolution kernel (1,1) ") arranged using the 1st row the 1st in convolution kernel is multiplied by the element that the 1st row the 1st arranges in image
(being expressed as " image (1,1) ") obtains 1 × 1=1, and the convolution kernel (1,2) arranged using the 1st row the 2nd in convolution kernel is multiplied by image
The element image (1,2) of 1st row the 2nd row obtains 1 × 0=0, and similarly calculating convolution kernel (1,3) is multiplied by image (1,3) and obtains 1
× 1=1, and so on be calculated 9 results and by this 9 results addeds obtain 1+0+1+0+1+0+0+0+1=4 using as
The element that the 1st row the 1st arranges in convolution results, convolution results (1,1).Similarly, calculate convolution kernel (1,1) be multiplied by image (1,2),
Convolution kernel (1,2) is multiplied by image (1,3), convolution kernel (1,3) is multiplied by image (Isosorbide-5-Nitrae), convolution kernel (2,1) is multiplied by image (2,2) ...,
And so on calculate 1+0+0+1+0+0+0+1=3 using as convolution results (1,2).Using aforesaid way can calculate as
Size illustrated in fig. 2 multiplies 3 convolution results matrix for 3.
The convolution results as illustrated in FIG. 2 acquired are input into the two of next layer by buffering and binary conversion treatment
It is worth in convolutional neural networks.
The examples discussed show " multiplying " and " adding " included by the calculating process of convolution or the computings of " cumulative summation ".
Inventors realized that based on characteristic specific to the multiplying of two-value so that " multiplying " in two-value convolution algorithm can
It is replaced by " exclusive or non-exclusive " computing, i.e., can be completed merely with logic element XNOR in the prior art must be using multiplying
The computing of musical instruments used in a Buddhist or Taoist mass ability achievable " multiplying ".As can be seen that it is more simple compared to traditional convolution based on the convolution process of two-value,
It is without carrying out the multiplying that such as " 2 × 4 " are so complicated, when carrying out the computing of " multiplying ", if carrying out the member of multiplying
It for the result that " 0 " is then obtained just is " 0 " to have any one in element, if carry out multiplying whole elements be " 1 " if institute
The result of acquisition is just " 1 ".
It will can be replaced in the present invention using XNOR gating elements by a specific example to be described in detail below
The principle of multiplier.
When actually using the convolution of binaryzation, two-value can be carried out to the non-two-value numerical value z in image and convolution kernel first
Change is handled, i.e.,:
Wherein, the numerical value z two-values equal to 0 be will be greater than and turn to " 1 " to be used for the symbol " 1 " of convolution algorithm in representative graph 2, it will
Numerical value z two-values less than 0 turn to " -1 " to be used for the symbol " 0 " of convolution algorithm in representative graph 2.
" exclusive or non-exclusive " computing is carried out to the value of the image Jing Guo binary conversion treatment and convolution kernel, i.e.,There are following several situations:
Input A | Input B | Export F | Symbol |
-1 | -1 | 1 | 1 |
-1 | 1 | -1 | 0 |
1 | -1 | -1 | 0 |
1 | 1 | 1 | 1 |
It can be seen that by above-mentioned truth table when the numerical value for binaryzation carries out the computing of " multiplying ", use may be employed
Multiplier is replaced in logic element XNOR for performing " exclusive or non-exclusive " computing.And as known in the art, the complexity of multiplier
Degree is far above logic element XNOR.
Therefore, inventor thinks to replace the multiplier in conventional processors by using logic element XNOR, can be with
The processor that two-value convolutional neural networks are greatly reduced uses the complexity of device.
In addition, inventor is also realized that based on characteristic specific to the add operation of two-value so that above-mentioned two-value convolution fortune
" adding " in calculation can be replaced by inclusive-OR operation, you can just to replace being used in the prior art using logic element OR
Adder.This is because, G=F can be expressed as to the result of the inclusive-OR operation of output progress XNOR above-mentioned1+F2...+
Fn, and the result G of final output single-bit, wherein FkRepresent k-th of the output of XNOR, n represents that its output is used as OR
Input XNOR doors sum.
Above-mentioned analysis based on inventor, the present invention provides a kind of two-value convolutional neural networks processors of can be used for
Two-value Convole Unit using the multiplying based on two-value, the characteristic of add operation, simplifies and is used to perform volume in processor
Thus the composition of the hardware of product computing improves the speed of convolution algorithm, reduces the overall energy consumption of processor.
Fig. 3 shows the hardware configuration of two-value Convole Unit according to an embodiment of the invention.It as shown in figure 3, should
Two-value Convole Unit includes 9 XNOR and 1 OR, and all output of 9 XNOR is used as input OR described.
When carrying out convolution algorithm, n is calculated respectively by each XNOR1×w1、n2×w2…n9×w9, to obtain output F1~F9;OR
By F1~F9As its input, first element G in convolution results is exported1.Similarly, using same convolution kernel, for figure
Other regions as in are calculated, and can be obtained the size of the other elements in convolution results, no longer be repeated herein.
In the embodiment illustrated in figure 3, the calculating concurrently multiplied using multiple XNOR, improves convolutional calculation
Rate.It should be appreciated, however, that the hardware configuration of the two-value Convole Unit can also be deformed in the present invention, below
It will be illustrated by other several embodiments.
Fig. 4 shows the hardware configuration of two-value Convole Unit according to still another embodiment of the invention.As shown in figure 4,
The two-value Convole Unit includes 1 XNOR, 1 OR and a register, and the register is for storing OR defeated
Go out and value that it is stored is used as one of input OR described, and another input OR described is described
The output of XNOR.When carrying out convolution algorithm, according to the propulsion at moment, respectively first to the 9th moment by n1And w1、n2
And w2、…n9And w9As the input of XNOR, correspondingly F are exported from XNOR corresponding to each moment1、F2…F9Using as OR
One of input of door, and using the result exported in previous moment from OR stored in register as OR
Another input.For example, as XNOR output F1(its size is equal to n1×w1) when, the symbol to prestore is read out from register
" 0 " using it with F1 together as the input of OR, and from OR output F1;F is exported when XNOR2(its size is equal to n2×w2)
When, F is read out from register1By itself and F2Together as the input of OR, and F are exported from OR1+F2, and so on until
Output is for F1~F9Accumulation result G1。
In embodiment illustrated in fig. 4, by increasing to XNOR and the reusability of OR, reduce used element
Quantity, and be that only for tool there are two the OR doors of input terminal, hardware complexity is lower used by the program.
Fig. 5 shows the hardware configuration of two-value Convole Unit according to still another embodiment of the invention.The embodiment with
Embodiment illustrated in fig. 4 is similar, only with one XNOR, one OR and a register, the difference is that in Fig. 5
Input XNOR middle is stored into the register that can store multidigit result simultaneously, and each result in register by with
Make the input of OR.Embodiment in the application method and Fig. 4 of the embodiment is similar, is that XNOR are multiplexed, no
Same is that Fig. 5 can preserve each moment XNOR results exported deposits in the register of multidigit result simultaneously, and
Obtain whole F1~F9Afterwards, by the computing of OR progress "or" to export G1。
In the embodiment provided in Fig. 3 of the present invention, 4,5, OR are employed to realize the function of " adding " or " cumulative ",
And input OR described causes to be finally single bit value from the result of OR outputs both from the output of XNOR,
It is possible thereby to simplify calculating process, increase arithmetic speed.The hardware configuration that the program is provided is particularly suitable for for two-value god
Application specific processor through network, this is because binary neural network represents the weight in neutral net using numerical value " 1 " and " -1 "
And data, in neural computing process there are a large amount of multiplication and add operation, reducing that calculating operation numerical digit is wide can be effectively
Reduce computation complexity.
However, realize that the scheme of the function of " adding " or " cumulative " is that single-bit calculates using OR due to above-mentioned, because
And a degree of error can be introduced.In this regard, the present invention also provides a kind of optional scheme, i.e., using Hamming weight Computing Meta
Part replaces the OR doors as shown in Fig. 3,4,5 to realize the function of " adding " or " cumulative ".Fig. 6 a~6c are shown with the Chinese
The hardware configuration of bright weight computing element, in the optional scheme, Hamming weight computing element makees the output of XNOR
It is inputted for it, the data of logical one, i.e. Hamming weight in output institute output data.The scheme and the above-mentioned side using OR
Case is similar, can equally achieve the effect that simplified calculating process, and the program can also realize accurately sum operation.
Inventor find, based on above-mentioned two-value Convole Unit provided by the present invention for " multiply " each time and it is " cumulative " count
It calculates, operated is the data of individual bit, and what is exported by the two-value Convole Unit is also individual bit
Data, and such feature is particularly suitable for participating in convolution fortune using " the staggered data mapping mode of figure layer " to store and dispatch
Obtained data are calculated and calculated, data loading number is reduced so as to reach, the locality of data is made full use of to improve data
The effect of recycling rate of waterused.
" the staggered data mapping mode of figure layer " in the present invention refers to the direction according to passage (Channel)
It by convolution kernel and treats that each element in convolved data is stored successively into every a line of storage device, i.e., counts in the storage device
It is stored in the way of according to interlocking by figure layer, adjacent two data elements are from different passages rather than same passage.
As shown in fig. 7, in the present invention, the convolution kernel in same z-axis corresponds to same " passage " with the element for treating convolved data, that is, has
The element for having identical z values belongs to same passage.
The data calculation is specifically described to be more vivid, Fig. 7 is with the convolution kernel weight 0 of (x, y, z)=2*2*2
With convolution kernel weight 1, with (x, y, z)=2*3*2 treat convolved data exemplified by, elaborate it is provided by the invention be suitable for two-value
The staggered data mapping mode of figure layer of convolutional neural networks.With reference to figure 7, the element in weight 0 and weight 1 is according to the element institute
The spatial position at place is respectively divided into four groups:Wherein, four groups of weights of weight 0 are respectively Az、Bz、CzAnd Dz, as shown in the figure,
Z is 0,1;Four groups of weights of weight 1 are respectively az、bz、czAnd dz, as shown in the figure, z is 0,1.
With reference to figure 7, according to one embodiment of present invention, in the following manner may be employed to store convolution kernel weight 0, convolution
Core weight 1 and treat each element in convolved data.
In Fig. 7, for convenience of explanation, according to the size and step size of each convolution kernel, by the three of weight 0 and weight 1
Element in dimension matrix is divided into two two-dimensional matrixes according to residing passage, such as weight 0 is divided by A0、B0、C0、D0
The two-dimensional matrix that is formed and by A1、B1、C1、D1The two-dimensional matrix formed;Similarly, the three-dimensional matrice of convolved data will be treated
In element be divided into two two-dimensional matrixes according to residing passage, i.e., by X0、Y0、Z0、P0、Q0、R0The two-dimensional matrix formed
With by X1、Y1、Z1、P1、Q1、R1The two-dimensional matrix formed.
When storing convolution kernel weight 0, in the continuous storage unit of a line of weight storage device, weight is stored successively
Elements A in 00、A1、B0、B1、C0、C1、D0And D1, totally 8 bits.As can be seen that in the memory unit, adjacent two elements
Come from different passages, such as A each other0And A1Respectively from different passages, A1And B0Also from different passages, according to
Such mode is the storage mode described above to interlock according to figure layer.
When storing convolution kernel weight 1, in the continuous storage unit of other a line of the weight storage device, deposit successively
Store up a of the element in weight 10、a1、b0、b1、c0、c1、d0And d1, totally 8 bits.With the storage mode of weight 0 similarly, it is adjacent
Two elements be similarly from different passages.
In weight storage device, positioned at the weight element of identical x-axis and identical y-axis (such as A0And A1) it is used as adjacent member
Element stores successively, and the next group of weight with identical x-axis and y-axis is stored after the element of identical x-axis and identical y-axis stores
Element (such as B0And B1), and so on, other weight element storages in convolution kernel are finished.
It, can be according to the data for participating in calculating successively when the size and convolution operation of convolution kernel when storage is when convolved data
Element is stored.With reference to convolutional calculation illustrated in fig. 2 rule, it is known that need first against Az Xz、Bz Yz、Cz PzAnd DzQz
It is calculated, then for Az Yz、Bz Zz、Cz QzAnd DzRzIt is calculated.Therefore, each element of convolved data is treated in storage
When, in addition to the storage mode to interlock according to figure layer, the rule of convolutional calculation should also be considered, so as to which storage participates in calculating successively
Data element, such as by Xz、Yz、Pz、QzIt is stored in the continuous storage unit of a row or column, by Yz、Zz、Qz、RzIt is stored in
In addition in the continuous storage unit of a row or column.
With reference to figure 7, in a continuous storage unit of row of data storage device, X is stored successively0、X1、Y0、Y1、P0、P1、
Q0、Q1.In the continuous storage unit of an other row of data storage device, Y is stored successively0、Y1、Z0、Z1、Q0、Q1、R0、R1。
With store convolution kernel element analogously, in data storage device, positioned at the number of identical x-axis and identical y-axis
According to element (such as X0And X1) be divided into one group and stored successively as adjacent element, it is deposited in the element of identical x-axis and identical y-axis
The next group of weight element with identical x-axis and y-axis (such as Y is stored after storage0And Y1), and so on, it will treat convolved data
It has been stored in matrix with other data elements in convolution kernel size comparable submatrix (such as being marked in the figure 7 with dotted line)
Finish.
Although in example illustrated in fig. 7, convolution kernel and treat that the port number of convolved data is 2, it should be understood that
It is more than 2 convolution kernel and the storage mode for treating that convolved data can also be according to figure layer staggeredly for port number in the present invention.
Preferably, in storage, continuous multiple storage units in storage device are filled up successively, i.e., according to convolution kernel and treat
The matrix distributing order of convolved data, is stored in the storage device.
Preferably, by convolution kernel and/or treat that the element in the matrix of convolved data in same position, different passages connects
Continuous multiple storage units in the storage device are stored continuously.
Preferably, by whole elements under same weight in same convolution kernel and/or it is same treat in convolved data be used for into
Whole elements in the submatrix of row convolution operation are stored in continuous multiple storage units in the storage device.
In Fig. 7 for convenience of explanation, weight storage device and data storage device are arranged to storage different from each other to fill
It puts, it should be understood that the weight storage device and the data storage device can be separately positioned on difference by the present invention
Memory on, can also be stored in the different zones of same memory, such as uniformly be stored in data to be calculated storage
On device.
Also, it will be appreciated by those skilled in the art that the described storage mode of above-described embodiment both can be prior to two
It is worth the calculating process of neutral net, completes offline outside the processor, can also complete, such as locate online on a processor
It manages and completes in the on piece chip of device or stored in a manner of computer program, and the calculating is performed by processor
Machine program.
It stores each convolution kernel using the above-mentioned staggered data mapping mode of figure layer according to the present invention and treats convolution
Each element in data, it is possible to reduce the reusability for being loaded into number, improving data of data.
It is also understood that using above-mentioned " the staggered data mapping mode of figure layer " come store convolution nuclear element and with it is described
The purpose for treating corresponding element in convolved data is to facilitate reading, quickly and easily to determine the input of two-value Convole Unit.
Therefore, all storage locations that can be achieved in the convolution nuclear element and the storage for treating corresponding element in convolved data
The mode of mapping relations is established between position, can be used to store the convolution nuclear element and with the convolved data for the treatment of
Element.
For example, when the length of continuous storage unit is less than 8 bit, such as only 4 bits, to the A in weight 00、A1、
B0、B1、C0、C1、D0And D1Folding storage is carried out, i.e., stores A in continuous storage unit0、A1、B0、B1, and another
C is stored in the continuous storage unit of row0、C1、D0And D1。
It is using the convolution nuclear element stored through the above way and is treating that the respective element in convolved data is rolled up
During product computing, it is suitable for performing by the way of single-instruction multiple-data stream (SIMD) (SIMD), i.e., will be stored by individual instructions
Multiple data are loaded into arithmetic element.It will be detailed in subsequent embodiment for the method that stored data are loaded into and calculated
It is thin to introduce.Mode in this way, it is possible to reduce the bit wide of computing unit, the hardware spending for reducing computing unit.
The comprehensive two-value Convole Unit being hereinbefore previously mentioned and convolution kernel and the storage mode for treating element in convolved data
And method of calling, it can provide that a kind of computing unit position money is few, hardware configuration is relatively easy, for two-value convolutional neural networks
Application specific processor.
With reference to figure 8, according to one embodiment of present invention, a kind of two-value convolutional neural networks processor 10 is provided, wrapped
It includes:
Data scheduling device 101, data storage device to be calculated 102, two-value Convole Unit 103, pond makeup put 104, return
105, binaryzation device 106 is put in one makeup.
Wherein, data storage device 102 to be calculated is used to store the convolution nuclear element and bi-level fashion of bi-level fashion
Treat convolved data.As described in the text, the storage mode should can reflect the element of the convolution kernel for convolutional calculation
With treating the mapping relations in convolved data between corresponding element.For example, convolution kernel member is stored in a manner of interlocking according to figure layer
Element and the element for treating convolved data for participating in calculating successively when convolved data and the size and convolution operation according to convolution kernel
Convolved data is treated to store.Specific storage mode may be referred to previous embodiment.
Data scheduling device 101, for according to the mapping relations, the convolution nuclear element to be treated convolved data with described
In corresponding element be loaded into the two-value Convole Unit.For example, register is set in the data scheduling device 101, and
During use the convolution nuclear element reused will be needed to be loaded into register.
Two-value Convole Unit 103 treats convolution number for the convolution nuclear element to the bi-level fashion and the bi-level fashion
Corresponding element carries out two-value convolution operation in.The two-value Convole Unit 103 may be employed as arbitrary in previous embodiment
A kind of structure, is realized to convolution nuclear element and is treated the computing multiplied of corresponding element in convolved data by XNOR, and is passed through
OR or Hamming weight computing element realization adding up to the computing acquired results by multiplying.
Pond makeup puts 104, and the result for being obtained to convolution carries out pond processing.
Normalized device 105, for the result by pond being normalized operation with the parameter of accelerans network
Training process.
In some embodiments of the invention, can be obtained online at data source for the convolution of two-value convolution operation
Core and/or treat convolved data.It is set to the data of binaryzation since the data obtained differ, in the described embodiment, also
Binaryzation device 106 in two-value convolutional neural networks processor 10 can be set, the data obtained are converted into two-value
Form.Also, the data by two-value conversion can also be stored online by data storage device 102 to be calculated.
It should be appreciated that for being stored offline in data to be calculated in advance before convolutional neural networks calculating is carried out
Convolution kernel is stored in device 102 and/or treats the embodiment of convolved data, it is not necessary in two-value convolutional neural networks processor 10
Binaryzation device 106 is set.
Below with reference to Fig. 9 and Figure 10, it is discussed in detail by specific embodiment and is rolled up using two-value as shown in Figure 8
The process that product neural network processor 10 is calculated.
Fig. 9 is shown according to one embodiment of present invention, is counted using above-mentioned two-value convolutional neural networks processor
The process of calculation.Fig. 9 employs the symbol identical with Fig. 7 to state convolution nuclear element and treat convolved data element, for example, X0、
X1、A0、A1Deng.Wherein, the whole convolution for storing word with one in weight storage matrix and carrying out storing one row and being in same passage
Nuclear element, as shown in the figure, described, it is wide to store word bit is 8 bits, and each element occupies 1 bit.Similarly, convolved data matrix is treated
In one storage word bit wide be equally 8 bits.In addition, in fig.9, the bit wide of XNOR and register group is 2 bits.
In calculating process, it then follows the principle that the data in same convolution kernel add up in same accumulator.Its calculating process is as follows:
Step 1, high two (the i.e. X that will be treated in convolved data0And X1) be loaded into register group;
It is understood with reference to the convolution principle figure shown in figure 2, in fig.9, treats that the element in convolved data will be by repeatedly
Use X0And X1, to calculate A in a subsequent step0X0、B0X0、A1X1、B1X1, it is therefore desirable to 2 ratios that will be treated in convolved data
In special data deposit register.
Step 2, by the front two weighted data (A for treating the first row in convolved data and weight matrix in register group0With
A1) be loaded into XNOR;
Step 3, OR computings are performed to the result of calculation of XNOR by addition unit or calculates Hamming weight;
As described in the text, OR computings or calculating Hamming weight can achieve the effect that " adding ", in this step, Ke Yiji
Calculation draws A0X0And A1X1。
It step 4, will be in addition unit result of calculation input value accumulator 0;
The accumulator 0 is added up for the data in same convolution kernel.
Step 5, by the front two weighted data (a for treating the second row in convolved data and weight matrix in register group0With
a1) be loaded into XNOR;
Step 6, addition unit performs OR computings to the result of calculation of XNOR or calculates Hamming weight, and a is calculated0X0
And a1X1。
Step 7, addition unit result of calculation is inputted into accumulator 1, and so on, by X0And X1It is deposited successively with weight
In storage array the front two weight of eight rows is specified to be calculated;
Step 8, in abovementioned steps similarly, by the 3rd treated in convolved data and the 4th (Y0And Y1) be loaded into
In register group;
Step 9, by treat the first row in convolved data and weight matrix the 3rd in register group and the 4th weight
Data (B0And B1) be loaded into XNOR;
Step 10, OR computings are performed to the result of calculation of XNOR by addition unit or calculates Hamming weight;
Step 11, it is hereafter similar to step 7 with step 5 by addition unit result of calculation input value accumulator 1, by b0
And b1Deng positioned at same column data successively with Y0And Y1It is calculated;
Step 12, when accumulator being obtained the data for exporting figure layer, it is single that accumulator result of calculation is loaded into buffering
Member;
Step 13, after buffer cell obtains output figure layer partial data, output is treated that convolved data is loaded into Chi Huadan
Member carries out pondization operation;
Step 14, pondization operation result of calculation is loaded into batch normalization unit and carries out batch normalization operation;
Step 15, the normalized result of calculation of batch is loaded into binarization unit and carries out binarization operation.
As can be seen that the storage location according to the convolution nuclear element by the way of as described in the text is waited to roll up with described
Existing mapping relations between the storage location of corresponding element in volume data can quickly determine the phase for needing to carry out convolution
Element is answered to be inputted in XNOR.
When storage unit bit wide is less than the matrix bit wide shown in Fig. 9, the matrix can also be rolled over using piecemeal
Folded mode stores convolution nuclear element and treats convolved data element, as shown in Figure 10.Similarly, Figure 10 is also used and Fig. 7
In identical symbol state convolution nuclear element and treat convolved data element, difference lies in when needing to read to belong to treat with Fig. 9
It also needs to consider the position that the data are stored in the register bank during the same data of convolved data data in the block.
By the embodiment of the present invention as can be seen that the present invention is based on binaryzation computing characteristic, provide by simplification
The hardware configuration by performing convolution algorithm and the two-value convolutional neural networks processor based on the structure and it is corresponding based on
Calculation method by reducing the bit wide of the data calculated in calculating process, reaches and improves operation efficiency, reduces memory capacity
And the effect of energy consumption.
Also, the present invention carries out data storage and calculating using the staggered data mapping mode of figure layer, simplifies convolution meter
The process for treating convolved data and convolution Nuclear Data is transferred during calculation, reduce hardware spending and improves data user rate.
It should be noted that each step introduced in above-described embodiment is all not necessary, those skilled in the art
Appropriate choice, replacement, modification etc. can be carried out according to actual needs.
It should be noted last that the above embodiments are merely illustrative of the technical solutions of the present invention and it is unrestricted.On although
Text is described in detail the present invention with reference to embodiment, it will be understood by those of ordinary skill in the art that, to the skill of the present invention
Art scheme is modified or replaced equivalently, and without departure from the spirit and scope of technical solution of the present invention, should all be covered at this
Among the right of invention.
Claims (10)
1. a kind of two-value convolutional neural networks processor, including:
Data storage device to be calculated, for storing the element for treating convolved data of bi-level fashion and the convolution kernel of bi-level fashion
Element;
Two-value Convole Unit treats phase in convolved data for the convolution nuclear element to the bi-level fashion and the bi-level fashion
The element answered carries out two-value convolution operation;
Data scheduling device, for the convolution nuclear element to be treated, corresponding element is loaded into the two-value in convolved data with described
Convole Unit;
Pond makeup is put, and the result for being obtained to convolution carries out pond processing;And
Normalized device, for operation to be normalized to the result by pond.
2. two-value convolutional neural networks processor according to claim 1, wherein the two-value Convole Unit, including:
XNOR, corresponding element in convolved data is treated with the convolution nuclear element of the bi-level fashion and the bi-level fashion
As its input;
Adding up device is inputted output XNOR described as it, for adding up to output XNOR described, with
Export the result of two-value convolution operation;
Wherein, the adding up device includes OR and/or Hamming weight computing unit, wherein,
At least one input OR described is described XNOR of output;
At least one input of the Hamming weight computing unit is described XNOR of output.
3. two-value convolutional neural networks processor according to claim 1, wherein the data storage device to be calculated is also
It is used for online to being obtained the convolution kernel by two-value conversion and/or treating that convolved data stores.
4. two-value convolutional neural networks processor according to claim 3, wherein further including:
Binaryzation device, for by the convolution kernel obtained and/or treating that convolved data is converted to bi-level fashion.
5. two-value convolutional neural networks processor according to claim 1, wherein being provided in the data scheduling device
Register, for being loaded into the convolution nuclear element for needing to reuse when in use.
6. the two-value convolutional neural networks processor according to any one in claim 1-5, in the data to be calculated
The element and the convolution nuclear element that convolved data is treated described in storage device store in a manner that figure layer is interlocked.
7. two-value convolutional neural networks processor according to claim 6, the institute in the data storage device to be calculated
The element for treating convolved data calculated is participated in successively when stating element when convolved data according to the size and convolution operation of convolution kernel
And it stores.
8. two-value convolutional neural networks processor according to claim 7, the institute in the data storage device to be calculated
The storage mode satisfaction for stating the element for treating convolved data and/or the convolution nuclear element is following one or more:
It is stored according to the convolution kernel and the matrix distributing order for treating convolved data;
Convolution kernel and/or treat that the element in the matrix of convolved data in same position, different passages is consecutively stored in
In continuous multiple storage units;
It whole elements in same convolution kernel under same weight and/or same treats in convolved data for carrying out the son of convolution operation
Whole elements in matrix are stored in continuous multiple storage units in the storage device.
9. a kind of application method of two-value convolutional neural networks processor for as described in any one in claim 1-8,
Including:
1) it will treat that convolved data is loaded into register in the data storage device to be calculated;
2) will treat to need to treat with described in convolved data and the data storage device to be calculated described in the register
The element that convolved data performs multiplication is loaded into two-value Convole Unit, to carry out two-value convolution operation;
3) output to the two-value Convole Unit is put by pond makeup and carries out pond processing;
4) operation is normalized in the output put by the normalized device to pond makeup.
10. a kind of computer readable storage medium, wherein being stored with computer program, the computer program is used when executed
In realization method as described in claim 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710316252.9A CN107153873B (en) | 2017-05-08 | 2017-05-08 | A kind of two-value convolutional neural networks processor and its application method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710316252.9A CN107153873B (en) | 2017-05-08 | 2017-05-08 | A kind of two-value convolutional neural networks processor and its application method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107153873A CN107153873A (en) | 2017-09-12 |
CN107153873B true CN107153873B (en) | 2018-06-01 |
Family
ID=59794343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710316252.9A Active CN107153873B (en) | 2017-05-08 | 2017-05-08 | A kind of two-value convolutional neural networks processor and its application method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107153873B (en) |
Families Citing this family (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10839286B2 (en) * | 2017-09-14 | 2020-11-17 | Xilinx, Inc. | System and method for implementing neural networks in integrated circuits |
CN107657312B (en) * | 2017-09-18 | 2021-06-11 | 东南大学 | Binary network implementation system for speech common word recognition |
CN108205704B (en) * | 2017-09-27 | 2021-10-29 | 深圳市商汤科技有限公司 | Neural network chip |
CN109754061B (en) * | 2017-11-07 | 2023-11-24 | 上海寒武纪信息科技有限公司 | Execution method of convolution expansion instruction and related product |
CN107977704B (en) | 2017-11-10 | 2020-07-31 | 中国科学院计算技术研究所 | Weight data storage method and neural network processor based on same |
CN107967132B (en) * | 2017-11-27 | 2020-07-31 | 中国科学院计算技术研究所 | Adder and multiplier for neural network processor |
KR20190066473A (en) * | 2017-12-05 | 2019-06-13 | 삼성전자주식회사 | Method and apparatus for processing convolution operation in neural network |
CN108108811B (en) * | 2017-12-18 | 2021-07-30 | 南京地平线机器人技术有限公司 | Convolution calculation method in neural network and electronic device |
CN109978148B (en) * | 2017-12-28 | 2020-06-23 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related product |
CN109993286B (en) * | 2017-12-29 | 2021-05-11 | 深圳云天励飞技术有限公司 | Sparse neural network computing method and related product |
CN110110283A (en) * | 2018-02-01 | 2019-08-09 | 北京中科晶上科技股份有限公司 | A kind of convolutional calculation method |
CN108829610B (en) * | 2018-04-02 | 2020-08-04 | 浙江大华技术股份有限公司 | Memory management method and device in neural network forward computing process |
CN108647777A (en) * | 2018-05-08 | 2018-10-12 | 济南浪潮高新科技投资发展有限公司 | A kind of data mapped system and method for realizing that parallel-convolution calculates |
CN110147873B (en) * | 2018-05-18 | 2020-02-18 | 中科寒武纪科技股份有限公司 | Convolutional neural network processor and training method |
CN108681773B (en) * | 2018-05-23 | 2020-01-10 | 腾讯科技(深圳)有限公司 | Data operation acceleration method, device, terminal and readable storage medium |
US11599785B2 (en) | 2018-11-13 | 2023-03-07 | International Business Machines Corporation | Inference focus for offline training of SRAM inference engine in binary neural network |
CN110059805B (en) * | 2019-04-15 | 2021-08-31 | 广州异构智能科技有限公司 | Method for a binary array tensor processor |
CN110033086B (en) * | 2019-04-15 | 2022-03-22 | 广州异构智能科技有限公司 | Hardware accelerator for neural network convolution operations |
CN110033085B (en) * | 2019-04-15 | 2021-08-31 | 广州异构智能科技有限公司 | Tensor processor |
CN110046705B (en) * | 2019-04-15 | 2022-03-22 | 广州异构智能科技有限公司 | Apparatus for convolutional neural network |
CN110263809B (en) * | 2019-05-16 | 2022-12-16 | 华南理工大学 | Pooling feature map processing method, target detection method, system, device and medium |
CN110265002B (en) * | 2019-06-04 | 2021-07-23 | 北京清微智能科技有限公司 | Speech recognition method, speech recognition device, computer equipment and computer readable storage medium |
CN111126579B (en) * | 2019-11-05 | 2023-06-27 | 复旦大学 | In-memory computing device suitable for binary convolutional neural network computation |
CN111340208B (en) * | 2020-03-04 | 2023-05-23 | 开放智能机器(上海)有限公司 | Vectorization calculation depth convolution calculation method and device |
CN112596912B (en) * | 2020-12-29 | 2023-03-28 | 清华大学 | Acceleration operation method and device for convolution calculation of binary or ternary neural network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005774A (en) * | 2015-07-28 | 2015-10-28 | 中国科学院自动化研究所 | Face relative relation recognition method based on convolutional neural network and device thereof |
CN105354568A (en) * | 2015-08-24 | 2016-02-24 | 西安电子科技大学 | Convolutional neural network based vehicle logo identification method |
CN105975931A (en) * | 2016-05-04 | 2016-09-28 | 浙江大学 | Convolutional neural network face recognition method based on multi-scale pooling |
-
2017
- 2017-05-08 CN CN201710316252.9A patent/CN107153873B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005774A (en) * | 2015-07-28 | 2015-10-28 | 中国科学院自动化研究所 | Face relative relation recognition method based on convolutional neural network and device thereof |
CN105354568A (en) * | 2015-08-24 | 2016-02-24 | 西安电子科技大学 | Convolutional neural network based vehicle logo identification method |
CN105975931A (en) * | 2016-05-04 | 2016-09-28 | 浙江大学 | Convolutional neural network face recognition method based on multi-scale pooling |
Also Published As
Publication number | Publication date |
---|---|
CN107153873A (en) | 2017-09-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107153873B (en) | A kind of two-value convolutional neural networks processor and its application method | |
CN107203808B (en) | A kind of two-value Convole Unit and corresponding two-value convolutional neural networks processor | |
JP7065877B2 (en) | Chip equipment and related products | |
CN208061184U (en) | Vector processor unit | |
CN105930902B (en) | A kind of processing method of neural network, system | |
CN107578095B (en) | Neural computing device and processor comprising the computing device | |
EP3407266B1 (en) | Artificial neural network calculating device and method for sparse connection | |
CN107578098A (en) | Neural network processor based on systolic arrays | |
CN108009106A (en) | Neural computing module | |
CN106951395A (en) | Towards the parallel convolution operations method and device of compression convolutional neural networks | |
CN106875011A (en) | The hardware structure and its calculation process of two-value weight convolutional neural networks accelerator | |
CN107918794A (en) | Neural network processor based on computing array | |
CN107301456A (en) | Deep neural network multinuclear based on vector processor speeds up to method | |
CN107704916A (en) | A kind of hardware accelerator and method that RNN neutral nets are realized based on FPGA | |
CN108108811A (en) | Convolutional calculation method and electronic equipment in neutral net | |
CN107862374A (en) | Processing with Neural Network system and processing method based on streamline | |
WO2017163208A1 (en) | In memory matrix multiplication and its usage in neural networks | |
CN106951962A (en) | Compound operation unit, method and electronic equipment for neutral net | |
CN112084038B (en) | Memory allocation method and device of neural network | |
CN107423816A (en) | A kind of more computational accuracy Processing with Neural Network method and systems | |
CN110766127B (en) | Neural network computing special circuit and related computing platform and implementation method thereof | |
CN110163356A (en) | A kind of computing device and method | |
CN107085562A (en) | A kind of neural network processor and design method based on efficient multiplexing data flow | |
CN108960414A (en) | Method for realizing single broadcast multiple operations based on deep learning accelerator | |
CN108320018A (en) | A kind of device and method of artificial neural network operation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |