CN107153873B - A kind of two-value convolutional neural networks processor and its application method - Google Patents

A kind of two-value convolutional neural networks processor and its application method Download PDF

Info

Publication number
CN107153873B
CN107153873B CN201710316252.9A CN201710316252A CN107153873B CN 107153873 B CN107153873 B CN 107153873B CN 201710316252 A CN201710316252 A CN 201710316252A CN 107153873 B CN107153873 B CN 107153873B
Authority
CN
China
Prior art keywords
data
value
convolution
neural networks
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710316252.9A
Other languages
Chinese (zh)
Other versions
CN107153873A (en
Inventor
韩银和
许浩博
王颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201710316252.9A priority Critical patent/CN107153873B/en
Publication of CN107153873A publication Critical patent/CN107153873A/en
Application granted granted Critical
Publication of CN107153873B publication Critical patent/CN107153873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention provides a kind of two-value convolutional neural networks processor, including:Data storage device to be calculated, for storing the convolution nuclear element of the element for treating convolved data of bi-level fashion and bi-level fashion;Two-value Convole Unit treats that corresponding element carries out two-value convolution operation in convolved data for the convolution nuclear element to the bi-level fashion and the bi-level fashion;Data scheduling device, for the convolution nuclear element to be treated, corresponding element is loaded into the two-value Convole Unit in convolved data with described;Pond makeup is put, and the result for being obtained to convolution carries out pond processing;And normalized device, for operation to be normalized to the result by pond.

Description

A kind of two-value convolutional neural networks processor and its application method
Technical field
The present invention relates to the storages and scheduling of data in being calculated for neural network model.
Background technology
With the development of artificial intelligence technology, it is related to the technology of deep neural network, especially convolutional neural networks near Development at full speed is obtained within several years, in image identification, speech recognition, natural language understanding, weather forecasting, gene expression, content Recommend to achieve with fields such as intelligent robots and be widely applied.
The deep neural network is construed as a kind of operational model, wherein comprising mass data node, per number It is connected according to node with other back end, the connection relation between each node is represented with weight.With deep neural network not Disconnected development, complexity are also improved constantly.
In order to weigh the contradiction between complexity and operation effect, in bibliography:Courbariaux M,Hubara I, Soudry D,et al.Binarized neural networks:Training deep neural networks with weights and activations constrained to+1or-1[J].arXiv preprint arXiv: It is proposed in 1602.02830,2016. and " two-value convolutional neural networks model " may be employed to reduce answering for traditional neural network Miscellaneous degree.In the two-value convolutional neural networks, weight, input data, output data in convolutional neural networks use " two Value form " approx represents its size by " 1 " and " -1 ", such as represents the numerical value more than or equal to 0 with " 1 ", and with " - 1 " represents less than 0 numerical value.By the above-mentioned means, the data bit width that operation is used in neutral net is reduced, thus greatly Reduce to degree required parameter capacity, cause two-value convolutional neural networks be particularly suitable for object end realize image identification, Augmented reality and virtual reality.
Generally use general computer processor runs deep neural network, such as central processing in the prior art Device (CPU) and graphics processor (GPU) etc..However, and there is no the application specific processors for two-value convolutional neural networks.It is general Computer processor computing unit bit wide be usually more bits, calculate binary neural network can generate the wasting of resources.
The content of the invention
Therefore, it is an object of the invention to overcome above-mentioned prior art the defects of, provides a kind of two-value convolutional neural networks Processor, including:
Data storage device to be calculated, for storing the element for treating convolved data of bi-level fashion and the volume of bi-level fashion Product nuclear element;
Two-value Convole Unit treats convolved data for the convolution nuclear element to the bi-level fashion and the bi-level fashion In corresponding element carry out two-value convolution operation;
Data scheduling device, for the convolution nuclear element to be treated in convolved data with described described in corresponding element loading Two-value Convole Unit;
Pond makeup is put, and the result for being obtained to convolution carries out pond processing;And
Normalized device, for operation to be normalized to the result by pond.
Preferably, according to the two-value convolutional neural networks processor, wherein the two-value Convole Unit, including:
XNOR, with the convolution nuclear element of the bi-level fashion and the bi-level fashion treat it is corresponding in convolved data Element is inputted as it;
Adding up device is inputted output XNOR described as it, for tiring out to output XNOR described Add, to export the result of two-value convolution operation;
Wherein, the adding up device includes OR and/or Hamming weight computing unit, wherein,
At least one input OR described is described XNOR of output;
At least one input of the Hamming weight computing unit is described XNOR of output.
Preferably, according to the two-value convolutional neural networks processor, wherein the data storage device to be calculated is also It is used for online to being obtained the convolution kernel by two-value conversion and/or treating that convolved data stores.
Preferably, according to the two-value convolutional neural networks processor, wherein further including:
Binaryzation device, for by the convolution kernel obtained and/or treating that convolved data is converted to bi-level fashion.
Preferably, according to the two-value convolutional neural networks processor, wherein being provided in the data scheduling device Register, for being loaded into the convolution nuclear element for needing to reuse when in use.
Preferably, the two-value convolutional neural networks processor according to above-mentioned any one, in the data to be calculated The element and the convolution nuclear element that convolved data is treated described in storage device store in a manner that figure layer is interlocked.
Preferably, according to the two-value convolutional neural networks processor, the institute in the data storage device to be calculated The element for treating convolved data calculated is participated in successively when stating element when convolved data according to the size and convolution operation of convolution kernel And it stores.
Preferably, according to the two-value convolutional neural networks processor, the institute in the data storage device to be calculated The storage mode satisfaction for stating the element for treating convolved data and/or the convolution nuclear element is following one or more:
It is stored according to the convolution kernel and the matrix distributing order for treating convolved data;
Convolution kernel and/or treat that the element in the matrix of convolved data in same position, different passages continuously stores In continuous multiple storage units;
It whole elements in same convolution kernel under same weight and/or same treats in convolved data for carrying out convolution operation Submatrix in whole elements storage in the storage device in continuous multiple storage units.
Also, the present invention also provides a kind of two-value convolutional neural networks processors for described in above-mentioned any one Application method, including:
1) it will treat that convolved data is loaded into register in the data storage device to be calculated;
2) will treat to need in convolved data and the data storage device to be calculated described in the register and institute It states and treats that the element of convolved data execution multiplication is loaded into two-value Convole Unit, to carry out two-value convolution operation;
3) output to the two-value Convole Unit is put by pond makeup and carries out pond processing;
4) operation is normalized in the output put by the normalized device to pond makeup.
And a kind of computer readable storage medium, wherein being stored with computer program, the computer program is being held Above-mentioned method is used to implement during row.
Compared with prior art, the advantage of the invention is that:
Provide the hardware configuration for being used to perform convolution algorithm by simplified and the two-value convolution god based on the structure Through network processing unit and corresponding computational methods, by reducing the bit wide of the data calculated in calculating process, reach and carry High operation efficiency, the effect for reducing memory capacity and energy consumption.
Description of the drawings
Embodiments of the present invention is further illustrated referring to the drawings, wherein:
Fig. 1 is the schematic diagram of the multilayered structure of neutral net;
Fig. 2 is the schematic diagram that convolutional calculation is carried out in two-dimensional space;
Fig. 3 is the hardware architecture diagram of two-value Convole Unit according to an embodiment of the invention;
Fig. 4 is the hardware architecture diagram of the two-value Convole Unit of another embodiment according to the present invention;
Fig. 5 is the hardware architecture diagram of the two-value Convole Unit of another embodiment according to the present invention;
Fig. 6 a~6c show hardware configuration signal of the present invention using the two-value Convole Unit of Hamming weight computing element Figure;
Fig. 7 is the convolution kernel i.e. weight 0 and weight 1 to multichannel and to treat convolution number according to one embodiment of present invention According to the schematic diagram stored;
Fig. 8 is the schematic diagram of the structure of two-value convolutional neural networks processor according to an embodiment of the invention;
Fig. 9 is the signal calculated according to one embodiment of present invention using two-value convolutional neural networks processor Figure;
Figure 10 is to be shown according to still another embodiment of the invention using what two-value convolutional neural networks processor was calculated It is intended to.
Specific embodiment
It elaborates with reference to the accompanying drawings and detailed description to the present invention.
It is a kind of mathematical modulo for copying biologically nerve synapse coupling structure to cause neutral net in Computer Subject Type can realize the various functions such as machine learning, pattern-recognition using the application system being made of neutral net.
The neutral net is divided into multilayer in structure, and Fig. 1 shows a kind of schematic diagram of neutral net multilayered structure.Ginseng Examine Fig. 1, the first layer in the multilayered structure is input layer, last layer is output layer, remaining each layer is hidden layer.It is using During the neutral net, original image, i.e. input layer figure layer are inputted to input layer, (" image ", " figure in the present invention Layer " refers to pending initial data, is not only the image obtained by shooting photo of narrow sense), by neutral net Each layer the figure layer inputted is processed and result is input in next layer of neutral net, it is and most defeated at last Go out the output of layer as the result exported.
As described in the text, in order to tackle the increasingly complicated structure of neutral net, the prior art proposes a kind of two-value The concept of convolutional neural networks.As its name suggests, the computing of two-value convolutional neural networks includes carrying out " volume to the data inputted Product " operation, and it further includes the operations such as " pond ", " normalization ", " binaryzation ".
It is operated as one important in two-value convolutional neural networks,.The meter of " convolution " will be discussed in detail by Fig. 2 below Calculation process.
Fig. 2 show in two-dimensional space be to size using the convolution kernel that size is 3 " two-values " for multiplying 35 multiply 5 " two The image of value " carries out the calculating process of convolution.With reference to figure 2, first against image 1-3 rows from top to bottom, from left to right Corresponding element and each element multiplication in convolution kernel is respectively adopted in each element in the range of 1-3 row:For example, The element (being expressed as " convolution kernel (1,1) ") arranged using the 1st row the 1st in convolution kernel is multiplied by the element that the 1st row the 1st arranges in image (being expressed as " image (1,1) ") obtains 1 × 1=1, and the convolution kernel (1,2) arranged using the 1st row the 2nd in convolution kernel is multiplied by image The element image (1,2) of 1st row the 2nd row obtains 1 × 0=0, and similarly calculating convolution kernel (1,3) is multiplied by image (1,3) and obtains 1 × 1=1, and so on be calculated 9 results and by this 9 results addeds obtain 1+0+1+0+1+0+0+0+1=4 using as The element that the 1st row the 1st arranges in convolution results, convolution results (1,1).Similarly, calculate convolution kernel (1,1) be multiplied by image (1,2), Convolution kernel (1,2) is multiplied by image (1,3), convolution kernel (1,3) is multiplied by image (Isosorbide-5-Nitrae), convolution kernel (2,1) is multiplied by image (2,2) ..., And so on calculate 1+0+0+1+0+0+0+1=3 using as convolution results (1,2).Using aforesaid way can calculate as Size illustrated in fig. 2 multiplies 3 convolution results matrix for 3.
The convolution results as illustrated in FIG. 2 acquired are input into the two of next layer by buffering and binary conversion treatment It is worth in convolutional neural networks.
The examples discussed show " multiplying " and " adding " included by the calculating process of convolution or the computings of " cumulative summation ".
Inventors realized that based on characteristic specific to the multiplying of two-value so that " multiplying " in two-value convolution algorithm can It is replaced by " exclusive or non-exclusive " computing, i.e., can be completed merely with logic element XNOR in the prior art must be using multiplying The computing of musical instruments used in a Buddhist or Taoist mass ability achievable " multiplying ".As can be seen that it is more simple compared to traditional convolution based on the convolution process of two-value, It is without carrying out the multiplying that such as " 2 × 4 " are so complicated, when carrying out the computing of " multiplying ", if carrying out the member of multiplying It for the result that " 0 " is then obtained just is " 0 " to have any one in element, if carry out multiplying whole elements be " 1 " if institute The result of acquisition is just " 1 ".
It will can be replaced in the present invention using XNOR gating elements by a specific example to be described in detail below The principle of multiplier.
When actually using the convolution of binaryzation, two-value can be carried out to the non-two-value numerical value z in image and convolution kernel first Change is handled, i.e.,:
Wherein, the numerical value z two-values equal to 0 be will be greater than and turn to " 1 " to be used for the symbol " 1 " of convolution algorithm in representative graph 2, it will Numerical value z two-values less than 0 turn to " -1 " to be used for the symbol " 0 " of convolution algorithm in representative graph 2.
" exclusive or non-exclusive " computing is carried out to the value of the image Jing Guo binary conversion treatment and convolution kernel, i.e.,There are following several situations:
Input A Input B Export F Symbol
-1 -1 1 1
-1 1 -1 0
1 -1 -1 0
1 1 1 1
It can be seen that by above-mentioned truth table when the numerical value for binaryzation carries out the computing of " multiplying ", use may be employed Multiplier is replaced in logic element XNOR for performing " exclusive or non-exclusive " computing.And as known in the art, the complexity of multiplier Degree is far above logic element XNOR.
Therefore, inventor thinks to replace the multiplier in conventional processors by using logic element XNOR, can be with The processor that two-value convolutional neural networks are greatly reduced uses the complexity of device.
In addition, inventor is also realized that based on characteristic specific to the add operation of two-value so that above-mentioned two-value convolution fortune " adding " in calculation can be replaced by inclusive-OR operation, you can just to replace being used in the prior art using logic element OR Adder.This is because, G=F can be expressed as to the result of the inclusive-OR operation of output progress XNOR above-mentioned1+F2...+ Fn, and the result G of final output single-bit, wherein FkRepresent k-th of the output of XNOR, n represents that its output is used as OR Input XNOR doors sum.
Above-mentioned analysis based on inventor, the present invention provides a kind of two-value convolutional neural networks processors of can be used for Two-value Convole Unit using the multiplying based on two-value, the characteristic of add operation, simplifies and is used to perform volume in processor Thus the composition of the hardware of product computing improves the speed of convolution algorithm, reduces the overall energy consumption of processor.
Fig. 3 shows the hardware configuration of two-value Convole Unit according to an embodiment of the invention.It as shown in figure 3, should Two-value Convole Unit includes 9 XNOR and 1 OR, and all output of 9 XNOR is used as input OR described. When carrying out convolution algorithm, n is calculated respectively by each XNOR1×w1、n2×w2…n9×w9, to obtain output F1~F9;OR By F1~F9As its input, first element G in convolution results is exported1.Similarly, using same convolution kernel, for figure Other regions as in are calculated, and can be obtained the size of the other elements in convolution results, no longer be repeated herein.
In the embodiment illustrated in figure 3, the calculating concurrently multiplied using multiple XNOR, improves convolutional calculation Rate.It should be appreciated, however, that the hardware configuration of the two-value Convole Unit can also be deformed in the present invention, below It will be illustrated by other several embodiments.
Fig. 4 shows the hardware configuration of two-value Convole Unit according to still another embodiment of the invention.As shown in figure 4, The two-value Convole Unit includes 1 XNOR, 1 OR and a register, and the register is for storing OR defeated Go out and value that it is stored is used as one of input OR described, and another input OR described is described The output of XNOR.When carrying out convolution algorithm, according to the propulsion at moment, respectively first to the 9th moment by n1And w1、n2 And w2、…n9And w9As the input of XNOR, correspondingly F are exported from XNOR corresponding to each moment1、F2…F9Using as OR One of input of door, and using the result exported in previous moment from OR stored in register as OR Another input.For example, as XNOR output F1(its size is equal to n1×w1) when, the symbol to prestore is read out from register " 0 " using it with F1 together as the input of OR, and from OR output F1;F is exported when XNOR2(its size is equal to n2×w2) When, F is read out from register1By itself and F2Together as the input of OR, and F are exported from OR1+F2, and so on until Output is for F1~F9Accumulation result G1
In embodiment illustrated in fig. 4, by increasing to XNOR and the reusability of OR, reduce used element Quantity, and be that only for tool there are two the OR doors of input terminal, hardware complexity is lower used by the program.
Fig. 5 shows the hardware configuration of two-value Convole Unit according to still another embodiment of the invention.The embodiment with Embodiment illustrated in fig. 4 is similar, only with one XNOR, one OR and a register, the difference is that in Fig. 5 Input XNOR middle is stored into the register that can store multidigit result simultaneously, and each result in register by with Make the input of OR.Embodiment in the application method and Fig. 4 of the embodiment is similar, is that XNOR are multiplexed, no Same is that Fig. 5 can preserve each moment XNOR results exported deposits in the register of multidigit result simultaneously, and Obtain whole F1~F9Afterwards, by the computing of OR progress "or" to export G1
In the embodiment provided in Fig. 3 of the present invention, 4,5, OR are employed to realize the function of " adding " or " cumulative ", And input OR described causes to be finally single bit value from the result of OR outputs both from the output of XNOR, It is possible thereby to simplify calculating process, increase arithmetic speed.The hardware configuration that the program is provided is particularly suitable for for two-value god Application specific processor through network, this is because binary neural network represents the weight in neutral net using numerical value " 1 " and " -1 " And data, in neural computing process there are a large amount of multiplication and add operation, reducing that calculating operation numerical digit is wide can be effectively Reduce computation complexity.
However, realize that the scheme of the function of " adding " or " cumulative " is that single-bit calculates using OR due to above-mentioned, because And a degree of error can be introduced.In this regard, the present invention also provides a kind of optional scheme, i.e., using Hamming weight Computing Meta Part replaces the OR doors as shown in Fig. 3,4,5 to realize the function of " adding " or " cumulative ".Fig. 6 a~6c are shown with the Chinese The hardware configuration of bright weight computing element, in the optional scheme, Hamming weight computing element makees the output of XNOR It is inputted for it, the data of logical one, i.e. Hamming weight in output institute output data.The scheme and the above-mentioned side using OR Case is similar, can equally achieve the effect that simplified calculating process, and the program can also realize accurately sum operation.
Inventor find, based on above-mentioned two-value Convole Unit provided by the present invention for " multiply " each time and it is " cumulative " count It calculates, operated is the data of individual bit, and what is exported by the two-value Convole Unit is also individual bit Data, and such feature is particularly suitable for participating in convolution fortune using " the staggered data mapping mode of figure layer " to store and dispatch Obtained data are calculated and calculated, data loading number is reduced so as to reach, the locality of data is made full use of to improve data The effect of recycling rate of waterused.
" the staggered data mapping mode of figure layer " in the present invention refers to the direction according to passage (Channel) It by convolution kernel and treats that each element in convolved data is stored successively into every a line of storage device, i.e., counts in the storage device It is stored in the way of according to interlocking by figure layer, adjacent two data elements are from different passages rather than same passage. As shown in fig. 7, in the present invention, the convolution kernel in same z-axis corresponds to same " passage " with the element for treating convolved data, that is, has The element for having identical z values belongs to same passage.
The data calculation is specifically described to be more vivid, Fig. 7 is with the convolution kernel weight 0 of (x, y, z)=2*2*2 With convolution kernel weight 1, with (x, y, z)=2*3*2 treat convolved data exemplified by, elaborate it is provided by the invention be suitable for two-value The staggered data mapping mode of figure layer of convolutional neural networks.With reference to figure 7, the element in weight 0 and weight 1 is according to the element institute The spatial position at place is respectively divided into four groups:Wherein, four groups of weights of weight 0 are respectively Az、Bz、CzAnd Dz, as shown in the figure, Z is 0,1;Four groups of weights of weight 1 are respectively az、bz、czAnd dz, as shown in the figure, z is 0,1.
With reference to figure 7, according to one embodiment of present invention, in the following manner may be employed to store convolution kernel weight 0, convolution Core weight 1 and treat each element in convolved data.
In Fig. 7, for convenience of explanation, according to the size and step size of each convolution kernel, by the three of weight 0 and weight 1 Element in dimension matrix is divided into two two-dimensional matrixes according to residing passage, such as weight 0 is divided by A0、B0、C0、D0 The two-dimensional matrix that is formed and by A1、B1、C1、D1The two-dimensional matrix formed;Similarly, the three-dimensional matrice of convolved data will be treated In element be divided into two two-dimensional matrixes according to residing passage, i.e., by X0、Y0、Z0、P0、Q0、R0The two-dimensional matrix formed With by X1、Y1、Z1、P1、Q1、R1The two-dimensional matrix formed.
When storing convolution kernel weight 0, in the continuous storage unit of a line of weight storage device, weight is stored successively Elements A in 00、A1、B0、B1、C0、C1、D0And D1, totally 8 bits.As can be seen that in the memory unit, adjacent two elements Come from different passages, such as A each other0And A1Respectively from different passages, A1And B0Also from different passages, according to Such mode is the storage mode described above to interlock according to figure layer.
When storing convolution kernel weight 1, in the continuous storage unit of other a line of the weight storage device, deposit successively Store up a of the element in weight 10、a1、b0、b1、c0、c1、d0And d1, totally 8 bits.With the storage mode of weight 0 similarly, it is adjacent Two elements be similarly from different passages.
In weight storage device, positioned at the weight element of identical x-axis and identical y-axis (such as A0And A1) it is used as adjacent member Element stores successively, and the next group of weight with identical x-axis and y-axis is stored after the element of identical x-axis and identical y-axis stores Element (such as B0And B1), and so on, other weight element storages in convolution kernel are finished.
It, can be according to the data for participating in calculating successively when the size and convolution operation of convolution kernel when storage is when convolved data Element is stored.With reference to convolutional calculation illustrated in fig. 2 rule, it is known that need first against Az Xz、Bz Yz、Cz PzAnd DzQz It is calculated, then for Az Yz、Bz Zz、Cz QzAnd DzRzIt is calculated.Therefore, each element of convolved data is treated in storage When, in addition to the storage mode to interlock according to figure layer, the rule of convolutional calculation should also be considered, so as to which storage participates in calculating successively Data element, such as by Xz、Yz、Pz、QzIt is stored in the continuous storage unit of a row or column, by Yz、Zz、Qz、RzIt is stored in In addition in the continuous storage unit of a row or column.
With reference to figure 7, in a continuous storage unit of row of data storage device, X is stored successively0、X1、Y0、Y1、P0、P1、 Q0、Q1.In the continuous storage unit of an other row of data storage device, Y is stored successively0、Y1、Z0、Z1、Q0、Q1、R0、R1
With store convolution kernel element analogously, in data storage device, positioned at the number of identical x-axis and identical y-axis According to element (such as X0And X1) be divided into one group and stored successively as adjacent element, it is deposited in the element of identical x-axis and identical y-axis The next group of weight element with identical x-axis and y-axis (such as Y is stored after storage0And Y1), and so on, it will treat convolved data It has been stored in matrix with other data elements in convolution kernel size comparable submatrix (such as being marked in the figure 7 with dotted line) Finish.
Although in example illustrated in fig. 7, convolution kernel and treat that the port number of convolved data is 2, it should be understood that It is more than 2 convolution kernel and the storage mode for treating that convolved data can also be according to figure layer staggeredly for port number in the present invention.
Preferably, in storage, continuous multiple storage units in storage device are filled up successively, i.e., according to convolution kernel and treat The matrix distributing order of convolved data, is stored in the storage device.
Preferably, by convolution kernel and/or treat that the element in the matrix of convolved data in same position, different passages connects Continuous multiple storage units in the storage device are stored continuously.
Preferably, by whole elements under same weight in same convolution kernel and/or it is same treat in convolved data be used for into Whole elements in the submatrix of row convolution operation are stored in continuous multiple storage units in the storage device.
In Fig. 7 for convenience of explanation, weight storage device and data storage device are arranged to storage different from each other to fill It puts, it should be understood that the weight storage device and the data storage device can be separately positioned on difference by the present invention Memory on, can also be stored in the different zones of same memory, such as uniformly be stored in data to be calculated storage On device.
Also, it will be appreciated by those skilled in the art that the described storage mode of above-described embodiment both can be prior to two It is worth the calculating process of neutral net, completes offline outside the processor, can also complete, such as locate online on a processor It manages and completes in the on piece chip of device or stored in a manner of computer program, and the calculating is performed by processor Machine program.
It stores each convolution kernel using the above-mentioned staggered data mapping mode of figure layer according to the present invention and treats convolution Each element in data, it is possible to reduce the reusability for being loaded into number, improving data of data.
It is also understood that using above-mentioned " the staggered data mapping mode of figure layer " come store convolution nuclear element and with it is described The purpose for treating corresponding element in convolved data is to facilitate reading, quickly and easily to determine the input of two-value Convole Unit. Therefore, all storage locations that can be achieved in the convolution nuclear element and the storage for treating corresponding element in convolved data The mode of mapping relations is established between position, can be used to store the convolution nuclear element and with the convolved data for the treatment of Element.
For example, when the length of continuous storage unit is less than 8 bit, such as only 4 bits, to the A in weight 00、A1、 B0、B1、C0、C1、D0And D1Folding storage is carried out, i.e., stores A in continuous storage unit0、A1、B0、B1, and another C is stored in the continuous storage unit of row0、C1、D0And D1
It is using the convolution nuclear element stored through the above way and is treating that the respective element in convolved data is rolled up During product computing, it is suitable for performing by the way of single-instruction multiple-data stream (SIMD) (SIMD), i.e., will be stored by individual instructions Multiple data are loaded into arithmetic element.It will be detailed in subsequent embodiment for the method that stored data are loaded into and calculated It is thin to introduce.Mode in this way, it is possible to reduce the bit wide of computing unit, the hardware spending for reducing computing unit.
The comprehensive two-value Convole Unit being hereinbefore previously mentioned and convolution kernel and the storage mode for treating element in convolved data And method of calling, it can provide that a kind of computing unit position money is few, hardware configuration is relatively easy, for two-value convolutional neural networks Application specific processor.
With reference to figure 8, according to one embodiment of present invention, a kind of two-value convolutional neural networks processor 10 is provided, wrapped It includes:
Data scheduling device 101, data storage device to be calculated 102, two-value Convole Unit 103, pond makeup put 104, return 105, binaryzation device 106 is put in one makeup.
Wherein, data storage device 102 to be calculated is used to store the convolution nuclear element and bi-level fashion of bi-level fashion Treat convolved data.As described in the text, the storage mode should can reflect the element of the convolution kernel for convolutional calculation With treating the mapping relations in convolved data between corresponding element.For example, convolution kernel member is stored in a manner of interlocking according to figure layer Element and the element for treating convolved data for participating in calculating successively when convolved data and the size and convolution operation according to convolution kernel Convolved data is treated to store.Specific storage mode may be referred to previous embodiment.
Data scheduling device 101, for according to the mapping relations, the convolution nuclear element to be treated convolved data with described In corresponding element be loaded into the two-value Convole Unit.For example, register is set in the data scheduling device 101, and During use the convolution nuclear element reused will be needed to be loaded into register.
Two-value Convole Unit 103 treats convolution number for the convolution nuclear element to the bi-level fashion and the bi-level fashion Corresponding element carries out two-value convolution operation in.The two-value Convole Unit 103 may be employed as arbitrary in previous embodiment A kind of structure, is realized to convolution nuclear element and is treated the computing multiplied of corresponding element in convolved data by XNOR, and is passed through OR or Hamming weight computing element realization adding up to the computing acquired results by multiplying.
Pond makeup puts 104, and the result for being obtained to convolution carries out pond processing.
Normalized device 105, for the result by pond being normalized operation with the parameter of accelerans network Training process.
In some embodiments of the invention, can be obtained online at data source for the convolution of two-value convolution operation Core and/or treat convolved data.It is set to the data of binaryzation since the data obtained differ, in the described embodiment, also Binaryzation device 106 in two-value convolutional neural networks processor 10 can be set, the data obtained are converted into two-value Form.Also, the data by two-value conversion can also be stored online by data storage device 102 to be calculated.
It should be appreciated that for being stored offline in data to be calculated in advance before convolutional neural networks calculating is carried out Convolution kernel is stored in device 102 and/or treats the embodiment of convolved data, it is not necessary in two-value convolutional neural networks processor 10 Binaryzation device 106 is set.
Below with reference to Fig. 9 and Figure 10, it is discussed in detail by specific embodiment and is rolled up using two-value as shown in Figure 8 The process that product neural network processor 10 is calculated.
Fig. 9 is shown according to one embodiment of present invention, is counted using above-mentioned two-value convolutional neural networks processor The process of calculation.Fig. 9 employs the symbol identical with Fig. 7 to state convolution nuclear element and treat convolved data element, for example, X0、 X1、A0、A1Deng.Wherein, the whole convolution for storing word with one in weight storage matrix and carrying out storing one row and being in same passage Nuclear element, as shown in the figure, described, it is wide to store word bit is 8 bits, and each element occupies 1 bit.Similarly, convolved data matrix is treated In one storage word bit wide be equally 8 bits.In addition, in fig.9, the bit wide of XNOR and register group is 2 bits. In calculating process, it then follows the principle that the data in same convolution kernel add up in same accumulator.Its calculating process is as follows:
Step 1, high two (the i.e. X that will be treated in convolved data0And X1) be loaded into register group;
It is understood with reference to the convolution principle figure shown in figure 2, in fig.9, treats that the element in convolved data will be by repeatedly Use X0And X1, to calculate A in a subsequent step0X0、B0X0、A1X1、B1X1, it is therefore desirable to 2 ratios that will be treated in convolved data In special data deposit register.
Step 2, by the front two weighted data (A for treating the first row in convolved data and weight matrix in register group0With A1) be loaded into XNOR;
Step 3, OR computings are performed to the result of calculation of XNOR by addition unit or calculates Hamming weight;
As described in the text, OR computings or calculating Hamming weight can achieve the effect that " adding ", in this step, Ke Yiji Calculation draws A0X0And A1X1
It step 4, will be in addition unit result of calculation input value accumulator 0;
The accumulator 0 is added up for the data in same convolution kernel.
Step 5, by the front two weighted data (a for treating the second row in convolved data and weight matrix in register group0With a1) be loaded into XNOR;
Step 6, addition unit performs OR computings to the result of calculation of XNOR or calculates Hamming weight, and a is calculated0X0 And a1X1
Step 7, addition unit result of calculation is inputted into accumulator 1, and so on, by X0And X1It is deposited successively with weight In storage array the front two weight of eight rows is specified to be calculated;
Step 8, in abovementioned steps similarly, by the 3rd treated in convolved data and the 4th (Y0And Y1) be loaded into In register group;
Step 9, by treat the first row in convolved data and weight matrix the 3rd in register group and the 4th weight Data (B0And B1) be loaded into XNOR;
Step 10, OR computings are performed to the result of calculation of XNOR by addition unit or calculates Hamming weight;
Step 11, it is hereafter similar to step 7 with step 5 by addition unit result of calculation input value accumulator 1, by b0 And b1Deng positioned at same column data successively with Y0And Y1It is calculated;
Step 12, when accumulator being obtained the data for exporting figure layer, it is single that accumulator result of calculation is loaded into buffering Member;
Step 13, after buffer cell obtains output figure layer partial data, output is treated that convolved data is loaded into Chi Huadan Member carries out pondization operation;
Step 14, pondization operation result of calculation is loaded into batch normalization unit and carries out batch normalization operation;
Step 15, the normalized result of calculation of batch is loaded into binarization unit and carries out binarization operation.
As can be seen that the storage location according to the convolution nuclear element by the way of as described in the text is waited to roll up with described Existing mapping relations between the storage location of corresponding element in volume data can quickly determine the phase for needing to carry out convolution Element is answered to be inputted in XNOR.
When storage unit bit wide is less than the matrix bit wide shown in Fig. 9, the matrix can also be rolled over using piecemeal Folded mode stores convolution nuclear element and treats convolved data element, as shown in Figure 10.Similarly, Figure 10 is also used and Fig. 7 In identical symbol state convolution nuclear element and treat convolved data element, difference lies in when needing to read to belong to treat with Fig. 9 It also needs to consider the position that the data are stored in the register bank during the same data of convolved data data in the block.
By the embodiment of the present invention as can be seen that the present invention is based on binaryzation computing characteristic, provide by simplification The hardware configuration by performing convolution algorithm and the two-value convolutional neural networks processor based on the structure and it is corresponding based on Calculation method by reducing the bit wide of the data calculated in calculating process, reaches and improves operation efficiency, reduces memory capacity And the effect of energy consumption.
Also, the present invention carries out data storage and calculating using the staggered data mapping mode of figure layer, simplifies convolution meter The process for treating convolved data and convolution Nuclear Data is transferred during calculation, reduce hardware spending and improves data user rate.
It should be noted that each step introduced in above-described embodiment is all not necessary, those skilled in the art Appropriate choice, replacement, modification etc. can be carried out according to actual needs.
It should be noted last that the above embodiments are merely illustrative of the technical solutions of the present invention and it is unrestricted.On although Text is described in detail the present invention with reference to embodiment, it will be understood by those of ordinary skill in the art that, to the skill of the present invention Art scheme is modified or replaced equivalently, and without departure from the spirit and scope of technical solution of the present invention, should all be covered at this Among the right of invention.

Claims (10)

1. a kind of two-value convolutional neural networks processor, including:
Data storage device to be calculated, for storing the element for treating convolved data of bi-level fashion and the convolution kernel of bi-level fashion Element;
Two-value Convole Unit treats phase in convolved data for the convolution nuclear element to the bi-level fashion and the bi-level fashion The element answered carries out two-value convolution operation;
Data scheduling device, for the convolution nuclear element to be treated, corresponding element is loaded into the two-value in convolved data with described Convole Unit;
Pond makeup is put, and the result for being obtained to convolution carries out pond processing;And
Normalized device, for operation to be normalized to the result by pond.
2. two-value convolutional neural networks processor according to claim 1, wherein the two-value Convole Unit, including:
XNOR, corresponding element in convolved data is treated with the convolution nuclear element of the bi-level fashion and the bi-level fashion As its input;
Adding up device is inputted output XNOR described as it, for adding up to output XNOR described, with Export the result of two-value convolution operation;
Wherein, the adding up device includes OR and/or Hamming weight computing unit, wherein,
At least one input OR described is described XNOR of output;
At least one input of the Hamming weight computing unit is described XNOR of output.
3. two-value convolutional neural networks processor according to claim 1, wherein the data storage device to be calculated is also It is used for online to being obtained the convolution kernel by two-value conversion and/or treating that convolved data stores.
4. two-value convolutional neural networks processor according to claim 3, wherein further including:
Binaryzation device, for by the convolution kernel obtained and/or treating that convolved data is converted to bi-level fashion.
5. two-value convolutional neural networks processor according to claim 1, wherein being provided in the data scheduling device Register, for being loaded into the convolution nuclear element for needing to reuse when in use.
6. the two-value convolutional neural networks processor according to any one in claim 1-5, in the data to be calculated The element and the convolution nuclear element that convolved data is treated described in storage device store in a manner that figure layer is interlocked.
7. two-value convolutional neural networks processor according to claim 6, the institute in the data storage device to be calculated The element for treating convolved data calculated is participated in successively when stating element when convolved data according to the size and convolution operation of convolution kernel And it stores.
8. two-value convolutional neural networks processor according to claim 7, the institute in the data storage device to be calculated The storage mode satisfaction for stating the element for treating convolved data and/or the convolution nuclear element is following one or more:
It is stored according to the convolution kernel and the matrix distributing order for treating convolved data;
Convolution kernel and/or treat that the element in the matrix of convolved data in same position, different passages is consecutively stored in In continuous multiple storage units;
It whole elements in same convolution kernel under same weight and/or same treats in convolved data for carrying out the son of convolution operation Whole elements in matrix are stored in continuous multiple storage units in the storage device.
9. a kind of application method of two-value convolutional neural networks processor for as described in any one in claim 1-8, Including:
1) it will treat that convolved data is loaded into register in the data storage device to be calculated;
2) will treat to need to treat with described in convolved data and the data storage device to be calculated described in the register The element that convolved data performs multiplication is loaded into two-value Convole Unit, to carry out two-value convolution operation;
3) output to the two-value Convole Unit is put by pond makeup and carries out pond processing;
4) operation is normalized in the output put by the normalized device to pond makeup.
10. a kind of computer readable storage medium, wherein being stored with computer program, the computer program is used when executed In realization method as described in claim 9.
CN201710316252.9A 2017-05-08 2017-05-08 A kind of two-value convolutional neural networks processor and its application method Active CN107153873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710316252.9A CN107153873B (en) 2017-05-08 2017-05-08 A kind of two-value convolutional neural networks processor and its application method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710316252.9A CN107153873B (en) 2017-05-08 2017-05-08 A kind of two-value convolutional neural networks processor and its application method

Publications (2)

Publication Number Publication Date
CN107153873A CN107153873A (en) 2017-09-12
CN107153873B true CN107153873B (en) 2018-06-01

Family

ID=59794343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710316252.9A Active CN107153873B (en) 2017-05-08 2017-05-08 A kind of two-value convolutional neural networks processor and its application method

Country Status (1)

Country Link
CN (1) CN107153873B (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10839286B2 (en) * 2017-09-14 2020-11-17 Xilinx, Inc. System and method for implementing neural networks in integrated circuits
CN107657312B (en) * 2017-09-18 2021-06-11 东南大学 Binary network implementation system for speech common word recognition
CN108205704B (en) * 2017-09-27 2021-10-29 深圳市商汤科技有限公司 Neural network chip
CN109754061B (en) * 2017-11-07 2023-11-24 上海寒武纪信息科技有限公司 Execution method of convolution expansion instruction and related product
CN107977704B (en) 2017-11-10 2020-07-31 中国科学院计算技术研究所 Weight data storage method and neural network processor based on same
CN107967132B (en) * 2017-11-27 2020-07-31 中国科学院计算技术研究所 Adder and multiplier for neural network processor
KR20190066473A (en) * 2017-12-05 2019-06-13 삼성전자주식회사 Method and apparatus for processing convolution operation in neural network
CN108108811B (en) * 2017-12-18 2021-07-30 南京地平线机器人技术有限公司 Convolution calculation method in neural network and electronic device
CN109978148B (en) * 2017-12-28 2020-06-23 中科寒武纪科技股份有限公司 Integrated circuit chip device and related product
CN109993286B (en) * 2017-12-29 2021-05-11 深圳云天励飞技术有限公司 Sparse neural network computing method and related product
CN110110283A (en) * 2018-02-01 2019-08-09 北京中科晶上科技股份有限公司 A kind of convolutional calculation method
CN108829610B (en) * 2018-04-02 2020-08-04 浙江大华技术股份有限公司 Memory management method and device in neural network forward computing process
CN108647777A (en) * 2018-05-08 2018-10-12 济南浪潮高新科技投资发展有限公司 A kind of data mapped system and method for realizing that parallel-convolution calculates
CN110147873B (en) * 2018-05-18 2020-02-18 中科寒武纪科技股份有限公司 Convolutional neural network processor and training method
CN108681773B (en) * 2018-05-23 2020-01-10 腾讯科技(深圳)有限公司 Data operation acceleration method, device, terminal and readable storage medium
US11599785B2 (en) 2018-11-13 2023-03-07 International Business Machines Corporation Inference focus for offline training of SRAM inference engine in binary neural network
CN110059805B (en) * 2019-04-15 2021-08-31 广州异构智能科技有限公司 Method for a binary array tensor processor
CN110033086B (en) * 2019-04-15 2022-03-22 广州异构智能科技有限公司 Hardware accelerator for neural network convolution operations
CN110033085B (en) * 2019-04-15 2021-08-31 广州异构智能科技有限公司 Tensor processor
CN110046705B (en) * 2019-04-15 2022-03-22 广州异构智能科技有限公司 Apparatus for convolutional neural network
CN110263809B (en) * 2019-05-16 2022-12-16 华南理工大学 Pooling feature map processing method, target detection method, system, device and medium
CN110265002B (en) * 2019-06-04 2021-07-23 北京清微智能科技有限公司 Speech recognition method, speech recognition device, computer equipment and computer readable storage medium
CN111126579B (en) * 2019-11-05 2023-06-27 复旦大学 In-memory computing device suitable for binary convolutional neural network computation
CN111340208B (en) * 2020-03-04 2023-05-23 开放智能机器(上海)有限公司 Vectorization calculation depth convolution calculation method and device
CN112596912B (en) * 2020-12-29 2023-03-28 清华大学 Acceleration operation method and device for convolution calculation of binary or ternary neural network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005774A (en) * 2015-07-28 2015-10-28 中国科学院自动化研究所 Face relative relation recognition method based on convolutional neural network and device thereof
CN105354568A (en) * 2015-08-24 2016-02-24 西安电子科技大学 Convolutional neural network based vehicle logo identification method
CN105975931A (en) * 2016-05-04 2016-09-28 浙江大学 Convolutional neural network face recognition method based on multi-scale pooling

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005774A (en) * 2015-07-28 2015-10-28 中国科学院自动化研究所 Face relative relation recognition method based on convolutional neural network and device thereof
CN105354568A (en) * 2015-08-24 2016-02-24 西安电子科技大学 Convolutional neural network based vehicle logo identification method
CN105975931A (en) * 2016-05-04 2016-09-28 浙江大学 Convolutional neural network face recognition method based on multi-scale pooling

Also Published As

Publication number Publication date
CN107153873A (en) 2017-09-12

Similar Documents

Publication Publication Date Title
CN107153873B (en) A kind of two-value convolutional neural networks processor and its application method
CN107203808B (en) A kind of two-value Convole Unit and corresponding two-value convolutional neural networks processor
JP7065877B2 (en) Chip equipment and related products
CN208061184U (en) Vector processor unit
CN105930902B (en) A kind of processing method of neural network, system
CN107578095B (en) Neural computing device and processor comprising the computing device
EP3407266B1 (en) Artificial neural network calculating device and method for sparse connection
CN107578098A (en) Neural network processor based on systolic arrays
CN108009106A (en) Neural computing module
CN106951395A (en) Towards the parallel convolution operations method and device of compression convolutional neural networks
CN106875011A (en) The hardware structure and its calculation process of two-value weight convolutional neural networks accelerator
CN107918794A (en) Neural network processor based on computing array
CN107301456A (en) Deep neural network multinuclear based on vector processor speeds up to method
CN107704916A (en) A kind of hardware accelerator and method that RNN neutral nets are realized based on FPGA
CN108108811A (en) Convolutional calculation method and electronic equipment in neutral net
CN107862374A (en) Processing with Neural Network system and processing method based on streamline
WO2017163208A1 (en) In memory matrix multiplication and its usage in neural networks
CN106951962A (en) Compound operation unit, method and electronic equipment for neutral net
CN112084038B (en) Memory allocation method and device of neural network
CN107423816A (en) A kind of more computational accuracy Processing with Neural Network method and systems
CN110766127B (en) Neural network computing special circuit and related computing platform and implementation method thereof
CN110163356A (en) A kind of computing device and method
CN107085562A (en) A kind of neural network processor and design method based on efficient multiplexing data flow
CN108960414A (en) Method for realizing single broadcast multiple operations based on deep learning accelerator
CN108320018A (en) A kind of device and method of artificial neural network operation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant