CN107153873B

CN107153873B - A kind of two-value convolutional neural networks processor and its application method

Info

Publication number: CN107153873B
Application number: CN201710316252.9A
Authority: CN
Inventors: 韩银和; 许浩博; 王颖
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2017-05-08
Filing date: 2017-05-08
Publication date: 2018-06-01
Anticipated expiration: 2037-05-08
Also published as: CN107153873A

Abstract

The present invention provides a kind of two-value convolutional neural networks processor, including：Data storage device to be calculated, for storing the convolution nuclear element of the element for treating convolved data of bi-level fashion and bi-level fashion；Two-value Convole Unit treats that corresponding element carries out two-value convolution operation in convolved data for the convolution nuclear element to the bi-level fashion and the bi-level fashion；Data scheduling device, for the convolution nuclear element to be treated, corresponding element is loaded into the two-value Convole Unit in convolved data with described；Pond makeup is put, and the result for being obtained to convolution carries out pond processing；And normalized device, for operation to be normalized to the result by pond.

Description

A kind of two-value convolutional neural networks processor and its application method

Technical field

The present invention relates to the storages and scheduling of data in being calculated for neural network model.

Background technology

With the development of artificial intelligence technology, it is related to the technology of deep neural network, especially convolutional neural networks near Development at full speed is obtained within several years, in image identification, speech recognition, natural language understanding, weather forecasting, gene expression, content Recommend to achieve with fields such as intelligent robots and be widely applied.

The deep neural network is construed as a kind of operational model, wherein comprising mass data node, per number It is connected according to node with other back end, the connection relation between each node is represented with weight.With deep neural network not Disconnected development, complexity are also improved constantly.

In order to weigh the contradiction between complexity and operation effect, in bibliography：Courbariaux M,Hubara I, Soudry D,et al.Binarized neural networks:Training deep neural networks with weights and activations constrained to+1or-1[J].arXiv preprint arXiv: It is proposed in 1602.02830,2016. and " two-value convolutional neural networks model " may be employed to reduce answering for traditional neural network Miscellaneous degree.In the two-value convolutional neural networks, weight, input data, output data in convolutional neural networks use " two Value form " approx represents its size by " 1 " and " -1 ", such as represents the numerical value more than or equal to 0 with " 1 ", and with " - 1 " represents less than 0 numerical value.By the above-mentioned means, the data bit width that operation is used in neutral net is reduced, thus greatly Reduce to degree required parameter capacity, cause two-value convolutional neural networks be particularly suitable for object end realize image identification, Augmented reality and virtual reality.

Generally use general computer processor runs deep neural network, such as central processing in the prior art Device (CPU) and graphics processor (GPU) etc..However, and there is no the application specific processors for two-value convolutional neural networks.It is general Computer processor computing unit bit wide be usually more bits, calculate binary neural network can generate the wasting of resources.

The content of the invention

Therefore, it is an object of the invention to overcome above-mentioned prior art the defects of, provides a kind of two-value convolutional neural networks Processor, including：

Data storage device to be calculated, for storing the element for treating convolved data of bi-level fashion and the volume of bi-level fashion Product nuclear element；

Two-value Convole Unit treats convolved data for the convolution nuclear element to the bi-level fashion and the bi-level fashion In corresponding element carry out two-value convolution operation；

Data scheduling device, for the convolution nuclear element to be treated in convolved data with described described in corresponding element loading Two-value Convole Unit；

Pond makeup is put, and the result for being obtained to convolution carries out pond processing；And

Normalized device, for operation to be normalized to the result by pond.

Preferably, according to the two-value convolutional neural networks processor, wherein the two-value Convole Unit, including：

XNOR, with the convolution nuclear element of the bi-level fashion and the bi-level fashion treat it is corresponding in convolved data Element is inputted as it；

Adding up device is inputted output XNOR described as it, for tiring out to output XNOR described Add, to export the result of two-value convolution operation；

Wherein, the adding up device includes OR and/or Hamming weight computing unit, wherein,

At least one input OR described is described XNOR of output；

At least one input of the Hamming weight computing unit is described XNOR of output.

Preferably, according to the two-value convolutional neural networks processor, wherein the data storage device to be calculated is also It is used for online to being obtained the convolution kernel by two-value conversion and/or treating that convolved data stores.

Preferably, according to the two-value convolutional neural networks processor, wherein further including：

Binaryzation device, for by the convolution kernel obtained and/or treating that convolved data is converted to bi-level fashion.

Preferably, according to the two-value convolutional neural networks processor, wherein being provided in the data scheduling device Register, for being loaded into the convolution nuclear element for needing to reuse when in use.

Preferably, the two-value convolutional neural networks processor according to above-mentioned any one, in the data to be calculated The element and the convolution nuclear element that convolved data is treated described in storage device store in a manner that figure layer is interlocked.

Preferably, according to the two-value convolutional neural networks processor, the institute in the data storage device to be calculated The element for treating convolved data calculated is participated in successively when stating element when convolved data according to the size and convolution operation of convolution kernel And it stores.

Preferably, according to the two-value convolutional neural networks processor, the institute in the data storage device to be calculated The storage mode satisfaction for stating the element for treating convolved data and/or the convolution nuclear element is following one or more：

It is stored according to the convolution kernel and the matrix distributing order for treating convolved data；

Convolution kernel and/or treat that the element in the matrix of convolved data in same position, different passages continuously stores In continuous multiple storage units；

It whole elements in same convolution kernel under same weight and/or same treats in convolved data for carrying out convolution operation Submatrix in whole elements storage in the storage device in continuous multiple storage units.

Also, the present invention also provides a kind of two-value convolutional neural networks processors for described in above-mentioned any one Application method, including：

1) it will treat that convolved data is loaded into register in the data storage device to be calculated；

2) will treat to need in convolved data and the data storage device to be calculated described in the register and institute It states and treats that the element of convolved data execution multiplication is loaded into two-value Convole Unit, to carry out two-value convolution operation；

3) output to the two-value Convole Unit is put by pond makeup and carries out pond processing；

4) operation is normalized in the output put by the normalized device to pond makeup.

And a kind of computer readable storage medium, wherein being stored with computer program, the computer program is being held Above-mentioned method is used to implement during row.

Compared with prior art, the advantage of the invention is that：

Provide the hardware configuration for being used to perform convolution algorithm by simplified and the two-value convolution god based on the structure Through network processing unit and corresponding computational methods, by reducing the bit wide of the data calculated in calculating process, reach and carry High operation efficiency, the effect for reducing memory capacity and energy consumption.

Description of the drawings

Embodiments of the present invention is further illustrated referring to the drawings, wherein：

Fig. 1 is the schematic diagram of the multilayered structure of neutral net；

Fig. 2 is the schematic diagram that convolutional calculation is carried out in two-dimensional space；

Fig. 3 is the hardware architecture diagram of two-value Convole Unit according to an embodiment of the invention；

Fig. 4 is the hardware architecture diagram of the two-value Convole Unit of another embodiment according to the present invention；

Fig. 5 is the hardware architecture diagram of the two-value Convole Unit of another embodiment according to the present invention；

Fig. 6 a~6c show hardware configuration signal of the present invention using the two-value Convole Unit of Hamming weight computing element Figure；

Fig. 7 is the convolution kernel i.e. weight 0 and weight 1 to multichannel and to treat convolution number according to one embodiment of present invention According to the schematic diagram stored；

Fig. 8 is the schematic diagram of the structure of two-value convolutional neural networks processor according to an embodiment of the invention；

Fig. 9 is the signal calculated according to one embodiment of present invention using two-value convolutional neural networks processor Figure；

Figure 10 is to be shown according to still another embodiment of the invention using what two-value convolutional neural networks processor was calculated It is intended to.

Specific embodiment

It elaborates with reference to the accompanying drawings and detailed description to the present invention.

It is a kind of mathematical modulo for copying biologically nerve synapse coupling structure to cause neutral net in Computer Subject Type can realize the various functions such as machine learning, pattern-recognition using the application system being made of neutral net.

The neutral net is divided into multilayer in structure, and Fig. 1 shows a kind of schematic diagram of neutral net multilayered structure.Ginseng Examine Fig. 1, the first layer in the multilayered structure is input layer, last layer is output layer, remaining each layer is hidden layer.It is using During the neutral net, original image, i.e. input layer figure layer are inputted to input layer, (" image ", " figure in the present invention Layer " refers to pending initial data, is not only the image obtained by shooting photo of narrow sense), by neutral net Each layer the figure layer inputted is processed and result is input in next layer of neutral net, it is and most defeated at last Go out the output of layer as the result exported.

As described in the text, in order to tackle the increasingly complicated structure of neutral net, the prior art proposes a kind of two-value The concept of convolutional neural networks.As its name suggests, the computing of two-value convolutional neural networks includes carrying out " volume to the data inputted Product " operation, and it further includes the operations such as " pond ", " normalization ", " binaryzation ".

It is operated as one important in two-value convolutional neural networks,.The meter of " convolution " will be discussed in detail by Fig. 2 below Calculation process.

Fig. 2 show in two-dimensional space be to size using the convolution kernel that size is 3 " two-values " for multiplying 35 multiply 5 " two The image of value " carries out the calculating process of convolution.With reference to figure 2, first against image 1-3 rows from top to bottom, from left to right Corresponding element and each element multiplication in convolution kernel is respectively adopted in each element in the range of 1-3 row：For example, The element (being expressed as " convolution kernel (1,1) ") arranged using the 1st row the 1st in convolution kernel is multiplied by the element that the 1st row the 1st arranges in image (being expressed as " image (1,1) ") obtains 1 × 1=1, and the convolution kernel (1,2) arranged using the 1st row the 2nd in convolution kernel is multiplied by image The element image (1,2) of 1st row the 2nd row obtains 1 × 0=0, and similarly calculating convolution kernel (1,3) is multiplied by image (1,3) and obtains 1 × 1=1, and so on be calculated 9 results and by this 9 results addeds obtain 1+0+1+0+1+0+0+0+1=4 using as The element that the 1st row the 1st arranges in convolution results, convolution results (1,1).Similarly, calculate convolution kernel (1,1) be multiplied by image (1,2), Convolution kernel (1,2) is multiplied by image (1,3), convolution kernel (1,3) is multiplied by image (Isosorbide-5-Nitrae), convolution kernel (2,1) is multiplied by image (2,2) ..., And so on calculate 1+0+0+1+0+0+0+1=3 using as convolution results (1,2).Using aforesaid way can calculate as Size illustrated in fig. 2 multiplies 3 convolution results matrix for 3.

The convolution results as illustrated in FIG. 2 acquired are input into the two of next layer by buffering and binary conversion treatment It is worth in convolutional neural networks.

The examples discussed show " multiplying " and " adding " included by the calculating process of convolution or the computings of " cumulative summation ".

Inventors realized that based on characteristic specific to the multiplying of two-value so that " multiplying " in two-value convolution algorithm can It is replaced by " exclusive or non-exclusive " computing, i.e., can be completed merely with logic element XNOR in the prior art must be using multiplying The computing of musical instruments used in a Buddhist or Taoist mass ability achievable " multiplying ".As can be seen that it is more simple compared to traditional convolution based on the convolution process of two-value, It is without carrying out the multiplying that such as " 2 × 4 " are so complicated, when carrying out the computing of " multiplying ", if carrying out the member of multiplying It for the result that " 0 " is then obtained just is " 0 " to have any one in element, if carry out multiplying whole elements be " 1 " if institute The result of acquisition is just " 1 ".

It will can be replaced in the present invention using XNOR gating elements by a specific example to be described in detail below The principle of multiplier.

When actually using the convolution of binaryzation, two-value can be carried out to the non-two-value numerical value z in image and convolution kernel first Change is handled, i.e.,：

Wherein, the numerical value z two-values equal to 0 be will be greater than and turn to " 1 " to be used for the symbol " 1 " of convolution algorithm in representative graph 2, it will Numerical value z two-values less than 0 turn to " -1 " to be used for the symbol " 0 " of convolution algorithm in representative graph 2.

" exclusive or non-exclusive " computing is carried out to the value of the image Jing Guo binary conversion treatment and convolution kernel, i.e.,There are following several situations：

Input A	Input B	Export F	Symbol
				-1	-1	1	1
-1	1	-1	0
				1	-1	-1	0
1	1	1	1

It can be seen that by above-mentioned truth table when the numerical value for binaryzation carries out the computing of " multiplying ", use may be employed Multiplier is replaced in logic element XNOR for performing " exclusive or non-exclusive " computing.And as known in the art, the complexity of multiplier Degree is far above logic element XNOR.

Therefore, inventor thinks to replace the multiplier in conventional processors by using logic element XNOR, can be with The processor that two-value convolutional neural networks are greatly reduced uses the complexity of device.

In addition, inventor is also realized that based on characteristic specific to the add operation of two-value so that above-mentioned two-value convolution fortune " adding " in calculation can be replaced by inclusive-OR operation, you can just to replace being used in the prior art using logic element OR Adder.This is because, G=F can be expressed as to the result of the inclusive-OR operation of output progress XNOR above-mentioned₁+F₂...+ F_n, and the result G of final output single-bit, wherein F_kRepresent k-th of the output of XNOR, n represents that its output is used as OR Input XNOR doors sum.

Above-mentioned analysis based on inventor, the present invention provides a kind of two-value convolutional neural networks processors of can be used for Two-value Convole Unit using the multiplying based on two-value, the characteristic of add operation, simplifies and is used to perform volume in processor Thus the composition of the hardware of product computing improves the speed of convolution algorithm, reduces the overall energy consumption of processor.

Fig. 3 shows the hardware configuration of two-value Convole Unit according to an embodiment of the invention.It as shown in figure 3, should Two-value Convole Unit includes 9 XNOR and 1 OR, and all output of 9 XNOR is used as input OR described. When carrying out convolution algorithm, n is calculated respectively by each XNOR₁×w₁、n₂×w₂…n₉×w₉, to obtain output F₁~F₉；OR By F₁~F₉As its input, first element G in convolution results is exported₁.Similarly, using same convolution kernel, for figure Other regions as in are calculated, and can be obtained the size of the other elements in convolution results, no longer be repeated herein.

In the embodiment illustrated in figure 3, the calculating concurrently multiplied using multiple XNOR, improves convolutional calculation Rate.It should be appreciated, however, that the hardware configuration of the two-value Convole Unit can also be deformed in the present invention, below It will be illustrated by other several embodiments.

Fig. 4 shows the hardware configuration of two-value Convole Unit according to still another embodiment of the invention.As shown in figure 4, The two-value Convole Unit includes 1 XNOR, 1 OR and a register, and the register is for storing OR defeated Go out and value that it is stored is used as one of input OR described, and another input OR described is described The output of XNOR.When carrying out convolution algorithm, according to the propulsion at moment, respectively first to the 9th moment by n₁And w₁、n₂ And w₂、…n₉And w₉As the input of XNOR, correspondingly F are exported from XNOR corresponding to each moment₁、F₂…F₉Using as OR One of input of door, and using the result exported in previous moment from OR stored in register as OR Another input.For example, as XNOR output F₁(its size is equal to n₁×w₁) when, the symbol to prestore is read out from register " 0 " using it with F1 together as the input of OR, and from OR output F₁；F is exported when XNOR₂(its size is equal to n₂×w₂) When, F is read out from register₁By itself and F₂Together as the input of OR, and F are exported from OR₁+F₂, and so on until Output is for F₁~F₉Accumulation result G₁。

In embodiment illustrated in fig. 4, by increasing to XNOR and the reusability of OR, reduce used element Quantity, and be that only for tool there are two the OR doors of input terminal, hardware complexity is lower used by the program.

Fig. 5 shows the hardware configuration of two-value Convole Unit according to still another embodiment of the invention.The embodiment with Embodiment illustrated in fig. 4 is similar, only with one XNOR, one OR and a register, the difference is that in Fig. 5 Input XNOR middle is stored into the register that can store multidigit result simultaneously, and each result in register by with Make the input of OR.Embodiment in the application method and Fig. 4 of the embodiment is similar, is that XNOR are multiplexed, no Same is that Fig. 5 can preserve each moment XNOR results exported deposits in the register of multidigit result simultaneously, and Obtain whole F₁~F₉Afterwards, by the computing of OR progress "or" to export G₁。

In the embodiment provided in Fig. 3 of the present invention, 4,5, OR are employed to realize the function of " adding " or " cumulative ", And input OR described causes to be finally single bit value from the result of OR outputs both from the output of XNOR, It is possible thereby to simplify calculating process, increase arithmetic speed.The hardware configuration that the program is provided is particularly suitable for for two-value god Application specific processor through network, this is because binary neural network represents the weight in neutral net using numerical value " 1 " and " -1 " And data, in neural computing process there are a large amount of multiplication and add operation, reducing that calculating operation numerical digit is wide can be effectively Reduce computation complexity.

However, realize that the scheme of the function of " adding " or " cumulative " is that single-bit calculates using OR due to above-mentioned, because And a degree of error can be introduced.In this regard, the present invention also provides a kind of optional scheme, i.e., using Hamming weight Computing Meta Part replaces the OR doors as shown in Fig. 3,4,5 to realize the function of " adding " or " cumulative ".Fig. 6 a~6c are shown with the Chinese The hardware configuration of bright weight computing element, in the optional scheme, Hamming weight computing element makees the output of XNOR It is inputted for it, the data of logical one, i.e. Hamming weight in output institute output data.The scheme and the above-mentioned side using OR Case is similar, can equally achieve the effect that simplified calculating process, and the program can also realize accurately sum operation.

Inventor find, based on above-mentioned two-value Convole Unit provided by the present invention for " multiply " each time and it is " cumulative " count It calculates, operated is the data of individual bit, and what is exported by the two-value Convole Unit is also individual bit Data, and such feature is particularly suitable for participating in convolution fortune using " the staggered data mapping mode of figure layer " to store and dispatch Obtained data are calculated and calculated, data loading number is reduced so as to reach, the locality of data is made full use of to improve data The effect of recycling rate of waterused.

" the staggered data mapping mode of figure layer " in the present invention refers to the direction according to passage (Channel) It by convolution kernel and treats that each element in convolved data is stored successively into every a line of storage device, i.e., counts in the storage device It is stored in the way of according to interlocking by figure layer, adjacent two data elements are from different passages rather than same passage. As shown in fig. 7, in the present invention, the convolution kernel in same z-axis corresponds to same " passage " with the element for treating convolved data, that is, has The element for having identical z values belongs to same passage.

The data calculation is specifically described to be more vivid, Fig. 7 is with the convolution kernel weight 0 of (x, y, z)=2*2*2 With convolution kernel weight 1, with (x, y, z)=2*3*2 treat convolved data exemplified by, elaborate it is provided by the invention be suitable for two-value The staggered data mapping mode of figure layer of convolutional neural networks.With reference to figure 7, the element in weight 0 and weight 1 is according to the element institute The spatial position at place is respectively divided into four groups：Wherein, four groups of weights of weight 0 are respectively A_z、B_z、C_zAnd D_z, as shown in the figure, Z is 0,1；Four groups of weights of weight 1 are respectively a_z、b_z、c_zAnd d_z, as shown in the figure, z is 0,1.

With reference to figure 7, according to one embodiment of present invention, in the following manner may be employed to store convolution kernel weight 0, convolution Core weight 1 and treat each element in convolved data.

In Fig. 7, for convenience of explanation, according to the size and step size of each convolution kernel, by the three of weight 0 and weight 1 Element in dimension matrix is divided into two two-dimensional matrixes according to residing passage, such as weight 0 is divided by A₀、B₀、C₀、D₀ The two-dimensional matrix that is formed and by A₁、B₁、C₁、D₁The two-dimensional matrix formed；Similarly, the three-dimensional matrice of convolved data will be treated In element be divided into two two-dimensional matrixes according to residing passage, i.e., by X₀、Y₀、Z₀、P₀、Q₀、R₀The two-dimensional matrix formed With by X₁、Y₁、Z₁、P₁、Q₁、R₁The two-dimensional matrix formed.

When storing convolution kernel weight 0, in the continuous storage unit of a line of weight storage device, weight is stored successively Elements A in 0₀、A₁、B₀、B₁、C₀、C₁、D₀And D₁, totally 8 bits.As can be seen that in the memory unit, adjacent two elements Come from different passages, such as A each other₀And A₁Respectively from different passages, A₁And B₀Also from different passages, according to Such mode is the storage mode described above to interlock according to figure layer.

When storing convolution kernel weight 1, in the continuous storage unit of other a line of the weight storage device, deposit successively Store up a of the element in weight 1₀、a₁、b₀、b₁、c₀、c₁、d₀And d₁, totally 8 bits.With the storage mode of weight 0 similarly, it is adjacent Two elements be similarly from different passages.

In weight storage device, positioned at the weight element of identical x-axis and identical y-axis (such as A₀And A₁) it is used as adjacent member Element stores successively, and the next group of weight with identical x-axis and y-axis is stored after the element of identical x-axis and identical y-axis stores Element (such as B₀And B₁), and so on, other weight element storages in convolution kernel are finished.

It, can be according to the data for participating in calculating successively when the size and convolution operation of convolution kernel when storage is when convolved data Element is stored.With reference to convolutional calculation illustrated in fig. 2 rule, it is known that need first against A_z X_z、B_z Y_z、C_z P_zAnd D_zQ_z It is calculated, then for A_z Y_z、B_z Z_z、C_z Q_zAnd D_zR_zIt is calculated.Therefore, each element of convolved data is treated in storage When, in addition to the storage mode to interlock according to figure layer, the rule of convolutional calculation should also be considered, so as to which storage participates in calculating successively Data element, such as by X_z、Y_z、P_z、Q_zIt is stored in the continuous storage unit of a row or column, by Y_z、Z_z、Q_z、R_zIt is stored in In addition in the continuous storage unit of a row or column.

With reference to figure 7, in a continuous storage unit of row of data storage device, X is stored successively₀、X₁、Y₀、Y₁、P₀、P₁、 Q₀、Q₁.In the continuous storage unit of an other row of data storage device, Y is stored successively₀、Y₁、Z₀、Z₁、Q₀、Q₁、R₀、R₁。

With store convolution kernel element analogously, in data storage device, positioned at the number of identical x-axis and identical y-axis According to element (such as X₀And X₁) be divided into one group and stored successively as adjacent element, it is deposited in the element of identical x-axis and identical y-axis The next group of weight element with identical x-axis and y-axis (such as Y is stored after storage₀And Y₁), and so on, it will treat convolved data It has been stored in matrix with other data elements in convolution kernel size comparable submatrix (such as being marked in the figure 7 with dotted line) Finish.

Although in example illustrated in fig. 7, convolution kernel and treat that the port number of convolved data is 2, it should be understood that It is more than 2 convolution kernel and the storage mode for treating that convolved data can also be according to figure layer staggeredly for port number in the present invention.

Preferably, in storage, continuous multiple storage units in storage device are filled up successively, i.e., according to convolution kernel and treat The matrix distributing order of convolved data, is stored in the storage device.

Preferably, by convolution kernel and/or treat that the element in the matrix of convolved data in same position, different passages connects Continuous multiple storage units in the storage device are stored continuously.

Preferably, by whole elements under same weight in same convolution kernel and/or it is same treat in convolved data be used for into Whole elements in the submatrix of row convolution operation are stored in continuous multiple storage units in the storage device.

In Fig. 7 for convenience of explanation, weight storage device and data storage device are arranged to storage different from each other to fill It puts, it should be understood that the weight storage device and the data storage device can be separately positioned on difference by the present invention Memory on, can also be stored in the different zones of same memory, such as uniformly be stored in data to be calculated storage On device.

Also, it will be appreciated by those skilled in the art that the described storage mode of above-described embodiment both can be prior to two It is worth the calculating process of neutral net, completes offline outside the processor, can also complete, such as locate online on a processor It manages and completes in the on piece chip of device or stored in a manner of computer program, and the calculating is performed by processor Machine program.

It stores each convolution kernel using the above-mentioned staggered data mapping mode of figure layer according to the present invention and treats convolution Each element in data, it is possible to reduce the reusability for being loaded into number, improving data of data.

It is also understood that using above-mentioned " the staggered data mapping mode of figure layer " come store convolution nuclear element and with it is described The purpose for treating corresponding element in convolved data is to facilitate reading, quickly and easily to determine the input of two-value Convole Unit. Therefore, all storage locations that can be achieved in the convolution nuclear element and the storage for treating corresponding element in convolved data The mode of mapping relations is established between position, can be used to store the convolution nuclear element and with the convolved data for the treatment of Element.

For example, when the length of continuous storage unit is less than 8 bit, such as only 4 bits, to the A in weight 0₀、A₁、 B₀、B₁、C₀、C₁、D₀And D₁Folding storage is carried out, i.e., stores A in continuous storage unit₀、A₁、B₀、B₁, and another C is stored in the continuous storage unit of row₀、C₁、D₀And D₁。

It is using the convolution nuclear element stored through the above way and is treating that the respective element in convolved data is rolled up During product computing, it is suitable for performing by the way of single-instruction multiple-data stream (SIMD) (SIMD), i.e., will be stored by individual instructions Multiple data are loaded into arithmetic element.It will be detailed in subsequent embodiment for the method that stored data are loaded into and calculated It is thin to introduce.Mode in this way, it is possible to reduce the bit wide of computing unit, the hardware spending for reducing computing unit.

The comprehensive two-value Convole Unit being hereinbefore previously mentioned and convolution kernel and the storage mode for treating element in convolved data And method of calling, it can provide that a kind of computing unit position money is few, hardware configuration is relatively easy, for two-value convolutional neural networks Application specific processor.

With reference to figure 8, according to one embodiment of present invention, a kind of two-value convolutional neural networks processor 10 is provided, wrapped It includes：

Data scheduling device 101, data storage device to be calculated 102, two-value Convole Unit 103, pond makeup put 104, return 105, binaryzation device 106 is put in one makeup.

Wherein, data storage device 102 to be calculated is used to store the convolution nuclear element and bi-level fashion of bi-level fashion Treat convolved data.As described in the text, the storage mode should can reflect the element of the convolution kernel for convolutional calculation With treating the mapping relations in convolved data between corresponding element.For example, convolution kernel member is stored in a manner of interlocking according to figure layer Element and the element for treating convolved data for participating in calculating successively when convolved data and the size and convolution operation according to convolution kernel Convolved data is treated to store.Specific storage mode may be referred to previous embodiment.

Data scheduling device 101, for according to the mapping relations, the convolution nuclear element to be treated convolved data with described In corresponding element be loaded into the two-value Convole Unit.For example, register is set in the data scheduling device 101, and During use the convolution nuclear element reused will be needed to be loaded into register.

Two-value Convole Unit 103 treats convolution number for the convolution nuclear element to the bi-level fashion and the bi-level fashion Corresponding element carries out two-value convolution operation in.The two-value Convole Unit 103 may be employed as arbitrary in previous embodiment A kind of structure, is realized to convolution nuclear element and is treated the computing multiplied of corresponding element in convolved data by XNOR, and is passed through OR or Hamming weight computing element realization adding up to the computing acquired results by multiplying.

Pond makeup puts 104, and the result for being obtained to convolution carries out pond processing.

Normalized device 105, for the result by pond being normalized operation with the parameter of accelerans network Training process.

In some embodiments of the invention, can be obtained online at data source for the convolution of two-value convolution operation Core and/or treat convolved data.It is set to the data of binaryzation since the data obtained differ, in the described embodiment, also Binaryzation device 106 in two-value convolutional neural networks processor 10 can be set, the data obtained are converted into two-value Form.Also, the data by two-value conversion can also be stored online by data storage device 102 to be calculated.

It should be appreciated that for being stored offline in data to be calculated in advance before convolutional neural networks calculating is carried out Convolution kernel is stored in device 102 and/or treats the embodiment of convolved data, it is not necessary in two-value convolutional neural networks processor 10 Binaryzation device 106 is set.

Below with reference to Fig. 9 and Figure 10, it is discussed in detail by specific embodiment and is rolled up using two-value as shown in Figure 8 The process that product neural network processor 10 is calculated.

Fig. 9 is shown according to one embodiment of present invention, is counted using above-mentioned two-value convolutional neural networks processor The process of calculation.Fig. 9 employs the symbol identical with Fig. 7 to state convolution nuclear element and treat convolved data element, for example, X₀、 X₁、A₀、A₁Deng.Wherein, the whole convolution for storing word with one in weight storage matrix and carrying out storing one row and being in same passage Nuclear element, as shown in the figure, described, it is wide to store word bit is 8 bits, and each element occupies 1 bit.Similarly, convolved data matrix is treated In one storage word bit wide be equally 8 bits.In addition, in fig.9, the bit wide of XNOR and register group is 2 bits. In calculating process, it then follows the principle that the data in same convolution kernel add up in same accumulator.Its calculating process is as follows：

Step 1, high two (the i.e. X that will be treated in convolved data₀And X₁) be loaded into register group；

It is understood with reference to the convolution principle figure shown in figure 2, in fig.9, treats that the element in convolved data will be by repeatedly Use X₀And X₁, to calculate A in a subsequent step₀X₀、B₀X₀、A₁X₁、B₁X₁, it is therefore desirable to 2 ratios that will be treated in convolved data In special data deposit register.

Step 2, by the front two weighted data (A for treating the first row in convolved data and weight matrix in register group₀With A₁) be loaded into XNOR；

Step 3, OR computings are performed to the result of calculation of XNOR by addition unit or calculates Hamming weight；

As described in the text, OR computings or calculating Hamming weight can achieve the effect that " adding ", in this step, Ke Yiji Calculation draws A₀X₀And A₁X₁。

It step 4, will be in addition unit result of calculation input value accumulator 0；

The accumulator 0 is added up for the data in same convolution kernel.

Step 5, by the front two weighted data (a for treating the second row in convolved data and weight matrix in register group₀With a₁) be loaded into XNOR；

Step 6, addition unit performs OR computings to the result of calculation of XNOR or calculates Hamming weight, and a is calculated₀X₀ And a₁X₁。

Step 7, addition unit result of calculation is inputted into accumulator 1, and so on, by X₀And X₁It is deposited successively with weight In storage array the front two weight of eight rows is specified to be calculated；

Step 8, in abovementioned steps similarly, by the 3rd treated in convolved data and the 4th (Y₀And Y₁) be loaded into In register group；

Step 9, by treat the first row in convolved data and weight matrix the 3rd in register group and the 4th weight Data (B₀And B₁) be loaded into XNOR；

Step 10, OR computings are performed to the result of calculation of XNOR by addition unit or calculates Hamming weight；

Step 11, it is hereafter similar to step 7 with step 5 by addition unit result of calculation input value accumulator 1, by b₀ And b₁Deng positioned at same column data successively with Y₀And Y₁It is calculated；

Step 12, when accumulator being obtained the data for exporting figure layer, it is single that accumulator result of calculation is loaded into buffering Member；

Step 13, after buffer cell obtains output figure layer partial data, output is treated that convolved data is loaded into Chi Huadan Member carries out pondization operation；

Step 14, pondization operation result of calculation is loaded into batch normalization unit and carries out batch normalization operation；

Step 15, the normalized result of calculation of batch is loaded into binarization unit and carries out binarization operation.

As can be seen that the storage location according to the convolution nuclear element by the way of as described in the text is waited to roll up with described Existing mapping relations between the storage location of corresponding element in volume data can quickly determine the phase for needing to carry out convolution Element is answered to be inputted in XNOR.

When storage unit bit wide is less than the matrix bit wide shown in Fig. 9, the matrix can also be rolled over using piecemeal Folded mode stores convolution nuclear element and treats convolved data element, as shown in Figure 10.Similarly, Figure 10 is also used and Fig. 7 In identical symbol state convolution nuclear element and treat convolved data element, difference lies in when needing to read to belong to treat with Fig. 9 It also needs to consider the position that the data are stored in the register bank during the same data of convolved data data in the block.

By the embodiment of the present invention as can be seen that the present invention is based on binaryzation computing characteristic, provide by simplification The hardware configuration by performing convolution algorithm and the two-value convolutional neural networks processor based on the structure and it is corresponding based on Calculation method by reducing the bit wide of the data calculated in calculating process, reaches and improves operation efficiency, reduces memory capacity And the effect of energy consumption.

Also, the present invention carries out data storage and calculating using the staggered data mapping mode of figure layer, simplifies convolution meter The process for treating convolved data and convolution Nuclear Data is transferred during calculation, reduce hardware spending and improves data user rate.

It should be noted that each step introduced in above-described embodiment is all not necessary, those skilled in the art Appropriate choice, replacement, modification etc. can be carried out according to actual needs.

It should be noted last that the above embodiments are merely illustrative of the technical solutions of the present invention and it is unrestricted.On although Text is described in detail the present invention with reference to embodiment, it will be understood by those of ordinary skill in the art that, to the skill of the present invention Art scheme is modified or replaced equivalently, and without departure from the spirit and scope of technical solution of the present invention, should all be covered at this Among the right of invention.

Claims

1. a kind of two-value convolutional neural networks processor, including：

Data storage device to be calculated, for storing the element for treating convolved data of bi-level fashion and the convolution kernel of bi-level fashion Element；

Two-value Convole Unit treats phase in convolved data for the convolution nuclear element to the bi-level fashion and the bi-level fashion The element answered carries out two-value convolution operation；

Data scheduling device, for the convolution nuclear element to be treated, corresponding element is loaded into the two-value in convolved data with described Convole Unit；

Normalized device, for operation to be normalized to the result by pond.

2. two-value convolutional neural networks processor according to claim 1, wherein the two-value Convole Unit, including：

XNOR, corresponding element in convolved data is treated with the convolution nuclear element of the bi-level fashion and the bi-level fashion As its input；

Adding up device is inputted output XNOR described as it, for adding up to output XNOR described, with Export the result of two-value convolution operation；

At least one input OR described is described XNOR of output；

3. two-value convolutional neural networks processor according to claim 1, wherein the data storage device to be calculated is also It is used for online to being obtained the convolution kernel by two-value conversion and/or treating that convolved data stores.

4. two-value convolutional neural networks processor according to claim 3, wherein further including：

5. two-value convolutional neural networks processor according to claim 1, wherein being provided in the data scheduling device Register, for being loaded into the convolution nuclear element for needing to reuse when in use.

6. the two-value convolutional neural networks processor according to any one in claim 1-5, in the data to be calculated The element and the convolution nuclear element that convolved data is treated described in storage device store in a manner that figure layer is interlocked.

7. two-value convolutional neural networks processor according to claim 6, the institute in the data storage device to be calculated The element for treating convolved data calculated is participated in successively when stating element when convolved data according to the size and convolution operation of convolution kernel And it stores.

8. two-value convolutional neural networks processor according to claim 7, the institute in the data storage device to be calculated The storage mode satisfaction for stating the element for treating convolved data and/or the convolution nuclear element is following one or more：

Convolution kernel and/or treat that the element in the matrix of convolved data in same position, different passages is consecutively stored in In continuous multiple storage units；

It whole elements in same convolution kernel under same weight and/or same treats in convolved data for carrying out the son of convolution operation Whole elements in matrix are stored in continuous multiple storage units in the storage device.

9. a kind of application method of two-value convolutional neural networks processor for as described in any one in claim 1-8, Including：

2) will treat to need to treat with described in convolved data and the data storage device to be calculated described in the register The element that convolved data performs multiplication is loaded into two-value Convole Unit, to carry out two-value convolution operation；

10. a kind of computer readable storage medium, wherein being stored with computer program, the computer program is used when executed In realization method as described in claim 9.