CN107832841A

CN107832841A - The power consumption optimization method and circuit of a kind of neural network chip

Info

Publication number: CN107832841A
Application number: CN201711121900.1A
Authority: CN
Inventors: 廖裕民; 陈幸
Original assignee: Fuzhou Rockchip Electronics Co Ltd
Current assignee: Rockchip Electronics Co Ltd
Priority date: 2017-11-14
Filing date: 2017-11-14
Publication date: 2018-03-23
Anticipated expiration: 2037-11-14
Also published as: CN107832841B

Abstract

The present invention provides a kind of power consumption optimization method and circuit of neural network chip, a power domain is separately provided for each convolutional calculation Internet, a power domain is separately provided for each convolution algorithm unit, the data block treated in convolution matrix is all connected with a gate controlled clock unit by behavior unit, the power supply of each row of data block；The n row data blocks that convolution matrix is treated by matrix resolution unit are analyzed, analysis result controls convolutional calculation Internet power domain switch control unit, convolution unit power domain switch control unit and convolution unit clock switch control unit by power consumption control unit, so as to control each convolutional calculation Internet power domain, each convolution power domain or the on or off of each gate controlled clock unit.Single neuron processing unit and whole neural net layer are made multi-level power domain by the present invention, can dynamically be turned off according to demand respectively, so as to effectively reduce the power consumption consumed during convolutional neural networks circuit computing.

Description

The power consumption optimization method and circuit of a kind of neural network chip

Technical field

The present invention relates to chip technology field, the power consumption optimization method and circuit of more particularly to a kind of neural network chip.

Background technology

With the rise of AI industry, artificial intelligence special chip is also quickly developing.And current manual is intelligent One big problem of chip is precisely due to the complexity of deep learning neutral net, causes computing circuit very huge, this is resulted in Chip cost is high and chip power-consumption is high.If can be from the characteristic of deep learning, further reduction deep learning be artificial The cost and power consumption of intelligent chip are all significantly.

The information in this world is complicated, but the information of human brain processing is sparse.We directly can not possibly locate firmly The numerous and jumbled input of sense organ is managed, it is necessary to which the process of an information extraction, this process are referred to as abstract in human brain. Why DeepLearning propagates its belief on a large scale, exactly because it simulates being abstracted for human brain processing information to a certain extent Journey.In terms of Neuscience, neuroscientist also found the sparse activity of neuron.2001, Attwell et al. was based on On the observational learning of cerebral energy consumption, thus it is speculated that neuron coding work mode has openness and distributivity.2003 The neuron that Lennie et al. estimation brains are activated simultaneously only has 1~4%, further demonstrates that the openness of neuron work. In terms of signal, i.e., neuron is simultaneously only to the small part selective response of input signal, and a large amount of signals are by screen deliberately Cover, can so improve the precision of study, faster and better extract sparse features.It is openness be exactly herein map it is past Matrix has many 0 elements.Just because of this, openness during neural network computing is the big characteristic of one, and the present invention is just It is the openness matrix operation occurred for convolutional neural networks in calculating process, it is proposed that targetedly optimised power consumption side Method.This method can effectively reduce the power consumption consumed during convolutional neural networks circuit computing.

The content of the invention

The technical problem to be solved in the present invention, it is the power consumption optimization method and circuit that a kind of neural network chip is provided, To effectively reduce the power consumption consumed during convolutional neural networks circuit computing.

What the inventive method was realized in：A kind of power consumption optimization method of neural network chip, the neural network chip Including plural layer convolutional calculation Internet, each convolutional calculation Internet includes a plurality of convolution algorithm units, each convolution fortune The computing that unit is responsible for treating a full line data block corresponding to the convolution kernel height of convolution matrix is calculated, this treats that convolution matrix includes n Row data block is simultaneously stored in corresponding hidden layer matrix memory cell, and the power consumption optimization method is：

Step S1, a power domain is separately provided for each convolutional calculation Internet, is convolutional calculation Internet power domain, and Connect a convolution calculating network layer power domain switch control unit；

A power domain is separately provided for each convolution algorithm unit, is convolution power domain, and connects a convolution unit power supply Domain switch control unit；

The data block treated in convolution matrix is all connected with a gated clock list by behavior unit, the power supply of each row of data block Member, each gate controlled clock unit reconnect convolution unit clock switch control unit；

Step S2, the n row data blocks that convolution matrix is treated by matrix resolution unit are analyzed, and analysis result passes through one Power consumption control unit controls the convolutional calculation Internet power domain switch control unit, the convolution unit power supply domain switch control Unit processed and convolution unit clock switch control unit, so as to control each convolutional calculation Internet power domain, each convolution power supply Domain or the on or off of each gate controlled clock unit.

The n rows data block that the matrix resolution unit treats convolution matrix carries out analysis process and is：

(1) treat convolution matrix according to the size of convolution kernel to be progressively scanned, judge each data in a full line one by one Whether block is complete zero, if some data block is complete zero, clock can be closed by marking the data block；

(2) complete a full line data block judgement after, then to all data blocks in a full line whether generally complete zero Do and once judge, if explanation can integrally close the clock of a full line, be then noted for the computing one full line data block Convolution algorithm unit can close convolution power domain；

(3) whether the last data block for judging entirely to treat convolution matrix again is complete zero, if it is marks the whole convolution Calculating network layer power domain can close.

What circuit of the present invention was realized in：A kind of optimised power consumption circuit of neural network chip, the neural network chip Including plural layer convolutional calculation Internet, each convolutional calculation Internet includes a plurality of convolution algorithm units, each convolution fortune The computing that unit is responsible for treating a full line data block corresponding to the convolution kernel height of convolution matrix is calculated, this treats that convolution matrix includes n Row data block is simultaneously stored in corresponding hidden layer matrix memory cell；

The optimised power consumption circuit includes corresponding the power domain set control electricity with a plurality of convolutional calculation Internets Road, each power domain control circuit include matrix resolution unit, power consumption control unit, convolutional calculation Internet power supply domain switch control Unit processed, convolution unit power domain switch control unit, convolution unit clock switch control unit, convolutional calculation Internet power supply Domain, n convolution power domain and n gate controlled clock unit；

The matrix resolution unit connects the corresponding hidden layer matrix memory cell and the power consumption control list respectively Member；The power consumption control unit connects convolutional calculation Internet power domain switch control unit respectively, convolution unit power domain is opened Close control unit and convolution unit clock switch control unit；The convolutional calculation Internet power domain switch control unit connection The convolutional calculation Internet power domain；The convolution unit power domain switch control unit connects n convolution power domain respectively； The convolution unit clock switch control unit connects n gate controlled clock unit respectively, and n gate controlled clock unit corresponds to respectively to be connected Connect n row data blocks.

Further, the matrix resolution unit in each power domain control circuit is combined into a matrix resolution unit, Power consumption control unit in each power domain control circuit is combined into a power consumption control unit.

The invention has the advantages that：Single neuron processing unit and whole neural net layer are made multilayer by the present invention Secondary power domain, can dynamically it turn off according to demand respectively, so as to effectively reduce convolutional neural networks circuit computing mistake The power consumption consumed in journey.

Brief description of the drawings

The present invention is further illustrated in conjunction with the embodiments with reference to the accompanying drawings.

Fig. 1 is schematic block circuit diagram of the present invention.

Embodiment

The power consumption optimization method of the neural network chip of the present invention, the neural network chip include plural layer convolutional calculation net Network layers, each convolutional calculation Internet include a plurality of convolution algorithm units, and each convolution algorithm unit is responsible for treating convolution square The computing of a full line data block corresponding to the convolution kernel height of battle array, this treats that convolution matrix includes n rows data block and is stored in correspondingly Hidden layer matrix memory cell in, the power consumption optimization method is：

Wherein, the matrix resolution unit treats the n rows data block of convolution matrix and carries out analysis process and be：

Refer to shown in Fig. 1, there is provided neural network chip of the invention includes plural layer convolutional calculation Internet, Mei Gejuan Product calculating network layer includes a plurality of convolution algorithm units, and the convolution kernel that each convolution algorithm unit is responsible for treating convolution matrix is high The computing of a full line data block corresponding to degree, this treats that convolution matrix includes n rows data block and is stored in corresponding hiding layer matrix In memory cell；The neural network chip includes synapse input block, input layer convolution algorithm unit, convolution kernel；Often One convolution calculating network layer is additionally provided with an activation primitive arithmetic element, a pond processing unit and one hides layer matrix storage list Member；

The optimised power consumption circuit includes corresponding the power domain set control electricity with a plurality of convolutional calculation Internets Road, each power domain control circuit include matrix resolution unit, power consumption control unit, convolutional calculation Internet power supply domain switch control Unit processed, convolution unit power domain switch control unit, convolution unit clock switch control unit, convolutional calculation Internet power supply Domain, n convolution power domain and n gate controlled clock unit；In a specific embodiment, each power domain control circuit bag A matrix resolution unit and a power consumption control unit are included, but the invention is not restricted to this, can also be：Each power domain control Matrix resolution unit in circuit is combined into a matrix resolution unit, the power consumption control list in each power domain control circuit Member is combined into a power consumption control unit.

Wherein,

The synapse input block is responsible for the value of synapse collection being sent to the input layer convolution algorithm Unit；

The data that the input layer convolution algorithm unit is responsible for inputting synapse according to convolution kernel are carried out at convolution Reason, and after convolution is completed, convolution value is sent to the activation primitive arithmetic element；

The function arithmetic unit living carries out activation primitive computing to convolution value, due to the characteristic of activation primitive, now produces Raw data matrix has been openness matrix；

The pond processing unit is responsible for hiding to that will treat that convolution matrix is sent to after the matrix progress pond processing after activation Layer matrix memory cell is stored；The different matrixes that different convolution kernels are dealt are stored to different addresses, such as in figure Matrix A and matrix B；

The convolutional calculation Internet is responsible for the convolution algorithm of corresponding neutral net hidden layer, is parallel processing knot Structure, i.e., each convolution algorithm unit are responsible for treating the computing of a full line corresponding to the convolution kernel height of convolution matrix；

Convolution kernel has size, such as 4x4 convolution kernel, in 320x180 image, if to sweep a full line, that Convolution kernel 1 is just responsible for a full line as 320x4 in image, and convolution kernel 2 is just responsible in image the 5th row to eighth row, this A full line corresponding to the convolution kernel height of sample, the present invention in treat convolution matrix n row data blocks row be and convolution kernel height A corresponding full line；

The matrix resolution unit is responsible for analyzing the matrix of hidden layer successively, and convolution algorithm is being carried out to the matrix When, determine whether to close a certain data block clock, if the power domain of convolution algorithm unit corresponding to closing, or whether Close the power domain of whole neural net layer convolution circuit；

The power consumption control unit is responsible for when starting to carry out convolution algorithm to matrix to convolution unit power supply domain switch control Unit, convolution unit clock switch control unit and neutral net flood power domain switch control unit processed are controlled, to every The clock and power domain and flood neutral net power domain of individual convolution algorithm unit are finely controlled；

The convolution algorithm unit starts to carry out convolution algorithm to matrix after matrix resolution unit completes matrix analysis, In calculating process, clock and power supply of the power consumption control unit according to the output control that analysis marks to each convolution algorithm unit The shut-off in domain and the shut-off of flood neutral net power domain, reach the effect for saving power consumption；

Accordingly, the optimization process of optimised power consumption circuit of the invention is：

1st, the value that synapse gathers is sent to input layer convolution algorithm unit by synapse input block；Input layer The data that convolution algorithm unit is responsible for inputting synapse according to convolution kernel carry out process of convolution；After completing convolution, convolution Value is sent to the first convolution calculating network layer (i.e. the first hidden layer), that is, is sent to the first activation primitive arithmetic element and enters line activating letter Number computing；It is hidden to that will treat that convolution matrix is sent to first after the matrix progress pond processing after activation by the first pond processing unit again Layer matrix memory cell is hidden to be stored.

2 and then be responsible for analyzing the matrix of the first convolution calculating network layer successively by matrix resolution unit, still with Exemplified by 4x4 convolution kernel, concrete analysis process is:

Since the matrix upper left corner, judge whether first 4x4 data block is complete zero, if complete zero, then marks convolution First data block in the computing matrix of arithmetic element 1 can close clock, and then move right a data, then takes one 4x4 data blocks determine whether complete zero, if complete zero, then mark second number in the computing matrix of convolution algorithm unit 1 Clock can be closed according to block, by that analogy, last 4x4 data block of the row is determined always, has judged convolution algorithm list After whether first 1 last data block can close clock, then whether can be with to data block corresponding to whole convolution algorithm units 1 Overall closing clock, which is done, once to be judged, if all data blocks can integrally close clock, then convolution during the computing matrix Arithmetic element 1 can close power domain；Then again to judge that the method for convolution algorithm unit 1 is sentenced to convolution algorithm unit 2 It is disconnected, until convolution algorithm unit n completes to judge and whether mark, the last data block for judging whole matrix again are complete zero, such as Fruit can close for the power domain of complete zero whole convolutional calculation Internet.Then annotation results are sent to power consumption control unit, and By power consumption control unit by controlling convolutional calculation Internet power domain switch control unit, convolution unit power supply domain switch to control Unit and convolution unit clock switch control unit, so as to control each convolutional calculation Internet power domain, each convolution power domain Or the on or off of each gate controlled clock unit, so as to reach the purpose of optimization power consumption.

3rd, after the convolution algorithm for completing this layer, continue to start next layer of neural network computing, i.e. the first convolutional calculation net After network layers complete convolution, convolution results are sent to the second convolution calculating network layer, that is, are sent to the second activation primitive arithmetic element, so The second pond processing unit is being sent to afterwards, is continuing to start the computing of the second layer, the computing mode of each hidden layer can be with first Layer is consistent, full articulamentum to the last, completes result and judges.

As can be seen here, single neuron processing unit and whole neural net layer are made multi-level power supply by the present invention Domain, can dynamic turns off according to demand respectively, consumed so as to effectively reduce during convolutional neural networks circuit computing Power consumption.

Although the foregoing describing the embodiment of the present invention, those familiar with the art should manage Solution, the specific embodiment described by us are merely exemplary, rather than for the restriction to the scope of the present invention, are familiar with this The equivalent modification and change that the technical staff in field is made in the spirit according to the present invention, should all cover the present invention's In scope of the claimed protection.

Claims

1. a kind of power consumption optimization method of neural network chip, the neural network chip includes plural layer convolutional calculation Internet, Each convolutional calculation Internet includes a plurality of convolution algorithm units, and each convolution algorithm unit is responsible for treating the volume of convolution matrix The computing of a full line data block corresponding to product core height, this treats that convolution matrix includes n rows data block and is stored in corresponding hide In layer matrix memory cell, n is the individual data tile height for entirely treating convolution matrix height divided by convolution kernel processing, and its feature exists In：The power consumption optimization method is：

Step S1, a power domain is separately provided for each convolutional calculation Internet, is convolutional calculation Internet power domain, and connect One convolution calculating network layer power domain switch control unit；

A power domain is separately provided for each convolution algorithm unit, is convolution power domain, and connects a convolution unit power domain and opens Close control unit；

The data block treated in convolution matrix is all connected with a gate controlled clock unit by behavior unit, the power supply of each row of data block, often Individual gate controlled clock unit reconnects convolution unit clock switch control unit；

Step S2, the n row data blocks that convolution matrix is treated by matrix resolution unit are analyzed, and analysis result passes through power consumption control Unit processed controls the convolutional calculation Internet power domain switch control unit, the convolution unit power domain switch control unit And convolution unit clock switch control unit, so as to control each convolutional calculation Internet power domain, each convolution power domain or each The on or off of gate controlled clock unit.

A kind of 2. optimised power consumption circuit of neural network chip according to claim 1, it is characterised in that：

(1) treat convolution matrix according to the size of convolution kernel to be progressively scanned, judge that each data block is in a full line one by one No is complete zero, if some data block is complete zero, clock can be closed by marking the data block；

(2) after the judgement of a full line data block is completed, then whether generally complete 0 one is done to all data blocks in a full line Secondary judgement, if explanation can integrally close the clock of a full line, then it is noted for the volume of the computing one full line data block Product arithmetic element can close convolution power domain；

(3) whether the last data block for judging entirely to treat convolution matrix again is complete zero, if it is marks the whole convolutional calculation Internet power domain can close.

3. a kind of optimised power consumption circuit of neural network chip, the neural network chip includes plural layer convolutional calculation Internet, Each convolutional calculation Internet includes a plurality of convolution algorithm units, and each convolution algorithm unit is responsible for treating the volume of convolution matrix The computing of a full line data block corresponding to product core height, this treats that convolution matrix includes n rows data block and is stored in corresponding hide In layer matrix memory cell, it is characterised in that：

The optimised power consumption circuit includes corresponding the power domain control circuit set with a plurality of convolutional calculation Internets, often Individual power domain control circuit includes matrix resolution unit, power consumption control unit, convolutional calculation Internet power supply domain switch control list Member, convolution unit power domain switch control unit, convolution unit clock switch control unit, convolutional calculation Internet power domain, n Individual convolution power domain and n gate controlled clock unit；

The matrix resolution unit connects the corresponding hidden layer matrix memory cell and the power consumption control unit respectively；Institute State power consumption control unit and connect convolutional calculation Internet power domain switch control unit, the control of convolution unit power supply domain switch respectively Unit and convolution unit clock switch control unit；The convolutional calculation Internet power domain switch control unit connects the volume Product calculating network layer power domain；The convolution unit power domain switch control unit connects n convolution power domain respectively；The volume Product unit clock switch control unit connects n gate controlled clock unit respectively, and each gate controlled clock unit is connected respectively one Row data block.

A kind of 4. optimised power consumption circuit of neural network chip according to claim 3, it is characterised in that：Each electricity Matrix resolution unit in source domain control circuit is combined into a matrix resolution unit, the work(in each power domain control circuit Consumption control unit is combined into a power consumption control unit.