CN109472361A

CN109472361A - Neural network optimization

Info

Publication number: CN109472361A
Application number: CN201811344189.0A
Authority: CN
Inventors: 张跃进; 胡勇; 喻蒙
Original assignee: Zhongxiang Bo Qian Mdt Infotech Ltd
Current assignee: Zhongxiang Bo Qian Mdt Infotech Ltd
Priority date: 2018-11-13
Filing date: 2018-11-13
Publication date: 2019-03-15
Anticipated expiration: 2038-11-13
Also published as: CN109472361B

Abstract

This application involves a kind of Neural network optimization, which includes: default modeling parameters, and the modeling parameters include network parameter and hardware parameter；Neural network energy consumption model is constructed based on the modeling parameters；Neural network time model is constructed based on the modeling parameters；Biobjective scheduling is carried out to the neural network energy consumption model and neural network time model.The application carries out the time to neural network from the angle of the hardware calculation process of network and energy consumption models, it is successively predicted in time, energy consumption, while the leading modeling parameters of analysis time, energy consumption expense, time and energy consumption biobjective scheduling are carried out to neural network to improve neural network model by improving modeling parameters, array dividing method and caching dividing method.

Description

Neural network optimization

Technical field

This application involves artificial neural network technology field, especially a kind of Neural network optimization.

Background technique

With the emergence of nerual network technique, the neural network hardware for adapting to different application scene comes into being.Nerve net The Inference Forecast ability of network is strong but computationally intensive, therefore how to promote the speed of neural computing and reduce neural network energy consumption Become key.

In the related technology, neural network training process and deduction process suffer from the acceleration of network query function more urgent Demand.Network training is all completed using GPU beyond the clouds substantially, and the parallel method and communication means of different hardware can shadows significantly The speed of neural metwork training is rung, therefore mainly promotes mind from improving parallelization computing capability and reducing the angle of communication overhead Through network query function speed.The energy consumption of neural network, which is segmented into, calculates two aspects of energy consumption and memory access energy consumption.It is fixed by output (output stationary), weight fix the different data multiplex mode of (weight stationary), can be effectively Reduce the memory access energy consumption of neural network.

But the above-mentioned research for neural computing all lays particular emphasis on one of low energy consumption or quickening calculating speed, does not have It, can be time-consuming reducing in view of there may be certain contradictions for the acceleration of neural network in complicated application environment and Energy-aware It can be to sacrifice speed as cost or with energy, and the time for reducing neural computing may generate more energy consumptions Problem.

Summary of the invention

Low energy consumption is all laid particular emphasis on for the research of neural computing in the related technology to overcome at least to a certain extent Or accelerate one of calculating speed, do not account in complicated application environment that the acceleration of neural network and Energy-aware can There can be certain contradiction, neural computing can be reduced to sacrifice speed as cost or with energy reducing energy time-consuming Time may lead to the problem of more energy consumptions again, the application provides a kind of Neural network optimization.

Default modeling parameters, the modeling parameters include network parameter and hardware parameter；

Neural network energy consumption model is constructed based on the modeling parameters；

Neural network time model is constructed based on the modeling parameters；

Biobjective scheduling is carried out to the neural network energy consumption model and neural network time model.

Further, the energy consumption formulas is E=V*T*e, and V is to need read-write/calculating data volume, and T is to need Repetitive read-write/calculating number, e are unit energy consumption.

Further, the specific energy consumption includes read-write energy consumption and calculating energy consumption.

Further, the building neural network energy consumption model includes: successively to be modeled to neural network energy consumption, is obtained Convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption model, pond layer energy consumption model.

Further, the neural network time model includes time overhead, and the time overhead includes the convolutional layer time Expense and full articulamentum time overhead.

Further, the time overhead calculation method is time overhead Tz=max (T^IO,T^operation), T^IOFor read-write Time, T^operationTo calculate the time.

It is further, described that biobjective scheduling is carried out to the neural network energy consumption model and neural network time model, Include:

Caching segmentation is carried out to the neural network time model using TM calculation process；

Array segmentation is carried out to the neural network energy consumption model using AP calculation process；

It chooses the caching segmentation and meets the requirements the small corresponding the smallest array dividing method of energy consumption of condition.

Further, the selection caching segmentation meets the requirements the small corresponding the smallest array segmentation of energy consumption of condition Method, comprising:

Set task time Tmax；

It calculates and uses T the time required to TM calculation process caching segmentation neural network time model_TM；

It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption model_AP；

Choose the smallest required energy consumption E_AP0And and calculate the smallest required energy consumption E_AP0Required time T_AP0；

Compare Tmax and T_AP0Size, according to comparison result judge set task time whether meet the requirements；

If setting task time meets the requirements, compare Tmax and T_TMSize, according to comparison result export to the nerve Network energy consumption model and neural network time model carry out biobjective scheduling result.

It is further, described to be judged to set whether task time meets the requirements according to comparison result, comprising:

If Tmax > T_AP0, determine that setting task time meets the requirements；

Otherwise determine that task can not be completed within a specified time, and task time need to be reset.

Further, described to be exported according to comparison result to the neural network energy consumption model and neural network time model Carry out biobjective scheduling result, comprising:

If Tmax < T_TM, modeling parameters are adjusted, array segmentation is carried out to the neural network energy consumption model again；

It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption model_AP’, and choose least energy consumption E_AP0’Corresponding modeling parameters, array dividing method are biobjective scheduling result；

Otherwise, choosing the corresponding modeling parameters of TM calculation process and caching dividing method is biobjective scheduling result.

The technical solution that embodiments herein provides can include the following benefits:

The application Neural network optimization includes: default modeling parameters, constructs neural network based on the modeling parameters Energy consumption model and neural network time model；Bi-objective is carried out to the neural network energy consumption model and neural network time model Optimization.The application carries out the time to neural network from the angle of the hardware calculation process of network and energy consumption models, to time, energy consumption Successively prediction, while the leading modeling parameters of analysis time, energy consumption expense, by improving modeling parameters, changing array segmentation side Method and caching dividing method carry out time and energy consumption biobjective scheduling to neural network to improve neural network model.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The application can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the application Example, and together with specification it is used to explain the principle of the application.

Fig. 1 is a kind of flow chart for Neural network optimization that the application one embodiment provides.

Specific embodiment

The present invention is described in detail below with reference to the accompanying drawings and embodiments.

As shown in Figure 1, the Neural network optimization of the present embodiment includes:

S1: default modeling parameters, the modeling parameters include network parameter and hardware parameter；

S2: neural network energy consumption model is constructed based on the modeling parameters；

S3: neural network time model is constructed based on the modeling parameters；

S4: biobjective scheduling is carried out to the neural network energy consumption model and neural network time model.

In neural network structure, there are three kinds of convolutional layer (CONV), full articulamentum (FC), pond layer (POOL) different knots Structure corresponds respectively to feature extraction, three kinds of feature connection, Feature Compression tasks.The number of plies of heterogeneous networks structure, channel type are not Together, and the feature of different layers is also not quite similar.For example, the data volume of computationally intensive, the full articulamentum of convolutional layer is big, it is this to money The lack of uniformity that source requires needs special hardware structure to design to realize.

It further include RNN (Recognition with Recurrent Neural Network, Recurrent Neural Networks) layer in some neural network structures Or CNN (convolutional neural networks, Convolutional Neural Networks) layer is to complete particular task.

Hardware for example chooses Thinker chip, and Thinker chip is controlled by PE array, on piece storage system, finite state Machine and IO, decoder composition.

The modeling parameters include network parameter and hardware parameter, and the network parameter and hardware parameter are as shown in table 1.

1 modeling parameters table of table

Energy consumption and time are carried out by building building neural network energy consumption model and neural network time model, and simultaneously Biobjective scheduling realizes the neural network algorithm of the lowest energy consumption under Given task time conditions by adjusting modeling parameters.

As a kind of implementation of the invention optional, the energy consumption formulas is E=V*T*e, V be need to read and write/ The data volume of calculating, T are to need repetitive read-write/calculating number, and e is unit energy consumption.

It needs read-write/calculating data volume V and repetitive read-write/calculating number T is needed to can use network parameter and hard Part parameter is calculated；And for specific energy consumption e, the memory access energy consumption of different storage tiers is different, it is therefore desirable to successively be divided Analysis.

The memory access energy consumption as shown in table 2 of different storage tiers

The memory access of the different storage tiers of table 2 normalizes energy consumption

If data are transmitted between two kinds of storing mechanisms, select energy consumption higher as this data transmission energy consumption. For example, energy consumption is determined as 200 energy consumption units when data are passed on piece caching SRAM from DRAM.

As optional a kind of implementation of the invention, the specific energy consumption includes reading and writing energy consumption and calculating energy consumption.

As optional a kind of implementation of the invention, the building neural network energy consumption model includes: to neural network Energy consumption is successively modeled, and convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption model, pond layer energy consumption are obtained Model.

As optional a kind of implementation of the invention, the specific energy consumption E0 calculation method includes:

Convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption model, pond layer energy consumption model are constructed respectively；

Calculate separately the convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption model, pond layer energy consumption The energy consumption E0 of the corresponding unit read-write/calculating of model.

Energy consumption calculation is as follows in convolutional layer energy consumption model:

For sample complete for one, a convolutional layer, energy consumption has been calculated are as follows:

E1=E1^IO+E1^operation (1)

E1^IOEnergy consumption, E1 are read and write for convolutional layer^operationTo calculate energy consumption.

Read and write energy consumption E1^IO=E1^weightIO+E1^inputIO+E1^outputIO (2)

E1^weightIOFor weight energy consumption, E1^inputIOTo input energy consumption, E1^outputIOTo export energy consumption.

Calculate separately weight energy consumption E1^weightIO, input energy consumption E^1inputIO, export energy consumption E1^outputIOIt is as follows:

(1) weight energy consumption E1^weightIO

E1^weightIO=E1^DRAM_buffer+E1^buffer_PE (3)

E1^DRAM_bufferFor the energy consumption for reading caching from DRAM, E1^buffer_PEFor from the energy consumption for being cached to PE array.

N_wFor the weight total amount of unit convolutional layer, RD_{DRAM_CONV}The wheel number of weight is written to caching for DRAM.

The weight total amount N of unit convolutional layer_w=γ Ch_{out_i}K_i ²Ch_{in_i}(5), DRAM reads 1 byte data to caching Energy consumption e^DRAM_bufferFor example, 200 energy consumption units.

Weight and the number for repeating importing are related with the size that on piece caches.If on piece caches the institute that can store this layer There is weighted data, then each data need to only import once, subsequent memory access pertains only to the interaction of caching with PE array；Conversely, When PE needs to reuse weighted data, needs weighted data to be used to be not stored in caching, need to carry out duplicate Access.In a PE array computation period, in available two dimension output figureA output point, therefore input data needs PointSecondary to imported into PE array, i.e., in the case where buffer memory capacity is inadequate, same weight repeats total time imported Number isTherefore,

Number due to needing repetitive read-write isCache the energy consumption that 1 byte data is read to PE array e^buffer_PEFor example, 6 energy consumption units, therefore,

(2) energy consumption E1 is inputted^inputIO

Input data needs to reach PE array from DRAM by caching, register, and needs to complete transmission in PE array. Therefore, E1^inputIO=E1^DRAM_buffer+E1^{buffer_register}+E1^register_PE+E1^PE_tran (8)。

E1^DRAM_bufferFor the energy consumption from DRAM to caching, E1^{buffer_register}For from the energy consumption for being cached to register, E1^register_PEFor from register to the energy consumption of PE array, E1^PE_tranThe number of first PE of PE left side array is input to for each The energy consumption transmitted according to dextrad.

It should be noted that the total amount of input may be different from the points of input feature vector figure due to the presence of filling.If It is filled with sky, then is not filled, original input size constancy.If being filled with identical size, even if input and output figure size is not Become, original input size can also change, and the length and width after variation are used in subsequent calculating.

Wherein, H_{in_i}W_{in_i}Ch_{in_i}α is the input data total amount of unit convolutional layer, S_{CONvdatabuffer}For size of data.By In Thinker, preferentially by input point, all multiplexing is finished, therefore input point is imported into buffer without repeating.

Register file (Register file) makes an addition between caching and PE array, can be to avoid the weight of input data Multiple input.Expression allowsLine number is inputted required for row PE is parallel, S is lateral step-length, longitudinal step-length For H_ui+S_i-K_i.During sliding laterally, data only read primary, therefore longitudinally need the number slided to bee^{buffer_register}Energy consumption for buffer to register unit byte transmission is 6.

Number, Ch are imported for the repetition of input data_{in_i}K_i ²*H_{out_i}W_{out_i}To need the input data inputted Product of the sum for the number of the points input point corresponding with each output point of output, e^register_PEFor register to PE array list The energy consumption of bit byte transmission is 2.

Since input data is multiplexed, each is input to the data of first PE of PE left side array, will carry out dextrad biography It passs.So the repetition memory access number of each input point isRead the energy consumption e an of byte data^PE_tran For example, 2.

(3) energy consumption E1 is exported^outputIO

The total amount of output data is β Ch_{out_i}H_{out_i}W_{out_i}.For output, output needs first to carry out the biography between PE It passs, then in incoming caching, the data being unable to store in caching imported into DRAM to go to wait again and calculate next time.

Therefore, E1^outputIO=E1^PEout_tran+E1^PE_buffer+E1^buffer_DRAM (13)

Each PE calculates the output point generated and requires first to be transmitted in the PE of the leftmost side.A line PE is considered, in a week Number can be generated in the calculating of phase isOutput point, these point require to be transferred in the PE of the leftmost side, visited altogether It depositsIt is secondary.Therefore, average each output point needs memory accessIt is secondary.

Output data total amount is β Ch_{out_i}H_{out_i}W_{out_i}, the transmission energy consumption of unit byte is e^PEout_tran, for example, 2 Energy consumption unit.

Calculate energy consumption

Ch need to be exported altogether in one layer of CONV is calculated_{in_i}H_{in_i}W_{in_i}A, each point needs to carry out K_i ²Ch_{in_i}It is secondary multiply-add Operation.

In some embodiments, for specified sample, it is all for completing energy consumed by all convolution layer operations The summation of convolutional layer consumption energy.

In some embodiments, AP calculates stream and TM is calculated in stream, and each sample is all serial computing, therefore multiple samples When energy consumption the energy consumption of different samples need to be only added.

Energy consumption calculation is as follows in full articulamentum energy consumption model:

E2=E2^IO+E2^operation (18)

Full articulamentum is multisample in Thinker chip, successively calculates.Calculate the read-write of a batch FC operation Energy consumption, formula are as follows:

E2^IO=E2^weightIO+E2^inputIO+E2^outputIO (19)

(1)E2^weightIOCalculation method include:

FC layers of weight read-write energy consumption is divided into two parts:

E2^weightIO=E2^DRAM_buffer+E2^buffer_PE (20)

Similarly with convolutional layer, it can obtain:

Wherein, BS is the size criticized.

(2)E2^inputIOCalculation method include:

Different from CONV layers, FC layers of input is not required to the register by inputting design reuse for convolution, therefore E2^inputIOIt is made of three parts:

E2^inputIO==E2^DRAM_buffer+E2^buffer_PE+E2^PE_tran (24)

Wherein, e^PEinput_tranEnergy consumption, for example, 2 energy consumption units are transmitted for unit.

(3)E2^outputIOCalculation method include:

E2^outputIO=E2^PE_out_tran+E2^PE_buffer+E2^buffer_DRAM (29)

Wherein:

E2^PE_buffer=FC_i+1BS*β*e^PE_buffer (31)

E2^buffer_DRAM=ReLU (FC_i+1BS-S_FCdatabuffer)*β*e^buffer_DRAM (32)

Wherein e^{PEoutput_tran}、e^PE_bufferAnd e^buffer_DRAMFor in each accumulation layer unit transmission energy consumption, respectively 2,6, 200 energy consumption units.

When in network there are it is FC layers multiple when, it is the sum of all FC layers of read-write energy consumptions that the FC of batch, which calculates total energy consumption,.

Calculate energy consumption E2^operationCalculation method include:

In calculating process, FC is shared_i+1BS output point, the multiply-add number of operations that each output point needs carry out are FC_i.Energy consumption needed for carrying out unit byte multiply-add operation is e^operation, then the energy consumption of the part are as follows: E2^operation=FC_i+ ₁BS*FC_i*αγe^operation (33)。

Energy consumption calculation is as follows in RNN layers of energy consumption model:

When in network there are it is RNN layers multiple when, it is the sum of every layer of calculating energy consumption that the RNN of batch, which calculates total energy consumption,.

Using the series model of as LSTM structure the most universal in RNN, following RNN energy consumption analysis is mainly for LSTM Structure is analyzed.

In RNN, the calculation process of a LSTM unit includes:

f_i=σ (w_xfx_t+w_hfh_t-1+b_f)

i_t=σ (w_xix_t+w_hih_t-1+b_i)

o_t=σ (w_xox_t+w_hoh_t-1+b_o)

Thinker chip calculates RNN layers using two kinds of PE of common PE and super PE.Wherein, W_{x_gate}x_{t_gate}+W_{h_} _gateh_{t-1_gate}+b_gatePartial calculating is the principle of FC layers of calculating, is calculated in common PE.After the completion of calculating, data It imports the calculating that super PE carries out RNN-gating: calculating sigmoid or tanh function, obtain all kinds of gate function vectors, and pass through It crosses and multiply-add c is calculated_tAnd h_t。

List RNN one, layer batch of energy consumption is calculated first.In view of RNN is split as two parts FC and RNN-gating, meter Calculate the energy consumption that a batch RNN operation the number of iterations is Iteration, the expression formula of RNN layers of energy consumption are as follows:

(1) calculating of RNN-FC energy consumption includes:

This two-part energy consumption calculation and FC layers of energy consumption calculation principle are essentially identical.The difference of itself and FC layers of energy consumption calculation Place is that the dimension needs of weight matrix, input vector and output vector are adjusted.For RNN, FC layers of calculating Unified Form are as follows:

FC_t=W_{x_gate}x_{t_gate}+W_{h_gate}h_{t-1_gate}+b_gate (35)

Wherein, gate can take i/f/o/g, respectively correspond four doors in LSTM.In the concrete realization, W_{x_gate}With W_{h_gate}A W can be integrated by lateral connection_gate, x_{t_gate}And h_{t-1_gate}Longitudinally connected it can be integrated into x_gate, therefore W_gate Dimension is O_{len_i}*(I_len-i+O_{len_i}), x_gateDimension is 1* (I_len-i+O_{len_i}), the parameter FC with FC layers_iAnd FC_i+1It corresponds to Come, it may be assumed that

FC_i=I_len-i+O_{len_i}

FC_i+1=O_{len_i}

It is analyzed so as to incorporate into FC layers of energy consumption model.It is worth noting that, for each LSTM unit, FC operation needs to be repeated four times, to the number answered the door.Details are not described herein.

(2) calculating of gating memory access energy consumption includes:

As described above, can be mainly divided into two kinds of calculating in gating calculating: (a) simple tanh/sigmoid letter Number calculates；(b) element calculates the multiplication of elementThereforeIt can further indicate that are as follows:

Calculating include:

Tanh/sigmoid is mainly used for calculating 4 gate functions and c_t, that is, need to be O to vector length_{len_i}5 groups of data Operate by the tanh/sigmoid of element, the required sum operated is BS*5O_{len_i}.The calculating carries out in PE, number Buffer is imported from DRAM according to needs, then imports PE from buffer, is finally exported.It is arrived assuming that carrying out unit byte DRAM The energy consumption of buffer transmission is e^DRAM_buffer, the energy consumption that unit byte is transmitted from buffer to PE is e^buffer_PE, unit byte from PE to buffer, the energy consumption from buffer to DRAM are respectively e^buffer_DRAMAnd e^PE_buffer.Each data at most carry out once This operation, and interaction of the data between DRAM and buffer only can not accommodate all input point Shi Caifa in buffer size It is raw, therefore following formula can be summed up:

Calculating mainly include c_tCalculating and h_tCalculate two parts.For h_tFor, it is only necessary to carry out once by The multiplication of element operates, therefore the reading of data pertains only to total 2O_{len_i}A input data, writing out for data are related to O_{len_i}A data. Calculating with tanh/sigmoid above is similarly:

For c_tFor, also need to carry out the operation of one-accumulate in addition to multiplication.For an output data, need Multiplication operation twice is carried out, then two products are summed, therefore one of product needs to read into buffer and works as In, then import the register completion summation in PE.Similarly, the energy consumption of the part can be obtained are as follows:

The calculating of Gating calculating energy consumption

For gating, in fact it could happen that three kinds of operations be respectively: tanh/sigmoid, multiplication, addition.Assuming that three Kind operates corresponding unit byte operation energy consumption are as follows: e^tanh/sigmoid、e^multiply、e^addValue is for example 1, can calculate three The number that kind operation needs to be implemented is respectively as follows: 5O_{len_i}、3O_{len_i}、O_{len_i}.For this purpose, can calculateAre as follows:

Energy consumption calculation is as follows in the layer energy consumption model of pond:

For pondization operation, since data-reusing is not present between different layers, different samples, it can examine first Consider the energy consumption of single layer, single sample.Equally, the energy consumption that pondization operates is divided into read-write energy consumption and calculates energy consumption discussion.

Establish the energy consumption model of pondization operation.If necessary to calculate multilayer, multisample as a result, need to only carry out summation behaviour Make.

E4=E4^IO+E4^operation (41)

Pond layer reads and writes energy consumption E4^IOCalculating

Thinker supports maximum pondization operation, and it acts as the height and width that reduce output figure, and maintain Port number is constant.Its input figure and output figure have dimension height relationships:Wide association:Then total pond block number mesh are as follows:

X=H_{out_i}*W_{out_i}*Ch_{in_i} (42)

The reading and writing data energy consumption of pondization operation can be divided into input data and output data two types, and each type needs It completes from the read-write of DRAM, caching, PE array interaction.Since data are imported without repeating, the energy consumption model of this part is more simple It is single, it may be assumed that

Wherein:

For the reading energy consumption of input data,It is that output data writes out energy Consumption.

The calculating for calculating energy consumption involves a need to the total data number of input.It is worth noting that, the total data number of input is same It not equal to the size of input feature vector figure, and needs from the size of output figure and the size of core is counter is pushed away, it may be assumed that E4^operation =p_i*p_i*H_{out_i}*W_{out_i}*Ch_{out_i}*e^operation (44)。

As optional a kind of implementation of the invention, the neural network time model includes time overhead, when described Between expense include convolutional layer time overhead and full articulamentum time overhead.

As optional a kind of implementation of the invention, the time overhead calculation method is time overhead Tz=max (T^IO,T^operation), T^IOFor access time, T^operationTo calculate the time.

As a kind of implementation of the invention optional, the neural network time model include: convolutional layer time model, Full articulamentum time model.

For Thinker chip, the expense of time depends primarily on the biggish convolution operation of calculation amount, data volume and Quan Lian Connect operation (including the full attended operation in full articulamentum and RNN).Due to " blocking " effect of time, time-consuming less RNN- Gating and pondization operation are without carrying out model construction.In the following, carrying out the time in terms of convolutional layer and full articulamentum two Modeling analysis.

As optional a kind of implementation of the invention, wrapped in the convolutional layer time model and full articulamentum time model Time overhead is included, the time overhead calculation method is time overhead Tz=max (T^IO,T^operation), T^IOFor access time, T^operationTo calculate the time.

Convolutional layer time model includes:

The calculating of access time includes:

Due to the process that Thinker is calculated, convolutional layer is that single layer, single sample meter calculate one by one, therefore analysis one first Time needed for one layer of convolution algorithm in a sample.

The read-write of data is needed by several levels such as DRAM, caching, PE.After being cached in data importing tablet, data Transmission time quickly, therefore this part can be ignored.So T^IOData will be reduced in the interaction of DRAM and caching The time of middle consumption.

In Thinker chip, input, output data and weight are imported into two different cachings from DRAM , this two-part bandwidth is different, can adaptively be made a change according to the framework of neural network.Input, output number According to shared bandwidth BW_dataconv, the bandwidth of weighted data is BW_weightconv.In a layer network, input, the total amount of output data For H_{in_i}W_{in_i}Ch_{in_i}α+H_{out_i}W_{out_i}Ch_{out_i}β, weighted data total amount areIt therefore can be with Calculate separately out the access time of input, the access time of output data and weighted data.It is one of larger in the two to be For T^IO, therefore:

Calculate time T^operationCalculating include:

For one layer of convolutional network of a sample, its calculating needs PE array to carry out the operations of several wheels.In a wheel In calculating, all output points of this layer can be had been calculated in several output points of parallel computation by PE array after several wheels.Assuming that complete It is round that the wheel number needed is calculated at a layer network_convlayer, the time that each round needs is t_roundconv, then calculating the time It can indicate are as follows:

T^operation=round_convlayer*t_roundconv (46)。

Since the every row of Thinker is multiplexed input data, H_{out_i}W_{out_i}It is to amount to the total line number for needing to input, and every wheel can It calculatesCapable data, therefore need in totalInput all could have been calculated for wheel；And for one group of input Data, weight similarly needs to inputIt is secondary.

It is contained in each output pointSecondary multiply-add operation, therefore the time for carrying out a wheel calculating consuming is this Time required for the multiply-add operation that a little sequences carry out:

Therefore, the time of one convolutional layer of calculating is in a sample are as follows:

It the operation time of all convolutional layers in one sample, only need to be by layer-by-layer temporal summation；If calculating a batch In all samples convolutional layer operation time, only need on the basis of a sample total time multiplied by batch size BS.

Articulamentum time model includes: entirely

Since the different samples of a batch in FC layers are parallel computations, the one of a batch data is directly analyzed Layer FC operation time.

The calculating of access time only considers interaction of the data between DRAM and caching.Assuming that input, output data divide The bandwidth matched is BW_{data_FC}, the bandwidth that weight is distributed is BW_{weight_FC}, and inputoutput data total degree is (FC_iα* RD_{DRAM_in_FC}+FC_i+1β) * BS, it is FC that weight, which always inputs number,_iFC_i+1γ*RD_{DRAM_weight_FC}, access time formula:

The calculating for calculating the time includes: round_FClayerAnd t_{round_FC}:

Wherein:

It is understood that all FC layers of operation times in a batch, it only need to be by layer-by-layer temporal summation.

The hardware parameter of model is also variable, therefore can be by adjusting same net under test of heuristics different hardware parameter The operation time of network and energy consumption, by adjusting various hardware parameters (such as bandwidth, cache size, the size of PE array and channel Number), the possible available optimal solution under hardware area limited case minimizes time and energy from the angle of hardware design Consumption, while time, energy consumption are successively predicted analysis time, energy consumption expense leading modeling parameters, therefore can be to hardware Design plays certain help.

It is described to the neural network energy consumption model and neural network time as optional a kind of implementation of the invention Model carries out biobjective scheduling, comprising:

Thinker chip supports two kinds of calculation process of AP and TM.The full name of AP is Array-Partitioning, i.e. array Segmentation；The full name of TM is Time-Multiplexing, that is, is time-multiplexed.The difference of the two is, same network is pushed away Disconnected process, computation sequence are different.For TM calculation process, network deduction executes one by one according to the sequence of layer, and the same time is all PE array all calculates same layer.But AP is calculated and is flowed, the division of array has carried out some adjustment, therefore in same time PE Array may calculate multilayer simultaneously.A plurality of types of layers can be calculated simultaneously since AP calculates stream, this substantially balances convolution fortune The feature that calculation amount is high, the small and full attended operation calculation amount of data volume is small, data volume is big is calculated, so as to shorten the time.But Due to TM calculation process data-reusing often, this cause Data duplication transmit number it is few, so that energy consumption is smaller.

As optional a kind of implementation of the invention, it is small described right that the selection caching segmentation meets the requirements condition The smallest array dividing method should be able to be consumed, comprising:

Set task time Tmax；

It is described to be judged to set whether task time meets according to comparison result as optional a kind of implementation of the invention It is required that, comprising:

If Tmax > T_AP0, determine that setting task time meets the requirements；

It is described to be exported according to comparison result to the neural network energy consumption mould as optional a kind of implementation of the invention Type and neural network time model carry out biobjective scheduling result, comprising:

During actually executing, in order to achieve the purpose that be performed simultaneously different layers, not only adjustment will be adjusted executable PE number, also to adjust the distribution condition of caching.Since the total cache size and array size of Thinker chip are given, and It only carries out longitudinally cutting being to compare reasonable manner, it is therefore desirable to which the parameter of searching loop includes distributing to the array columns of CONVData buffer storage space size S_{CONvdatabuffer}With weight spatial cache size S_{convweightbuffer}。

It is known that the time overhead of network query function has " hiding effect " in modeling process." hiding effect " refers to certain One layer of time, which calculates, can be split as calculating time and data transmission period, and should be the maximum in the two the total time of this layer Value.And data transmission period takes the larger value in data transmission period and weight transmission time.For further search time Expense needs to study the leading factor of every layer of time loss.For this purpose, by taking LRCN typical case's neural network as an example, the typical nerve of LRCN Network operates FC, and two kinds of calculation process of TM and AP0 are all that the time overhead of weight read-write is big, occupies an leading position；And for CONV operation is basically to occupy an leading position the calculating time.Full articulamentum calculation amount is small and data volume is big.

Reading and writing data time of TM calculation process, weight access time and to calculate time difference very big.The calculating of convolutional layer Time overhead is far longer than reading and writing data time, weight read-write；In this case, when PE array is busy with the one section very long of calculating In time, data transmission is but in idle state；And when carrying out FC calculating, PE array but goes out because weight read-write time-consuming is too long Show idle.And in the time layer-by-layer expense of AP0, reading and writing data time, weight access time and calculating time have carried out preferably Balance.Therefore, total time can be effectively reduced by the way that the time of the especially small convolution algorithm of original accounting is substantially turned up.FC layers Time overhead read and write by weight it is leading, if to shorten the FC time certainly will need to increase the caching ratio for distributing to FC weight or Person reduces total time by increasing caching total amount or increasing the method for data transfer bandwidth.

The hardware parameter of neural network energy consumption model and neural network time model is also variable, therefore can pass through tune The operation time of consolidated network and energy consumption under whole test of heuristics different hardware parameter can play certain side to the design of hardware It helps.By adjusting various hardware parameters (such as bandwidth, cache size, the size of PE array and port number), may it is available Optimal solution under hardware area limited case minimizes time and energy consumption from the angle of hardware design.

Therefore caching segmentation is carried out to the neural network time model using TM calculation process；Use AP calculation process pair The neural network energy consumption model carries out array segmentation；It chooses the caching segmentation and meets the requirements the small corresponding energy consumption of condition most Small array dividing method can provide help for the design of later hardware, and the condition in the different task time may be implemented Under have different optimal solutions.

In the present embodiment, from the angle of the hardware calculation process of network is carried out to neural network the time and energy consumption models, Time, energy consumption are successively predicted, while the leading modeling parameters of analysis time, energy consumption expense, by improving modeling parameters, array point Segmentation method and caching dividing method carry out time and energy consumption biobjective scheduling to neural network to improve neural network model.Into One step, by successively modeling, different optimal solutions is had under conditions of the different task time.

It is understood that same or similar part can mutually refer in the various embodiments described above, in some embodiments Unspecified content may refer to the same or similar content in other embodiments.

It should be noted that term " first ", " second " etc. are used for description purposes only in the description of the present application, without It can be interpreted as indication or suggestion relative importance.In addition, in the description of the present application, unless otherwise indicated, the meaning of " multiple " Refer at least two.

Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the application includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be by the application Embodiment person of ordinary skill in the field understood.

It should be appreciated that each section of the application can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..

Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.

It, can also be in addition, can integrate in a processing module in each functional unit in each embodiment of the application It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.

Storage medium mentioned above can be read-only memory, disk or CD etc..

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is contained at least one embodiment or example of the application.In the present specification, schematic expression of the above terms are not Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any One or more embodiment or examples in can be combined in any suitable manner.

Although embodiments herein has been shown and described above, it is to be understood that above-described embodiment is example Property, it should not be understood as the limitation to the application, those skilled in the art within the scope of application can be to above-mentioned Embodiment is changed, modifies, replacement and variant.

It should be noted that the present invention is not limited to above-mentioned preferred forms, those skilled in the art are of the invention Other various forms of products can be all obtained under enlightenment, however, make any variation in its shape or structure, it is all have with The identical or similar technical solution of the application, is within the scope of the present invention.

Claims

1. a kind of Neural network optimization characterized by comprising

Neural network time model is constructed based on the modeling parameters；

2. Neural network optimization according to claim 1, which is characterized in that the energy consumption formulas is E=V* T*e, V are to need read-write/calculating data volume, and T is to need repetitive read-write/calculating number, and e is unit energy consumption.

3. Neural network optimization according to claim 2, which is characterized in that the specific energy consumption includes read-write energy consumption With calculating energy consumption.

4. Neural network optimization according to claim 1, which is characterized in that the building neural network energy consumption model Include: successively to be modeled to neural network energy consumption, obtains convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption Model, pond layer energy consumption model.

5. Neural network optimization according to claim 1, which is characterized in that the neural network time model includes Time overhead, the time overhead include convolutional layer time overhead and full articulamentum time overhead.

6. Neural network optimization according to claim 5, which is characterized in that when the time overhead calculation method is Between expense Tz=max (T^IO,T^operation), T^IOFor access time, T^operationTo calculate the time.

7. Neural network optimization according to claim 1, which is characterized in that described to the neural network energy consumption mould Type and neural network time model carry out biobjective scheduling, comprising:

8. Neural network optimization according to claim 7, which is characterized in that described to choose the caching segmentation satisfaction The small corresponding the smallest array dividing method of energy consumption of requirement condition, comprising:

Set task time Tmax；

If setting task time meets the requirements, compare Tmax and T_TMSize, according to comparison result export to the neural network Energy consumption model and neural network time model carry out biobjective scheduling result.

9. Neural network optimization according to claim 8, which is characterized in that described to judge to set according to comparison result Whether task time meets the requirements, comprising:

If Tmax > T_AP0, determine that setting task time meets the requirements；

10. Neural network optimization according to claim 8, which is characterized in that described according to comparison result output pair The neural network energy consumption model and neural network time model carry out biobjective scheduling result, comprising:

It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption model_AP', and choose least energy consumption E_AP0’ Corresponding modeling parameters, array dividing method are biobjective scheduling result；