CN109472361A - Neural network optimization - Google Patents
Neural network optimization Download PDFInfo
- Publication number
- CN109472361A CN109472361A CN201811344189.0A CN201811344189A CN109472361A CN 109472361 A CN109472361 A CN 109472361A CN 201811344189 A CN201811344189 A CN 201811344189A CN 109472361 A CN109472361 A CN 109472361A
- Authority
- CN
- China
- Prior art keywords
- energy consumption
- neural network
- time
- model
- modeling parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Image Analysis (AREA)
Abstract
This application involves a kind of Neural network optimization, which includes: default modeling parameters, and the modeling parameters include network parameter and hardware parameter;Neural network energy consumption model is constructed based on the modeling parameters;Neural network time model is constructed based on the modeling parameters;Biobjective scheduling is carried out to the neural network energy consumption model and neural network time model.The application carries out the time to neural network from the angle of the hardware calculation process of network and energy consumption models, it is successively predicted in time, energy consumption, while the leading modeling parameters of analysis time, energy consumption expense, time and energy consumption biobjective scheduling are carried out to neural network to improve neural network model by improving modeling parameters, array dividing method and caching dividing method.
Description
Technical field
This application involves artificial neural network technology field, especially a kind of Neural network optimization.
Background technique
With the emergence of nerual network technique, the neural network hardware for adapting to different application scene comes into being.Nerve net
The Inference Forecast ability of network is strong but computationally intensive, therefore how to promote the speed of neural computing and reduce neural network energy consumption
Become key.
In the related technology, neural network training process and deduction process suffer from the acceleration of network query function more urgent
Demand.Network training is all completed using GPU beyond the clouds substantially, and the parallel method and communication means of different hardware can shadows significantly
The speed of neural metwork training is rung, therefore mainly promotes mind from improving parallelization computing capability and reducing the angle of communication overhead
Through network query function speed.The energy consumption of neural network, which is segmented into, calculates two aspects of energy consumption and memory access energy consumption.It is fixed by output
(output stationary), weight fix the different data multiplex mode of (weight stationary), can be effectively
Reduce the memory access energy consumption of neural network.
But the above-mentioned research for neural computing all lays particular emphasis on one of low energy consumption or quickening calculating speed, does not have
It, can be time-consuming reducing in view of there may be certain contradictions for the acceleration of neural network in complicated application environment and Energy-aware
It can be to sacrifice speed as cost or with energy, and the time for reducing neural computing may generate more energy consumptions
Problem.
Summary of the invention
Low energy consumption is all laid particular emphasis on for the research of neural computing in the related technology to overcome at least to a certain extent
Or accelerate one of calculating speed, do not account in complicated application environment that the acceleration of neural network and Energy-aware can
There can be certain contradiction, neural computing can be reduced to sacrifice speed as cost or with energy reducing energy time-consuming
Time may lead to the problem of more energy consumptions again, the application provides a kind of Neural network optimization.
Default modeling parameters, the modeling parameters include network parameter and hardware parameter;
Neural network energy consumption model is constructed based on the modeling parameters;
Neural network time model is constructed based on the modeling parameters;
Biobjective scheduling is carried out to the neural network energy consumption model and neural network time model.
Further, the energy consumption formulas is E=V*T*e, and V is to need read-write/calculating data volume, and T is to need
Repetitive read-write/calculating number, e are unit energy consumption.
Further, the specific energy consumption includes read-write energy consumption and calculating energy consumption.
Further, the building neural network energy consumption model includes: successively to be modeled to neural network energy consumption, is obtained
Convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption model, pond layer energy consumption model.
Further, the neural network time model includes time overhead, and the time overhead includes the convolutional layer time
Expense and full articulamentum time overhead.
Further, the time overhead calculation method is time overhead Tz=max (TIO,Toperation), TIOFor read-write
Time, ToperationTo calculate the time.
It is further, described that biobjective scheduling is carried out to the neural network energy consumption model and neural network time model,
Include:
Caching segmentation is carried out to the neural network time model using TM calculation process;
Array segmentation is carried out to the neural network energy consumption model using AP calculation process;
It chooses the caching segmentation and meets the requirements the small corresponding the smallest array dividing method of energy consumption of condition.
Further, the selection caching segmentation meets the requirements the small corresponding the smallest array segmentation of energy consumption of condition
Method, comprising:
Set task time Tmax;
It calculates and uses T the time required to TM calculation process caching segmentation neural network time modelTM;
It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption modelAP;
Choose the smallest required energy consumption EAP0And and calculate the smallest required energy consumption EAP0Required time TAP0;
Compare Tmax and TAP0Size, according to comparison result judge set task time whether meet the requirements;
If setting task time meets the requirements, compare Tmax and TTMSize, according to comparison result export to the nerve
Network energy consumption model and neural network time model carry out biobjective scheduling result.
It is further, described to be judged to set whether task time meets the requirements according to comparison result, comprising:
If Tmax > TAP0, determine that setting task time meets the requirements;
Otherwise determine that task can not be completed within a specified time, and task time need to be reset.
Further, described to be exported according to comparison result to the neural network energy consumption model and neural network time model
Carry out biobjective scheduling result, comprising:
If Tmax < TTM, modeling parameters are adjusted, array segmentation is carried out to the neural network energy consumption model again;
It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption modelAP’, and choose least energy consumption
EAP0’Corresponding modeling parameters, array dividing method are biobjective scheduling result;
Otherwise, choosing the corresponding modeling parameters of TM calculation process and caching dividing method is biobjective scheduling result.
The technical solution that embodiments herein provides can include the following benefits:
The application Neural network optimization includes: default modeling parameters, constructs neural network based on the modeling parameters
Energy consumption model and neural network time model;Bi-objective is carried out to the neural network energy consumption model and neural network time model
Optimization.The application carries out the time to neural network from the angle of the hardware calculation process of network and energy consumption models, to time, energy consumption
Successively prediction, while the leading modeling parameters of analysis time, energy consumption expense, by improving modeling parameters, changing array segmentation side
Method and caching dividing method carry out time and energy consumption biobjective scheduling to neural network to improve neural network model.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The application can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the application
Example, and together with specification it is used to explain the principle of the application.
Fig. 1 is a kind of flow chart for Neural network optimization that the application one embodiment provides.
Specific embodiment
The present invention is described in detail below with reference to the accompanying drawings and embodiments.
Fig. 1 is a kind of flow chart for Neural network optimization that the application one embodiment provides.
As shown in Figure 1, the Neural network optimization of the present embodiment includes:
S1: default modeling parameters, the modeling parameters include network parameter and hardware parameter;
S2: neural network energy consumption model is constructed based on the modeling parameters;
S3: neural network time model is constructed based on the modeling parameters;
S4: biobjective scheduling is carried out to the neural network energy consumption model and neural network time model.
In neural network structure, there are three kinds of convolutional layer (CONV), full articulamentum (FC), pond layer (POOL) different knots
Structure corresponds respectively to feature extraction, three kinds of feature connection, Feature Compression tasks.The number of plies of heterogeneous networks structure, channel type are not
Together, and the feature of different layers is also not quite similar.For example, the data volume of computationally intensive, the full articulamentum of convolutional layer is big, it is this to money
The lack of uniformity that source requires needs special hardware structure to design to realize.
It further include RNN (Recognition with Recurrent Neural Network, Recurrent Neural Networks) layer in some neural network structures
Or CNN (convolutional neural networks, Convolutional Neural Networks) layer is to complete particular task.
Hardware for example chooses Thinker chip, and Thinker chip is controlled by PE array, on piece storage system, finite state
Machine and IO, decoder composition.
The modeling parameters include network parameter and hardware parameter, and the network parameter and hardware parameter are as shown in table 1.
1 modeling parameters table of table
Energy consumption and time are carried out by building building neural network energy consumption model and neural network time model, and simultaneously
Biobjective scheduling realizes the neural network algorithm of the lowest energy consumption under Given task time conditions by adjusting modeling parameters.
As a kind of implementation of the invention optional, the energy consumption formulas is E=V*T*e, V be need to read and write/
The data volume of calculating, T are to need repetitive read-write/calculating number, and e is unit energy consumption.
It needs read-write/calculating data volume V and repetitive read-write/calculating number T is needed to can use network parameter and hard
Part parameter is calculated;And for specific energy consumption e, the memory access energy consumption of different storage tiers is different, it is therefore desirable to successively be divided
Analysis.
The memory access energy consumption as shown in table 2 of different storage tiers
The memory access of the different storage tiers of table 2 normalizes energy consumption
If data are transmitted between two kinds of storing mechanisms, select energy consumption higher as this data transmission energy consumption.
For example, energy consumption is determined as 200 energy consumption units when data are passed on piece caching SRAM from DRAM.
As optional a kind of implementation of the invention, the specific energy consumption includes reading and writing energy consumption and calculating energy consumption.
As optional a kind of implementation of the invention, the building neural network energy consumption model includes: to neural network
Energy consumption is successively modeled, and convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption model, pond layer energy consumption are obtained
Model.
As optional a kind of implementation of the invention, the specific energy consumption E0 calculation method includes:
Convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption model, pond layer energy consumption model are constructed respectively;
Calculate separately the convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption model, pond layer energy consumption
The energy consumption E0 of the corresponding unit read-write/calculating of model.
As optional a kind of implementation of the invention, the specific energy consumption includes reading and writing energy consumption and calculating energy consumption.
Energy consumption calculation is as follows in convolutional layer energy consumption model:
For sample complete for one, a convolutional layer, energy consumption has been calculated are as follows:
E1=E1IO+E1operation (1)
E1IOEnergy consumption, E1 are read and write for convolutional layeroperationTo calculate energy consumption.
Read and write energy consumption E1IO=E1weightIO+E1inputIO+E1outputIO (2)
E1weightIOFor weight energy consumption, E1inputIOTo input energy consumption, E1outputIOTo export energy consumption.
Calculate separately weight energy consumption E1weightIO, input energy consumption E1inputIO, export energy consumption E1outputIOIt is as follows:
(1) weight energy consumption E1weightIO
E1weightIO=E1DRAM_buffer+E1buffer_PE (3)
E1DRAM_bufferFor the energy consumption for reading caching from DRAM, E1buffer_PEFor from the energy consumption for being cached to PE array.
NwFor the weight total amount of unit convolutional layer, RDDRAM_CONVThe wheel number of weight is written to caching for DRAM.
The weight total amount N of unit convolutional layerw=γ Chout_iKi 2Chin_i(5), DRAM reads 1 byte data to caching
Energy consumption eDRAM_bufferFor example, 200 energy consumption units.
Weight and the number for repeating importing are related with the size that on piece caches.If on piece caches the institute that can store this layer
There is weighted data, then each data need to only import once, subsequent memory access pertains only to the interaction of caching with PE array;Conversely,
When PE needs to reuse weighted data, needs weighted data to be used to be not stored in caching, need to carry out duplicate
Access.In a PE array computation period, in available two dimension output figureA output point, therefore input data needs
PointSecondary to imported into PE array, i.e., in the case where buffer memory capacity is inadequate, same weight repeats total time imported
Number isTherefore,
Number due to needing repetitive read-write isCache the energy consumption that 1 byte data is read to PE array
ebuffer_PEFor example, 6 energy consumption units, therefore,
(2) energy consumption E1 is inputtedinputIO
Input data needs to reach PE array from DRAM by caching, register, and needs to complete transmission in PE array.
Therefore, E1inputIO=E1DRAM_buffer+E1buffer_register+E1register_PE+E1PE_tran (8)。
E1DRAM_bufferFor the energy consumption from DRAM to caching, E1buffer_registerFor from the energy consumption for being cached to register,
E1register_PEFor from register to the energy consumption of PE array, E1PE_tranThe number of first PE of PE left side array is input to for each
The energy consumption transmitted according to dextrad.
It should be noted that the total amount of input may be different from the points of input feature vector figure due to the presence of filling.If
It is filled with sky, then is not filled, original input size constancy.If being filled with identical size, even if input and output figure size is not
Become, original input size can also change, and the length and width after variation are used in subsequent calculating.
Wherein, Hin_iWin_iChin_iα is the input data total amount of unit convolutional layer, SCONvdatabufferFor size of data.By
In Thinker, preferentially by input point, all multiplexing is finished, therefore input point is imported into buffer without repeating.
Register file (Register file) makes an addition between caching and PE array, can be to avoid the weight of input data
Multiple input.Expression allowsLine number is inputted required for row PE is parallel, S is lateral step-length, longitudinal step-length
For Hui+Si-Ki.During sliding laterally, data only read primary, therefore longitudinally need the number slided to beebuffer_registerEnergy consumption for buffer to register unit byte transmission is 6.
Number, Ch are imported for the repetition of input datain_iKi 2*Hout_iWout_iTo need the input data inputted
Product of the sum for the number of the points input point corresponding with each output point of output, eregister_PEFor register to PE array list
The energy consumption of bit byte transmission is 2.
Since input data is multiplexed, each is input to the data of first PE of PE left side array, will carry out dextrad biography
It passs.So the repetition memory access number of each input point isRead the energy consumption e an of byte dataPE_tran
For example, 2.
(3) energy consumption E1 is exportedoutputIO
The total amount of output data is β Chout_iHout_iWout_i.For output, output needs first to carry out the biography between PE
It passs, then in incoming caching, the data being unable to store in caching imported into DRAM to go to wait again and calculate next time.
Therefore, E1outputIO=E1PEout_tran+E1PE_buffer+E1buffer_DRAM (13)
Each PE calculates the output point generated and requires first to be transmitted in the PE of the leftmost side.A line PE is considered, in a week
Number can be generated in the calculating of phase isOutput point, these point require to be transferred in the PE of the leftmost side, visited altogether
It depositsIt is secondary.Therefore, average each output point needs memory accessIt is secondary.
Output data total amount is β Chout_iHout_iWout_i, the transmission energy consumption of unit byte is ePEout_tran, for example, 2
Energy consumption unit.
Calculate energy consumption
Ch need to be exported altogether in one layer of CONV is calculatedin_iHin_iWin_iA, each point needs to carry out Ki 2Chin_iIt is secondary multiply-add
Operation.
In some embodiments, for specified sample, it is all for completing energy consumed by all convolution layer operations
The summation of convolutional layer consumption energy.
In some embodiments, AP calculates stream and TM is calculated in stream, and each sample is all serial computing, therefore multiple samples
When energy consumption the energy consumption of different samples need to be only added.
Energy consumption calculation is as follows in full articulamentum energy consumption model:
E2=E2IO+E2operation (18)
Full articulamentum is multisample in Thinker chip, successively calculates.Calculate the read-write of a batch FC operation
Energy consumption, formula are as follows:
E2IO=E2weightIO+E2inputIO+E2outputIO (19)
(1)E2weightIOCalculation method include:
FC layers of weight read-write energy consumption is divided into two parts:
E2weightIO=E2DRAM_buffer+E2buffer_PE (20)
Similarly with convolutional layer, it can obtain:
Wherein, BS is the size criticized.
(2)E2inputIOCalculation method include:
Different from CONV layers, FC layers of input is not required to the register by inputting design reuse for convolution, therefore
E2inputIOIt is made of three parts:
E2inputIO==E2DRAM_buffer+E2buffer_PE+E2PE_tran (24)
Wherein, ePEinput_tranEnergy consumption, for example, 2 energy consumption units are transmitted for unit.
(3)E2outputIOCalculation method include:
E2outputIO=E2PE_out_tran+E2PE_buffer+E2buffer_DRAM (29)
Wherein:
E2PE_buffer=FCi+1BS*β*ePE_buffer (31)
E2buffer_DRAM=ReLU (FCi+1BS-SFCdatabuffer)*β*ebuffer_DRAM (32)
Wherein ePEoutput_tran、ePE_bufferAnd ebuffer_DRAMFor in each accumulation layer unit transmission energy consumption, respectively 2,6,
200 energy consumption units.
When in network there are it is FC layers multiple when, it is the sum of all FC layers of read-write energy consumptions that the FC of batch, which calculates total energy consumption,.
Calculate energy consumption E2operationCalculation method include:
In calculating process, FC is sharedi+1BS output point, the multiply-add number of operations that each output point needs carry out are
FCi.Energy consumption needed for carrying out unit byte multiply-add operation is eoperation, then the energy consumption of the part are as follows: E2operation=FCi+ 1BS*FCi*αγeoperation (33)。
Energy consumption calculation is as follows in RNN layers of energy consumption model:
When in network there are it is RNN layers multiple when, it is the sum of every layer of calculating energy consumption that the RNN of batch, which calculates total energy consumption,.
Using the series model of as LSTM structure the most universal in RNN, following RNN energy consumption analysis is mainly for LSTM
Structure is analyzed.
In RNN, the calculation process of a LSTM unit includes:
fi=σ (wxfxt+whfht-1+bf)
it=σ (wxixt+whiht-1+bi)
ot=σ (wxoxt+whoht-1+bo)
Thinker chip calculates RNN layers using two kinds of PE of common PE and super PE.Wherein, Wx_gatext_gate+Wh_ gateht-1_gate+bgatePartial calculating is the principle of FC layers of calculating, is calculated in common PE.After the completion of calculating, data
It imports the calculating that super PE carries out RNN-gating: calculating sigmoid or tanh function, obtain all kinds of gate function vectors, and pass through
It crosses and multiply-add c is calculatedtAnd ht。
List RNN one, layer batch of energy consumption is calculated first.In view of RNN is split as two parts FC and RNN-gating, meter
Calculate the energy consumption that a batch RNN operation the number of iterations is Iteration, the expression formula of RNN layers of energy consumption are as follows:
(1) calculating of RNN-FC energy consumption includes:
This two-part energy consumption calculation and FC layers of energy consumption calculation principle are essentially identical.The difference of itself and FC layers of energy consumption calculation
Place is that the dimension needs of weight matrix, input vector and output vector are adjusted.For RNN, FC layers of calculating
Unified Form are as follows:
FCt=Wx_gatext_gate+Wh_gateht-1_gate+bgate (35)
Wherein, gate can take i/f/o/g, respectively correspond four doors in LSTM.In the concrete realization, Wx_gateWith
Wh_gateA W can be integrated by lateral connectiongate, xt_gateAnd ht-1_gateLongitudinally connected it can be integrated into xgate, therefore Wgate
Dimension is Olen_i*(Ilen-i+Olen_i), xgateDimension is 1* (Ilen-i+Olen_i), the parameter FC with FC layersiAnd FCi+1It corresponds to
Come, it may be assumed that
FCi=Ilen-i+Olen_i
FCi+1=Olen_i
It is analyzed so as to incorporate into FC layers of energy consumption model.It is worth noting that, for each LSTM unit,
FC operation needs to be repeated four times, to the number answered the door.Details are not described herein.
(2) calculating of gating memory access energy consumption includes:
As described above, can be mainly divided into two kinds of calculating in gating calculating: (a) simple tanh/sigmoid letter
Number calculates;(b) element calculates the multiplication of elementThereforeIt can further indicate that are as follows:
Calculating include:
Tanh/sigmoid is mainly used for calculating 4 gate functions and ct, that is, need to be O to vector lengthlen_i5 groups of data
Operate by the tanh/sigmoid of element, the required sum operated is BS*5Olen_i.The calculating carries out in PE, number
Buffer is imported from DRAM according to needs, then imports PE from buffer, is finally exported.It is arrived assuming that carrying out unit byte DRAM
The energy consumption of buffer transmission is eDRAM_buffer, the energy consumption that unit byte is transmitted from buffer to PE is ebuffer_PE, unit byte from
PE to buffer, the energy consumption from buffer to DRAM are respectively ebuffer_DRAMAnd ePE_buffer.Each data at most carry out once
This operation, and interaction of the data between DRAM and buffer only can not accommodate all input point Shi Caifa in buffer size
It is raw, therefore following formula can be summed up:
Calculating mainly include ctCalculating and htCalculate two parts.For htFor, it is only necessary to carry out once by
The multiplication of element operates, therefore the reading of data pertains only to total 2Olen_iA input data, writing out for data are related to Olen_iA data.
Calculating with tanh/sigmoid above is similarly:
For ctFor, also need to carry out the operation of one-accumulate in addition to multiplication.For an output data, need
Multiplication operation twice is carried out, then two products are summed, therefore one of product needs to read into buffer and works as
In, then import the register completion summation in PE.Similarly, the energy consumption of the part can be obtained are as follows:
The calculating of Gating calculating energy consumption
For gating, in fact it could happen that three kinds of operations be respectively: tanh/sigmoid, multiplication, addition.Assuming that three
Kind operates corresponding unit byte operation energy consumption are as follows: etanh/sigmoid、emultiply、eaddValue is for example 1, can calculate three
The number that kind operation needs to be implemented is respectively as follows: 5Olen_i、3Olen_i、Olen_i.For this purpose, can calculateAre as follows:
Energy consumption calculation is as follows in the layer energy consumption model of pond:
For pondization operation, since data-reusing is not present between different layers, different samples, it can examine first
Consider the energy consumption of single layer, single sample.Equally, the energy consumption that pondization operates is divided into read-write energy consumption and calculates energy consumption discussion.
Establish the energy consumption model of pondization operation.If necessary to calculate multilayer, multisample as a result, need to only carry out summation behaviour
Make.
E4=E4IO+E4operation (41)
Pond layer reads and writes energy consumption E4IOCalculating
Thinker supports maximum pondization operation, and it acts as the height and width that reduce output figure, and maintain
Port number is constant.Its input figure and output figure have dimension height relationships:Wide association:Then total pond block number mesh are as follows:
X=Hout_i*Wout_i*Chin_i (42)
The reading and writing data energy consumption of pondization operation can be divided into input data and output data two types, and each type needs
It completes from the read-write of DRAM, caching, PE array interaction.Since data are imported without repeating, the energy consumption model of this part is more simple
It is single, it may be assumed that
Wherein:
For the reading energy consumption of input data,It is that output data writes out energy
Consumption.
The calculating for calculating energy consumption involves a need to the total data number of input.It is worth noting that, the total data number of input is same
It not equal to the size of input feature vector figure, and needs from the size of output figure and the size of core is counter is pushed away, it may be assumed that E4operation
=pi*pi*Hout_i*Wout_i*Chout_i*eoperation (44)。
As optional a kind of implementation of the invention, the neural network time model includes time overhead, when described
Between expense include convolutional layer time overhead and full articulamentum time overhead.
As optional a kind of implementation of the invention, the time overhead calculation method is time overhead Tz=max
(TIO,Toperation), TIOFor access time, ToperationTo calculate the time.
As a kind of implementation of the invention optional, the neural network time model include: convolutional layer time model,
Full articulamentum time model.
For Thinker chip, the expense of time depends primarily on the biggish convolution operation of calculation amount, data volume and Quan Lian
Connect operation (including the full attended operation in full articulamentum and RNN).Due to " blocking " effect of time, time-consuming less RNN-
Gating and pondization operation are without carrying out model construction.In the following, carrying out the time in terms of convolutional layer and full articulamentum two
Modeling analysis.
As optional a kind of implementation of the invention, wrapped in the convolutional layer time model and full articulamentum time model
Time overhead is included, the time overhead calculation method is time overhead Tz=max (TIO,Toperation), TIOFor access time,
ToperationTo calculate the time.
Convolutional layer time model includes:
The calculating of access time includes:
Due to the process that Thinker is calculated, convolutional layer is that single layer, single sample meter calculate one by one, therefore analysis one first
Time needed for one layer of convolution algorithm in a sample.
The read-write of data is needed by several levels such as DRAM, caching, PE.After being cached in data importing tablet, data
Transmission time quickly, therefore this part can be ignored.So TIOData will be reduced in the interaction of DRAM and caching
The time of middle consumption.
In Thinker chip, input, output data and weight are imported into two different cachings from DRAM
, this two-part bandwidth is different, can adaptively be made a change according to the framework of neural network.Input, output number
According to shared bandwidth BWdataconv, the bandwidth of weighted data is BWweightconv.In a layer network, input, the total amount of output data
For Hin_iWin_iChin_iα+Hout_iWout_iChout_iβ, weighted data total amount areIt therefore can be with
Calculate separately out the access time of input, the access time of output data and weighted data.It is one of larger in the two to be
For TIO, therefore:
Calculate time ToperationCalculating include:
For one layer of convolutional network of a sample, its calculating needs PE array to carry out the operations of several wheels.In a wheel
In calculating, all output points of this layer can be had been calculated in several output points of parallel computation by PE array after several wheels.Assuming that complete
It is round that the wheel number needed is calculated at a layer networkconvlayer, the time that each round needs is troundconv, then calculating the time
It can indicate are as follows:
Toperation=roundconvlayer*troundconv (46)。
Since the every row of Thinker is multiplexed input data, Hout_iWout_iIt is to amount to the total line number for needing to input, and every wheel can
It calculatesCapable data, therefore need in totalInput all could have been calculated for wheel;And for one group of input
Data, weight similarly needs to inputIt is secondary.
It is contained in each output pointSecondary multiply-add operation, therefore the time for carrying out a wheel calculating consuming is this
Time required for the multiply-add operation that a little sequences carry out:
Therefore, the time of one convolutional layer of calculating is in a sample are as follows:
It the operation time of all convolutional layers in one sample, only need to be by layer-by-layer temporal summation;If calculating a batch
In all samples convolutional layer operation time, only need on the basis of a sample total time multiplied by batch size BS.
Articulamentum time model includes: entirely
Since the different samples of a batch in FC layers are parallel computations, the one of a batch data is directly analyzed
Layer FC operation time.
The calculating of access time only considers interaction of the data between DRAM and caching.Assuming that input, output data divide
The bandwidth matched is BWdata_FC, the bandwidth that weight is distributed is BWweight_FC, and inputoutput data total degree is (FCiα*
RDDRAM_in_FC+FCi+1β) * BS, it is FC that weight, which always inputs number,iFCi+1γ*RDDRAM_weight_FC, access time formula:
The calculating for calculating the time includes: roundFClayerAnd tround_FC:
Wherein:
It is understood that all FC layers of operation times in a batch, it only need to be by layer-by-layer temporal summation.
The hardware parameter of model is also variable, therefore can be by adjusting same net under test of heuristics different hardware parameter
The operation time of network and energy consumption, by adjusting various hardware parameters (such as bandwidth, cache size, the size of PE array and channel
Number), the possible available optimal solution under hardware area limited case minimizes time and energy from the angle of hardware design
Consumption, while time, energy consumption are successively predicted analysis time, energy consumption expense leading modeling parameters, therefore can be to hardware
Design plays certain help.
It is described to the neural network energy consumption model and neural network time as optional a kind of implementation of the invention
Model carries out biobjective scheduling, comprising:
Caching segmentation is carried out to the neural network time model using TM calculation process;
Array segmentation is carried out to the neural network energy consumption model using AP calculation process;
It chooses the caching segmentation and meets the requirements the small corresponding the smallest array dividing method of energy consumption of condition.
Thinker chip supports two kinds of calculation process of AP and TM.The full name of AP is Array-Partitioning, i.e. array
Segmentation;The full name of TM is Time-Multiplexing, that is, is time-multiplexed.The difference of the two is, same network is pushed away
Disconnected process, computation sequence are different.For TM calculation process, network deduction executes one by one according to the sequence of layer, and the same time is all
PE array all calculates same layer.But AP is calculated and is flowed, the division of array has carried out some adjustment, therefore in same time PE
Array may calculate multilayer simultaneously.A plurality of types of layers can be calculated simultaneously since AP calculates stream, this substantially balances convolution fortune
The feature that calculation amount is high, the small and full attended operation calculation amount of data volume is small, data volume is big is calculated, so as to shorten the time.But
Due to TM calculation process data-reusing often, this cause Data duplication transmit number it is few, so that energy consumption is smaller.
As optional a kind of implementation of the invention, it is small described right that the selection caching segmentation meets the requirements condition
The smallest array dividing method should be able to be consumed, comprising:
Set task time Tmax;
It calculates and uses T the time required to TM calculation process caching segmentation neural network time modelTM;
It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption modelAP;
Choose the smallest required energy consumption EAP0And and calculate the smallest required energy consumption EAP0Required time TAP0;
Compare Tmax and TAP0Size, according to comparison result judge set task time whether meet the requirements;
If setting task time meets the requirements, compare Tmax and TTMSize, according to comparison result export to the nerve
Network energy consumption model and neural network time model carry out biobjective scheduling result.
It is described to be judged to set whether task time meets according to comparison result as optional a kind of implementation of the invention
It is required that, comprising:
If Tmax > TAP0, determine that setting task time meets the requirements;
Otherwise determine that task can not be completed within a specified time, and task time need to be reset.
It is described to be exported according to comparison result to the neural network energy consumption mould as optional a kind of implementation of the invention
Type and neural network time model carry out biobjective scheduling result, comprising:
If Tmax < TTM, modeling parameters are adjusted, array segmentation is carried out to the neural network energy consumption model again;
It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption modelAP’, and choose least energy consumption
EAP0’Corresponding modeling parameters, array dividing method are biobjective scheduling result;
Otherwise, choosing the corresponding modeling parameters of TM calculation process and caching dividing method is biobjective scheduling result.
During actually executing, in order to achieve the purpose that be performed simultaneously different layers, not only adjustment will be adjusted executable
PE number, also to adjust the distribution condition of caching.Since the total cache size and array size of Thinker chip are given, and
It only carries out longitudinally cutting being to compare reasonable manner, it is therefore desirable to which the parameter of searching loop includes distributing to the array columns of CONVData buffer storage space size SCONvdatabufferWith weight spatial cache size Sconvweightbuffer。
It is known that the time overhead of network query function has " hiding effect " in modeling process." hiding effect " refers to certain
One layer of time, which calculates, can be split as calculating time and data transmission period, and should be the maximum in the two the total time of this layer
Value.And data transmission period takes the larger value in data transmission period and weight transmission time.For further search time
Expense needs to study the leading factor of every layer of time loss.For this purpose, by taking LRCN typical case's neural network as an example, the typical nerve of LRCN
Network operates FC, and two kinds of calculation process of TM and AP0 are all that the time overhead of weight read-write is big, occupies an leading position;And for
CONV operation is basically to occupy an leading position the calculating time.Full articulamentum calculation amount is small and data volume is big.
Reading and writing data time of TM calculation process, weight access time and to calculate time difference very big.The calculating of convolutional layer
Time overhead is far longer than reading and writing data time, weight read-write;In this case, when PE array is busy with the one section very long of calculating
In time, data transmission is but in idle state;And when carrying out FC calculating, PE array but goes out because weight read-write time-consuming is too long
Show idle.And in the time layer-by-layer expense of AP0, reading and writing data time, weight access time and calculating time have carried out preferably
Balance.Therefore, total time can be effectively reduced by the way that the time of the especially small convolution algorithm of original accounting is substantially turned up.FC layers
Time overhead read and write by weight it is leading, if to shorten the FC time certainly will need to increase the caching ratio for distributing to FC weight or
Person reduces total time by increasing caching total amount or increasing the method for data transfer bandwidth.
The hardware parameter of neural network energy consumption model and neural network time model is also variable, therefore can pass through tune
The operation time of consolidated network and energy consumption under whole test of heuristics different hardware parameter can play certain side to the design of hardware
It helps.By adjusting various hardware parameters (such as bandwidth, cache size, the size of PE array and port number), may it is available
Optimal solution under hardware area limited case minimizes time and energy consumption from the angle of hardware design.
Therefore caching segmentation is carried out to the neural network time model using TM calculation process;Use AP calculation process pair
The neural network energy consumption model carries out array segmentation;It chooses the caching segmentation and meets the requirements the small corresponding energy consumption of condition most
Small array dividing method can provide help for the design of later hardware, and the condition in the different task time may be implemented
Under have different optimal solutions.
In the present embodiment, from the angle of the hardware calculation process of network is carried out to neural network the time and energy consumption models,
Time, energy consumption are successively predicted, while the leading modeling parameters of analysis time, energy consumption expense, by improving modeling parameters, array point
Segmentation method and caching dividing method carry out time and energy consumption biobjective scheduling to neural network to improve neural network model.Into
One step, by successively modeling, different optimal solutions is had under conditions of the different task time.
It is understood that same or similar part can mutually refer in the various embodiments described above, in some embodiments
Unspecified content may refer to the same or similar content in other embodiments.
It should be noted that term " first ", " second " etc. are used for description purposes only in the description of the present application, without
It can be interpreted as indication or suggestion relative importance.In addition, in the description of the present application, unless otherwise indicated, the meaning of " multiple "
Refer at least two.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion
Point, and the range of the preferred embodiment of the application includes other realization, wherein can not press shown or discussed suitable
Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be by the application
Embodiment person of ordinary skill in the field understood.
It should be appreciated that each section of the application can be realized with hardware, software, firmware or their combination.Above-mentioned
In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage
Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware
Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal
Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries
It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium
In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, can integrate in a processing module in each functional unit in each embodiment of the application
It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer
In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is contained at least one embodiment or example of the application.In the present specification, schematic expression of the above terms are not
Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any
One or more embodiment or examples in can be combined in any suitable manner.
Although embodiments herein has been shown and described above, it is to be understood that above-described embodiment is example
Property, it should not be understood as the limitation to the application, those skilled in the art within the scope of application can be to above-mentioned
Embodiment is changed, modifies, replacement and variant.
It should be noted that the present invention is not limited to above-mentioned preferred forms, those skilled in the art are of the invention
Other various forms of products can be all obtained under enlightenment, however, make any variation in its shape or structure, it is all have with
The identical or similar technical solution of the application, is within the scope of the present invention.
Claims (10)
1. a kind of Neural network optimization characterized by comprising
Default modeling parameters, the modeling parameters include network parameter and hardware parameter;
Neural network energy consumption model is constructed based on the modeling parameters;
Neural network time model is constructed based on the modeling parameters;
Biobjective scheduling is carried out to the neural network energy consumption model and neural network time model.
2. Neural network optimization according to claim 1, which is characterized in that the energy consumption formulas is E=V*
T*e, V are to need read-write/calculating data volume, and T is to need repetitive read-write/calculating number, and e is unit energy consumption.
3. Neural network optimization according to claim 2, which is characterized in that the specific energy consumption includes read-write energy consumption
With calculating energy consumption.
4. Neural network optimization according to claim 1, which is characterized in that the building neural network energy consumption model
Include: successively to be modeled to neural network energy consumption, obtains convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption
Model, pond layer energy consumption model.
5. Neural network optimization according to claim 1, which is characterized in that the neural network time model includes
Time overhead, the time overhead include convolutional layer time overhead and full articulamentum time overhead.
6. Neural network optimization according to claim 5, which is characterized in that when the time overhead calculation method is
Between expense Tz=max (TIO,Toperation), TIOFor access time, ToperationTo calculate the time.
7. Neural network optimization according to claim 1, which is characterized in that described to the neural network energy consumption mould
Type and neural network time model carry out biobjective scheduling, comprising:
Caching segmentation is carried out to the neural network time model using TM calculation process;
Array segmentation is carried out to the neural network energy consumption model using AP calculation process;
It chooses the caching segmentation and meets the requirements the small corresponding the smallest array dividing method of energy consumption of condition.
8. Neural network optimization according to claim 7, which is characterized in that described to choose the caching segmentation satisfaction
The small corresponding the smallest array dividing method of energy consumption of requirement condition, comprising:
Set task time Tmax;
It calculates and uses T the time required to TM calculation process caching segmentation neural network time modelTM;
It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption modelAP;
Choose the smallest required energy consumption EAP0And and calculate the smallest required energy consumption EAP0Required time TAP0;
Compare Tmax and TAP0Size, according to comparison result judge set task time whether meet the requirements;
If setting task time meets the requirements, compare Tmax and TTMSize, according to comparison result export to the neural network
Energy consumption model and neural network time model carry out biobjective scheduling result.
9. Neural network optimization according to claim 8, which is characterized in that described to judge to set according to comparison result
Whether task time meets the requirements, comprising:
If Tmax > TAP0, determine that setting task time meets the requirements;
Otherwise determine that task can not be completed within a specified time, and task time need to be reset.
10. Neural network optimization according to claim 8, which is characterized in that described according to comparison result output pair
The neural network energy consumption model and neural network time model carry out biobjective scheduling result, comprising:
If Tmax < TTM, modeling parameters are adjusted, array segmentation is carried out to the neural network energy consumption model again;
It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption modelAP', and choose least energy consumption EAP0’
Corresponding modeling parameters, array dividing method are biobjective scheduling result;
Otherwise, choosing the corresponding modeling parameters of TM calculation process and caching dividing method is biobjective scheduling result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811344189.0A CN109472361B (en) | 2018-11-13 | 2018-11-13 | Neural network optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811344189.0A CN109472361B (en) | 2018-11-13 | 2018-11-13 | Neural network optimization method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109472361A true CN109472361A (en) | 2019-03-15 |
CN109472361B CN109472361B (en) | 2020-08-28 |
Family
ID=65671820
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811344189.0A Withdrawn - After Issue CN109472361B (en) | 2018-11-13 | 2018-11-13 | Neural network optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109472361B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110738318A (en) * | 2019-09-11 | 2020-01-31 | 北京百度网讯科技有限公司 | Method, system and device for evaluating network structure running time and generating evaluation model |
CN110929860A (en) * | 2019-11-07 | 2020-03-27 | 深圳云天励飞技术有限公司 | Convolution acceleration operation method and device, storage medium and terminal equipment |
CN111753950A (en) * | 2020-01-19 | 2020-10-09 | 杭州海康威视数字技术股份有限公司 | Method, device and equipment for determining forward time consumption |
CN112085195A (en) * | 2020-09-04 | 2020-12-15 | 西北工业大学 | X-ADMM-based deep learning model environment self-adaption method |
CN112468533A (en) * | 2020-10-20 | 2021-03-09 | 安徽网萌科技发展股份有限公司 | Agricultural product planting-oriented edge learning model online segmentation method and system |
CN113377546A (en) * | 2021-07-12 | 2021-09-10 | 中科弘云科技(北京)有限公司 | Communication avoidance method, apparatus, electronic device, and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102773981A (en) * | 2012-07-16 | 2012-11-14 | 南京航空航天大学 | Implementation method of energy-saving and optimizing system of injection molding machine |
CN105302973A (en) * | 2015-11-06 | 2016-02-03 | 重庆科技学院 | MOEA/D algorithm based aluminum electrolysis production optimization method |
CN106427589A (en) * | 2016-10-17 | 2017-02-22 | 江苏大学 | Electric car driving range estimation method based on prediction of working condition and fuzzy energy consumption |
US20180046894A1 (en) * | 2016-08-12 | 2018-02-15 | DeePhi Technology Co., Ltd. | Method for optimizing an artificial neural network (ann) |
-
2018
- 2018-11-13 CN CN201811344189.0A patent/CN109472361B/en not_active Withdrawn - After Issue
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102773981A (en) * | 2012-07-16 | 2012-11-14 | 南京航空航天大学 | Implementation method of energy-saving and optimizing system of injection molding machine |
CN105302973A (en) * | 2015-11-06 | 2016-02-03 | 重庆科技学院 | MOEA/D algorithm based aluminum electrolysis production optimization method |
US20180046894A1 (en) * | 2016-08-12 | 2018-02-15 | DeePhi Technology Co., Ltd. | Method for optimizing an artificial neural network (ann) |
CN106427589A (en) * | 2016-10-17 | 2017-02-22 | 江苏大学 | Electric car driving range estimation method based on prediction of working condition and fuzzy energy consumption |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110738318A (en) * | 2019-09-11 | 2020-01-31 | 北京百度网讯科技有限公司 | Method, system and device for evaluating network structure running time and generating evaluation model |
CN110929860A (en) * | 2019-11-07 | 2020-03-27 | 深圳云天励飞技术有限公司 | Convolution acceleration operation method and device, storage medium and terminal equipment |
WO2021088688A1 (en) * | 2019-11-07 | 2021-05-14 | 深圳云天励飞技术股份有限公司 | Convolution acceleration operation method and apparatus, storage medium and terminal device |
US11748595B2 (en) | 2019-11-07 | 2023-09-05 | Shenzhen Intellifusion Technologies Co., Ltd. | Convolution acceleration operation method and apparatus, storage medium and terminal device |
CN111753950A (en) * | 2020-01-19 | 2020-10-09 | 杭州海康威视数字技术股份有限公司 | Method, device and equipment for determining forward time consumption |
CN111753950B (en) * | 2020-01-19 | 2024-02-27 | 杭州海康威视数字技术股份有限公司 | Forward time consumption determination method, device and equipment |
CN112085195A (en) * | 2020-09-04 | 2020-12-15 | 西北工业大学 | X-ADMM-based deep learning model environment self-adaption method |
CN112468533A (en) * | 2020-10-20 | 2021-03-09 | 安徽网萌科技发展股份有限公司 | Agricultural product planting-oriented edge learning model online segmentation method and system |
CN112468533B (en) * | 2020-10-20 | 2023-01-10 | 安徽网萌科技发展股份有限公司 | Agricultural product planting-oriented edge learning model online segmentation method and system |
CN113377546A (en) * | 2021-07-12 | 2021-09-10 | 中科弘云科技(北京)有限公司 | Communication avoidance method, apparatus, electronic device, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109472361B (en) | 2020-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109472361A (en) | Neural network optimization | |
KR101959376B1 (en) | Systems and methods for a multi-core optimized recurrent neural network | |
CN110689115B (en) | Neural network model processing method and device, computer equipment and storage medium | |
Ma et al. | Performance modeling for CNN inference accelerators on FPGA | |
CN110020716A (en) | Neural network hardware | |
CN111667051A (en) | Neural network accelerator suitable for edge equipment and neural network acceleration calculation method | |
CN107657581A (en) | Convolutional neural network CNN hardware accelerator and acceleration method | |
US11049013B1 (en) | Encoding of weight values stored on neural network inference circuit | |
Ramanathan et al. | Look-up table based energy efficient processing in cache support for neural network acceleration | |
US11210586B1 (en) | Weight value decoder of neural network inference circuit | |
EP2122542A1 (en) | Architecture, system and method for artificial neural network implementation | |
WO2021089009A1 (en) | Data stream reconstruction method and reconstructable data stream processor | |
KR20180123846A (en) | Logical-3d array reconfigurable accelerator for convolutional neural networks | |
CN113037800B (en) | Job scheduling method and job scheduling device | |
US11568227B1 (en) | Neural network inference circuit read controller with multiple operational modes | |
JP7332722B2 (en) | Data processing method, device, storage medium and electronic equipment | |
CN108520297A (en) | Programmable deep neural network processor | |
Yan et al. | FPGAN: an FPGA accelerator for graph attention networks with software and hardware co-optimization | |
Andri et al. | Going further with winograd convolutions: Tap-wise quantization for efficient inference on 4x4 tiles | |
EP4057142A1 (en) | Job scheduling method and job scheduling apparatus | |
WO2024119862A1 (en) | Neural network acceleration system | |
CN117035045A (en) | Model parameter updating method, device, equipment, storage medium and program product | |
Wang | Efficient, Reconfigurable, and QoS-Aware Systems for Deep Neural Networks | |
CN113377546B (en) | Communication avoidance method, apparatus, electronic device, and storage medium | |
US20230315608A1 (en) | Method for measuring performance of neural processing device and device for measuring performance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
AV01 | Patent right actively abandoned |
Granted publication date: 20200828 Effective date of abandoning: 20210125 |
|
AV01 | Patent right actively abandoned |
Granted publication date: 20200828 Effective date of abandoning: 20210125 |
|
AV01 | Patent right actively abandoned | ||
AV01 | Patent right actively abandoned |