CN109472361A - Neural network optimization - Google Patents

Neural network optimization Download PDF

Info

Publication number
CN109472361A
CN109472361A CN201811344189.0A CN201811344189A CN109472361A CN 109472361 A CN109472361 A CN 109472361A CN 201811344189 A CN201811344189 A CN 201811344189A CN 109472361 A CN109472361 A CN 109472361A
Authority
CN
China
Prior art keywords
energy consumption
neural network
time
model
modeling parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811344189.0A
Other languages
Chinese (zh)
Other versions
CN109472361B (en
Inventor
张跃进
胡勇
喻蒙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongxiang Bo Qian Mdt Infotech Ltd
Original Assignee
Zhongxiang Bo Qian Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongxiang Bo Qian Mdt Infotech Ltd filed Critical Zhongxiang Bo Qian Mdt Infotech Ltd
Priority to CN201811344189.0A priority Critical patent/CN109472361B/en
Publication of CN109472361A publication Critical patent/CN109472361A/en
Application granted granted Critical
Publication of CN109472361B publication Critical patent/CN109472361B/en
Withdrawn - After Issue legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

This application involves a kind of Neural network optimization, which includes: default modeling parameters, and the modeling parameters include network parameter and hardware parameter;Neural network energy consumption model is constructed based on the modeling parameters;Neural network time model is constructed based on the modeling parameters;Biobjective scheduling is carried out to the neural network energy consumption model and neural network time model.The application carries out the time to neural network from the angle of the hardware calculation process of network and energy consumption models, it is successively predicted in time, energy consumption, while the leading modeling parameters of analysis time, energy consumption expense, time and energy consumption biobjective scheduling are carried out to neural network to improve neural network model by improving modeling parameters, array dividing method and caching dividing method.

Description

Neural network optimization
Technical field
This application involves artificial neural network technology field, especially a kind of Neural network optimization.
Background technique
With the emergence of nerual network technique, the neural network hardware for adapting to different application scene comes into being.Nerve net The Inference Forecast ability of network is strong but computationally intensive, therefore how to promote the speed of neural computing and reduce neural network energy consumption Become key.
In the related technology, neural network training process and deduction process suffer from the acceleration of network query function more urgent Demand.Network training is all completed using GPU beyond the clouds substantially, and the parallel method and communication means of different hardware can shadows significantly The speed of neural metwork training is rung, therefore mainly promotes mind from improving parallelization computing capability and reducing the angle of communication overhead Through network query function speed.The energy consumption of neural network, which is segmented into, calculates two aspects of energy consumption and memory access energy consumption.It is fixed by output (output stationary), weight fix the different data multiplex mode of (weight stationary), can be effectively Reduce the memory access energy consumption of neural network.
But the above-mentioned research for neural computing all lays particular emphasis on one of low energy consumption or quickening calculating speed, does not have It, can be time-consuming reducing in view of there may be certain contradictions for the acceleration of neural network in complicated application environment and Energy-aware It can be to sacrifice speed as cost or with energy, and the time for reducing neural computing may generate more energy consumptions Problem.
Summary of the invention
Low energy consumption is all laid particular emphasis on for the research of neural computing in the related technology to overcome at least to a certain extent Or accelerate one of calculating speed, do not account in complicated application environment that the acceleration of neural network and Energy-aware can There can be certain contradiction, neural computing can be reduced to sacrifice speed as cost or with energy reducing energy time-consuming Time may lead to the problem of more energy consumptions again, the application provides a kind of Neural network optimization.
Default modeling parameters, the modeling parameters include network parameter and hardware parameter;
Neural network energy consumption model is constructed based on the modeling parameters;
Neural network time model is constructed based on the modeling parameters;
Biobjective scheduling is carried out to the neural network energy consumption model and neural network time model.
Further, the energy consumption formulas is E=V*T*e, and V is to need read-write/calculating data volume, and T is to need Repetitive read-write/calculating number, e are unit energy consumption.
Further, the specific energy consumption includes read-write energy consumption and calculating energy consumption.
Further, the building neural network energy consumption model includes: successively to be modeled to neural network energy consumption, is obtained Convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption model, pond layer energy consumption model.
Further, the neural network time model includes time overhead, and the time overhead includes the convolutional layer time Expense and full articulamentum time overhead.
Further, the time overhead calculation method is time overhead Tz=max (TIO,Toperation), TIOFor read-write Time, ToperationTo calculate the time.
It is further, described that biobjective scheduling is carried out to the neural network energy consumption model and neural network time model, Include:
Caching segmentation is carried out to the neural network time model using TM calculation process;
Array segmentation is carried out to the neural network energy consumption model using AP calculation process;
It chooses the caching segmentation and meets the requirements the small corresponding the smallest array dividing method of energy consumption of condition.
Further, the selection caching segmentation meets the requirements the small corresponding the smallest array segmentation of energy consumption of condition Method, comprising:
Set task time Tmax;
It calculates and uses T the time required to TM calculation process caching segmentation neural network time modelTM
It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption modelAP
Choose the smallest required energy consumption EAP0And and calculate the smallest required energy consumption EAP0Required time TAP0
Compare Tmax and TAP0Size, according to comparison result judge set task time whether meet the requirements;
If setting task time meets the requirements, compare Tmax and TTMSize, according to comparison result export to the nerve Network energy consumption model and neural network time model carry out biobjective scheduling result.
It is further, described to be judged to set whether task time meets the requirements according to comparison result, comprising:
If Tmax > TAP0, determine that setting task time meets the requirements;
Otherwise determine that task can not be completed within a specified time, and task time need to be reset.
Further, described to be exported according to comparison result to the neural network energy consumption model and neural network time model Carry out biobjective scheduling result, comprising:
If Tmax < TTM, modeling parameters are adjusted, array segmentation is carried out to the neural network energy consumption model again;
It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption modelAP’, and choose least energy consumption EAP0’Corresponding modeling parameters, array dividing method are biobjective scheduling result;
Otherwise, choosing the corresponding modeling parameters of TM calculation process and caching dividing method is biobjective scheduling result.
The technical solution that embodiments herein provides can include the following benefits:
The application Neural network optimization includes: default modeling parameters, constructs neural network based on the modeling parameters Energy consumption model and neural network time model;Bi-objective is carried out to the neural network energy consumption model and neural network time model Optimization.The application carries out the time to neural network from the angle of the hardware calculation process of network and energy consumption models, to time, energy consumption Successively prediction, while the leading modeling parameters of analysis time, energy consumption expense, by improving modeling parameters, changing array segmentation side Method and caching dividing method carry out time and energy consumption biobjective scheduling to neural network to improve neural network model.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The application can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the application Example, and together with specification it is used to explain the principle of the application.
Fig. 1 is a kind of flow chart for Neural network optimization that the application one embodiment provides.
Specific embodiment
The present invention is described in detail below with reference to the accompanying drawings and embodiments.
Fig. 1 is a kind of flow chart for Neural network optimization that the application one embodiment provides.
As shown in Figure 1, the Neural network optimization of the present embodiment includes:
S1: default modeling parameters, the modeling parameters include network parameter and hardware parameter;
S2: neural network energy consumption model is constructed based on the modeling parameters;
S3: neural network time model is constructed based on the modeling parameters;
S4: biobjective scheduling is carried out to the neural network energy consumption model and neural network time model.
In neural network structure, there are three kinds of convolutional layer (CONV), full articulamentum (FC), pond layer (POOL) different knots Structure corresponds respectively to feature extraction, three kinds of feature connection, Feature Compression tasks.The number of plies of heterogeneous networks structure, channel type are not Together, and the feature of different layers is also not quite similar.For example, the data volume of computationally intensive, the full articulamentum of convolutional layer is big, it is this to money The lack of uniformity that source requires needs special hardware structure to design to realize.
It further include RNN (Recognition with Recurrent Neural Network, Recurrent Neural Networks) layer in some neural network structures Or CNN (convolutional neural networks, Convolutional Neural Networks) layer is to complete particular task.
Hardware for example chooses Thinker chip, and Thinker chip is controlled by PE array, on piece storage system, finite state Machine and IO, decoder composition.
The modeling parameters include network parameter and hardware parameter, and the network parameter and hardware parameter are as shown in table 1.
1 modeling parameters table of table
Energy consumption and time are carried out by building building neural network energy consumption model and neural network time model, and simultaneously Biobjective scheduling realizes the neural network algorithm of the lowest energy consumption under Given task time conditions by adjusting modeling parameters.
As a kind of implementation of the invention optional, the energy consumption formulas is E=V*T*e, V be need to read and write/ The data volume of calculating, T are to need repetitive read-write/calculating number, and e is unit energy consumption.
It needs read-write/calculating data volume V and repetitive read-write/calculating number T is needed to can use network parameter and hard Part parameter is calculated;And for specific energy consumption e, the memory access energy consumption of different storage tiers is different, it is therefore desirable to successively be divided Analysis.
The memory access energy consumption as shown in table 2 of different storage tiers
The memory access of the different storage tiers of table 2 normalizes energy consumption
If data are transmitted between two kinds of storing mechanisms, select energy consumption higher as this data transmission energy consumption. For example, energy consumption is determined as 200 energy consumption units when data are passed on piece caching SRAM from DRAM.
As optional a kind of implementation of the invention, the specific energy consumption includes reading and writing energy consumption and calculating energy consumption.
As optional a kind of implementation of the invention, the building neural network energy consumption model includes: to neural network Energy consumption is successively modeled, and convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption model, pond layer energy consumption are obtained Model.
As optional a kind of implementation of the invention, the specific energy consumption E0 calculation method includes:
Convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption model, pond layer energy consumption model are constructed respectively;
Calculate separately the convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption model, pond layer energy consumption The energy consumption E0 of the corresponding unit read-write/calculating of model.
As optional a kind of implementation of the invention, the specific energy consumption includes reading and writing energy consumption and calculating energy consumption.
Energy consumption calculation is as follows in convolutional layer energy consumption model:
For sample complete for one, a convolutional layer, energy consumption has been calculated are as follows:
E1=E1IO+E1operation (1)
E1IOEnergy consumption, E1 are read and write for convolutional layeroperationTo calculate energy consumption.
Read and write energy consumption E1IO=E1weightIO+E1inputIO+E1outputIO (2)
E1weightIOFor weight energy consumption, E1inputIOTo input energy consumption, E1outputIOTo export energy consumption.
Calculate separately weight energy consumption E1weightIO, input energy consumption E1inputIO, export energy consumption E1outputIOIt is as follows:
(1) weight energy consumption E1weightIO
E1weightIO=E1DRAM_buffer+E1buffer_PE (3)
E1DRAM_bufferFor the energy consumption for reading caching from DRAM, E1buffer_PEFor from the energy consumption for being cached to PE array.
NwFor the weight total amount of unit convolutional layer, RDDRAM_CONVThe wheel number of weight is written to caching for DRAM.
The weight total amount N of unit convolutional layerw=γ Chout_iKi 2Chin_i(5), DRAM reads 1 byte data to caching Energy consumption eDRAM_bufferFor example, 200 energy consumption units.
Weight and the number for repeating importing are related with the size that on piece caches.If on piece caches the institute that can store this layer There is weighted data, then each data need to only import once, subsequent memory access pertains only to the interaction of caching with PE array;Conversely, When PE needs to reuse weighted data, needs weighted data to be used to be not stored in caching, need to carry out duplicate Access.In a PE array computation period, in available two dimension output figureA output point, therefore input data needs PointSecondary to imported into PE array, i.e., in the case where buffer memory capacity is inadequate, same weight repeats total time imported Number isTherefore,
Number due to needing repetitive read-write isCache the energy consumption that 1 byte data is read to PE array ebuffer_PEFor example, 6 energy consumption units, therefore,
(2) energy consumption E1 is inputtedinputIO
Input data needs to reach PE array from DRAM by caching, register, and needs to complete transmission in PE array. Therefore, E1inputIO=E1DRAM_buffer+E1buffer_register+E1register_PE+E1PE_tran (8)。
E1DRAM_bufferFor the energy consumption from DRAM to caching, E1buffer_registerFor from the energy consumption for being cached to register, E1register_PEFor from register to the energy consumption of PE array, E1PE_tranThe number of first PE of PE left side array is input to for each The energy consumption transmitted according to dextrad.
It should be noted that the total amount of input may be different from the points of input feature vector figure due to the presence of filling.If It is filled with sky, then is not filled, original input size constancy.If being filled with identical size, even if input and output figure size is not Become, original input size can also change, and the length and width after variation are used in subsequent calculating.
Wherein, Hin_iWin_iChin_iα is the input data total amount of unit convolutional layer, SCONvdatabufferFor size of data.By In Thinker, preferentially by input point, all multiplexing is finished, therefore input point is imported into buffer without repeating.
Register file (Register file) makes an addition between caching and PE array, can be to avoid the weight of input data Multiple input.Expression allowsLine number is inputted required for row PE is parallel, S is lateral step-length, longitudinal step-length For Hui+Si-Ki.During sliding laterally, data only read primary, therefore longitudinally need the number slided to beebuffer_registerEnergy consumption for buffer to register unit byte transmission is 6.
Number, Ch are imported for the repetition of input datain_iKi 2*Hout_iWout_iTo need the input data inputted Product of the sum for the number of the points input point corresponding with each output point of output, eregister_PEFor register to PE array list The energy consumption of bit byte transmission is 2.
Since input data is multiplexed, each is input to the data of first PE of PE left side array, will carry out dextrad biography It passs.So the repetition memory access number of each input point isRead the energy consumption e an of byte dataPE_tran For example, 2.
(3) energy consumption E1 is exportedoutputIO
The total amount of output data is β Chout_iHout_iWout_i.For output, output needs first to carry out the biography between PE It passs, then in incoming caching, the data being unable to store in caching imported into DRAM to go to wait again and calculate next time.
Therefore, E1outputIO=E1PEout_tran+E1PE_buffer+E1buffer_DRAM (13)
Each PE calculates the output point generated and requires first to be transmitted in the PE of the leftmost side.A line PE is considered, in a week Number can be generated in the calculating of phase isOutput point, these point require to be transferred in the PE of the leftmost side, visited altogether It depositsIt is secondary.Therefore, average each output point needs memory accessIt is secondary.
Output data total amount is β Chout_iHout_iWout_i, the transmission energy consumption of unit byte is ePEout_tran, for example, 2 Energy consumption unit.
Calculate energy consumption
Ch need to be exported altogether in one layer of CONV is calculatedin_iHin_iWin_iA, each point needs to carry out Ki 2Chin_iIt is secondary multiply-add Operation.
In some embodiments, for specified sample, it is all for completing energy consumed by all convolution layer operations The summation of convolutional layer consumption energy.
In some embodiments, AP calculates stream and TM is calculated in stream, and each sample is all serial computing, therefore multiple samples When energy consumption the energy consumption of different samples need to be only added.
Energy consumption calculation is as follows in full articulamentum energy consumption model:
E2=E2IO+E2operation (18)
Full articulamentum is multisample in Thinker chip, successively calculates.Calculate the read-write of a batch FC operation Energy consumption, formula are as follows:
E2IO=E2weightIO+E2inputIO+E2outputIO (19)
(1)E2weightIOCalculation method include:
FC layers of weight read-write energy consumption is divided into two parts:
E2weightIO=E2DRAM_buffer+E2buffer_PE (20)
Similarly with convolutional layer, it can obtain:
Wherein, BS is the size criticized.
(2)E2inputIOCalculation method include:
Different from CONV layers, FC layers of input is not required to the register by inputting design reuse for convolution, therefore E2inputIOIt is made of three parts:
E2inputIO==E2DRAM_buffer+E2buffer_PE+E2PE_tran (24)
Wherein, ePEinput_tranEnergy consumption, for example, 2 energy consumption units are transmitted for unit.
(3)E2outputIOCalculation method include:
E2outputIO=E2PE_out_tran+E2PE_buffer+E2buffer_DRAM (29)
Wherein:
E2PE_buffer=FCi+1BS*β*ePE_buffer (31)
E2buffer_DRAM=ReLU (FCi+1BS-SFCdatabuffer)*β*ebuffer_DRAM (32)
Wherein ePEoutput_tran、ePE_bufferAnd ebuffer_DRAMFor in each accumulation layer unit transmission energy consumption, respectively 2,6, 200 energy consumption units.
When in network there are it is FC layers multiple when, it is the sum of all FC layers of read-write energy consumptions that the FC of batch, which calculates total energy consumption,.
Calculate energy consumption E2operationCalculation method include:
In calculating process, FC is sharedi+1BS output point, the multiply-add number of operations that each output point needs carry out are FCi.Energy consumption needed for carrying out unit byte multiply-add operation is eoperation, then the energy consumption of the part are as follows: E2operation=FCi+ 1BS*FCi*αγeoperation (33)。
Energy consumption calculation is as follows in RNN layers of energy consumption model:
When in network there are it is RNN layers multiple when, it is the sum of every layer of calculating energy consumption that the RNN of batch, which calculates total energy consumption,.
Using the series model of as LSTM structure the most universal in RNN, following RNN energy consumption analysis is mainly for LSTM Structure is analyzed.
In RNN, the calculation process of a LSTM unit includes:
fi=σ (wxfxt+whfht-1+bf)
it=σ (wxixt+whiht-1+bi)
ot=σ (wxoxt+whoht-1+bo)
Thinker chip calculates RNN layers using two kinds of PE of common PE and super PE.Wherein, Wx_gatext_gate+Wh_ gateht-1_gate+bgatePartial calculating is the principle of FC layers of calculating, is calculated in common PE.After the completion of calculating, data It imports the calculating that super PE carries out RNN-gating: calculating sigmoid or tanh function, obtain all kinds of gate function vectors, and pass through It crosses and multiply-add c is calculatedtAnd ht
List RNN one, layer batch of energy consumption is calculated first.In view of RNN is split as two parts FC and RNN-gating, meter Calculate the energy consumption that a batch RNN operation the number of iterations is Iteration, the expression formula of RNN layers of energy consumption are as follows:
(1) calculating of RNN-FC energy consumption includes:
This two-part energy consumption calculation and FC layers of energy consumption calculation principle are essentially identical.The difference of itself and FC layers of energy consumption calculation Place is that the dimension needs of weight matrix, input vector and output vector are adjusted.For RNN, FC layers of calculating Unified Form are as follows:
FCt=Wx_gatext_gate+Wh_gateht-1_gate+bgate (35)
Wherein, gate can take i/f/o/g, respectively correspond four doors in LSTM.In the concrete realization, Wx_gateWith Wh_gateA W can be integrated by lateral connectiongate, xt_gateAnd ht-1_gateLongitudinally connected it can be integrated into xgate, therefore Wgate Dimension is Olen_i*(Ilen-i+Olen_i), xgateDimension is 1* (Ilen-i+Olen_i), the parameter FC with FC layersiAnd FCi+1It corresponds to Come, it may be assumed that
FCi=Ilen-i+Olen_i
FCi+1=Olen_i
It is analyzed so as to incorporate into FC layers of energy consumption model.It is worth noting that, for each LSTM unit, FC operation needs to be repeated four times, to the number answered the door.Details are not described herein.
(2) calculating of gating memory access energy consumption includes:
As described above, can be mainly divided into two kinds of calculating in gating calculating: (a) simple tanh/sigmoid letter Number calculates;(b) element calculates the multiplication of elementThereforeIt can further indicate that are as follows:
Calculating include:
Tanh/sigmoid is mainly used for calculating 4 gate functions and ct, that is, need to be O to vector lengthlen_i5 groups of data Operate by the tanh/sigmoid of element, the required sum operated is BS*5Olen_i.The calculating carries out in PE, number Buffer is imported from DRAM according to needs, then imports PE from buffer, is finally exported.It is arrived assuming that carrying out unit byte DRAM The energy consumption of buffer transmission is eDRAM_buffer, the energy consumption that unit byte is transmitted from buffer to PE is ebuffer_PE, unit byte from PE to buffer, the energy consumption from buffer to DRAM are respectively ebuffer_DRAMAnd ePE_buffer.Each data at most carry out once This operation, and interaction of the data between DRAM and buffer only can not accommodate all input point Shi Caifa in buffer size It is raw, therefore following formula can be summed up:
Calculating mainly include ctCalculating and htCalculate two parts.For htFor, it is only necessary to carry out once by The multiplication of element operates, therefore the reading of data pertains only to total 2Olen_iA input data, writing out for data are related to Olen_iA data. Calculating with tanh/sigmoid above is similarly:
For ctFor, also need to carry out the operation of one-accumulate in addition to multiplication.For an output data, need Multiplication operation twice is carried out, then two products are summed, therefore one of product needs to read into buffer and works as In, then import the register completion summation in PE.Similarly, the energy consumption of the part can be obtained are as follows:
The calculating of Gating calculating energy consumption
For gating, in fact it could happen that three kinds of operations be respectively: tanh/sigmoid, multiplication, addition.Assuming that three Kind operates corresponding unit byte operation energy consumption are as follows: etanh/sigmoid、emultiply、eaddValue is for example 1, can calculate three The number that kind operation needs to be implemented is respectively as follows: 5Olen_i、3Olen_i、Olen_i.For this purpose, can calculateAre as follows:
Energy consumption calculation is as follows in the layer energy consumption model of pond:
For pondization operation, since data-reusing is not present between different layers, different samples, it can examine first Consider the energy consumption of single layer, single sample.Equally, the energy consumption that pondization operates is divided into read-write energy consumption and calculates energy consumption discussion.
Establish the energy consumption model of pondization operation.If necessary to calculate multilayer, multisample as a result, need to only carry out summation behaviour Make.
E4=E4IO+E4operation (41)
Pond layer reads and writes energy consumption E4IOCalculating
Thinker supports maximum pondization operation, and it acts as the height and width that reduce output figure, and maintain Port number is constant.Its input figure and output figure have dimension height relationships:Wide association:Then total pond block number mesh are as follows:
X=Hout_i*Wout_i*Chin_i (42)
The reading and writing data energy consumption of pondization operation can be divided into input data and output data two types, and each type needs It completes from the read-write of DRAM, caching, PE array interaction.Since data are imported without repeating, the energy consumption model of this part is more simple It is single, it may be assumed that
Wherein:
For the reading energy consumption of input data,It is that output data writes out energy Consumption.
The calculating for calculating energy consumption involves a need to the total data number of input.It is worth noting that, the total data number of input is same It not equal to the size of input feature vector figure, and needs from the size of output figure and the size of core is counter is pushed away, it may be assumed that E4operation =pi*pi*Hout_i*Wout_i*Chout_i*eoperation (44)。
As optional a kind of implementation of the invention, the neural network time model includes time overhead, when described Between expense include convolutional layer time overhead and full articulamentum time overhead.
As optional a kind of implementation of the invention, the time overhead calculation method is time overhead Tz=max (TIO,Toperation), TIOFor access time, ToperationTo calculate the time.
As a kind of implementation of the invention optional, the neural network time model include: convolutional layer time model, Full articulamentum time model.
For Thinker chip, the expense of time depends primarily on the biggish convolution operation of calculation amount, data volume and Quan Lian Connect operation (including the full attended operation in full articulamentum and RNN).Due to " blocking " effect of time, time-consuming less RNN- Gating and pondization operation are without carrying out model construction.In the following, carrying out the time in terms of convolutional layer and full articulamentum two Modeling analysis.
As optional a kind of implementation of the invention, wrapped in the convolutional layer time model and full articulamentum time model Time overhead is included, the time overhead calculation method is time overhead Tz=max (TIO,Toperation), TIOFor access time, ToperationTo calculate the time.
Convolutional layer time model includes:
The calculating of access time includes:
Due to the process that Thinker is calculated, convolutional layer is that single layer, single sample meter calculate one by one, therefore analysis one first Time needed for one layer of convolution algorithm in a sample.
The read-write of data is needed by several levels such as DRAM, caching, PE.After being cached in data importing tablet, data Transmission time quickly, therefore this part can be ignored.So TIOData will be reduced in the interaction of DRAM and caching The time of middle consumption.
In Thinker chip, input, output data and weight are imported into two different cachings from DRAM , this two-part bandwidth is different, can adaptively be made a change according to the framework of neural network.Input, output number According to shared bandwidth BWdataconv, the bandwidth of weighted data is BWweightconv.In a layer network, input, the total amount of output data For Hin_iWin_iChin_iα+Hout_iWout_iChout_iβ, weighted data total amount areIt therefore can be with Calculate separately out the access time of input, the access time of output data and weighted data.It is one of larger in the two to be For TIO, therefore:
Calculate time ToperationCalculating include:
For one layer of convolutional network of a sample, its calculating needs PE array to carry out the operations of several wheels.In a wheel In calculating, all output points of this layer can be had been calculated in several output points of parallel computation by PE array after several wheels.Assuming that complete It is round that the wheel number needed is calculated at a layer networkconvlayer, the time that each round needs is troundconv, then calculating the time It can indicate are as follows:
Toperation=roundconvlayer*troundconv (46)。
Since the every row of Thinker is multiplexed input data, Hout_iWout_iIt is to amount to the total line number for needing to input, and every wheel can It calculatesCapable data, therefore need in totalInput all could have been calculated for wheel;And for one group of input Data, weight similarly needs to inputIt is secondary.
It is contained in each output pointSecondary multiply-add operation, therefore the time for carrying out a wheel calculating consuming is this Time required for the multiply-add operation that a little sequences carry out:
Therefore, the time of one convolutional layer of calculating is in a sample are as follows:
It the operation time of all convolutional layers in one sample, only need to be by layer-by-layer temporal summation;If calculating a batch In all samples convolutional layer operation time, only need on the basis of a sample total time multiplied by batch size BS.
Articulamentum time model includes: entirely
Since the different samples of a batch in FC layers are parallel computations, the one of a batch data is directly analyzed Layer FC operation time.
The calculating of access time only considers interaction of the data between DRAM and caching.Assuming that input, output data divide The bandwidth matched is BWdata_FC, the bandwidth that weight is distributed is BWweight_FC, and inputoutput data total degree is (FCiα* RDDRAM_in_FC+FCi+1β) * BS, it is FC that weight, which always inputs number,iFCi+1γ*RDDRAM_weight_FC, access time formula:
The calculating for calculating the time includes: roundFClayerAnd tround_FC:
Wherein:
It is understood that all FC layers of operation times in a batch, it only need to be by layer-by-layer temporal summation.
The hardware parameter of model is also variable, therefore can be by adjusting same net under test of heuristics different hardware parameter The operation time of network and energy consumption, by adjusting various hardware parameters (such as bandwidth, cache size, the size of PE array and channel Number), the possible available optimal solution under hardware area limited case minimizes time and energy from the angle of hardware design Consumption, while time, energy consumption are successively predicted analysis time, energy consumption expense leading modeling parameters, therefore can be to hardware Design plays certain help.
It is described to the neural network energy consumption model and neural network time as optional a kind of implementation of the invention Model carries out biobjective scheduling, comprising:
Caching segmentation is carried out to the neural network time model using TM calculation process;
Array segmentation is carried out to the neural network energy consumption model using AP calculation process;
It chooses the caching segmentation and meets the requirements the small corresponding the smallest array dividing method of energy consumption of condition.
Thinker chip supports two kinds of calculation process of AP and TM.The full name of AP is Array-Partitioning, i.e. array Segmentation;The full name of TM is Time-Multiplexing, that is, is time-multiplexed.The difference of the two is, same network is pushed away Disconnected process, computation sequence are different.For TM calculation process, network deduction executes one by one according to the sequence of layer, and the same time is all PE array all calculates same layer.But AP is calculated and is flowed, the division of array has carried out some adjustment, therefore in same time PE Array may calculate multilayer simultaneously.A plurality of types of layers can be calculated simultaneously since AP calculates stream, this substantially balances convolution fortune The feature that calculation amount is high, the small and full attended operation calculation amount of data volume is small, data volume is big is calculated, so as to shorten the time.But Due to TM calculation process data-reusing often, this cause Data duplication transmit number it is few, so that energy consumption is smaller.
As optional a kind of implementation of the invention, it is small described right that the selection caching segmentation meets the requirements condition The smallest array dividing method should be able to be consumed, comprising:
Set task time Tmax;
It calculates and uses T the time required to TM calculation process caching segmentation neural network time modelTM
It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption modelAP
Choose the smallest required energy consumption EAP0And and calculate the smallest required energy consumption EAP0Required time TAP0
Compare Tmax and TAP0Size, according to comparison result judge set task time whether meet the requirements;
If setting task time meets the requirements, compare Tmax and TTMSize, according to comparison result export to the nerve Network energy consumption model and neural network time model carry out biobjective scheduling result.
It is described to be judged to set whether task time meets according to comparison result as optional a kind of implementation of the invention It is required that, comprising:
If Tmax > TAP0, determine that setting task time meets the requirements;
Otherwise determine that task can not be completed within a specified time, and task time need to be reset.
It is described to be exported according to comparison result to the neural network energy consumption mould as optional a kind of implementation of the invention Type and neural network time model carry out biobjective scheduling result, comprising:
If Tmax < TTM, modeling parameters are adjusted, array segmentation is carried out to the neural network energy consumption model again;
It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption modelAP’, and choose least energy consumption EAP0’Corresponding modeling parameters, array dividing method are biobjective scheduling result;
Otherwise, choosing the corresponding modeling parameters of TM calculation process and caching dividing method is biobjective scheduling result.
During actually executing, in order to achieve the purpose that be performed simultaneously different layers, not only adjustment will be adjusted executable PE number, also to adjust the distribution condition of caching.Since the total cache size and array size of Thinker chip are given, and It only carries out longitudinally cutting being to compare reasonable manner, it is therefore desirable to which the parameter of searching loop includes distributing to the array columns of CONVData buffer storage space size SCONvdatabufferWith weight spatial cache size Sconvweightbuffer
It is known that the time overhead of network query function has " hiding effect " in modeling process." hiding effect " refers to certain One layer of time, which calculates, can be split as calculating time and data transmission period, and should be the maximum in the two the total time of this layer Value.And data transmission period takes the larger value in data transmission period and weight transmission time.For further search time Expense needs to study the leading factor of every layer of time loss.For this purpose, by taking LRCN typical case's neural network as an example, the typical nerve of LRCN Network operates FC, and two kinds of calculation process of TM and AP0 are all that the time overhead of weight read-write is big, occupies an leading position;And for CONV operation is basically to occupy an leading position the calculating time.Full articulamentum calculation amount is small and data volume is big.
Reading and writing data time of TM calculation process, weight access time and to calculate time difference very big.The calculating of convolutional layer Time overhead is far longer than reading and writing data time, weight read-write;In this case, when PE array is busy with the one section very long of calculating In time, data transmission is but in idle state;And when carrying out FC calculating, PE array but goes out because weight read-write time-consuming is too long Show idle.And in the time layer-by-layer expense of AP0, reading and writing data time, weight access time and calculating time have carried out preferably Balance.Therefore, total time can be effectively reduced by the way that the time of the especially small convolution algorithm of original accounting is substantially turned up.FC layers Time overhead read and write by weight it is leading, if to shorten the FC time certainly will need to increase the caching ratio for distributing to FC weight or Person reduces total time by increasing caching total amount or increasing the method for data transfer bandwidth.
The hardware parameter of neural network energy consumption model and neural network time model is also variable, therefore can pass through tune The operation time of consolidated network and energy consumption under whole test of heuristics different hardware parameter can play certain side to the design of hardware It helps.By adjusting various hardware parameters (such as bandwidth, cache size, the size of PE array and port number), may it is available Optimal solution under hardware area limited case minimizes time and energy consumption from the angle of hardware design.
Therefore caching segmentation is carried out to the neural network time model using TM calculation process;Use AP calculation process pair The neural network energy consumption model carries out array segmentation;It chooses the caching segmentation and meets the requirements the small corresponding energy consumption of condition most Small array dividing method can provide help for the design of later hardware, and the condition in the different task time may be implemented Under have different optimal solutions.
In the present embodiment, from the angle of the hardware calculation process of network is carried out to neural network the time and energy consumption models, Time, energy consumption are successively predicted, while the leading modeling parameters of analysis time, energy consumption expense, by improving modeling parameters, array point Segmentation method and caching dividing method carry out time and energy consumption biobjective scheduling to neural network to improve neural network model.Into One step, by successively modeling, different optimal solutions is had under conditions of the different task time.
It is understood that same or similar part can mutually refer in the various embodiments described above, in some embodiments Unspecified content may refer to the same or similar content in other embodiments.
It should be noted that term " first ", " second " etc. are used for description purposes only in the description of the present application, without It can be interpreted as indication or suggestion relative importance.In addition, in the description of the present application, unless otherwise indicated, the meaning of " multiple " Refer at least two.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the application includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be by the application Embodiment person of ordinary skill in the field understood.
It should be appreciated that each section of the application can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, can integrate in a processing module in each functional unit in each embodiment of the application It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is contained at least one embodiment or example of the application.In the present specification, schematic expression of the above terms are not Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any One or more embodiment or examples in can be combined in any suitable manner.
Although embodiments herein has been shown and described above, it is to be understood that above-described embodiment is example Property, it should not be understood as the limitation to the application, those skilled in the art within the scope of application can be to above-mentioned Embodiment is changed, modifies, replacement and variant.
It should be noted that the present invention is not limited to above-mentioned preferred forms, those skilled in the art are of the invention Other various forms of products can be all obtained under enlightenment, however, make any variation in its shape or structure, it is all have with The identical or similar technical solution of the application, is within the scope of the present invention.

Claims (10)

1. a kind of Neural network optimization characterized by comprising
Default modeling parameters, the modeling parameters include network parameter and hardware parameter;
Neural network energy consumption model is constructed based on the modeling parameters;
Neural network time model is constructed based on the modeling parameters;
Biobjective scheduling is carried out to the neural network energy consumption model and neural network time model.
2. Neural network optimization according to claim 1, which is characterized in that the energy consumption formulas is E=V* T*e, V are to need read-write/calculating data volume, and T is to need repetitive read-write/calculating number, and e is unit energy consumption.
3. Neural network optimization according to claim 2, which is characterized in that the specific energy consumption includes read-write energy consumption With calculating energy consumption.
4. Neural network optimization according to claim 1, which is characterized in that the building neural network energy consumption model Include: successively to be modeled to neural network energy consumption, obtains convolutional layer energy consumption model, full articulamentum energy consumption model, RNN layers of energy consumption Model, pond layer energy consumption model.
5. Neural network optimization according to claim 1, which is characterized in that the neural network time model includes Time overhead, the time overhead include convolutional layer time overhead and full articulamentum time overhead.
6. Neural network optimization according to claim 5, which is characterized in that when the time overhead calculation method is Between expense Tz=max (TIO,Toperation), TIOFor access time, ToperationTo calculate the time.
7. Neural network optimization according to claim 1, which is characterized in that described to the neural network energy consumption mould Type and neural network time model carry out biobjective scheduling, comprising:
Caching segmentation is carried out to the neural network time model using TM calculation process;
Array segmentation is carried out to the neural network energy consumption model using AP calculation process;
It chooses the caching segmentation and meets the requirements the small corresponding the smallest array dividing method of energy consumption of condition.
8. Neural network optimization according to claim 7, which is characterized in that described to choose the caching segmentation satisfaction The small corresponding the smallest array dividing method of energy consumption of requirement condition, comprising:
Set task time Tmax;
It calculates and uses T the time required to TM calculation process caching segmentation neural network time modelTM
It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption modelAP
Choose the smallest required energy consumption EAP0And and calculate the smallest required energy consumption EAP0Required time TAP0
Compare Tmax and TAP0Size, according to comparison result judge set task time whether meet the requirements;
If setting task time meets the requirements, compare Tmax and TTMSize, according to comparison result export to the neural network Energy consumption model and neural network time model carry out biobjective scheduling result.
9. Neural network optimization according to claim 8, which is characterized in that described to judge to set according to comparison result Whether task time meets the requirements, comprising:
If Tmax > TAP0, determine that setting task time meets the requirements;
Otherwise determine that task can not be completed within a specified time, and task time need to be reset.
10. Neural network optimization according to claim 8, which is characterized in that described according to comparison result output pair The neural network energy consumption model and neural network time model carry out biobjective scheduling result, comprising:
If Tmax < TTM, modeling parameters are adjusted, array segmentation is carried out to the neural network energy consumption model again;
It calculates using energy consumption E needed for AP calculation process array segmentation neural network energy consumption modelAP', and choose least energy consumption EAP0’ Corresponding modeling parameters, array dividing method are biobjective scheduling result;
Otherwise, choosing the corresponding modeling parameters of TM calculation process and caching dividing method is biobjective scheduling result.
CN201811344189.0A 2018-11-13 2018-11-13 Neural network optimization method Withdrawn - After Issue CN109472361B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811344189.0A CN109472361B (en) 2018-11-13 2018-11-13 Neural network optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811344189.0A CN109472361B (en) 2018-11-13 2018-11-13 Neural network optimization method

Publications (2)

Publication Number Publication Date
CN109472361A true CN109472361A (en) 2019-03-15
CN109472361B CN109472361B (en) 2020-08-28

Family

ID=65671820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811344189.0A Withdrawn - After Issue CN109472361B (en) 2018-11-13 2018-11-13 Neural network optimization method

Country Status (1)

Country Link
CN (1) CN109472361B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738318A (en) * 2019-09-11 2020-01-31 北京百度网讯科技有限公司 Method, system and device for evaluating network structure running time and generating evaluation model
CN110929860A (en) * 2019-11-07 2020-03-27 深圳云天励飞技术有限公司 Convolution acceleration operation method and device, storage medium and terminal equipment
CN111753950A (en) * 2020-01-19 2020-10-09 杭州海康威视数字技术股份有限公司 Method, device and equipment for determining forward time consumption
CN112085195A (en) * 2020-09-04 2020-12-15 西北工业大学 X-ADMM-based deep learning model environment self-adaption method
CN112468533A (en) * 2020-10-20 2021-03-09 安徽网萌科技发展股份有限公司 Agricultural product planting-oriented edge learning model online segmentation method and system
CN113377546A (en) * 2021-07-12 2021-09-10 中科弘云科技(北京)有限公司 Communication avoidance method, apparatus, electronic device, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102773981A (en) * 2012-07-16 2012-11-14 南京航空航天大学 Implementation method of energy-saving and optimizing system of injection molding machine
CN105302973A (en) * 2015-11-06 2016-02-03 重庆科技学院 MOEA/D algorithm based aluminum electrolysis production optimization method
CN106427589A (en) * 2016-10-17 2017-02-22 江苏大学 Electric car driving range estimation method based on prediction of working condition and fuzzy energy consumption
US20180046894A1 (en) * 2016-08-12 2018-02-15 DeePhi Technology Co., Ltd. Method for optimizing an artificial neural network (ann)

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102773981A (en) * 2012-07-16 2012-11-14 南京航空航天大学 Implementation method of energy-saving and optimizing system of injection molding machine
CN105302973A (en) * 2015-11-06 2016-02-03 重庆科技学院 MOEA/D algorithm based aluminum electrolysis production optimization method
US20180046894A1 (en) * 2016-08-12 2018-02-15 DeePhi Technology Co., Ltd. Method for optimizing an artificial neural network (ann)
CN106427589A (en) * 2016-10-17 2017-02-22 江苏大学 Electric car driving range estimation method based on prediction of working condition and fuzzy energy consumption

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738318A (en) * 2019-09-11 2020-01-31 北京百度网讯科技有限公司 Method, system and device for evaluating network structure running time and generating evaluation model
CN110929860A (en) * 2019-11-07 2020-03-27 深圳云天励飞技术有限公司 Convolution acceleration operation method and device, storage medium and terminal equipment
WO2021088688A1 (en) * 2019-11-07 2021-05-14 深圳云天励飞技术股份有限公司 Convolution acceleration operation method and apparatus, storage medium and terminal device
US11748595B2 (en) 2019-11-07 2023-09-05 Shenzhen Intellifusion Technologies Co., Ltd. Convolution acceleration operation method and apparatus, storage medium and terminal device
CN111753950A (en) * 2020-01-19 2020-10-09 杭州海康威视数字技术股份有限公司 Method, device and equipment for determining forward time consumption
CN111753950B (en) * 2020-01-19 2024-02-27 杭州海康威视数字技术股份有限公司 Forward time consumption determination method, device and equipment
CN112085195A (en) * 2020-09-04 2020-12-15 西北工业大学 X-ADMM-based deep learning model environment self-adaption method
CN112468533A (en) * 2020-10-20 2021-03-09 安徽网萌科技发展股份有限公司 Agricultural product planting-oriented edge learning model online segmentation method and system
CN112468533B (en) * 2020-10-20 2023-01-10 安徽网萌科技发展股份有限公司 Agricultural product planting-oriented edge learning model online segmentation method and system
CN113377546A (en) * 2021-07-12 2021-09-10 中科弘云科技(北京)有限公司 Communication avoidance method, apparatus, electronic device, and storage medium

Also Published As

Publication number Publication date
CN109472361B (en) 2020-08-28

Similar Documents

Publication Publication Date Title
CN109472361A (en) Neural network optimization
KR101959376B1 (en) Systems and methods for a multi-core optimized recurrent neural network
CN110689115B (en) Neural network model processing method and device, computer equipment and storage medium
Ma et al. Performance modeling for CNN inference accelerators on FPGA
CN110020716A (en) Neural network hardware
CN111667051A (en) Neural network accelerator suitable for edge equipment and neural network acceleration calculation method
CN107657581A (en) Convolutional neural network CNN hardware accelerator and acceleration method
US11049013B1 (en) Encoding of weight values stored on neural network inference circuit
Ramanathan et al. Look-up table based energy efficient processing in cache support for neural network acceleration
US11210586B1 (en) Weight value decoder of neural network inference circuit
EP2122542A1 (en) Architecture, system and method for artificial neural network implementation
WO2021089009A1 (en) Data stream reconstruction method and reconstructable data stream processor
KR20180123846A (en) Logical-3d array reconfigurable accelerator for convolutional neural networks
CN113037800B (en) Job scheduling method and job scheduling device
US11568227B1 (en) Neural network inference circuit read controller with multiple operational modes
JP7332722B2 (en) Data processing method, device, storage medium and electronic equipment
CN108520297A (en) Programmable deep neural network processor
Yan et al. FPGAN: an FPGA accelerator for graph attention networks with software and hardware co-optimization
Andri et al. Going further with winograd convolutions: Tap-wise quantization for efficient inference on 4x4 tiles
EP4057142A1 (en) Job scheduling method and job scheduling apparatus
WO2024119862A1 (en) Neural network acceleration system
CN117035045A (en) Model parameter updating method, device, equipment, storage medium and program product
Wang Efficient, Reconfigurable, and QoS-Aware Systems for Deep Neural Networks
CN113377546B (en) Communication avoidance method, apparatus, electronic device, and storage medium
US20230315608A1 (en) Method for measuring performance of neural processing device and device for measuring performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
AV01 Patent right actively abandoned

Granted publication date: 20200828

Effective date of abandoning: 20210125

AV01 Patent right actively abandoned

Granted publication date: 20200828

Effective date of abandoning: 20210125

AV01 Patent right actively abandoned
AV01 Patent right actively abandoned