CN110515739A

CN110515739A - Deep learning neural network model load calculating method, device, equipment and medium

Info

Publication number: CN110515739A
Application number: CN201911008660.3A
Authority: CN
Inventors: 黎兴民
Original assignee: Shanghai Suiyuan Intelligent Technology Co Ltd
Current assignee: Shanghai Suiyuan Intelligent Technology Co Ltd
Priority date: 2019-10-23
Filing date: 2019-10-23
Publication date: 2019-11-29
Anticipated expiration: 2039-10-23
Also published as: CN110515739B

Abstract

The embodiment of the invention discloses a kind of deep learning neural network model load calculating method, device, equipment and media.The described method includes: parsing to the network model constructed in advance, the calculation process of the network model is decomposed at least two calculating tasks；Each calculating task is divided, at least one is formed and calculates subtask；Being respectively that the calculating task is associated according to each resource allocation policy all calculates subtask distribution resource, obtains distribution data acquisition system of the calculating task under each resource allocation policy；Distribution data acquisition system of each calculating task under each resource allocation policy is counted, the load matrix of the network model is formed；According to the performance parameter set of chip to be assessed, each calculating subtask runing time is calculated, determines performance matrix.The performance simulation speed of the chip of operation learning network model can be improved in the embodiment of the present invention, to be quickly found out the architecture design defect of chip.

Description

Deep learning neural network model load calculating method, device, equipment and medium

Technical field

The present embodiments relate to data processing fields more particularly to a kind of load of deep learning neural network model to calculate Method, apparatus, equipment and medium.

Background technique

More stringent requirements are proposed for the computing capability of computer for the high speed development of AI industry, major semiconductor Manufacturer is also intended to accelerate the trained special chip with reasoning process of deep learning in positive research and development and release.

The research and development and manufacture of chip are a more very long processes, it is generally the case that the reasonability of chip architecture designs Verifying and the assessment of calculated performance just can be carried out after needing to produce and obtain until small lot print, to can greatly increase The iteration cycle of long products research and development or even indefinite duration delay time to market (TTM), this is for each semiconductor manufacturer It is unacceptable.

Existing solution is to be emulated using private server to chip architecture, and provide a whole set of by service side Matched software and hardware solution carries out the performance verification of chip based on this, but this scheme involves great expense, and service of simulation software The speed of service is slower, and usual one simple test sample needs to run as long as a few hours；Further for support parallel computation The verifying for accelerating chip architecture, the cutting of calculating task and will lead to chip to the scheduling strategy difference of on piece hardware resource Operation load is different, to bring different performances, the trial and exploration of these strategies facilitate at the beginning of chip designs Find the defect on framework.The strategy that can usually use has thousands of, this just proposes iteration speed to heuristic process Requirement, this can not complete this work by complicated huge dedicated performance service of simulation.

Summary of the invention

The embodiment of the invention provides a kind of deep learning neural network model load calculating method, device, equipment and Jie The performance simulation speed of the chip of operation learning network model can be improved in matter.

In a first aspect, the embodiment of the invention provides a kind of deep learning neural network model load calculating methods, comprising:

The network model constructed in advance is parsed, the calculation process of the network model is decomposed at least two calculating and is appointed Business；Wherein, there are dependences at least two calculating task；

Each calculating task is divided according at least one preconfigured resource allocation policy, forms at least one meter Operator task；

Being respectively that the calculating task is associated according to each resource allocation policy all calculates subtask distribution resource, obtains Distribution data acquisition system of the calculating task under each resource allocation policy；The resource includes computing resource and storage money Source；

Distribution data acquisition system of each calculating task under each resource allocation policy is counted, the network model is formed Load matrix；

According to the performance parameter set and the load matrix of chip to be assessed, calculate what the network model decomposed Each runing time for calculating subtask under each resource allocation policy, determines performance matrix, to evaluate the chip operation The performance of the network model.

Second aspect, the embodiment of the invention provides a kind of deep learning neural network models to load computing device, comprising:

Calculating task parsing module, for being parsed to the network model constructed in advance, by the calculating stream of the network model Journey is decomposed at least two calculating tasks；Wherein, there are dependences at least two calculating task；

Calculating task division module, for according at least one preconfigured resource allocation policy to each calculating task into Row divides, and forms at least one and calculates subtask；

Resource distribution module all calculates son for being respectively that the calculating task is associated according to each resource allocation policy Task distributes resource, obtains distribution data acquisition system of the calculating task under each resource allocation policy；The resource packet Include computing resource and storage resource；

Load matrix generation module, for counting distribution data set of each calculating task under each resource allocation policy It closes, forms the load matrix of the network model；

Performance matrix computing module calculates institute for the performance parameter set and the load matrix according to chip to be assessed The runing time of each calculating subtask that network model decomposes under each resource allocation policy is stated, determines performance square Battle array, to evaluate the performance that the chip runs the network model.

The third aspect the embodiment of the invention also provides a kind of computer equipment, including memory, processor and is stored in It is realized on memory and when processor described in the computer program that can run on a processor executes described program as the present invention is real Apply any deep learning neural network model load calculating method in example.

Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer Program realizes that the deep learning neural network model as described in any in the embodiment of the present invention is negative when the program is executed by processor Carry calculation method.

The embodiment of the present invention forms at least two and calculates and appoint by automatically parsing to deep learning neural network model Business, and according to preconfigured resource allocation policy, further calculating task is divided, is formed and calculates subtask, and pressed It is respectively to calculate subtask to carry out resource allocation according to different resource allocation policies, obtains the load under different resource allocation strategy Matrix, and the performance parameter set based on chip to be assessed calculate each fortune for calculating subtask under different resource allocation strategy The row time, so that it is determined that the performance matrix of chip operational network model is solved for evaluating the performance of chip operational network model The problem of economic cost of chip emulation operational network model is high in the prior art and low efficiency, can be improved operation and learns Practise the performance simulation speed of the chip of network model.

Detailed description of the invention

Fig. 1 is the flow chart of one of embodiment of the present invention one deep learning neural network model load calculating method；

Fig. 2 is the flow chart of one of embodiment of the present invention two deep learning neural network model load calculating method；

Fig. 3 is the structural schematic diagram of one of embodiment of the present invention three deep learning neural network model load computing device；

Fig. 4 is the structural schematic diagram of one of the embodiment of the present invention four computer equipment.

Specific embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.

Embodiment one

Fig. 1 is the flow chart of one of the embodiment of the present invention one deep learning neural network model load calculating method, this reality It applies example and is applicable to the case where emulating to the process of chip operational network model, this method can be mentioned by the embodiment of the present invention The deep learning neural network model of confession loads computing device to execute, and the mode which can be used software and/or hardware is real It is existing, and generally can be in integrated computer equipment, for example, terminal device or server etc..As shown in Figure 1, the method for the present embodiment It specifically includes:

S110 parses the network model constructed in advance, and the calculation process of the network model is decomposed at least two Calculating task；Wherein, there are dependences at least two calculating task.

Network model can refer to deep learning neural network model (Deep Learning Neural Networks).

The calculation process of network model is used to indicate multiple continuous calculating steps that network model needs to be implemented at runtime Suddenly.Wherein, calculation process can be converted to multiple continuous calculating steps.

Calculating task is for indicating some or certain several calculating steps.Each calculating task is all different.

There are dependence between two calculating tasks, show that the calculated result (or output) of a calculating task can be made For the input of another calculating task.Multiple calculating tasks there are dependence are combined according to order, as net The calculation process of network model, that is to say, that the calculation process of network model is calculating task sequence, is calculated in calculating task sequence Putting in order for task executes sequence for calculating task.

In fact, neural network model includes multi-layer data operation, each layer executes different data processing operations.Show Example property, data processing operation may include filling (Padding), deformation (Reshape), convolution (Convolution) and pond Change (Pooling) etc..

Network model can be constructed by programming interface predetermined.User can pass through programming predetermined Interface inputs the associated data of network model, to establish neural network model.Illustratively, the network model of foundation is, whereinIt indicates The a certain layer of neural network.

It is understood that network model is transferred to programming interface by user, network mould can be obtained by programming interface The structure of type, namely determine that each layer of structure in network model determines, and immobilize in subsequent processing.It is practical On, the calculation process of network model includes the data handling procedure that each layer is done, so as at the data of being done each layer Calculating task of the reason process as network model.As in the previous example, correspondingly, corresponding S layer network model structure Available S continuous calculating task sequence,, Wherein, fromIt arrivesIn, calculating taskInput be；Box represents calculating taskBetween exist Dependence, i.e., input of the output of previous item calculating task as latter calculating task, S calculating task sequences execute, To guarantee the correctness of calculated result.

In fact, calculating task be divided from the angle of the operation timing of network model namely calculating task be from The calculation process of network model is divided on time.

S120 divides each calculating task according at least one preconfigured resource allocation policy, is formed At least one calculates subtask.

Usual resource allocation policy, which is used to distribute, executes the resource that calculating task needs, and resource allocation policy can refer to money The mode of source distribution, wherein resource can specifically include computing resource and storage resource.Computing resource is appointed for executing to calculate Business.Storage resource executes the associated data of calculating task for storing.Subtask is calculated for forming calculating task, is to calculate to appoint Part in business calculates.

In fact, calculating task can also continue to divide, such as it is subdivided into multiple calculating subtasks.Each calculating subtask It is all different, and mutually indepedent, while all calculating subtask and forming complete calculating task.Wherein, division mode can be, By the calculation amount of calculating task, equalization is divided into n calculating subtask, wherein n be more than or equal to 1, specifically can according to need into Row setting, in this regard, the embodiment of the present invention is not particularly limited.Illustratively, calculating task is to carry out convolution to 10 characteristic patterns It calculates, calculating task can be divided into 10 calculating subtasks, each calculating subtask is to carry out convolution to 1 characteristic pattern It calculates；Or calculating task can also be divided into 5 calculating subtasks, each calculating subtask is to carry out to 2 characteristic patterns Convolutional calculation, wherein each characteristic pattern for calculating subtask progress convolutional calculation is different.

As in the previous example, calculating task is, the relationship of calculating task and calculating subtask are as follows:, Wherein,To calculate subtask, each calculating subtask subscript step having the same is represented, and each subtask that calculates is subordinate to jointly Belong to calculating task, subscript difference represents from calculating taskIn the Q item that splits out, curly bracket represents each calculating There is no dependence between subtask, is parallel existing.

The division mode to each calculating task can be determined according to resource allocation policy, namely true according to resource allocation policy It is fixed to can be divided into the quantity for calculating subtask to each calculating task, that is, determine the quantity of n.

Specifically, can be using the computing unit in computing resource as the foundation for calculating subtask division.Specifically, calculating Resource includes multiple computing units, and using the quantity of computing unit as n, calculating task is also divided into n calculating subtask, To which a computing unit executes a calculating subtask.

Further, it is also possible to according to the space size of storage resource as the foundation for calculating subtask division；Or it can root It is comprehensive to determine the division mode for calculating subtask according to computing resource and storage resource.It specifically can according to need and set, it is right This, the embodiment of the present invention is not specifically limited.

To sum up, calculating subtask is really, in the same time, calculating task to be divided into the unit assignment executed parallel. Namely calculating subtask is from spatially dividing to calculating task.In the execution time of calculating task, meter is executed parallel Multiple calculating subtasks that calculation task divides.

S130, which is respectively that the calculating task is associated according to each resource allocation policy, all calculates subtask distribution Resource obtains distribution data acquisition system of the calculating task under each resource allocation policy；The resource includes calculating money Source and storage resource.

Distribution data acquisition system is used to describe the resource allocation conditions of each calculating subtask of calculating task division, distributes data The mapping relations of each calculating subtask and resource that set record has calculating task to divide.

Resource is distributed to calculate subtask, really computing resource and storage resource is distributed to calculate subtask, is also Calculate subtask allocation processing device and memory space.

In general, the mode of the resource allocation of different resource allocation strategy is different, really by computing resource and/or deposits Resource numeralization is stored up, and is distributed to according to different quantity and calculates subtask.

Specifically, computing resource equalization can be divided into N number of computing unit namely computing resource collection is combined into,For a computing unit.It should be noted that computing resource typically refers to Processor on chip, a computing unit are integer processor, wherein a computing unit may include a processing Device, a computing unit may include two processors, even more, in this regard, the embodiment of the present invention is not specifically limited.It obtains Chip to be detected determines the number of processor, the maximum quantity as computing unit.That is, the quantity N of computing unit For integer, and N is less than or equal to the number of the processor of chip to be detected.

Storage resource is evenly divided into M equal portions simultaneously, obtaining set of memory resources is,For a storage unit.Storage resource is to be used on chip Memory space can be carried out impartial division by the memory space of operational network model distribution.It should be noted that each storage is single The size of the corresponding memory space of member is at least stored with execution enough and calculates associated total data during subtask.

In fact, a resource allocation policy specifies a calculating subtask that can distribute to obtain the quantity of computing unit, Yi Jicun The quantity of storage unit.It should be noted that each resource allocation policy should meet restrictive condition,, that is, guarantee whole calculating subtasks that calculating task divides Just divided all computing resource C and storage resource D, cannot be more points, it cannot divide less, and cannot have overlapping between each other.

Under a resource allocation policy, each calculating subtask that calculating task divides is assigned at least one Computing unit and at least one storage unit.Correspondingly, each calculating subtask is assigned to not under different resource allocation strategy Computing unit and different number of storage unit with number.By each calculating subtask under each resource allocation policy, each meter The number for the computing unit that operator task is assigned to and each number for calculating the storage unit that subtask is assigned to, as a sequence Column count the data of different resource allocation strategy as a result, and an available set, which is calculating task each Distribution data acquisition system under resource allocation policy.

S140, counts the distribution data acquisition system of each calculating task under each resource allocation policy, described in formation The load matrix of network model.

Load matrix is for describing resource allocation conditions of the network model under different resource allocation strategy.Load matrix note Record has the mapping relations of network model each calculation stages and resource allocation in the process of running.

According to preceding method, distribution data acquisition system of each calculating task respectively under each resource allocation policy, shape are obtained At the load matrix of network model.

S150 calculates the network model point according to the performance parameter set and the load matrix of chip to be assessed Runing time of each calculating subtask that solution obtains under each resource allocation policy, determines performance matrix, described in evaluation Chip runs the performance of the network model.

Performance parameter set is used to describe the performance of chip to be assessed.Performance parameter set record has the performance of chip to join Number, illustratively, performance parameter collection is combined into performance dictionary (PD).Runing time is used to describe to execute the consuming for calculating subtask Duration.Performance matrix is used to evaluate the runnability of chip to be assessed.Appointed according to the performance parameter of chip and each calculating The computing unit and storage unit that business distributes under different resource allocation strategy calculate each runing time for calculating subtask, Form performance matrix corresponding with load matrix.Thus, it is possible to which each calculating subtask that determining network model decomposes is not With the runing time under resource allocation policy, namely determine runing time of the network model under different resource allocation strategy, from And determine the performance of chip operational network model.It is understood that the runing time of network model is shorter, chip operational network The performance of model is higher.

Optionally, the performance parameter set and the load matrix according to chip to be assessed, calculates the network Runing time of each calculating subtask that model decomposition obtains under each resource allocation policy, determines performance matrix, comprising: According to the performance parameter set of chip to be assessed, calculates and respectively calculate subtask in the load matrix in each resource allocation plan The processing elapsed time and result data handling time of input data handling time, input data under slightly；By calculating Input data handling time of the task under the resource allocation policy, the processing elapsed time of input data and result data are removed Transport runing time of the summation of time as the calculating subtask under the resource allocation policy；According to each calculating Runing time of the task under each resource allocation policy, forms the performance matrix of the network model.

A usual calculating subtask, which executes, to be completed to need to undergo to read input data from memory space, to input data Data processing is carried out, and three processes in the result data write-in memory space that data processing is obtained.Thus correspondingly, The runing time of one calculating subtask includes the processing elapsed time and result data of input data handling time, input data Handling time.

Wherein, input data handling time be used for describe calculate subtask acquisition input data duration, can refer to by Input data is from the duration in the computing resource that the storage resource of distribution is transported to distribution.It can specifically be calculated based on following formula Input data handling time:

Wherein,It is the size for calculating the input data of subtask, it is single using byte Byte as statistics Position, input bandwidth is from storage resourceIt is transported to computing resourceWhen data transfer bandwidth, with Byte/s can be obtained as unit from performance parameter set.

The processing elapsed time of input data is used to describe to calculate subtask and carries out handling corresponding consumption to input data Duration can refer to computing resourceHandle the duration of input data.Specifically input data can be calculated based on following formula Processing elapsed time:

Wherein, cost of data processing per byte refers to the time consumed by the every byte data of processing.Processing Time consumed by every byte data can pass through calculating taskCharacteristic determine, can also be from performance parameter set It obtains.

Result data handling time is used to describe to calculate the duration of subtask output result data, can refer to number of results According to from the duration in the storage resource that the computing resource of distribution is transported to distribution.Following formula calculated result number can be specifically based on According to handling time:

Wherein,It is the size for calculating the result data that subtask obtains, using byte Byte as system Unit is counted, output bandwidth is from computing resourceCalculation result data is carried to storage resourceData pass Defeated bandwidth can be obtained using Byte/s as unit from performance parameter set.

When the input data handling time for calculating subtask, the processing elapsed time of input data and result data are carried Between summation as calculating runing time of the subtask under resource allocation policy.

Runing time of the different calculating subtasks under different resource allocation strategy is counted, by the longest calculating of runing time Runing time of the runing time of subtask as calculating task.It is understood that each calculating that calculating task is decomposed Subtask executes parallel, thus, the runing time of calculating task, which is equal to, calculates the longest duration of runing time in subtask. The runing time of calculating task is determined based on following formula:

Wherein,For the runing time of the calculating task of step step.

By calculating each input data handling time for calculating subtask under each resource allocation policy, input data Processing elapsed time and result data handling time, determine each operation of the calculating subtask under each resource allocation policy Time, accurately determine it is each calculate subtask runing time, thus, accurate statistics go out whole network model running consuming when Between.

Embodiment two

Fig. 2 is the flow chart of one of the embodiment of the present invention two deep learning neural network model load calculating method, this reality It applies example to be embodied based on above-described embodiment, the network model constructed in advance will be parsed, by the network mould The calculation process of type is decomposed at least two calculating tasks, is embodied as: parsing to the network model, determines the net The hierarchical structure of network model, the hierarchical structure of the network model include at least two layers；By the associated data processing behaviour of the layer It is used as a calculating task, forms at least two calculating tasks.

Specifically, the method for the present embodiment specifically includes:

S210 parses the network model constructed in advance, determines the hierarchical structure of the network model, the network model Hierarchical structure include at least two layers.

In fact, network model is the network model for having hierarchical structure, for example, neural network model.Hierarchical structure can To refer to, there are dependence between every layer, every layer includes that multiple layer units (such as neuron) executes a data processing operation. The hierarchical structure of network model can at least be decomposed into two and calculate including the calculation process of at least two layers namely network model appoints Business.

Network model, calculating task, dependence, resource allocation policy, calculating subtask, distribution number in the present embodiment The description of previous embodiment can be referred to according to set, performance parameter set and performance matrix.

S220 forms at least two calculating tasks using the associated data processing operation of the layer as a calculating task； Wherein, there are dependences at least two calculating task.

Can be using the layer in hierarchical structure as node, network model can be the network structure formed by multiple nodes. The associated data processing operation of layer as a calculating task, count as one by the data processing operation that an as node executes Calculation task, a node carry out the result that data processing obtains and are transferred to next node as input, so that next node Continue data processing operation.Correspondingly, input number of the calculated result of a calculating task as next calculating task According to.The calculation process of network model is mapped as multiple calculating tasks there are dependence as a result,.

S230 divides each calculating task according at least one preconfigured resource allocation policy, is formed At least one calculates subtask.

Optionally, described that each calculating task is drawn according at least one preconfigured resource allocation policy Point, it forms at least one and calculates subtask, comprising: according at least one preconfigured resource allocation policy, determine and calculate money The quantity allotted in source；Each calculating task is divided according to the quantity allotted, at least one is formed and calculates subtask； Wherein, the quantity for the calculating subtask that each calculating task divides is less than or equal to the quantity allotted.

Specifically, the quantity allotted of computing resource can refer to the number for the computing resource that chip can call.Usually meter The quantity allotted for calculating resource is more, and it is more to calculate the quantity that subtask divides.Quantity allotted calculates drawing for subtask for determining Divide method.Computing resource can be divided into multiple computing units, the quantity and distribution number of computing unit according to quantity allotted equalization It measures identical, so that quantity allotted is equal to the quantity of computing unit to be allocated in each resource allocation policy, is equally equal to each resource The maximum value of the quantity of allocated computing unit in allocation strategy.It is understood that the distribution data volume of computing resource can To describe the runnability of chip, thus, quantity allotted can be obtained from the performance parameter set of chip.

Each calculating subtask at least needs a computing unit to execute corresponding data processing.Therefore, calculating task is most Quantity allotted can be mostly divided into and calculate subtask.A usual calculating subtask is executed by a computing unit.Distribute number Amount can be greater than the quantity for calculating subtask, at this point, being not carried out there are at least one computing unit free time and calculating subtask.

Quantity allotted is determined by resource allocation policy, and determines the quantity for calculating subtask according to quantity allotted, is realized The calculating subtask of division can be adapted to computing resource, guarantee the correct execution for calculating subtask, thus, guarantee network model Correct simulation run.

Optionally, each calculating task is being divided according at least one preconfigured resource allocation policy, It is formed before at least one calculating subtask, further includes: receive an at least resource distribution table, and parse, obtain each money The syntagmatic of source allocation list corresponding computing resource and storage resource；By the corresponding computing resource of a resource distribution table with deposit The syntagmatic for storing up resource, as a kind of resource allocation policy.

Wherein, resource distribution table is used to determine the method for salary distribution of computing resource and storage resource.It is recorded in resource distribution table There is the syntagmatic of computing resource and storage resource, wherein syntagmatic is distributed to based on each calculating subtask by determination Calculate resource and storage resource, and for describing there are the computing unit of mapping relations and to deposit with the same calculating subtask respectively Relationship between storage unit.

In fact, being to calculate subtask for each to configure computing resource and storage resource in resource distribution table, distribute to There are syntagmatics for the same computing resource for calculating subtask and storage resource.

Illustratively, resource distribution table is as follows:

…

Correspondingly, the Q item of calculating task calculates a kind of resource allocation policy of the subtask in computing resource and storage resource are as follows:

Wherein,。

Resource distribution table can be tester's input, for specifying resource allocation policy to be tested.To receiving User input resource distribution table parse, obtain resource allocation policy.

It is (i.e. every to calculate subtask to all allocation strategies Process) carry out exhaustion, it is assumed that available K kind distribution method；It is possible thereby to obtain a description in whole network model The load matrix of chip loading condition when step is walked。

By receiving resource distribution table and parsing, determines to each resource allocation conditions for calculating subtask, realize flexible point With resource, increase the loading condition of chip, increase the test scope of chip, to improve the accuracy of the performance test of chip.

Optionally, the quantity allotted for the computing resource for including in each resource distribution table is equal to the chip performance ginseng The quantity of processor in manifold conjunction.Wherein, the quantity allotted of computing resource is the quantity of processor, i.e., includes meter in computing resource The quantity for calculating unit is the quantity of processor.By the way that the quantity allotted of computing resource to be equal to the quantity of processor in chip, It can be adapted to the processor situation of chip, computing resource is distributed, the reasonability of resource allocation is improved, to improve the performance of chip The accuracy of test.

Optionally, it is described according to each resource allocation policy be respectively the calculating task it is associated all calculate son appoint Business distribution resource, obtains distribution data acquisition system of the calculating task under each resource allocation policy, comprising: from each described Target resource allocation list plays each resource distribution table of traversal in resource distribution table, until whole resource distribution table traversals are completed； In the ergodic process of the resource distribution table, son is calculated from each target calculated in subtask of the calculating task and is appointed Each calculating subtask of traversal of being engaged in, for the current calculating subtask traversed, from corresponding group of the resource distribution table Selection target computing resource in conjunction relationship, and corresponding target storage resource is obtained, establish the current calculating subtask, target The corresponding relationship of computing resource and at least one target storage resource is completed until all calculating subtask traversal；Wherein, institute The corresponding computing resource in each calculating subtask for stating calculating task is different, each calculating subtask of the calculating task Corresponding storage resource is different；According to each corresponding computing resource in calculating subtask and corresponding storage resource, generate Distribution data acquisition system of the calculating task under the resource distribution table；After whole resource distribution tables traversal is completed, In After whole resource distribution table traversals are completed, distribution data set of the calculating task under each resource distribution table is obtained It closes.

Specifically, target resource allocation list is a resource distribution table in whole resource distribution tables, illustratively, target Resource distribution table is any one randomly selected resource distribution table, or whole resource distribution tables are numbered, according to Number order selects resource distribution table, for example, selecting to number the resource distribution table for 1.

Distribution data are used to describe the money of each calculating subtask that calculating task is decomposed under different resource allocation list Source distribution condition, i.e., for determining each calculating subtask corresponding computing resource and storage resource under different resource allocation list.

Resource distribution table is traversed from each resource distribution table, for obtaining the resource allocation conditions of each resource distribution table.

When traversing Current resource allocation list, the syntagmatic of a computing resource and storage resource is selected, that is, is determined Target computational resource and matched storage resource.Target computational resource and matched storage resource are distributed into any one calculating Subtask.Wherein, target computational resource includes at least one computing unit, and storage resource includes at least one storage unit.Time Go through the whole of current calculating task and calculate subtasks, and establish each calculating subtask respectively with computing resource and storage resource Corresponding relationship, hereby it is achieved that distributing computing resource and storage money for each calculating subtask that current calculating task is decomposed Source reduces and omits situation.

It repeats the above steps, until whole resource distribution tables traverse completion.Thus, it is possible to obtain each calculating subtask Data are distributed, so that distribution data acquisition system of the calculating task under different resource allocation list is formed, with each calculating subtask of determination Resource allocation conditions under different resource allocation list.

Illustratively, distribution data acquisition system of the calculating task of step step under a resource distribution table is as follows:

By traversing resource distribution table, and when traversing Current resource allocation list, the whole that calculating task obtains is established respectively Corresponding relationship of the subtask respectively with the computing resource of Current resource allocation list configuration and storage resource is calculated, is realized based on current Resource distribution table carries out resource allocation to subtask is calculated, and guarantees that each calculating subtask can be assigned to computing resource and storage money Source realizes flexible allocation resource, and improves the accuracy of resource allocation, to improve the accuracy of the performance test of chip.

S240, which is respectively that the calculating task is associated according to each resource allocation policy, all calculates subtask distribution Resource obtains distribution data acquisition system of the calculating task under each resource allocation policy；The resource includes calculating money Source and storage resource.

S250, counts the distribution data acquisition system of each calculating task under each resource allocation policy, described in formation The load matrix of network model.

As in the previous example, distribution data acquisition system of the calculating task of statistics step step under K resource distribution table, is loaded MatrixIt is as follows:

S260 calculates the network model and decomposes according to the performance parameter set and the load matrix of chip to be assessed Runing time of each calculating subtask arrived under each resource allocation policy, determines performance matrix, to evaluate the chip Run the performance of the network model.

Illustratively, performance parameter collection is combined into PD.

Based on following formula, distribution data acquisition system of the step step calculating task under k-th of resource distribution table is calculated Runing time:

In turn, it is based on following formula calculated performance matrix:

Wherein, each row element

With step identical, that is, represent decomposed from the calculation process of whole network model obtain out step (0, 1 ..., S-1) item calculating task, the operation duration on the different resource allocation policy of K kind respectively.

Each column elementEntire neural computing task is represented in k-th of resource allocation policyUnder calculating task item by itemRuning time, there are dependence between each calculating task, This column summation can be obtained the overall operation time of the network model under k-th of allocation strategy.

Optionally, the performance parameter set and the load matrix according to chip to be assessed, calculates the network Runing time of each calculating subtask that model decomposition obtains under each resource allocation policy, determines performance matrix, comprising: According to the performance parameter set of chip to be assessed, when calculating the input data carrying for respectively calculating subtask in the load matrix Between, the processing elapsed time and result data handling time of input data；When the input data for calculating subtask is carried Between, input data processing elapsed time and result data handling time summation as the calculating subtask in resource allocation Runing time under strategy；According to each runing time for calculating subtask under each resource allocation policy, institute is formed State the performance matrix of network model.

The embodiment of the present invention by network model carry out network structure parsing, determine the layer for including in hierarchical structure, and Using the associated data processing operation of each layer as calculating task, at least two calculating tasks are formed, are realized automatically by net The calculation process of network model is decomposed into calculating task, needs not rely on the emulation of specific hardware device supporting network model, drop The cost of low network model simulation run, while by being demarcated to obtain calculating task to network structure, network can be improved The accuracy rate of the decomposition of the calculation process of model.

Embodiment three

Fig. 3 is the schematic diagram that one of the embodiment of the present invention three deep learning neural network model loads computing device.Implement Example fourth is that realize the above embodiment of the present invention provide deep learning neural network model load calculating method related device, should The mode that software and/or hardware can be used in device realizes, and generally can integrated computer equipment etc..

Correspondingly, the device of the present embodiment may include:

Calculating task parsing module 310, for being parsed to the network model constructed in advance, by the calculating of the network model Flowsheet simulation is at least two calculating tasks；Wherein, there are dependences at least two calculating task；

Calculating task division module 320, for being appointed according at least one preconfigured resource allocation policy to each described calculate Business is divided, and is formed at least one and is calculated subtask；

Resource distribution module 330, by being respectively based on the associated whole of the calculating task according to each resource allocation policy Operator task distributes resource, obtains distribution data acquisition system of the calculating task under each resource allocation policy；The money Source includes computing resource and storage resource；

Load matrix generation module 340, for counting distribution number of each calculating task under each resource allocation policy According to set, the load matrix of the network model is formed；

Performance matrix computing module 350, for the performance parameter set and the load matrix according to chip to be assessed, meter The runing time of each calculating subtask that the network model decomposes under each resource allocation policy is calculated, determines performance Matrix, to evaluate the performance that the chip runs the network model.

Further, the calculating task parsing module 310, comprising: network model hierarchical structure resolution unit, for pair The network model is parsed, and determines that the hierarchical structure of the network model, the hierarchical structure of the network model include extremely It is two layers few；Using the associated data processing operation of the layer as a calculating task, at least two calculating tasks are formed.

Further, the calculating task division module 320, comprising: resource allocation policy division unit, for according to pre- At least one resource allocation policy first configured, determines the quantity allotted of computing resource；According to the quantity allotted to each described Calculating task is divided, and is formed at least one and is calculated subtask；Wherein, calculating that each calculating task divides The quantity of task is less than or equal to the quantity allotted.

Further, the deep learning neural network model loads computing device, further includes: resource distribution table receives mould Block is formed at least for being divided according at least one preconfigured resource allocation policy to each calculating task Before one calculating subtask, an at least resource distribution table is received, and parse, it is corresponding to obtain each resource distribution table The syntagmatic of computing resource and storage resource；The combination of the corresponding computing resource of one resource distribution table and storage resource is closed System, as a kind of resource allocation policy.

Further, the resource distribution module 330, comprising: resource distribution table traverses resolution unit, is used for from each described Target resource allocation list plays each resource distribution table of traversal in resource distribution table, until whole resource distribution table traversals are completed； In the ergodic process of the resource distribution table, son is calculated from each target calculated in subtask of the calculating task and is appointed Each calculating subtask of traversal of being engaged in, for the current calculating subtask traversed, from corresponding group of the resource distribution table Selection target computing resource in conjunction relationship, and corresponding target storage resource is obtained, establish the current calculating subtask, target The corresponding relationship of computing resource and at least one target storage resource is completed until all calculating subtask traversal；Wherein, institute The corresponding computing resource in each calculating subtask for stating calculating task is different, each calculating subtask of the calculating task Corresponding storage resource is different；According to each corresponding computing resource in calculating subtask and corresponding storage resource, generate Distribution data acquisition system of the calculating task under the resource distribution table；After whole resource distribution tables traversal is completed, obtain Take distribution data acquisition system of the calculating task under each resource distribution table.

Further, the performance matrix computing module 350, comprising: runing time computing unit, for according to be assessed The performance parameter set of chip calculates the input that subtask is respectively calculated in the load matrix under each resource allocation policy The processing elapsed time and result data handling time of data handling time, input data；By the calculating subtask described The processing elapsed time of input data handling time, input data under resource allocation policy and result data handling time it is total With the runing time as the calculating subtask under the resource allocation policy；According to each calculating subtask in each institute The runing time under resource allocation policy is stated, the performance matrix of the network model is formed.

Further, the quantity allotted for the computing resource for including in each resource distribution table is equal to the chip performance The quantity of processor in parameter sets.

Depth provided by the embodiment of the present invention can be performed in above-mentioned deep learning neural network model load computing device Neural network model load calculating method is practised, has the corresponding function of deep learning neural network model load calculating method of execution It can module and beneficial effect.

Example IV

Fig. 4 is the structural schematic diagram of one of the embodiment of the present invention four computer equipment.Fig. 4, which is shown, to be suitable for being used to realizing this The block diagram of the exemplary computer device 12 of invention embodiment.The computer equipment 12 that Fig. 4 is shown is only an example, no The function and use scope for coping with the embodiment of the present invention bring any restrictions.

As shown in figure 4, computer equipment 12 is showed in the form of universal computing device.The component of computer equipment 12 can be with Including but not limited to: one or more processor or processing unit 16, system storage 28 connect different system components The bus 18 of (including system storage 28 and processing unit 16).Computer equipment 12 can be the equipment being articulated in bus.

Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (Industry Standard Architecture, ISA) bus, microchannel architecture (Micro Channel Architecture, MCA) bus, increasing Strong type isa bus, Video Electronics Standards Association (Video Electronics Standards Association, VESA) office Domain bus and peripheral component interconnection (Peripheral Component Interconnect, PCI) bus.

Computer equipment 12 typically comprises a variety of computer system readable media.These media can be it is any can be by The usable medium that computer equipment 12 accesses, including volatile and non-volatile media, moveable and immovable medium.

System storage 28 may include the computer system readable media of form of volatile memory, such as arbitrary access Memory (RAM) 30 and/or cache memory 32.Computer equipment 12 may further include it is other it is removable/can not Mobile, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing not Movably, non-volatile magnetic media (Fig. 4 do not show, commonly referred to as " hard disk drive ").It although not shown in fig 4, can be with The disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") is provided, and non-volatile to moving CD (such as compact disc read-only memory (Compact Disc Read-Only Memory, CD-ROM), digital video disk (Digital Video Disc-Read Only Memory, DVD-ROM) or other optical mediums) read-write CD drive. In these cases, each driver can be connected by one or more data media interfaces with bus 18.System storage Device 28 may include at least one program product, which has one group of (for example, at least one) program module, these journeys Sequence module is configured to perform the function of various embodiments of the present invention.

Program/utility 40 with one group of (at least one) program module 42 can store and store in such as system In device 28, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other It may include the realization of network environment in program module and program data, each of these examples or certain combination.Journey Sequence module 42 usually executes function and/or method in embodiment described in the invention.

Computer equipment 12 can also be with one or more external equipment 14(such as keyboard, sensing equipment, display 24 Deng) communication, can also be enabled a user to one or more equipment interact with the computer equipment 12 communicate, and/or with make The computer equipment 12 any equipment (such as network interface card, the modulatedemodulate that can be communicated with one or more of the other calculating equipment Adjust device etc.) communication.This communication can be carried out by input/output (Input/Output, I/O) interface 22.Also, it calculates Machine equipment 12 can also pass through network adapter 20 and one or more network (such as local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN) communication.As shown, network adapter 20 passes through bus 18 communicate with other modules of computer equipment 12.It should be understood that although not shown in fig 4, computer equipment 12 can be combined Using other hardware and/or software module, including but not limited to: microcode, device driver, redundant processing unit, external magnetic Dish driving array, (Redundant Arrays of Inexpensive Disks, RAID) system, tape drive and number According to backup storage system etc..

Processing unit 16 by the program that is stored in system storage 28 of operation, thereby executing various function application and Data processing, such as realize a kind of deep learning neural network model load calculating side provided by any embodiment of the invention Method.

Embodiment five

The embodiment of the present invention five provides a kind of computer readable storage medium, is stored thereon with computer program, the program quilt The deep learning neural network model load calculating method provided such as all inventive embodiments of the application is provided when processor executes: That is, realization when the program is executed by processor: being parsed to the network model constructed in advance, by the meter of the network model Calculation flowsheet simulation is at least two calculating tasks；Wherein, there are dependences at least two calculating task；According to matching in advance At least one resource allocation policy set divides each calculating task, forms at least one and calculates subtask；According to Each resource allocation policy, which is respectively that the calculating task is associated, all calculates subtask distribution resource, obtains the calculating Distribution data acquisition system of the task under each resource allocation policy；The resource includes computing resource and storage resource；Statistics Distribution data acquisition system of each calculating task under each resource allocation policy, forms the moment of load of the network model Battle array；According to the performance parameter set and the load matrix of chip to be assessed, calculate the network model decompose it is each Runing time of the subtask under each resource allocation policy is calculated, determines performance matrix, to evaluate chip operation institute State the performance of network model.

The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool There are electrical connection, the portable computer diskette, hard disk, RAM, read-only memory (Read Only of one or more conducting wires Memory, ROM), erasable programmable read only memory (Erasable Programmable Read Only Memory, EPROM), flash memory, optical fiber, portable CD-ROM, light storage device, magnetic memory device or above-mentioned any appropriate combination. In this document, it includes or the tangible medium of storage program that the program can be by that computer readable storage medium, which can be any, Instruction execution system, device or device use or in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission is for by the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, radio frequency (Radio Frequency, RF) etc. are above-mentioned any appropriate Combination.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.In It is related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN or WAN --- even It is connected to subscriber computer, or, it may be connected to outer computer (such as pass through internet using ISP Connection).

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. a kind of deep learning neural network model load calculating method characterized by comprising

2. being incited somebody to action the method according to claim 1, wherein described parse the network model constructed in advance The calculation process of the network model is decomposed at least two calculating tasks, comprising:

The network model is parsed, determines the hierarchical structure of the network model, the hierarchical structure of the network model Including at least two layers；

Using the associated data processing operation of the layer as a calculating task, at least two calculating tasks are formed.

3. according to the method described in claim 2, it is characterized in that, described according at least one preconfigured resource allocation plan Slightly each calculating task is divided, at least one is formed and calculates subtask, comprising:

According at least one preconfigured resource allocation policy, the quantity allotted of computing resource is determined；

Each calculating task is divided according to the quantity allotted, at least one is formed and calculates subtask；Wherein, each The quantity for the calculating subtask that the calculating task divides is less than or equal to the quantity allotted.

4. according to the method described in claim 3, it is characterized in that, according at least one preconfigured resource allocation policy Each calculating task is divided, is formed before at least one calculating subtask, further includes:

An at least resource distribution table is received, and is parsed, the corresponding computing resource of each resource distribution table is obtained and storage provides The syntagmatic in source；

By the syntagmatic of a resource distribution table corresponding computing resource and storage resource, as a kind of resource allocation policy.

5. according to the method described in claim 4, it is characterized in that, described respectively described according to each resource allocation policy Resource is distributed in the associated subtask that all calculates of calculating task, obtains the calculating task under each resource allocation policy Distribute data acquisition system, comprising:

Each resource distribution table is traversed from target resource allocation list in each resource distribution table, until whole resource distributions Table traversal is completed；

In the ergodic process of the resource distribution table, calculated from each target calculated in subtask of the calculating task Each calculating subtask of traversal is played in subtask, corresponding from the resource distribution table for the current calculating subtask traversed Syntagmatic in selection target computing resource, and obtain corresponding target storage resource, establish the current calculating subtask, The corresponding relationship of target computational resource and at least one target storage resource is completed until all calculating subtask traversal；

Wherein, the corresponding computing resource in each calculating subtask of the calculating task is different, each institute of the calculating task It states and calculates the corresponding storage resource difference in subtask；

According to each corresponding computing resource in calculating subtask and corresponding storage resource, the calculating task is generated in institute State the distribution data acquisition system under resource distribution table；

After whole resource distribution tables traversal is completed, distribution number of the calculating task under each resource distribution table is obtained According to set.

6. the method according to claim 1, wherein the performance parameter set according to chip to be assessed, with And the load matrix, each calculating subtask that the network model decomposes is calculated under each resource allocation policy Runing time determines performance matrix, comprising:

According to the performance parameter set of chip to be assessed, calculates and respectively calculate subtask in the load matrix in each resource point Processing elapsed time and result data handling time with input data handling time, input data under strategy；

By the processing consumption of input data handling time, input data of the calculating subtask under the resource allocation policy Runing time of the summation of time and result data handling time as the calculating subtask under the resource allocation policy；

According to each runing time for calculating subtask under each resource allocation policy, the property of the network model is formed It can matrix.

7. according to the method described in claim 4, it is characterized in that, point for the computing resource for including in each resource distribution table Quantity with the processor equal in number in the chip performance parameter sets.

8. a kind of deep learning neural network model loads computing device characterized by comprising

9. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes the depth as described in any in claim 1-7 when executing described program Learning neural network model load calculation method.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The deep learning neural network model load calculating method as described in any in claim 1-7 is realized when execution.