CN110515739A - Deep learning neural network model load calculating method, device, equipment and medium - Google Patents
Deep learning neural network model load calculating method, device, equipment and medium Download PDFInfo
- Publication number
- CN110515739A CN110515739A CN201911008660.3A CN201911008660A CN110515739A CN 110515739 A CN110515739 A CN 110515739A CN 201911008660 A CN201911008660 A CN 201911008660A CN 110515739 A CN110515739 A CN 110515739A
- Authority
- CN
- China
- Prior art keywords
- calculating
- resource
- network model
- subtask
- resource allocation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/10—Interfaces, programming languages or software development kits, e.g. for simulating neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5017—Task decomposition
Abstract
The embodiment of the invention discloses a kind of deep learning neural network model load calculating method, device, equipment and media.The described method includes: parsing to the network model constructed in advance, the calculation process of the network model is decomposed at least two calculating tasks;Each calculating task is divided, at least one is formed and calculates subtask;Being respectively that the calculating task is associated according to each resource allocation policy all calculates subtask distribution resource, obtains distribution data acquisition system of the calculating task under each resource allocation policy;Distribution data acquisition system of each calculating task under each resource allocation policy is counted, the load matrix of the network model is formed;According to the performance parameter set of chip to be assessed, each calculating subtask runing time is calculated, determines performance matrix.The performance simulation speed of the chip of operation learning network model can be improved in the embodiment of the present invention, to be quickly found out the architecture design defect of chip.
Description
Technical field
The present embodiments relate to data processing fields more particularly to a kind of load of deep learning neural network model to calculate
Method, apparatus, equipment and medium.
Background technique
More stringent requirements are proposed for the computing capability of computer for the high speed development of AI industry, major semiconductor
Manufacturer is also intended to accelerate the trained special chip with reasoning process of deep learning in positive research and development and release.
The research and development and manufacture of chip are a more very long processes, it is generally the case that the reasonability of chip architecture designs
Verifying and the assessment of calculated performance just can be carried out after needing to produce and obtain until small lot print, to can greatly increase
The iteration cycle of long products research and development or even indefinite duration delay time to market (TTM), this is for each semiconductor manufacturer
It is unacceptable.
Existing solution is to be emulated using private server to chip architecture, and provide a whole set of by service side
Matched software and hardware solution carries out the performance verification of chip based on this, but this scheme involves great expense, and service of simulation software
The speed of service is slower, and usual one simple test sample needs to run as long as a few hours;Further for support parallel computation
The verifying for accelerating chip architecture, the cutting of calculating task and will lead to chip to the scheduling strategy difference of on piece hardware resource
Operation load is different, to bring different performances, the trial and exploration of these strategies facilitate at the beginning of chip designs
Find the defect on framework.The strategy that can usually use has thousands of, this just proposes iteration speed to heuristic process
Requirement, this can not complete this work by complicated huge dedicated performance service of simulation.
Summary of the invention
The embodiment of the invention provides a kind of deep learning neural network model load calculating method, device, equipment and Jie
The performance simulation speed of the chip of operation learning network model can be improved in matter.
In a first aspect, the embodiment of the invention provides a kind of deep learning neural network model load calculating methods, comprising:
The network model constructed in advance is parsed, the calculation process of the network model is decomposed at least two calculating and is appointed
Business;Wherein, there are dependences at least two calculating task;
Each calculating task is divided according at least one preconfigured resource allocation policy, forms at least one meter
Operator task;
Being respectively that the calculating task is associated according to each resource allocation policy all calculates subtask distribution resource, obtains
Distribution data acquisition system of the calculating task under each resource allocation policy;The resource includes computing resource and storage money
Source;
Distribution data acquisition system of each calculating task under each resource allocation policy is counted, the network model is formed
Load matrix;
According to the performance parameter set and the load matrix of chip to be assessed, calculate what the network model decomposed
Each runing time for calculating subtask under each resource allocation policy, determines performance matrix, to evaluate the chip operation
The performance of the network model.
Second aspect, the embodiment of the invention provides a kind of deep learning neural network models to load computing device, comprising:
Calculating task parsing module, for being parsed to the network model constructed in advance, by the calculating stream of the network model
Journey is decomposed at least two calculating tasks;Wherein, there are dependences at least two calculating task;
Calculating task division module, for according at least one preconfigured resource allocation policy to each calculating task into
Row divides, and forms at least one and calculates subtask;
Resource distribution module all calculates son for being respectively that the calculating task is associated according to each resource allocation policy
Task distributes resource, obtains distribution data acquisition system of the calculating task under each resource allocation policy;The resource packet
Include computing resource and storage resource;
Load matrix generation module, for counting distribution data set of each calculating task under each resource allocation policy
It closes, forms the load matrix of the network model;
Performance matrix computing module calculates institute for the performance parameter set and the load matrix according to chip to be assessed
The runing time of each calculating subtask that network model decomposes under each resource allocation policy is stated, determines performance square
Battle array, to evaluate the performance that the chip runs the network model.
The third aspect the embodiment of the invention also provides a kind of computer equipment, including memory, processor and is stored in
It is realized on memory and when processor described in the computer program that can run on a processor executes described program as the present invention is real
Apply any deep learning neural network model load calculating method in example.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer
Program realizes that the deep learning neural network model as described in any in the embodiment of the present invention is negative when the program is executed by processor
Carry calculation method.
The embodiment of the present invention forms at least two and calculates and appoint by automatically parsing to deep learning neural network model
Business, and according to preconfigured resource allocation policy, further calculating task is divided, is formed and calculates subtask, and pressed
It is respectively to calculate subtask to carry out resource allocation according to different resource allocation policies, obtains the load under different resource allocation strategy
Matrix, and the performance parameter set based on chip to be assessed calculate each fortune for calculating subtask under different resource allocation strategy
The row time, so that it is determined that the performance matrix of chip operational network model is solved for evaluating the performance of chip operational network model
The problem of economic cost of chip emulation operational network model is high in the prior art and low efficiency, can be improved operation and learns
Practise the performance simulation speed of the chip of network model.
Detailed description of the invention
Fig. 1 is the flow chart of one of embodiment of the present invention one deep learning neural network model load calculating method;
Fig. 2 is the flow chart of one of embodiment of the present invention two deep learning neural network model load calculating method;
Fig. 3 is the structural schematic diagram of one of embodiment of the present invention three deep learning neural network model load computing device;
Fig. 4 is the structural schematic diagram of one of the embodiment of the present invention four computer equipment.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is the flow chart of one of the embodiment of the present invention one deep learning neural network model load calculating method, this reality
It applies example and is applicable to the case where emulating to the process of chip operational network model, this method can be mentioned by the embodiment of the present invention
The deep learning neural network model of confession loads computing device to execute, and the mode which can be used software and/or hardware is real
It is existing, and generally can be in integrated computer equipment, for example, terminal device or server etc..As shown in Figure 1, the method for the present embodiment
It specifically includes:
S110 parses the network model constructed in advance, and the calculation process of the network model is decomposed at least two
Calculating task;Wherein, there are dependences at least two calculating task.
Network model can refer to deep learning neural network model (Deep Learning Neural Networks).
The calculation process of network model is used to indicate multiple continuous calculating steps that network model needs to be implemented at runtime
Suddenly.Wherein, calculation process can be converted to multiple continuous calculating steps.
Calculating task is for indicating some or certain several calculating steps.Each calculating task is all different.
There are dependence between two calculating tasks, show that the calculated result (or output) of a calculating task can be made
For the input of another calculating task.Multiple calculating tasks there are dependence are combined according to order, as net
The calculation process of network model, that is to say, that the calculation process of network model is calculating task sequence, is calculated in calculating task sequence
Putting in order for task executes sequence for calculating task.
In fact, neural network model includes multi-layer data operation, each layer executes different data processing operations.Show
Example property, data processing operation may include filling (Padding), deformation (Reshape), convolution (Convolution) and pond
Change (Pooling) etc..
Network model can be constructed by programming interface predetermined.User can pass through programming predetermined
Interface inputs the associated data of network model, to establish neural network model.Illustratively, the network model of foundation is, whereinIt indicates
The a certain layer of neural network.
It is understood that network model is transferred to programming interface by user, network mould can be obtained by programming interface
The structure of type, namely determine that each layer of structure in network model determines, and immobilize in subsequent processing.It is practical
On, the calculation process of network model includes the data handling procedure that each layer is done, so as at the data of being done each layer
Calculating task of the reason process as network model.As in the previous example, correspondingly, corresponding S layer network model structure
Available S continuous calculating task sequence,,
Wherein, fromIt arrivesIn, calculating taskInput be;Box represents calculating taskBetween exist
Dependence, i.e., input of the output of previous item calculating task as latter calculating task, S calculating task sequences execute,
To guarantee the correctness of calculated result.
In fact, calculating task be divided from the angle of the operation timing of network model namely calculating task be from
The calculation process of network model is divided on time.
S120 divides each calculating task according at least one preconfigured resource allocation policy, is formed
At least one calculates subtask.
Usual resource allocation policy, which is used to distribute, executes the resource that calculating task needs, and resource allocation policy can refer to money
The mode of source distribution, wherein resource can specifically include computing resource and storage resource.Computing resource is appointed for executing to calculate
Business.Storage resource executes the associated data of calculating task for storing.Subtask is calculated for forming calculating task, is to calculate to appoint
Part in business calculates.
In fact, calculating task can also continue to divide, such as it is subdivided into multiple calculating subtasks.Each calculating subtask
It is all different, and mutually indepedent, while all calculating subtask and forming complete calculating task.Wherein, division mode can be,
By the calculation amount of calculating task, equalization is divided into n calculating subtask, wherein n be more than or equal to 1, specifically can according to need into
Row setting, in this regard, the embodiment of the present invention is not particularly limited.Illustratively, calculating task is to carry out convolution to 10 characteristic patterns
It calculates, calculating task can be divided into 10 calculating subtasks, each calculating subtask is to carry out convolution to 1 characteristic pattern
It calculates;Or calculating task can also be divided into 5 calculating subtasks, each calculating subtask is to carry out to 2 characteristic patterns
Convolutional calculation, wherein each characteristic pattern for calculating subtask progress convolutional calculation is different.
As in the previous example, calculating task is, the relationship of calculating task and calculating subtask are as follows:,
Wherein,To calculate subtask, each calculating subtask subscript step having the same is represented, and each subtask that calculates is subordinate to jointly
Belong to calculating task, subscript difference represents from calculating taskIn the Q item that splits out, curly bracket represents each calculating
There is no dependence between subtask, is parallel existing.
The division mode to each calculating task can be determined according to resource allocation policy, namely true according to resource allocation policy
It is fixed to can be divided into the quantity for calculating subtask to each calculating task, that is, determine the quantity of n.
Specifically, can be using the computing unit in computing resource as the foundation for calculating subtask division.Specifically, calculating
Resource includes multiple computing units, and using the quantity of computing unit as n, calculating task is also divided into n calculating subtask,
To which a computing unit executes a calculating subtask.
Further, it is also possible to according to the space size of storage resource as the foundation for calculating subtask division;Or it can root
It is comprehensive to determine the division mode for calculating subtask according to computing resource and storage resource.It specifically can according to need and set, it is right
This, the embodiment of the present invention is not specifically limited.
To sum up, calculating subtask is really, in the same time, calculating task to be divided into the unit assignment executed parallel.
Namely calculating subtask is from spatially dividing to calculating task.In the execution time of calculating task, meter is executed parallel
Multiple calculating subtasks that calculation task divides.
S130, which is respectively that the calculating task is associated according to each resource allocation policy, all calculates subtask distribution
Resource obtains distribution data acquisition system of the calculating task under each resource allocation policy;The resource includes calculating money
Source and storage resource.
Distribution data acquisition system is used to describe the resource allocation conditions of each calculating subtask of calculating task division, distributes data
The mapping relations of each calculating subtask and resource that set record has calculating task to divide.
Resource is distributed to calculate subtask, really computing resource and storage resource is distributed to calculate subtask, is also
Calculate subtask allocation processing device and memory space.
In general, the mode of the resource allocation of different resource allocation strategy is different, really by computing resource and/or deposits
Resource numeralization is stored up, and is distributed to according to different quantity and calculates subtask.
Specifically, computing resource equalization can be divided into N number of computing unit namely computing resource collection is combined into,For a computing unit.It should be noted that computing resource typically refers to
Processor on chip, a computing unit are integer processor, wherein a computing unit may include a processing
Device, a computing unit may include two processors, even more, in this regard, the embodiment of the present invention is not specifically limited.It obtains
Chip to be detected determines the number of processor, the maximum quantity as computing unit.That is, the quantity N of computing unit
For integer, and N is less than or equal to the number of the processor of chip to be detected.
Storage resource is evenly divided into M equal portions simultaneously, obtaining set of memory resources is,For a storage unit.Storage resource is to be used on chip
Memory space can be carried out impartial division by the memory space of operational network model distribution.It should be noted that each storage is single
The size of the corresponding memory space of member is at least stored with execution enough and calculates associated total data during subtask.
In fact, a resource allocation policy specifies a calculating subtask that can distribute to obtain the quantity of computing unit, Yi Jicun
The quantity of storage unit.It should be noted that each resource allocation policy should meet restrictive condition,, that is, guarantee whole calculating subtasks that calculating task divides
Just divided all computing resource C and storage resource D, cannot be more points, it cannot divide less, and cannot have overlapping between each other.
Under a resource allocation policy, each calculating subtask that calculating task divides is assigned at least one
Computing unit and at least one storage unit.Correspondingly, each calculating subtask is assigned to not under different resource allocation strategy
Computing unit and different number of storage unit with number.By each calculating subtask under each resource allocation policy, each meter
The number for the computing unit that operator task is assigned to and each number for calculating the storage unit that subtask is assigned to, as a sequence
Column count the data of different resource allocation strategy as a result, and an available set, which is calculating task each
Distribution data acquisition system under resource allocation policy.
S140, counts the distribution data acquisition system of each calculating task under each resource allocation policy, described in formation
The load matrix of network model.
Load matrix is for describing resource allocation conditions of the network model under different resource allocation strategy.Load matrix note
Record has the mapping relations of network model each calculation stages and resource allocation in the process of running.
According to preceding method, distribution data acquisition system of each calculating task respectively under each resource allocation policy, shape are obtained
At the load matrix of network model.
S150 calculates the network model point according to the performance parameter set and the load matrix of chip to be assessed
Runing time of each calculating subtask that solution obtains under each resource allocation policy, determines performance matrix, described in evaluation
Chip runs the performance of the network model.
Performance parameter set is used to describe the performance of chip to be assessed.Performance parameter set record has the performance of chip to join
Number, illustratively, performance parameter collection is combined into performance dictionary (PD).Runing time is used to describe to execute the consuming for calculating subtask
Duration.Performance matrix is used to evaluate the runnability of chip to be assessed.Appointed according to the performance parameter of chip and each calculating
The computing unit and storage unit that business distributes under different resource allocation strategy calculate each runing time for calculating subtask,
Form performance matrix corresponding with load matrix.Thus, it is possible to which each calculating subtask that determining network model decomposes is not
With the runing time under resource allocation policy, namely determine runing time of the network model under different resource allocation strategy, from
And determine the performance of chip operational network model.It is understood that the runing time of network model is shorter, chip operational network
The performance of model is higher.
Optionally, the performance parameter set and the load matrix according to chip to be assessed, calculates the network
Runing time of each calculating subtask that model decomposition obtains under each resource allocation policy, determines performance matrix, comprising:
According to the performance parameter set of chip to be assessed, calculates and respectively calculate subtask in the load matrix in each resource allocation plan
The processing elapsed time and result data handling time of input data handling time, input data under slightly;By calculating
Input data handling time of the task under the resource allocation policy, the processing elapsed time of input data and result data are removed
Transport runing time of the summation of time as the calculating subtask under the resource allocation policy;According to each calculating
Runing time of the task under each resource allocation policy, forms the performance matrix of the network model.
A usual calculating subtask, which executes, to be completed to need to undergo to read input data from memory space, to input data
Data processing is carried out, and three processes in the result data write-in memory space that data processing is obtained.Thus correspondingly,
The runing time of one calculating subtask includes the processing elapsed time and result data of input data handling time, input data
Handling time.
Wherein, input data handling time be used for describe calculate subtask acquisition input data duration, can refer to by
Input data is from the duration in the computing resource that the storage resource of distribution is transported to distribution.It can specifically be calculated based on following formula
Input data handling time:
Wherein,It is the size for calculating the input data of subtask, it is single using byte Byte as statistics
Position, input bandwidth is from storage resourceIt is transported to computing resourceWhen data transfer bandwidth, with
Byte/s can be obtained as unit from performance parameter set.
The processing elapsed time of input data is used to describe to calculate subtask and carries out handling corresponding consumption to input data
Duration can refer to computing resourceHandle the duration of input data.Specifically input data can be calculated based on following formula
Processing elapsed time:
Wherein, cost of data processing per byte refers to the time consumed by the every byte data of processing.Processing
Time consumed by every byte data can pass through calculating taskCharacteristic determine, can also be from performance parameter set
It obtains.
Result data handling time is used to describe to calculate the duration of subtask output result data, can refer to number of results
According to from the duration in the storage resource that the computing resource of distribution is transported to distribution.Following formula calculated result number can be specifically based on
According to handling time:
Wherein,It is the size for calculating the result data that subtask obtains, using byte Byte as system
Unit is counted, output bandwidth is from computing resourceCalculation result data is carried to storage resourceData pass
Defeated bandwidth can be obtained using Byte/s as unit from performance parameter set.
When the input data handling time for calculating subtask, the processing elapsed time of input data and result data are carried
Between summation as calculating runing time of the subtask under resource allocation policy.
Runing time of the different calculating subtasks under different resource allocation strategy is counted, by the longest calculating of runing time
Runing time of the runing time of subtask as calculating task.It is understood that each calculating that calculating task is decomposed
Subtask executes parallel, thus, the runing time of calculating task, which is equal to, calculates the longest duration of runing time in subtask.
The runing time of calculating task is determined based on following formula:
Wherein,For the runing time of the calculating task of step step.
By calculating each input data handling time for calculating subtask under each resource allocation policy, input data
Processing elapsed time and result data handling time, determine each operation of the calculating subtask under each resource allocation policy
Time, accurately determine it is each calculate subtask runing time, thus, accurate statistics go out whole network model running consuming when
Between.
The embodiment of the present invention forms at least two and calculates and appoint by automatically parsing to deep learning neural network model
Business, and according to preconfigured resource allocation policy, further calculating task is divided, is formed and calculates subtask, and pressed
It is respectively to calculate subtask to carry out resource allocation according to different resource allocation policies, obtains the load under different resource allocation strategy
Matrix, and the performance parameter set based on chip to be assessed calculate each fortune for calculating subtask under different resource allocation strategy
The row time, so that it is determined that the performance matrix of chip operational network model is solved for evaluating the performance of chip operational network model
The problem of economic cost of chip emulation operational network model is high in the prior art and low efficiency, can be improved operation and learns
Practise the performance simulation speed of the chip of network model.
Embodiment two
Fig. 2 is the flow chart of one of the embodiment of the present invention two deep learning neural network model load calculating method, this reality
It applies example to be embodied based on above-described embodiment, the network model constructed in advance will be parsed, by the network mould
The calculation process of type is decomposed at least two calculating tasks, is embodied as: parsing to the network model, determines the net
The hierarchical structure of network model, the hierarchical structure of the network model include at least two layers;By the associated data processing behaviour of the layer
It is used as a calculating task, forms at least two calculating tasks.
Specifically, the method for the present embodiment specifically includes:
S210 parses the network model constructed in advance, determines the hierarchical structure of the network model, the network model
Hierarchical structure include at least two layers.
In fact, network model is the network model for having hierarchical structure, for example, neural network model.Hierarchical structure can
To refer to, there are dependence between every layer, every layer includes that multiple layer units (such as neuron) executes a data processing operation.
The hierarchical structure of network model can at least be decomposed into two and calculate including the calculation process of at least two layers namely network model appoints
Business.
Network model, calculating task, dependence, resource allocation policy, calculating subtask, distribution number in the present embodiment
The description of previous embodiment can be referred to according to set, performance parameter set and performance matrix.
S220 forms at least two calculating tasks using the associated data processing operation of the layer as a calculating task;
Wherein, there are dependences at least two calculating task.
Can be using the layer in hierarchical structure as node, network model can be the network structure formed by multiple nodes.
The associated data processing operation of layer as a calculating task, count as one by the data processing operation that an as node executes
Calculation task, a node carry out the result that data processing obtains and are transferred to next node as input, so that next node
Continue data processing operation.Correspondingly, input number of the calculated result of a calculating task as next calculating task
According to.The calculation process of network model is mapped as multiple calculating tasks there are dependence as a result,.
S230 divides each calculating task according at least one preconfigured resource allocation policy, is formed
At least one calculates subtask.
Optionally, described that each calculating task is drawn according at least one preconfigured resource allocation policy
Point, it forms at least one and calculates subtask, comprising: according at least one preconfigured resource allocation policy, determine and calculate money
The quantity allotted in source;Each calculating task is divided according to the quantity allotted, at least one is formed and calculates subtask;
Wherein, the quantity for the calculating subtask that each calculating task divides is less than or equal to the quantity allotted.
Specifically, the quantity allotted of computing resource can refer to the number for the computing resource that chip can call.Usually meter
The quantity allotted for calculating resource is more, and it is more to calculate the quantity that subtask divides.Quantity allotted calculates drawing for subtask for determining
Divide method.Computing resource can be divided into multiple computing units, the quantity and distribution number of computing unit according to quantity allotted equalization
It measures identical, so that quantity allotted is equal to the quantity of computing unit to be allocated in each resource allocation policy, is equally equal to each resource
The maximum value of the quantity of allocated computing unit in allocation strategy.It is understood that the distribution data volume of computing resource can
To describe the runnability of chip, thus, quantity allotted can be obtained from the performance parameter set of chip.
Each calculating subtask at least needs a computing unit to execute corresponding data processing.Therefore, calculating task is most
Quantity allotted can be mostly divided into and calculate subtask.A usual calculating subtask is executed by a computing unit.Distribute number
Amount can be greater than the quantity for calculating subtask, at this point, being not carried out there are at least one computing unit free time and calculating subtask.
Quantity allotted is determined by resource allocation policy, and determines the quantity for calculating subtask according to quantity allotted, is realized
The calculating subtask of division can be adapted to computing resource, guarantee the correct execution for calculating subtask, thus, guarantee network model
Correct simulation run.
Optionally, each calculating task is being divided according at least one preconfigured resource allocation policy,
It is formed before at least one calculating subtask, further includes: receive an at least resource distribution table, and parse, obtain each money
The syntagmatic of source allocation list corresponding computing resource and storage resource;By the corresponding computing resource of a resource distribution table with deposit
The syntagmatic for storing up resource, as a kind of resource allocation policy.
Wherein, resource distribution table is used to determine the method for salary distribution of computing resource and storage resource.It is recorded in resource distribution table
There is the syntagmatic of computing resource and storage resource, wherein syntagmatic is distributed to based on each calculating subtask by determination
Calculate resource and storage resource, and for describing there are the computing unit of mapping relations and to deposit with the same calculating subtask respectively
Relationship between storage unit.
In fact, being to calculate subtask for each to configure computing resource and storage resource in resource distribution table, distribute to
There are syntagmatics for the same computing resource for calculating subtask and storage resource.
Illustratively, resource distribution table is as follows:
…
Correspondingly, the Q item of calculating task calculates a kind of resource allocation policy of the subtask in computing resource and storage resource are as follows:
Wherein,。
Resource distribution table can be tester's input, for specifying resource allocation policy to be tested.To receiving
User input resource distribution table parse, obtain resource allocation policy.
It is (i.e. every to calculate subtask to all allocation strategies
Process) carry out exhaustion, it is assumed that available K kind distribution method;It is possible thereby to obtain a description in whole network model
The load matrix of chip loading condition when step is walked。
By receiving resource distribution table and parsing, determines to each resource allocation conditions for calculating subtask, realize flexible point
With resource, increase the loading condition of chip, increase the test scope of chip, to improve the accuracy of the performance test of chip.
Optionally, the quantity allotted for the computing resource for including in each resource distribution table is equal to the chip performance ginseng
The quantity of processor in manifold conjunction.Wherein, the quantity allotted of computing resource is the quantity of processor, i.e., includes meter in computing resource
The quantity for calculating unit is the quantity of processor.By the way that the quantity allotted of computing resource to be equal to the quantity of processor in chip,
It can be adapted to the processor situation of chip, computing resource is distributed, the reasonability of resource allocation is improved, to improve the performance of chip
The accuracy of test.
Optionally, it is described according to each resource allocation policy be respectively the calculating task it is associated all calculate son appoint
Business distribution resource, obtains distribution data acquisition system of the calculating task under each resource allocation policy, comprising: from each described
Target resource allocation list plays each resource distribution table of traversal in resource distribution table, until whole resource distribution table traversals are completed;
In the ergodic process of the resource distribution table, son is calculated from each target calculated in subtask of the calculating task and is appointed
Each calculating subtask of traversal of being engaged in, for the current calculating subtask traversed, from corresponding group of the resource distribution table
Selection target computing resource in conjunction relationship, and corresponding target storage resource is obtained, establish the current calculating subtask, target
The corresponding relationship of computing resource and at least one target storage resource is completed until all calculating subtask traversal;Wherein, institute
The corresponding computing resource in each calculating subtask for stating calculating task is different, each calculating subtask of the calculating task
Corresponding storage resource is different;According to each corresponding computing resource in calculating subtask and corresponding storage resource, generate
Distribution data acquisition system of the calculating task under the resource distribution table;After whole resource distribution tables traversal is completed, In
After whole resource distribution table traversals are completed, distribution data set of the calculating task under each resource distribution table is obtained
It closes.
Specifically, target resource allocation list is a resource distribution table in whole resource distribution tables, illustratively, target
Resource distribution table is any one randomly selected resource distribution table, or whole resource distribution tables are numbered, according to
Number order selects resource distribution table, for example, selecting to number the resource distribution table for 1.
Distribution data are used to describe the money of each calculating subtask that calculating task is decomposed under different resource allocation list
Source distribution condition, i.e., for determining each calculating subtask corresponding computing resource and storage resource under different resource allocation list.
Resource distribution table is traversed from each resource distribution table, for obtaining the resource allocation conditions of each resource distribution table.
When traversing Current resource allocation list, the syntagmatic of a computing resource and storage resource is selected, that is, is determined
Target computational resource and matched storage resource.Target computational resource and matched storage resource are distributed into any one calculating
Subtask.Wherein, target computational resource includes at least one computing unit, and storage resource includes at least one storage unit.Time
Go through the whole of current calculating task and calculate subtasks, and establish each calculating subtask respectively with computing resource and storage resource
Corresponding relationship, hereby it is achieved that distributing computing resource and storage money for each calculating subtask that current calculating task is decomposed
Source reduces and omits situation.
It repeats the above steps, until whole resource distribution tables traverse completion.Thus, it is possible to obtain each calculating subtask
Data are distributed, so that distribution data acquisition system of the calculating task under different resource allocation list is formed, with each calculating subtask of determination
Resource allocation conditions under different resource allocation list.
Illustratively, distribution data acquisition system of the calculating task of step step under a resource distribution table is as follows:
By traversing resource distribution table, and when traversing Current resource allocation list, the whole that calculating task obtains is established respectively
Corresponding relationship of the subtask respectively with the computing resource of Current resource allocation list configuration and storage resource is calculated, is realized based on current
Resource distribution table carries out resource allocation to subtask is calculated, and guarantees that each calculating subtask can be assigned to computing resource and storage money
Source realizes flexible allocation resource, and improves the accuracy of resource allocation, to improve the accuracy of the performance test of chip.
S240, which is respectively that the calculating task is associated according to each resource allocation policy, all calculates subtask distribution
Resource obtains distribution data acquisition system of the calculating task under each resource allocation policy;The resource includes calculating money
Source and storage resource.
S250, counts the distribution data acquisition system of each calculating task under each resource allocation policy, described in formation
The load matrix of network model.
As in the previous example, distribution data acquisition system of the calculating task of statistics step step under K resource distribution table, is loaded
MatrixIt is as follows:
S260 calculates the network model and decomposes according to the performance parameter set and the load matrix of chip to be assessed
Runing time of each calculating subtask arrived under each resource allocation policy, determines performance matrix, to evaluate the chip
Run the performance of the network model.
Illustratively, performance parameter collection is combined into PD.
Based on following formula, distribution data acquisition system of the step step calculating task under k-th of resource distribution table is calculated
Runing time:
In turn, it is based on following formula calculated performance matrix:
Wherein, each row element
With step identical, that is, represent decomposed from the calculation process of whole network model obtain out step (0,
1 ..., S-1) item calculating task, the operation duration on the different resource allocation policy of K kind respectively.
Each column elementEntire neural computing task is represented in k-th of resource allocation policyUnder calculating task item by itemRuning time, there are dependence between each calculating task,
This column summation can be obtained the overall operation time of the network model under k-th of allocation strategy.
Optionally, the performance parameter set and the load matrix according to chip to be assessed, calculates the network
Runing time of each calculating subtask that model decomposition obtains under each resource allocation policy, determines performance matrix, comprising:
According to the performance parameter set of chip to be assessed, when calculating the input data carrying for respectively calculating subtask in the load matrix
Between, the processing elapsed time and result data handling time of input data;When the input data for calculating subtask is carried
Between, input data processing elapsed time and result data handling time summation as the calculating subtask in resource allocation
Runing time under strategy;According to each runing time for calculating subtask under each resource allocation policy, institute is formed
State the performance matrix of network model.
The embodiment of the present invention by network model carry out network structure parsing, determine the layer for including in hierarchical structure, and
Using the associated data processing operation of each layer as calculating task, at least two calculating tasks are formed, are realized automatically by net
The calculation process of network model is decomposed into calculating task, needs not rely on the emulation of specific hardware device supporting network model, drop
The cost of low network model simulation run, while by being demarcated to obtain calculating task to network structure, network can be improved
The accuracy rate of the decomposition of the calculation process of model.
Embodiment three
Fig. 3 is the schematic diagram that one of the embodiment of the present invention three deep learning neural network model loads computing device.Implement
Example fourth is that realize the above embodiment of the present invention provide deep learning neural network model load calculating method related device, should
The mode that software and/or hardware can be used in device realizes, and generally can integrated computer equipment etc..
Correspondingly, the device of the present embodiment may include:
Calculating task parsing module 310, for being parsed to the network model constructed in advance, by the calculating of the network model
Flowsheet simulation is at least two calculating tasks;Wherein, there are dependences at least two calculating task;
Calculating task division module 320, for being appointed according at least one preconfigured resource allocation policy to each described calculate
Business is divided, and is formed at least one and is calculated subtask;
Resource distribution module 330, by being respectively based on the associated whole of the calculating task according to each resource allocation policy
Operator task distributes resource, obtains distribution data acquisition system of the calculating task under each resource allocation policy;The money
Source includes computing resource and storage resource;
Load matrix generation module 340, for counting distribution number of each calculating task under each resource allocation policy
According to set, the load matrix of the network model is formed;
Performance matrix computing module 350, for the performance parameter set and the load matrix according to chip to be assessed, meter
The runing time of each calculating subtask that the network model decomposes under each resource allocation policy is calculated, determines performance
Matrix, to evaluate the performance that the chip runs the network model.
The embodiment of the present invention forms at least two and calculates and appoint by automatically parsing to deep learning neural network model
Business, and according to preconfigured resource allocation policy, further calculating task is divided, is formed and calculates subtask, and pressed
It is respectively to calculate subtask to carry out resource allocation according to different resource allocation policies, obtains the load under different resource allocation strategy
Matrix, and the performance parameter set based on chip to be assessed calculate each fortune for calculating subtask under different resource allocation strategy
The row time, so that it is determined that the performance matrix of chip operational network model is solved for evaluating the performance of chip operational network model
The problem of economic cost of chip emulation operational network model is high in the prior art and low efficiency, can be improved operation and learns
Practise the performance simulation speed of the chip of network model.
Further, the calculating task parsing module 310, comprising: network model hierarchical structure resolution unit, for pair
The network model is parsed, and determines that the hierarchical structure of the network model, the hierarchical structure of the network model include extremely
It is two layers few;Using the associated data processing operation of the layer as a calculating task, at least two calculating tasks are formed.
Further, the calculating task division module 320, comprising: resource allocation policy division unit, for according to pre-
At least one resource allocation policy first configured, determines the quantity allotted of computing resource;According to the quantity allotted to each described
Calculating task is divided, and is formed at least one and is calculated subtask;Wherein, calculating that each calculating task divides
The quantity of task is less than or equal to the quantity allotted.
Further, the deep learning neural network model loads computing device, further includes: resource distribution table receives mould
Block is formed at least for being divided according at least one preconfigured resource allocation policy to each calculating task
Before one calculating subtask, an at least resource distribution table is received, and parse, it is corresponding to obtain each resource distribution table
The syntagmatic of computing resource and storage resource;The combination of the corresponding computing resource of one resource distribution table and storage resource is closed
System, as a kind of resource allocation policy.
Further, the resource distribution module 330, comprising: resource distribution table traverses resolution unit, is used for from each described
Target resource allocation list plays each resource distribution table of traversal in resource distribution table, until whole resource distribution table traversals are completed;
In the ergodic process of the resource distribution table, son is calculated from each target calculated in subtask of the calculating task and is appointed
Each calculating subtask of traversal of being engaged in, for the current calculating subtask traversed, from corresponding group of the resource distribution table
Selection target computing resource in conjunction relationship, and corresponding target storage resource is obtained, establish the current calculating subtask, target
The corresponding relationship of computing resource and at least one target storage resource is completed until all calculating subtask traversal;Wherein, institute
The corresponding computing resource in each calculating subtask for stating calculating task is different, each calculating subtask of the calculating task
Corresponding storage resource is different;According to each corresponding computing resource in calculating subtask and corresponding storage resource, generate
Distribution data acquisition system of the calculating task under the resource distribution table;After whole resource distribution tables traversal is completed, obtain
Take distribution data acquisition system of the calculating task under each resource distribution table.
Further, the performance matrix computing module 350, comprising: runing time computing unit, for according to be assessed
The performance parameter set of chip calculates the input that subtask is respectively calculated in the load matrix under each resource allocation policy
The processing elapsed time and result data handling time of data handling time, input data;By the calculating subtask described
The processing elapsed time of input data handling time, input data under resource allocation policy and result data handling time it is total
With the runing time as the calculating subtask under the resource allocation policy;According to each calculating subtask in each institute
The runing time under resource allocation policy is stated, the performance matrix of the network model is formed.
Further, the quantity allotted for the computing resource for including in each resource distribution table is equal to the chip performance
The quantity of processor in parameter sets.
Depth provided by the embodiment of the present invention can be performed in above-mentioned deep learning neural network model load computing device
Neural network model load calculating method is practised, has the corresponding function of deep learning neural network model load calculating method of execution
It can module and beneficial effect.
Example IV
Fig. 4 is the structural schematic diagram of one of the embodiment of the present invention four computer equipment.Fig. 4, which is shown, to be suitable for being used to realizing this
The block diagram of the exemplary computer device 12 of invention embodiment.The computer equipment 12 that Fig. 4 is shown is only an example, no
The function and use scope for coping with the embodiment of the present invention bring any restrictions.
As shown in figure 4, computer equipment 12 is showed in the form of universal computing device.The component of computer equipment 12 can be with
Including but not limited to: one or more processor or processing unit 16, system storage 28 connect different system components
The bus 18 of (including system storage 28 and processing unit 16).Computer equipment 12 can be the equipment being articulated in bus.
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts
For example, these architectures include but is not limited to industry standard architecture (Industry Standard
Architecture, ISA) bus, microchannel architecture (Micro Channel Architecture, MCA) bus, increasing
Strong type isa bus, Video Electronics Standards Association (Video Electronics Standards Association, VESA) office
Domain bus and peripheral component interconnection (Peripheral Component Interconnect, PCI) bus.
Computer equipment 12 typically comprises a variety of computer system readable media.These media can be it is any can be by
The usable medium that computer equipment 12 accesses, including volatile and non-volatile media, moveable and immovable medium.
System storage 28 may include the computer system readable media of form of volatile memory, such as arbitrary access
Memory (RAM) 30 and/or cache memory 32.Computer equipment 12 may further include it is other it is removable/can not
Mobile, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing not
Movably, non-volatile magnetic media (Fig. 4 do not show, commonly referred to as " hard disk drive ").It although not shown in fig 4, can be with
The disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") is provided, and non-volatile to moving
CD (such as compact disc read-only memory (Compact Disc Read-Only Memory, CD-ROM), digital video disk
(Digital Video Disc-Read Only Memory, DVD-ROM) or other optical mediums) read-write CD drive.
In these cases, each driver can be connected by one or more data media interfaces with bus 18.System storage
Device 28 may include at least one program product, which has one group of (for example, at least one) program module, these journeys
Sequence module is configured to perform the function of various embodiments of the present invention.
Program/utility 40 with one group of (at least one) program module 42 can store and store in such as system
In device 28, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other
It may include the realization of network environment in program module and program data, each of these examples or certain combination.Journey
Sequence module 42 usually executes function and/or method in embodiment described in the invention.
Computer equipment 12 can also be with one or more external equipment 14(such as keyboard, sensing equipment, display 24
Deng) communication, can also be enabled a user to one or more equipment interact with the computer equipment 12 communicate, and/or with make
The computer equipment 12 any equipment (such as network interface card, the modulatedemodulate that can be communicated with one or more of the other calculating equipment
Adjust device etc.) communication.This communication can be carried out by input/output (Input/Output, I/O) interface 22.Also, it calculates
Machine equipment 12 can also pass through network adapter 20 and one or more network (such as local area network (Local Area
Network, LAN), wide area network (Wide Area Network, WAN) communication.As shown, network adapter 20 passes through bus
18 communicate with other modules of computer equipment 12.It should be understood that although not shown in fig 4, computer equipment 12 can be combined
Using other hardware and/or software module, including but not limited to: microcode, device driver, redundant processing unit, external magnetic
Dish driving array, (Redundant Arrays of Inexpensive Disks, RAID) system, tape drive and number
According to backup storage system etc..
Processing unit 16 by the program that is stored in system storage 28 of operation, thereby executing various function application and
Data processing, such as realize a kind of deep learning neural network model load calculating side provided by any embodiment of the invention
Method.
Embodiment five
The embodiment of the present invention five provides a kind of computer readable storage medium, is stored thereon with computer program, the program quilt
The deep learning neural network model load calculating method provided such as all inventive embodiments of the application is provided when processor executes:
That is, realization when the program is executed by processor: being parsed to the network model constructed in advance, by the meter of the network model
Calculation flowsheet simulation is at least two calculating tasks;Wherein, there are dependences at least two calculating task;According to matching in advance
At least one resource allocation policy set divides each calculating task, forms at least one and calculates subtask;According to
Each resource allocation policy, which is respectively that the calculating task is associated, all calculates subtask distribution resource, obtains the calculating
Distribution data acquisition system of the task under each resource allocation policy;The resource includes computing resource and storage resource;Statistics
Distribution data acquisition system of each calculating task under each resource allocation policy, forms the moment of load of the network model
Battle array;According to the performance parameter set and the load matrix of chip to be assessed, calculate the network model decompose it is each
Runing time of the subtask under each resource allocation policy is calculated, determines performance matrix, to evaluate chip operation institute
State the performance of network model.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media
Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable
Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or
Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool
There are electrical connection, the portable computer diskette, hard disk, RAM, read-only memory (Read Only of one or more conducting wires
Memory, ROM), erasable programmable read only memory (Erasable Programmable Read Only Memory,
EPROM), flash memory, optical fiber, portable CD-ROM, light storage device, magnetic memory device or above-mentioned any appropriate combination.
In this document, it includes or the tangible medium of storage program that the program can be by that computer readable storage medium, which can be any,
Instruction execution system, device or device use or in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but
It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be
Any computer-readable medium other than computer readable storage medium, which can send, propagate or
Transmission is for by the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited
In --- wireless, electric wire, optical cable, radio frequency (Radio Frequency, RF) etc. are above-mentioned any appropriate
Combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++,
It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion
Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.In
It is related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN or WAN --- even
It is connected to subscriber computer, or, it may be connected to outer computer (such as pass through internet using ISP
Connection).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (10)
1. a kind of deep learning neural network model load calculating method characterized by comprising
The network model constructed in advance is parsed, the calculation process of the network model is decomposed at least two calculating and is appointed
Business;Wherein, there are dependences at least two calculating task;
Each calculating task is divided according at least one preconfigured resource allocation policy, forms at least one meter
Operator task;
Being respectively that the calculating task is associated according to each resource allocation policy all calculates subtask distribution resource, obtains
Distribution data acquisition system of the calculating task under each resource allocation policy;The resource includes computing resource and storage money
Source;
Distribution data acquisition system of each calculating task under each resource allocation policy is counted, the network model is formed
Load matrix;
According to the performance parameter set and the load matrix of chip to be assessed, calculate what the network model decomposed
Each runing time for calculating subtask under each resource allocation policy, determines performance matrix, to evaluate the chip operation
The performance of the network model.
2. being incited somebody to action the method according to claim 1, wherein described parse the network model constructed in advance
The calculation process of the network model is decomposed at least two calculating tasks, comprising:
The network model is parsed, determines the hierarchical structure of the network model, the hierarchical structure of the network model
Including at least two layers;
Using the associated data processing operation of the layer as a calculating task, at least two calculating tasks are formed.
3. according to the method described in claim 2, it is characterized in that, described according at least one preconfigured resource allocation plan
Slightly each calculating task is divided, at least one is formed and calculates subtask, comprising:
According at least one preconfigured resource allocation policy, the quantity allotted of computing resource is determined;
Each calculating task is divided according to the quantity allotted, at least one is formed and calculates subtask;Wherein, each
The quantity for the calculating subtask that the calculating task divides is less than or equal to the quantity allotted.
4. according to the method described in claim 3, it is characterized in that, according at least one preconfigured resource allocation policy
Each calculating task is divided, is formed before at least one calculating subtask, further includes:
An at least resource distribution table is received, and is parsed, the corresponding computing resource of each resource distribution table is obtained and storage provides
The syntagmatic in source;
By the syntagmatic of a resource distribution table corresponding computing resource and storage resource, as a kind of resource allocation policy.
5. according to the method described in claim 4, it is characterized in that, described respectively described according to each resource allocation policy
Resource is distributed in the associated subtask that all calculates of calculating task, obtains the calculating task under each resource allocation policy
Distribute data acquisition system, comprising:
Each resource distribution table is traversed from target resource allocation list in each resource distribution table, until whole resource distributions
Table traversal is completed;
In the ergodic process of the resource distribution table, calculated from each target calculated in subtask of the calculating task
Each calculating subtask of traversal is played in subtask, corresponding from the resource distribution table for the current calculating subtask traversed
Syntagmatic in selection target computing resource, and obtain corresponding target storage resource, establish the current calculating subtask,
The corresponding relationship of target computational resource and at least one target storage resource is completed until all calculating subtask traversal;
Wherein, the corresponding computing resource in each calculating subtask of the calculating task is different, each institute of the calculating task
It states and calculates the corresponding storage resource difference in subtask;
According to each corresponding computing resource in calculating subtask and corresponding storage resource, the calculating task is generated in institute
State the distribution data acquisition system under resource distribution table;
After whole resource distribution tables traversal is completed, distribution number of the calculating task under each resource distribution table is obtained
According to set.
6. the method according to claim 1, wherein the performance parameter set according to chip to be assessed, with
And the load matrix, each calculating subtask that the network model decomposes is calculated under each resource allocation policy
Runing time determines performance matrix, comprising:
According to the performance parameter set of chip to be assessed, calculates and respectively calculate subtask in the load matrix in each resource point
Processing elapsed time and result data handling time with input data handling time, input data under strategy;
By the processing consumption of input data handling time, input data of the calculating subtask under the resource allocation policy
Runing time of the summation of time and result data handling time as the calculating subtask under the resource allocation policy;
According to each runing time for calculating subtask under each resource allocation policy, the property of the network model is formed
It can matrix.
7. according to the method described in claim 4, it is characterized in that, point for the computing resource for including in each resource distribution table
Quantity with the processor equal in number in the chip performance parameter sets.
8. a kind of deep learning neural network model loads computing device characterized by comprising
Calculating task parsing module, for being parsed to the network model constructed in advance, by the calculating stream of the network model
Journey is decomposed at least two calculating tasks;Wherein, there are dependences at least two calculating task;
Calculating task division module, for according at least one preconfigured resource allocation policy to each calculating task into
Row divides, and forms at least one and calculates subtask;
Resource distribution module all calculates son for being respectively that the calculating task is associated according to each resource allocation policy
Task distributes resource, obtains distribution data acquisition system of the calculating task under each resource allocation policy;The resource packet
Include computing resource and storage resource;
Load matrix generation module, for counting distribution data set of each calculating task under each resource allocation policy
It closes, forms the load matrix of the network model;
Performance matrix computing module calculates institute for the performance parameter set and the load matrix according to chip to be assessed
The runing time of each calculating subtask that network model decomposes under each resource allocation policy is stated, determines performance square
Battle array, to evaluate the performance that the chip runs the network model.
9. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor
Calculation machine program, which is characterized in that the processor realizes the depth as described in any in claim 1-7 when executing described program
Learning neural network model load calculation method.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
The deep learning neural network model load calculating method as described in any in claim 1-7 is realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911008660.3A CN110515739B (en) | 2019-10-23 | 2019-10-23 | Deep learning neural network model load calculation method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911008660.3A CN110515739B (en) | 2019-10-23 | 2019-10-23 | Deep learning neural network model load calculation method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110515739A true CN110515739A (en) | 2019-11-29 |
CN110515739B CN110515739B (en) | 2020-01-31 |
Family
ID=68633608
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911008660.3A Active CN110515739B (en) | 2019-10-23 | 2019-10-23 | Deep learning neural network model load calculation method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110515739B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111047017A (en) * | 2019-12-18 | 2020-04-21 | 北京安兔兔科技有限公司 | Neural network algorithm evaluation method and device and electronic equipment |
CN111162946A (en) * | 2019-12-30 | 2020-05-15 | 北京奇艺世纪科技有限公司 | Method for constructing model inference network, data processing method, data processing device and storage medium |
CN111158901A (en) * | 2019-12-09 | 2020-05-15 | 北京迈格威科技有限公司 | Optimization method and device of computation graph, computer equipment and storage medium |
CN111340237A (en) * | 2020-03-05 | 2020-06-26 | 腾讯科技(深圳)有限公司 | Data processing and model operation method, device and computer equipment |
CN111738434A (en) * | 2020-06-03 | 2020-10-02 | 中国科学院计算技术研究所 | Method for executing deep neural network on heterogeneous processing unit |
CN111858070A (en) * | 2020-08-05 | 2020-10-30 | 中国工商银行股份有限公司 | Computing resource allocation method, device, equipment and storage medium |
CN111860758A (en) * | 2020-04-07 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Operation method and device of deep learning model, electronic equipment and medium |
CN112598112A (en) * | 2020-12-04 | 2021-04-02 | 深圳大学 | Resource scheduling method based on graph neural network |
CN113268404A (en) * | 2021-05-28 | 2021-08-17 | 曙光信息产业(北京)有限公司 | Performance analysis and optimization method and device, computer equipment and storage medium |
WO2021259106A1 (en) * | 2020-06-22 | 2021-12-30 | 深圳鲲云信息科技有限公司 | Method, system, and device for optimizing neural network chip, and storage medium |
CN113884857A (en) * | 2021-09-29 | 2022-01-04 | 上海阵量智能科技有限公司 | Chip, chip pressure testing method and device, electronic equipment and storage medium |
WO2022042519A1 (en) * | 2020-08-27 | 2022-03-03 | 北京灵汐科技有限公司 | Resource allocation method and apparatus, and computer device and computer-readable storage medium |
WO2022116142A1 (en) * | 2020-12-04 | 2022-06-09 | 深圳大学 | Resource scheduling method based on graph neural network |
CN116501594A (en) * | 2023-06-27 | 2023-07-28 | 上海燧原科技有限公司 | System modeling evaluation method and device, electronic equipment and storage medium |
CN116501505A (en) * | 2023-06-27 | 2023-07-28 | 上海燧原科技有限公司 | Method, device, equipment and medium for generating data stream of load task |
CN116737605A (en) * | 2023-08-11 | 2023-09-12 | 上海燧原科技有限公司 | Data prefetching method, device, equipment and medium based on chip multilevel storage |
WO2024022046A1 (en) * | 2022-07-28 | 2024-02-01 | 华为技术有限公司 | Deep learning system and method |
US11907098B2 (en) * | 2022-04-01 | 2024-02-20 | Rebellions Inc. | Method for measuring performance of neural processing device and device for measuring performance |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120109864A1 (en) * | 2010-10-29 | 2012-05-03 | International Business Machines Corporation | Neuromorphic and synaptronic spiking neural network with synaptic weights learned using simulation |
CN106649060A (en) * | 2015-11-02 | 2017-05-10 | 中国移动通信集团公司 | Equipment performance testing method and device |
CN108197083A (en) * | 2018-01-31 | 2018-06-22 | 湖南农业大学 | The short-term job load predicting method that a kind of data center's linear regression is merged with wavelet neural network |
US10019668B1 (en) * | 2017-05-19 | 2018-07-10 | Google Llc | Scheduling neural network processing |
CN109901878A (en) * | 2019-02-25 | 2019-06-18 | 北京灵汐科技有限公司 | One type brain computing chip and calculating equipment |
CN110333945A (en) * | 2019-05-09 | 2019-10-15 | 成都信息工程大学 | A kind of dynamic load balancing method, system and terminal |
-
2019
- 2019-10-23 CN CN201911008660.3A patent/CN110515739B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120109864A1 (en) * | 2010-10-29 | 2012-05-03 | International Business Machines Corporation | Neuromorphic and synaptronic spiking neural network with synaptic weights learned using simulation |
CN106649060A (en) * | 2015-11-02 | 2017-05-10 | 中国移动通信集团公司 | Equipment performance testing method and device |
US10019668B1 (en) * | 2017-05-19 | 2018-07-10 | Google Llc | Scheduling neural network processing |
CN108197083A (en) * | 2018-01-31 | 2018-06-22 | 湖南农业大学 | The short-term job load predicting method that a kind of data center's linear regression is merged with wavelet neural network |
CN109901878A (en) * | 2019-02-25 | 2019-06-18 | 北京灵汐科技有限公司 | One type brain computing chip and calculating equipment |
CN110333945A (en) * | 2019-05-09 | 2019-10-15 | 成都信息工程大学 | A kind of dynamic load balancing method, system and terminal |
Non-Patent Citations (1)
Title |
---|
M. SCHWARZ等: "A parallel neural network emulator based on application-specific VLSI communication chips", 《PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON MICROELECTRONICS FOR NEURAL NETWORKS AND FUZZY SYSTEMS》 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111158901B (en) * | 2019-12-09 | 2023-09-08 | 爱芯元智半导体(宁波)有限公司 | Optimization method, optimization device, computer equipment and storage medium for calculation graph |
CN111158901A (en) * | 2019-12-09 | 2020-05-15 | 北京迈格威科技有限公司 | Optimization method and device of computation graph, computer equipment and storage medium |
CN111047017B (en) * | 2019-12-18 | 2023-06-23 | 北京安兔兔科技有限公司 | Neural network algorithm evaluation method and device and electronic equipment |
CN111047017A (en) * | 2019-12-18 | 2020-04-21 | 北京安兔兔科技有限公司 | Neural network algorithm evaluation method and device and electronic equipment |
CN111162946A (en) * | 2019-12-30 | 2020-05-15 | 北京奇艺世纪科技有限公司 | Method for constructing model inference network, data processing method, data processing device and storage medium |
CN111162946B (en) * | 2019-12-30 | 2022-07-12 | 北京奇艺世纪科技有限公司 | Method for constructing model inference network, data processing method, data processing device and storage medium |
CN111340237A (en) * | 2020-03-05 | 2020-06-26 | 腾讯科技(深圳)有限公司 | Data processing and model operation method, device and computer equipment |
CN111340237B (en) * | 2020-03-05 | 2024-04-26 | 腾讯科技(深圳)有限公司 | Data processing and model running method, device and computer equipment |
CN111860758A (en) * | 2020-04-07 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Operation method and device of deep learning model, electronic equipment and medium |
CN111860758B (en) * | 2020-04-07 | 2024-05-03 | 北京嘀嘀无限科技发展有限公司 | Deep learning model operation method and device, electronic equipment and medium |
CN111738434A (en) * | 2020-06-03 | 2020-10-02 | 中国科学院计算技术研究所 | Method for executing deep neural network on heterogeneous processing unit |
CN111738434B (en) * | 2020-06-03 | 2023-04-07 | 中国科学院计算技术研究所 | Method for executing deep neural network on heterogeneous processing unit |
WO2021259106A1 (en) * | 2020-06-22 | 2021-12-30 | 深圳鲲云信息科技有限公司 | Method, system, and device for optimizing neural network chip, and storage medium |
CN111858070B (en) * | 2020-08-05 | 2023-12-01 | 中国工商银行股份有限公司 | Computing resource allocation method, device, equipment and storage medium |
CN111858070A (en) * | 2020-08-05 | 2020-10-30 | 中国工商银行股份有限公司 | Computing resource allocation method, device, equipment and storage medium |
WO2022042519A1 (en) * | 2020-08-27 | 2022-03-03 | 北京灵汐科技有限公司 | Resource allocation method and apparatus, and computer device and computer-readable storage medium |
CN112598112A (en) * | 2020-12-04 | 2021-04-02 | 深圳大学 | Resource scheduling method based on graph neural network |
CN112598112B (en) * | 2020-12-04 | 2021-09-10 | 深圳大学 | Resource scheduling method based on graph neural network |
WO2022116142A1 (en) * | 2020-12-04 | 2022-06-09 | 深圳大学 | Resource scheduling method based on graph neural network |
CN113268404A (en) * | 2021-05-28 | 2021-08-17 | 曙光信息产业(北京)有限公司 | Performance analysis and optimization method and device, computer equipment and storage medium |
CN113884857A (en) * | 2021-09-29 | 2022-01-04 | 上海阵量智能科技有限公司 | Chip, chip pressure testing method and device, electronic equipment and storage medium |
CN113884857B (en) * | 2021-09-29 | 2024-03-08 | 上海阵量智能科技有限公司 | Chip, chip pressure testing method and device, electronic equipment and storage medium |
US11907098B2 (en) * | 2022-04-01 | 2024-02-20 | Rebellions Inc. | Method for measuring performance of neural processing device and device for measuring performance |
WO2024022046A1 (en) * | 2022-07-28 | 2024-02-01 | 华为技术有限公司 | Deep learning system and method |
CN116501594A (en) * | 2023-06-27 | 2023-07-28 | 上海燧原科技有限公司 | System modeling evaluation method and device, electronic equipment and storage medium |
CN116501505B (en) * | 2023-06-27 | 2023-09-12 | 上海燧原科技有限公司 | Method, device, equipment and medium for generating data stream of load task |
CN116501594B (en) * | 2023-06-27 | 2023-09-08 | 上海燧原科技有限公司 | System modeling evaluation method and device, electronic equipment and storage medium |
CN116501505A (en) * | 2023-06-27 | 2023-07-28 | 上海燧原科技有限公司 | Method, device, equipment and medium for generating data stream of load task |
CN116737605B (en) * | 2023-08-11 | 2023-11-14 | 上海燧原科技有限公司 | Data prefetching method, device, equipment and medium based on chip multilevel storage |
CN116737605A (en) * | 2023-08-11 | 2023-09-12 | 上海燧原科技有限公司 | Data prefetching method, device, equipment and medium based on chip multilevel storage |
Also Published As
Publication number | Publication date |
---|---|
CN110515739B (en) | 2020-01-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110515739A (en) | Deep learning neural network model load calculating method, device, equipment and medium | |
CN110705705B (en) | Convolutional neural network model synchronous training method, cluster and readable storage medium | |
CN104243617B (en) | Towards the method for scheduling task and system of mixed load in a kind of isomeric group | |
CN106951926A (en) | The deep learning systems approach and device of a kind of mixed architecture | |
CN107766148A (en) | A kind of isomeric group and task processing method and device | |
Castiglione et al. | Modeling performances of concurrent big data applications | |
US8271252B2 (en) | Automatic verification of device models | |
CN110502323B (en) | Real-time scheduling method for cloud computing tasks | |
US20230099117A1 (en) | Spiking neural network-based data processing method, computing core circuit, and chip | |
CN110825522A (en) | Spark parameter self-adaptive optimization method and system | |
CN106250933A (en) | Method, system and the FPGA processor of data clusters based on FPGA | |
CN109710406A (en) | Data distribution and its model training method, device and computing cluster | |
CN107360026A (en) | Distributed message performance of middle piece is predicted and modeling method | |
CN106406820B (en) | A kind of multi-emitting parallel instructions processing method and processing device of network processor micro-engine | |
CN115640851A (en) | Neural network efficient reasoning method suitable for test instrument | |
CN114925854A (en) | Federal learning node selection method and system based on gradient similarity measurement | |
CN106844024B (en) | GPU/CPU scheduling method and system of self-learning running time prediction model | |
JPWO2007043144A1 (en) | Load test apparatus and method | |
CN111367632B (en) | Container cloud scheduling method based on periodic characteristics | |
CN110222410A (en) | A kind of electromagnetic environment emulation method based on Hadoop MapReduce | |
CN106294146B (en) | Parameter replacement test method and device | |
Wang et al. | A deep reinforcement learning method for solving task mapping problems with dynamic traffic on parallel systems | |
CN104657534A (en) | Methods and systems for reporting realistic kinetic energy of multi-part finite element analysis model | |
CN109684067B (en) | Automatic generation and operation system and method for task scheduling plan | |
CN109788061A (en) | Calculating task dispositions method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |