CN115016938A - Calculation graph automatic partitioning method based on reinforcement learning - Google Patents
Calculation graph automatic partitioning method based on reinforcement learning Download PDFInfo
- Publication number
- CN115016938A CN115016938A CN202210650630.8A CN202210650630A CN115016938A CN 115016938 A CN115016938 A CN 115016938A CN 202210650630 A CN202210650630 A CN 202210650630A CN 115016938 A CN115016938 A CN 115016938A
- Authority
- CN
- China
- Prior art keywords
- graph
- core
- reinforcement learning
- action
- condition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Neurology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a calculation graph automatic partitioning method based on reinforcement learning, which comprises the following steps: step 1, performing topological sorting on a calculation graph to convert the calculation graph into a linear table; step 2, modeling a computational graph to be divided and a many-core processor as a Markov decision process in reinforcement learning, extracting a sub-graph division condition and a core resource distribution condition of the current computational graph as states in the reinforcement learning, adjusting layer number distribution between two adjacent cores as actions in the reinforcement learning, and using the running time and storage condition of the computational graph on the many-core processor as rewards in the reinforcement learning; and 3, solving the Markov decision process by using a REINFORCE algorithm, and dividing an algorithm model by a training graph.
Description
Technical Field
The invention belongs to the field of resource allocation and reinforcement learning, and relates to a method for solving the problem of deep learning calculation graph partitioning by using a reinforcement learning algorithm.
Background
In recent years, deep learning has enjoyed dramatic success in the fields of image analysis, natural language processing, speech recognition, video classification, and the like. However, deep learning relies on powerful computing power, and optimizing the system framework of deep learning to reduce computing power requirements plays an important role in deep learning applications. In the face of the explosive computing requirement of a deep learning model, an AI chip aiming at the field of AI is widely applied.
The AI chip generally employs a many-core architecture. The AI chip is dedicated to handle a large number of computational tasks in AI applications, while other non-computational tasks are still handled by the CPU. The AI chip integrates multiple cores and functionally can purposely accelerate certain algorithms or tasks. The definition of AI chips in the market comes more from the functional aspect, regardless of its architecture. In recent years, AI chip products related to deep learning have appeared in succession, and solutions thereof have been introduced from science and technology capitals such as google, intel, and invida to entrepreneur companies such as cambrian and horizon. With the development and maturity of chip design technology, the AI chip architecture realizes rapid iteration with the development of AI technology.
The deep learning compiler often splits a model realized by different frameworks into a plurality of subtasks to be deployed on a many-core chip, and a pipeline structure is used for processing a computing task so as to achieve optimal performance. In a pipeline structure, a large computation task can be divided into a plurality of parallel subtask sets, wherein each subtask set is processed by a plurality of cores in a many-core chip in parallel, and the mapping from the subtask set to a core resource set is completed by a Run-time system (Run-time system). When a certain deep learning model is operated as a calculation task, dividing the subtask into a plurality of subgraphs, namely dividing the calculation graph of the deep learning model. In the computational graph, the computational process is simulated by using the nodes, and the advantage is that the operation can be divided, and even the operation can be carried out in a multi-GPU mode.
It is important in a pipeline architecture how to distribute computing tasks to individual processors in an optimized manner, a problem commonly referred to as load balancing. Load balancing is a fundamental problem in parallel computing, which maximizes parallel application performance by minimizing idle time of processors and communication time between processors.
The scheduling of the pipeline structure can reduce the idle time of the processor, improve the program execution performance and increase the utilization rate of hardware resources. However, limitations of system resources such as processor core performance, on-chip storage, communication bandwidth, etc., can impact software pipelining performance.
Reinforcement learning has achieved certain achievements in solving resource scheduling. Mirhoseini et al, 2017, proposed the use of reinforcement learning to optimize computational graph node assignment in a distributed system for the TensorFlow model. The paper uses a sequence-to-sequence (Seq2Seq) model. The model consists of an encoder and a decoder, nodes of the computational graph are input into the model according to topological sequencing, and equipment distributed to each node serves as output. Since then, many reinforcement learning based resource scheduling schemes have been proposed. Addanki et al use a reinforcement learning algorithm to achieve the scheduling of neural networks over distributed resources. This study iterates through the resource allocation scheme rather than obtaining the node allocation scheme of the computational graph at once. The subsequent research also commonly utilizes reinforcement learning to solve the task scheduling problem in the distributed system, and the differences are mostly reflected on a specific deep learning model. However, the above methods for solving the resource scheduling problem by reinforcement learning allocate resources according to the resource layout, and are not suitable for the situation that the core resource layout of the processor changes. The Luo provides an algorithm based on deep reinforcement learning and multi-level graph division ideas aiming at the resource scheduling problem of a distributed stream processing system, and resources are distributed according to the number of the resources. And reducing the complexity of the graph by graph coarsening, processing the large-scale data flow graph into a small-scale data flow graph, training by using reinforcement learning, and mapping the result into the large-scale data flow graph. However, after the data flow graph is subjected to coarsening processing, the space for searching the optimal solution by reinforcement learning is limited, so that the reinforcement learning effect is limited to a certain extent.
The invention mainly researches how to effectively divide the computation graph corresponding to the model and realize load balance when training the deep learning model on the many-core chip architecture. Therefore, an algorithm for automatically dividing the deep learning calculation graph and allocating the number of the core resources to each sub-graph is designed, so that the running time of the deep learning model in many core chips is minimum, and the method is a method for allocating the resources according to the number of the resources.
Disclosure of Invention
The invention aims to provide a computation graph automatic partitioning method based on reinforcement learning, which can automatically partition a deep learning computation graph corresponding to a deep learning model into sub-graphs and allocate core resources to each sub-graph according to the number of resources so as to achieve the aim of shortening the running time of the deep learning model.
In order to achieve the above object, the method for automatically partitioning a computational graph based on reinforcement learning according to the present invention comprises the following steps:
step 1, performing topological sorting on a calculation graph to convert the calculation graph into a linear table;
and 3, solving the Markov decision process by using a REINFORCE algorithm, and dividing an algorithm model by a training graph.
The specific process of the step 1 is as follows:
and 11, carrying out topological sequencing on the deep learning calculation graph, and converting the graph into a linear table structure. The element arrangement sequence in the linear table is consistent with the operation sequence of the nodes, and the data elements in the linear table correspond to the layers in the deep learning model;
and step 12, seeking to represent the type and the hyper-parameter of the operation of the data of each layer and the data quantity of each edge in the graph, thereby calculating the total number of nodes in the computation graph, the operation quantity of each operation, the required storage quantity and the routing quantity.
The step 2 specifically comprises the following steps:
step 21, extracting the subgraph division condition and the core resource allocation condition of the current computation graph as the states in reinforcement learning: the state is composed of two parts, the first part is a node division and resource allocation state of the computational graph, and the second part is an operand state of each subgraph;
step 22, the adjustment of the layer number distribution between two adjacent cores is taken as the action in reinforcement learning, and there are four types of actions: merging all layers of two adjacent cores to a next core for processing, merging all layers of two adjacent cores to a previous core for processing, handing the last layer processed by the previous core (namely the last layer in the linear table) to the next core for processing, and handing the first layer processed by the next core (namely the first layer in the linear table) to the previous core for processing;
and step 23, taking the running time and the storage condition of the computation graph on the many-core chip as rewards in reinforcement learning: the reward value is set to reward ═ a × max (T) + b, where T ═ T 1 ,t 2 ,…,t k "is the running time of the computation graph G on the many-core processor M, S ═ S { [ S ] } 1 ,s 2 ,…,s k Indicates the data storage on the many-core processor M. If max(s) exceeds the limit, b is assigned a penalty value. The goal of the training is to make the prize value as large as possible.
The step 3 specifically includes the following steps:
step 31, initializing the whole graph partitioning environment, importing the deep learning calculation graph converted into the linear table structure, counting the total number of nodes, initializing the core resource allocation condition, initializing optional actions according to the total number of the core resources, changing variables recording the number of cores in the subgraph, variables recording the states of all the core resources, variables recording the number of partitioned nodes, variables recording reward values and the like into initial states, and initializing the probability of all the actions;
step 32, selecting an action a according to the action probability, and changing the current environment state s after executing the action a;
step 33, calculating the calculation amount, the storage amount and the routing amount of each sub-graph according to the current state, respectively comparing the three values in each sub-graph to obtain three values with the worst condition, and performing weighting operation to calculate the reward value r;
step 34, judging whether the reward value meets the preset requirement: if the probability of the action is up, updating the action probability and ending the current round, if the probability of the action is not up, continuing to select the action, and repeating the steps 32-34;
step 35, inputting all s, a and r of the turn into a neural network for training, and updating the probability of selecting an action;
and step 36, finishing the training process after the set number of rounds is reached, and storing the neural network model.
The above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same, and any modifications made on the basis of the technical solutions according to the technical ideas presented by the present invention are within the scope of the present invention.
Drawings
FIG. 1 is a flow chart of a computational graph automatic partitioning method based on reinforcement learning;
FIG. 2 is a schematic diagram of a many-core chip oriented computation graph automatic partitioning method structure based on deep reinforcement learning;
fig. 3 is a diagram of a neural network architecture.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings:
the invention provides a computation graph automatic partitioning method based on reinforcement learning, which comprises three parts, namely a Markov decision process for performing topological sequencing on a computation graph and modeling a computation graph partitioning problem into reinforcement learning, and a neural network model is trained by utilizing a REINFORCE algorithm-based many-core processor-oriented computation graph automatic partitioning algorithm. The specific implementation method comprises the following steps:
defining the many-core processor-oriented computation graph partitioning problem as follows: for a computation graph, G ═ (O, E), O ═ op for a computation graph 1 ,op 2 ,···op m Is the set of operators on the graph, and E is the set of E edges. Dividing the calculation graph G into sub-graph sets G' ═ G 1 ,g 2 ,···,g k Where k ≦ m, i.e. each sub-graph g consists of one or more operators. The computational graph G is deployed on a many-core processor M and runs, and n core resources are integrated on the M. Assigning a core number to each subgraph, C ═ C 1 ,c 2 ,…,c k Is a collection of sets of kernel resources, denoted as subgraph g i Is assigned c i A core resource, wherein c 1 +c 2 +···+c k N, responsible for sub-graph g i C of i Each core resource constitutes a group of core resources. Each partitioning scheme P ═ G ', C denotes that the computation graph G is partitioned into subgraphs in a G' manner, and core resources are allocated to each subgraph in the core number allocation scheme of C. Under the partitioning scheme P, s (P) { s ═ s 1 ,s 2 ,…,s k Denotes the data storage situation on the many-core processor M, where s i Represents the storage amount of a single core in the ith core resource group, T (P) ═ t 1 ,t 2 ,…,t k "is the running time situation of the computation graph G on the many-core processor M, where t i Representation scheme g i The run time of (c). The training goal is to find an allocation scheme P ═ G', C such that max (t) is the shortest, and max(s) does not exceed the upper memory limit of the many-core processor M.
According to the analysis, the invention relates to a calculation graph automatic partitioning method based on reinforcement learning, which comprises the following steps:
step A, performing topological sorting on a calculation graph to convert the calculation graph into a linear table;
step B, modeling a computational graph to be divided and a many-core processor as a Markov decision process in reinforcement learning;
and step C, solving the Markov decision process by using a REINFORCE algorithm, and dividing an algorithm model by a training picture.
Further, the detailed description of step A is provided in the summary of the invention.
The step B specifically comprises the following steps:
and step B1, modeling the state as a subgraph division condition and a core resource allocation condition of the current computation graph. Because the main factor influencing load balance is the operand of the subgraph, the state is composed of two parts, the first part is the node division and resource allocation state of the computational graph, and the second part is the operand state of each subgraph.
The node division and resource allocation state of the computation graph is recorded by a list shared _ core, and the list shared _ core represents the condition that each core resource on the many-core processor processes the computation graph. The processor includes n core resources 1 ,core 2 ,···,core n Denotes the list of associated _ core as [ layers } 1 ,layers 2 ,layers 3 ,layers 4 ,…,layers n ]WhereinAnd 0 or more layers i Less than or equal to m, and making layers i As core i The number of layers treated. If layers i All are not 0, the whole calculation graph is divided into n sub-graphs, and the number of layers is layers 1 ,layers 2 ,layers 3 ,layers 4 ,…,layers n . When layers appear i ,layers i+1 ,…,layers i+p Are all 0 and layers i+p+1 Not equal to 0, namely when p +1 0 s continuously appear in the arraged _ core (p is more than or equal to 0 and less than or equal to m-1), a core resource group is formed from the ith core resource to the (i + p) th core resource, and the layers are processed i+p+1 Each layer constitutes a sub-graph. It is emphasized that, in actual operation, core i ,core i+1 ,…,core i+p Rather than dealing with 0 layers, the state settings here are merely modeling to facilitate reinforcement learning.
The operand status of each sub-graph is recorded by a list of allocated _ macs, which represents the layers in the allocated _ core list i The calculated amount of each layer. The allocated _ mac corresponds to the allocated _ core setting, denoted as [ mac s 1 ,macs 2 ,macs 3 ,macs 4 …macs n ]Wherein macs i Are layers i Total amount of operations of individual layers. Regarding macs i Definition of 0 and layers i Similarly.
And step B2, under the current dividing state, adjusting the layer number distribution between two adjacent cores to be modeled as one action.
There are four types of actions for adjusting the number of layers for two adjacent cores. To adjacent core i ,core i+1 Four types of actions that can be taken are: (a) two adjacent cores are connected i ,core i+1 Layer on to core on the next core i+1 And (6) carrying out the above treatment. I.e. core i Incorporated into core i+1 In the core resource group, the new core resource group needs to bear the layer i The task of (2). (b) Two adjacent cores are connected i ,core i+1 Layer on the previous core i And (6) performing the above treatment. I.e. core i Core resource group processing layers i ,layers i+1 All layers of (1), and core i+1 Adding core i+2 The set of core resources of (1). (c) The previous core resource is core i Treated layers i The last layer in the layer (i.e. the last layer in the linear table) is handed over to the next core resource core i+1 And (6) processing. (d) The latter core resource is core i+1 Treated layers i+1 The first layer in the hierarchy (i.e., the first layer in the linear table) is handed over to the previous core resource core i And (6) processing.
The action space is the set of all selectable actions. Actions are taken on two adjacent cores, and for n cores, there are n-1 groups of adjacent cores, and each type of action has n-1 choices. There are four types of motion, so the motion space size is 4 x (n-1). Table 1 shows the definitions of these four types of actions, the first two types of actions being merge actions and the last two types of actions being split actions. To facilitate code writing, the action number starts at 0.
TABLE 1 Definitions of four classes of actions
When an action is performed, it may happen that the action is invalid for the environmental state at that time, and the environmental state is left unchanged. For example, if two cores are originally in the same core resource group, and the adjustment of the number of layers is not effective at this time, the action will not change the state.
Step B3 models the run time and storage of the computation graph on the many-core chip as rewards. The goal here is to find an allocation scheme P ═ G', C such that max (t) is the shortest, and max(s) does not exceed the upper memory limit of the many-core processor M. The reward value is set to reward-a max (t) + b, and b is given a penalty value if max(s) exceeds the limit. The goal of the training is to make the prize value as large as possible.
Further, the detailed steps of the training diagram partitioning algorithm model in the step C are as follows:
step C1, initialize the whole graph partitioning environment. Importing a deep learning calculation graph converted into a linear table structure, counting the total number of nodes, initializing the distribution condition of core resources, initializing optional actions according to the total number of the core resources, changing variables recording the number of cores in the sub-graph, variables recording the states of each core resource, variables recording the number of divided nodes, variables recording an incentive value and the like into initial states, initializing the probability of all the actions, recording the total number of the cores as n, and setting m to be n-1;
step C2, selecting action a according to the action probability;
step C3, selecting corresponding action according to the value of a, and executing action a;
step C3, calculating the calculation amount, the storage amount and the routing amount of each subgraph according to the current state s, respectively comparing the three values in each subgraph to obtain the three values with the worst condition, and performing weighting operation to calculate the reward value r;
step C4, judging whether r meets the preset requirement: if the probability of the action is up, updating the action probability and ending the current round, if the probability of the action is not up, continuing to select the action, and repeating the steps 32-34;
step C5, inputting all s, a, r of the turn into the neural network for training, and updating the probability of selecting action (FIG. 3 shows the structure of the neural network);
and step C6, finishing the training process after the set number of rounds is reached, and storing the neural network model.
The above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same, and any modifications made on the basis of the technical solutions according to the technical ideas presented by the present invention are within the scope of the present invention.
Claims (4)
1. A computational graph automatic partitioning method based on reinforcement learning is characterized by comprising the following steps:
step 1, performing topological sorting on a calculation graph to convert the calculation graph into a linear table;
step 2, modeling a computation graph to be divided and a many-core processor as a Markov decision process in reinforcement learning, extracting a sub-graph division condition and a core resource distribution condition of the current computation graph as states in the reinforcement learning, adjusting layer number distribution between two adjacent cores as actions in the reinforcement learning, and using the running time and the storage condition of the computation graph on the many-core processor as rewards in the reinforcement learning;
and 3, solving the Markov decision process by using a REINFORCE algorithm, and dividing an algorithm model by a training graph.
2. The method for automatically partitioning a computational graph based on reinforcement learning according to claim 1, wherein the specific process of the step 1 is as follows:
and 11, carrying out topological sequencing on the deep learning calculation graph, and converting the graph into a linear table structure. The element arrangement sequence in the linear table is consistent with the operation sequence of the nodes, and the data elements in the linear table correspond to the layers in the deep learning model;
and step 12, seeking to represent the type and the hyper-parameter of the operation of the data of each layer and the data quantity of each edge in the graph, thereby calculating the total number of nodes in the computation graph, the operation quantity of each operation, the required storage quantity and the routing quantity.
3. The method for automatically partitioning a computational graph based on reinforcement learning according to claim 1, wherein the specific process of the step 2 is:
step 21, extracting the subgraph division condition and the core resource allocation condition of the current computation graph as the states in reinforcement learning: the state is composed of two parts, the first part is a node division and resource allocation state of the computational graph, and the second part is an operand state of each subgraph;
step 22, taking the adjustment of the layer number distribution between two adjacent cores as the actions in the reinforcement learning, wherein there are four types of actions: merging all layers of two adjacent cores to a next core for processing, merging all layers of two adjacent cores to a previous core for processing, handing the last layer processed by the previous core (namely the last layer in the linear table) to the next core for processing, and handing the first layer processed by the next core (namely the first layer in the linear table) to the previous core for processing;
and step 23, taking the running time and the storage condition of the computation graph on the many-core chip as rewards in reinforcement learning: the reward value is set to reward ═ a × max (T) + b, where T ═ T 1 ,t 2 ,…,t k The running time of the computation graph G on the many-core processor M is shown, S ═ S 1 ,s 2 ,…,s k Denotes the data storage situation on the many-core processor M. If max(s) exceeds the limit, b is assigned a penalty value. The goal of the training is to make the prize value as large as possible.
4. The method for automatically partitioning a computational graph based on reinforcement learning according to claim 1, wherein the specific process of the step 3 is:
step 31, initializing the whole graph partitioning environment, importing the deep learning calculation graph converted into the linear table structure, counting the total number of nodes, initializing the core resource allocation condition, initializing optional actions according to the total number of the core resources, changing variables recording the number of cores in the subgraph, variables recording the states of all the core resources, variables recording the number of partitioned nodes, variables recording reward values and the like into initial states, and initializing the probability of all the actions;
step 32, selecting an action a according to the action probability, and changing the current environment state s after executing the action a;
step 33, calculating the calculation amount, the storage amount and the routing amount of each sub-graph according to the current state, respectively comparing the three values in each sub-graph to obtain three values with the worst condition, and performing weighting operation to calculate the reward value r;
step 34, judging whether the reward value meets the preset requirement: if the probability of the action is up, updating the action probability and ending the current round, if the probability of the action is not up, continuing to select the action, and repeating the steps 32-34;
step 35, inputting all s, a and r of the turn into a neural network for training, and updating the probability of selecting an action;
and step 36, after the set number of rounds is reached, ending the training process, and storing the neural network model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210650630.8A CN115016938A (en) | 2022-06-09 | 2022-06-09 | Calculation graph automatic partitioning method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210650630.8A CN115016938A (en) | 2022-06-09 | 2022-06-09 | Calculation graph automatic partitioning method based on reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115016938A true CN115016938A (en) | 2022-09-06 |
Family
ID=83073421
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210650630.8A Pending CN115016938A (en) | 2022-06-09 | 2022-06-09 | Calculation graph automatic partitioning method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115016938A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115374914A (en) * | 2022-10-24 | 2022-11-22 | 北京白海科技有限公司 | Distributed training method, parallel deep learning framework and electronic equipment |
WO2024065525A1 (en) * | 2022-09-29 | 2024-04-04 | Intel Corporation | Method and apparatus for optimizing deep learning computation graph |
-
2022
- 2022-06-09 CN CN202210650630.8A patent/CN115016938A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024065525A1 (en) * | 2022-09-29 | 2024-04-04 | Intel Corporation | Method and apparatus for optimizing deep learning computation graph |
CN115374914A (en) * | 2022-10-24 | 2022-11-22 | 北京白海科技有限公司 | Distributed training method, parallel deep learning framework and electronic equipment |
CN115374914B (en) * | 2022-10-24 | 2023-04-07 | 北京白海科技有限公司 | Distributed training method, parallel deep learning framework and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mirhoseini et al. | A hierarchical model for device placement | |
CN110737529A (en) | cluster scheduling adaptive configuration method for short-time multiple variable-size data jobs | |
Chen et al. | Deep learning research and development platform: Characterizing and scheduling with qos guarantees on gpu clusters | |
WO2022068663A1 (en) | Memory allocation method, related device, and computer readable storage medium | |
US12093791B2 (en) | Partitioning for an execution pipeline | |
CN112328380A (en) | Task scheduling method and device based on heterogeneous computing | |
CN115016938A (en) | Calculation graph automatic partitioning method based on reinforcement learning | |
Mahmoud et al. | Multiobjective task scheduling in cloud environment using decision tree algorithm | |
CN115543639A (en) | Optimization method for distributed execution of deep learning task and distributed system | |
CN113742089B (en) | Method, device and equipment for distributing neural network computing tasks in heterogeneous resources | |
CN112101525A (en) | Method, device and system for designing neural network through NAS | |
Noorian Talouki et al. | A hybrid meta-heuristic scheduler algorithm for optimization of workflow scheduling in cloud heterogeneous computing environment | |
CN113220450A (en) | Load prediction method, resource scheduling method and device for cloud-side multi-data center | |
CN112434785B (en) | Distributed parallel deep neural network performance evaluation method for supercomputer | |
CN112764893B (en) | Data processing method and data processing system | |
Dublish et al. | Poise: Balancing thread-level parallelism and memory system performance in GPUs using machine learning | |
Jalalian et al. | A hierarchical multi-objective task scheduling approach for fast big data processing | |
Ahmed et al. | Fuzzy active learning to detect OpenCL kernel heterogeneous machines in cyber physical systems | |
CN116680063A (en) | Task scheduling method, device, computing system, electronic equipment and storage medium | |
Shirazi et al. | PARSA: A parallel program scheduling and assessment environment | |
CN116800610A (en) | Distributed data plane resource optimization method and system | |
Li et al. | Performance optimization algorithm of radar signal processing system | |
Li et al. | Dynamic data replacement and adaptive scheduling policies in spark | |
CN116048759A (en) | Data processing method, device, computer and storage medium for data stream | |
Do et al. | Co-scheduling ensembles of in situ workflows |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |