CN115016938A - Calculation graph automatic partitioning method based on reinforcement learning - Google Patents

Calculation graph automatic partitioning method based on reinforcement learning Download PDF

Info

Publication number
CN115016938A
CN115016938A CN202210650630.8A CN202210650630A CN115016938A CN 115016938 A CN115016938 A CN 115016938A CN 202210650630 A CN202210650630 A CN 202210650630A CN 115016938 A CN115016938 A CN 115016938A
Authority
CN
China
Prior art keywords
graph
core
reinforcement learning
action
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210650630.8A
Other languages
Chinese (zh)
Inventor
崔毅东
林孟群
雷友珣
陈莉萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202210650630.8A priority Critical patent/CN115016938A/en
Publication of CN115016938A publication Critical patent/CN115016938A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Neurology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a calculation graph automatic partitioning method based on reinforcement learning, which comprises the following steps: step 1, performing topological sorting on a calculation graph to convert the calculation graph into a linear table; step 2, modeling a computational graph to be divided and a many-core processor as a Markov decision process in reinforcement learning, extracting a sub-graph division condition and a core resource distribution condition of the current computational graph as states in the reinforcement learning, adjusting layer number distribution between two adjacent cores as actions in the reinforcement learning, and using the running time and storage condition of the computational graph on the many-core processor as rewards in the reinforcement learning; and 3, solving the Markov decision process by using a REINFORCE algorithm, and dividing an algorithm model by a training graph.

Description

Calculation graph automatic partitioning method based on reinforcement learning
Technical Field
The invention belongs to the field of resource allocation and reinforcement learning, and relates to a method for solving the problem of deep learning calculation graph partitioning by using a reinforcement learning algorithm.
Background
In recent years, deep learning has enjoyed dramatic success in the fields of image analysis, natural language processing, speech recognition, video classification, and the like. However, deep learning relies on powerful computing power, and optimizing the system framework of deep learning to reduce computing power requirements plays an important role in deep learning applications. In the face of the explosive computing requirement of a deep learning model, an AI chip aiming at the field of AI is widely applied.
The AI chip generally employs a many-core architecture. The AI chip is dedicated to handle a large number of computational tasks in AI applications, while other non-computational tasks are still handled by the CPU. The AI chip integrates multiple cores and functionally can purposely accelerate certain algorithms or tasks. The definition of AI chips in the market comes more from the functional aspect, regardless of its architecture. In recent years, AI chip products related to deep learning have appeared in succession, and solutions thereof have been introduced from science and technology capitals such as google, intel, and invida to entrepreneur companies such as cambrian and horizon. With the development and maturity of chip design technology, the AI chip architecture realizes rapid iteration with the development of AI technology.
The deep learning compiler often splits a model realized by different frameworks into a plurality of subtasks to be deployed on a many-core chip, and a pipeline structure is used for processing a computing task so as to achieve optimal performance. In a pipeline structure, a large computation task can be divided into a plurality of parallel subtask sets, wherein each subtask set is processed by a plurality of cores in a many-core chip in parallel, and the mapping from the subtask set to a core resource set is completed by a Run-time system (Run-time system). When a certain deep learning model is operated as a calculation task, dividing the subtask into a plurality of subgraphs, namely dividing the calculation graph of the deep learning model. In the computational graph, the computational process is simulated by using the nodes, and the advantage is that the operation can be divided, and even the operation can be carried out in a multi-GPU mode.
It is important in a pipeline architecture how to distribute computing tasks to individual processors in an optimized manner, a problem commonly referred to as load balancing. Load balancing is a fundamental problem in parallel computing, which maximizes parallel application performance by minimizing idle time of processors and communication time between processors.
The scheduling of the pipeline structure can reduce the idle time of the processor, improve the program execution performance and increase the utilization rate of hardware resources. However, limitations of system resources such as processor core performance, on-chip storage, communication bandwidth, etc., can impact software pipelining performance.
Reinforcement learning has achieved certain achievements in solving resource scheduling. Mirhoseini et al, 2017, proposed the use of reinforcement learning to optimize computational graph node assignment in a distributed system for the TensorFlow model. The paper uses a sequence-to-sequence (Seq2Seq) model. The model consists of an encoder and a decoder, nodes of the computational graph are input into the model according to topological sequencing, and equipment distributed to each node serves as output. Since then, many reinforcement learning based resource scheduling schemes have been proposed. Addanki et al use a reinforcement learning algorithm to achieve the scheduling of neural networks over distributed resources. This study iterates through the resource allocation scheme rather than obtaining the node allocation scheme of the computational graph at once. The subsequent research also commonly utilizes reinforcement learning to solve the task scheduling problem in the distributed system, and the differences are mostly reflected on a specific deep learning model. However, the above methods for solving the resource scheduling problem by reinforcement learning allocate resources according to the resource layout, and are not suitable for the situation that the core resource layout of the processor changes. The Luo provides an algorithm based on deep reinforcement learning and multi-level graph division ideas aiming at the resource scheduling problem of a distributed stream processing system, and resources are distributed according to the number of the resources. And reducing the complexity of the graph by graph coarsening, processing the large-scale data flow graph into a small-scale data flow graph, training by using reinforcement learning, and mapping the result into the large-scale data flow graph. However, after the data flow graph is subjected to coarsening processing, the space for searching the optimal solution by reinforcement learning is limited, so that the reinforcement learning effect is limited to a certain extent.
The invention mainly researches how to effectively divide the computation graph corresponding to the model and realize load balance when training the deep learning model on the many-core chip architecture. Therefore, an algorithm for automatically dividing the deep learning calculation graph and allocating the number of the core resources to each sub-graph is designed, so that the running time of the deep learning model in many core chips is minimum, and the method is a method for allocating the resources according to the number of the resources.
Disclosure of Invention
The invention aims to provide a computation graph automatic partitioning method based on reinforcement learning, which can automatically partition a deep learning computation graph corresponding to a deep learning model into sub-graphs and allocate core resources to each sub-graph according to the number of resources so as to achieve the aim of shortening the running time of the deep learning model.
In order to achieve the above object, the method for automatically partitioning a computational graph based on reinforcement learning according to the present invention comprises the following steps:
step 1, performing topological sorting on a calculation graph to convert the calculation graph into a linear table;
step 2, modeling a computational graph to be divided and a many-core processor as a Markov decision process in reinforcement learning, extracting a sub-graph division condition and a core resource distribution condition of the current computational graph as states in the reinforcement learning, adjusting layer number distribution between two adjacent cores as actions in the reinforcement learning, and using the running time and storage condition of the computational graph on the many-core processor as rewards in the reinforcement learning;
and 3, solving the Markov decision process by using a REINFORCE algorithm, and dividing an algorithm model by a training graph.
The specific process of the step 1 is as follows:
and 11, carrying out topological sequencing on the deep learning calculation graph, and converting the graph into a linear table structure. The element arrangement sequence in the linear table is consistent with the operation sequence of the nodes, and the data elements in the linear table correspond to the layers in the deep learning model;
and step 12, seeking to represent the type and the hyper-parameter of the operation of the data of each layer and the data quantity of each edge in the graph, thereby calculating the total number of nodes in the computation graph, the operation quantity of each operation, the required storage quantity and the routing quantity.
The step 2 specifically comprises the following steps:
step 21, extracting the subgraph division condition and the core resource allocation condition of the current computation graph as the states in reinforcement learning: the state is composed of two parts, the first part is a node division and resource allocation state of the computational graph, and the second part is an operand state of each subgraph;
step 22, the adjustment of the layer number distribution between two adjacent cores is taken as the action in reinforcement learning, and there are four types of actions: merging all layers of two adjacent cores to a next core for processing, merging all layers of two adjacent cores to a previous core for processing, handing the last layer processed by the previous core (namely the last layer in the linear table) to the next core for processing, and handing the first layer processed by the next core (namely the first layer in the linear table) to the previous core for processing;
and step 23, taking the running time and the storage condition of the computation graph on the many-core chip as rewards in reinforcement learning: the reward value is set to reward ═ a × max (T) + b, where T ═ T 1 ,t 2 ,…,t k "is the running time of the computation graph G on the many-core processor M, S ═ S { [ S ] } 1 ,s 2 ,…,s k Indicates the data storage on the many-core processor M. If max(s) exceeds the limit, b is assigned a penalty value. The goal of the training is to make the prize value as large as possible.
The step 3 specifically includes the following steps:
step 31, initializing the whole graph partitioning environment, importing the deep learning calculation graph converted into the linear table structure, counting the total number of nodes, initializing the core resource allocation condition, initializing optional actions according to the total number of the core resources, changing variables recording the number of cores in the subgraph, variables recording the states of all the core resources, variables recording the number of partitioned nodes, variables recording reward values and the like into initial states, and initializing the probability of all the actions;
step 32, selecting an action a according to the action probability, and changing the current environment state s after executing the action a;
step 33, calculating the calculation amount, the storage amount and the routing amount of each sub-graph according to the current state, respectively comparing the three values in each sub-graph to obtain three values with the worst condition, and performing weighting operation to calculate the reward value r;
step 34, judging whether the reward value meets the preset requirement: if the probability of the action is up, updating the action probability and ending the current round, if the probability of the action is not up, continuing to select the action, and repeating the steps 32-34;
step 35, inputting all s, a and r of the turn into a neural network for training, and updating the probability of selecting an action;
and step 36, finishing the training process after the set number of rounds is reached, and storing the neural network model.
The above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same, and any modifications made on the basis of the technical solutions according to the technical ideas presented by the present invention are within the scope of the present invention.
Drawings
FIG. 1 is a flow chart of a computational graph automatic partitioning method based on reinforcement learning;
FIG. 2 is a schematic diagram of a many-core chip oriented computation graph automatic partitioning method structure based on deep reinforcement learning;
fig. 3 is a diagram of a neural network architecture.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings:
the invention provides a computation graph automatic partitioning method based on reinforcement learning, which comprises three parts, namely a Markov decision process for performing topological sequencing on a computation graph and modeling a computation graph partitioning problem into reinforcement learning, and a neural network model is trained by utilizing a REINFORCE algorithm-based many-core processor-oriented computation graph automatic partitioning algorithm. The specific implementation method comprises the following steps:
defining the many-core processor-oriented computation graph partitioning problem as follows: for a computation graph, G ═ (O, E), O ═ op for a computation graph 1 ,op 2 ,···op m Is the set of operators on the graph, and E is the set of E edges. Dividing the calculation graph G into sub-graph sets G' ═ G 1 ,g 2 ,···,g k Where k ≦ m, i.e. each sub-graph g consists of one or more operators. The computational graph G is deployed on a many-core processor M and runs, and n core resources are integrated on the M. Assigning a core number to each subgraph, C ═ C 1 ,c 2 ,…,c k Is a collection of sets of kernel resources, denoted as subgraph g i Is assigned c i A core resource, wherein c 1 +c 2 +···+c k N, responsible for sub-graph g i C of i Each core resource constitutes a group of core resources. Each partitioning scheme P ═ G ', C denotes that the computation graph G is partitioned into subgraphs in a G' manner, and core resources are allocated to each subgraph in the core number allocation scheme of C. Under the partitioning scheme P, s (P) { s ═ s 1 ,s 2 ,…,s k Denotes the data storage situation on the many-core processor M, where s i Represents the storage amount of a single core in the ith core resource group, T (P) ═ t 1 ,t 2 ,…,t k "is the running time situation of the computation graph G on the many-core processor M, where t i Representation scheme g i The run time of (c). The training goal is to find an allocation scheme P ═ G', C such that max (t) is the shortest, and max(s) does not exceed the upper memory limit of the many-core processor M.
According to the analysis, the invention relates to a calculation graph automatic partitioning method based on reinforcement learning, which comprises the following steps:
step A, performing topological sorting on a calculation graph to convert the calculation graph into a linear table;
step B, modeling a computational graph to be divided and a many-core processor as a Markov decision process in reinforcement learning;
and step C, solving the Markov decision process by using a REINFORCE algorithm, and dividing an algorithm model by a training picture.
Further, the detailed description of step A is provided in the summary of the invention.
The step B specifically comprises the following steps:
and step B1, modeling the state as a subgraph division condition and a core resource allocation condition of the current computation graph. Because the main factor influencing load balance is the operand of the subgraph, the state is composed of two parts, the first part is the node division and resource allocation state of the computational graph, and the second part is the operand state of each subgraph.
The node division and resource allocation state of the computation graph is recorded by a list shared _ core, and the list shared _ core represents the condition that each core resource on the many-core processor processes the computation graph. The processor includes n core resources 1 ,core 2 ,···,core n Denotes the list of associated _ core as [ layers } 1 ,layers 2 ,layers 3 ,layers 4 ,…,layers n ]Wherein
Figure BDA0003685951400000081
And 0 or more layers i Less than or equal to m, and making layers i As core i The number of layers treated. If layers i All are not 0, the whole calculation graph is divided into n sub-graphs, and the number of layers is layers 1 ,layers 2 ,layers 3 ,layers 4 ,…,layers n . When layers appear i ,layers i+1 ,…,layers i+p Are all 0 and layers i+p+1 Not equal to 0, namely when p +1 0 s continuously appear in the arraged _ core (p is more than or equal to 0 and less than or equal to m-1), a core resource group is formed from the ith core resource to the (i + p) th core resource, and the layers are processed i+p+1 Each layer constitutes a sub-graph. It is emphasized that, in actual operation, core i ,core i+1 ,…,core i+p Rather than dealing with 0 layers, the state settings here are merely modeling to facilitate reinforcement learning.
The operand status of each sub-graph is recorded by a list of allocated _ macs, which represents the layers in the allocated _ core list i The calculated amount of each layer. The allocated _ mac corresponds to the allocated _ core setting, denoted as [ mac s 1 ,macs 2 ,macs 3 ,macs 4 …macs n ]Wherein macs i Are layers i Total amount of operations of individual layers. Regarding macs i Definition of 0 and layers i Similarly.
And step B2, under the current dividing state, adjusting the layer number distribution between two adjacent cores to be modeled as one action.
There are four types of actions for adjusting the number of layers for two adjacent cores. To adjacent core i ,core i+1 Four types of actions that can be taken are: (a) two adjacent cores are connected i ,core i+1 Layer on to core on the next core i+1 And (6) carrying out the above treatment. I.e. core i Incorporated into core i+1 In the core resource group, the new core resource group needs to bear the layer i The task of (2). (b) Two adjacent cores are connected i ,core i+1 Layer on the previous core i And (6) performing the above treatment. I.e. core i Core resource group processing layers i ,layers i+1 All layers of (1), and core i+1 Adding core i+2 The set of core resources of (1). (c) The previous core resource is core i Treated layers i The last layer in the layer (i.e. the last layer in the linear table) is handed over to the next core resource core i+1 And (6) processing. (d) The latter core resource is core i+1 Treated layers i+1 The first layer in the hierarchy (i.e., the first layer in the linear table) is handed over to the previous core resource core i And (6) processing.
The action space is the set of all selectable actions. Actions are taken on two adjacent cores, and for n cores, there are n-1 groups of adjacent cores, and each type of action has n-1 choices. There are four types of motion, so the motion space size is 4 x (n-1). Table 1 shows the definitions of these four types of actions, the first two types of actions being merge actions and the last two types of actions being split actions. To facilitate code writing, the action number starts at 0.
TABLE 1 Definitions of four classes of actions
Figure BDA0003685951400000101
When an action is performed, it may happen that the action is invalid for the environmental state at that time, and the environmental state is left unchanged. For example, if two cores are originally in the same core resource group, and the adjustment of the number of layers is not effective at this time, the action will not change the state.
Step B3 models the run time and storage of the computation graph on the many-core chip as rewards. The goal here is to find an allocation scheme P ═ G', C such that max (t) is the shortest, and max(s) does not exceed the upper memory limit of the many-core processor M. The reward value is set to reward-a max (t) + b, and b is given a penalty value if max(s) exceeds the limit. The goal of the training is to make the prize value as large as possible.
Further, the detailed steps of the training diagram partitioning algorithm model in the step C are as follows:
step C1, initialize the whole graph partitioning environment. Importing a deep learning calculation graph converted into a linear table structure, counting the total number of nodes, initializing the distribution condition of core resources, initializing optional actions according to the total number of the core resources, changing variables recording the number of cores in the sub-graph, variables recording the states of each core resource, variables recording the number of divided nodes, variables recording an incentive value and the like into initial states, initializing the probability of all the actions, recording the total number of the cores as n, and setting m to be n-1;
step C2, selecting action a according to the action probability;
step C3, selecting corresponding action according to the value of a, and executing action a;
step C3, calculating the calculation amount, the storage amount and the routing amount of each subgraph according to the current state s, respectively comparing the three values in each subgraph to obtain the three values with the worst condition, and performing weighting operation to calculate the reward value r;
step C4, judging whether r meets the preset requirement: if the probability of the action is up, updating the action probability and ending the current round, if the probability of the action is not up, continuing to select the action, and repeating the steps 32-34;
step C5, inputting all s, a, r of the turn into the neural network for training, and updating the probability of selecting action (FIG. 3 shows the structure of the neural network);
and step C6, finishing the training process after the set number of rounds is reached, and storing the neural network model.
The above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same, and any modifications made on the basis of the technical solutions according to the technical ideas presented by the present invention are within the scope of the present invention.

Claims (4)

1. A computational graph automatic partitioning method based on reinforcement learning is characterized by comprising the following steps:
step 1, performing topological sorting on a calculation graph to convert the calculation graph into a linear table;
step 2, modeling a computation graph to be divided and a many-core processor as a Markov decision process in reinforcement learning, extracting a sub-graph division condition and a core resource distribution condition of the current computation graph as states in the reinforcement learning, adjusting layer number distribution between two adjacent cores as actions in the reinforcement learning, and using the running time and the storage condition of the computation graph on the many-core processor as rewards in the reinforcement learning;
and 3, solving the Markov decision process by using a REINFORCE algorithm, and dividing an algorithm model by a training graph.
2. The method for automatically partitioning a computational graph based on reinforcement learning according to claim 1, wherein the specific process of the step 1 is as follows:
and 11, carrying out topological sequencing on the deep learning calculation graph, and converting the graph into a linear table structure. The element arrangement sequence in the linear table is consistent with the operation sequence of the nodes, and the data elements in the linear table correspond to the layers in the deep learning model;
and step 12, seeking to represent the type and the hyper-parameter of the operation of the data of each layer and the data quantity of each edge in the graph, thereby calculating the total number of nodes in the computation graph, the operation quantity of each operation, the required storage quantity and the routing quantity.
3. The method for automatically partitioning a computational graph based on reinforcement learning according to claim 1, wherein the specific process of the step 2 is:
step 21, extracting the subgraph division condition and the core resource allocation condition of the current computation graph as the states in reinforcement learning: the state is composed of two parts, the first part is a node division and resource allocation state of the computational graph, and the second part is an operand state of each subgraph;
step 22, taking the adjustment of the layer number distribution between two adjacent cores as the actions in the reinforcement learning, wherein there are four types of actions: merging all layers of two adjacent cores to a next core for processing, merging all layers of two adjacent cores to a previous core for processing, handing the last layer processed by the previous core (namely the last layer in the linear table) to the next core for processing, and handing the first layer processed by the next core (namely the first layer in the linear table) to the previous core for processing;
and step 23, taking the running time and the storage condition of the computation graph on the many-core chip as rewards in reinforcement learning: the reward value is set to reward ═ a × max (T) + b, where T ═ T 1 ,t 2 ,…,t k The running time of the computation graph G on the many-core processor M is shown, S ═ S 1 ,s 2 ,…,s k Denotes the data storage situation on the many-core processor M. If max(s) exceeds the limit, b is assigned a penalty value. The goal of the training is to make the prize value as large as possible.
4. The method for automatically partitioning a computational graph based on reinforcement learning according to claim 1, wherein the specific process of the step 3 is:
step 31, initializing the whole graph partitioning environment, importing the deep learning calculation graph converted into the linear table structure, counting the total number of nodes, initializing the core resource allocation condition, initializing optional actions according to the total number of the core resources, changing variables recording the number of cores in the subgraph, variables recording the states of all the core resources, variables recording the number of partitioned nodes, variables recording reward values and the like into initial states, and initializing the probability of all the actions;
step 32, selecting an action a according to the action probability, and changing the current environment state s after executing the action a;
step 33, calculating the calculation amount, the storage amount and the routing amount of each sub-graph according to the current state, respectively comparing the three values in each sub-graph to obtain three values with the worst condition, and performing weighting operation to calculate the reward value r;
step 34, judging whether the reward value meets the preset requirement: if the probability of the action is up, updating the action probability and ending the current round, if the probability of the action is not up, continuing to select the action, and repeating the steps 32-34;
step 35, inputting all s, a and r of the turn into a neural network for training, and updating the probability of selecting an action;
and step 36, after the set number of rounds is reached, ending the training process, and storing the neural network model.
CN202210650630.8A 2022-06-09 2022-06-09 Calculation graph automatic partitioning method based on reinforcement learning Pending CN115016938A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210650630.8A CN115016938A (en) 2022-06-09 2022-06-09 Calculation graph automatic partitioning method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210650630.8A CN115016938A (en) 2022-06-09 2022-06-09 Calculation graph automatic partitioning method based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN115016938A true CN115016938A (en) 2022-09-06

Family

ID=83073421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210650630.8A Pending CN115016938A (en) 2022-06-09 2022-06-09 Calculation graph automatic partitioning method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN115016938A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115374914A (en) * 2022-10-24 2022-11-22 北京白海科技有限公司 Distributed training method, parallel deep learning framework and electronic equipment
WO2024065525A1 (en) * 2022-09-29 2024-04-04 Intel Corporation Method and apparatus for optimizing deep learning computation graph

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024065525A1 (en) * 2022-09-29 2024-04-04 Intel Corporation Method and apparatus for optimizing deep learning computation graph
CN115374914A (en) * 2022-10-24 2022-11-22 北京白海科技有限公司 Distributed training method, parallel deep learning framework and electronic equipment
CN115374914B (en) * 2022-10-24 2023-04-07 北京白海科技有限公司 Distributed training method, parallel deep learning framework and electronic equipment

Similar Documents

Publication Publication Date Title
Mirhoseini et al. A hierarchical model for device placement
CN110737529A (en) cluster scheduling adaptive configuration method for short-time multiple variable-size data jobs
Chen et al. Deep learning research and development platform: Characterizing and scheduling with qos guarantees on gpu clusters
WO2022068663A1 (en) Memory allocation method, related device, and computer readable storage medium
US12093791B2 (en) Partitioning for an execution pipeline
CN112328380A (en) Task scheduling method and device based on heterogeneous computing
CN115016938A (en) Calculation graph automatic partitioning method based on reinforcement learning
Mahmoud et al. Multiobjective task scheduling in cloud environment using decision tree algorithm
CN115543639A (en) Optimization method for distributed execution of deep learning task and distributed system
CN113742089B (en) Method, device and equipment for distributing neural network computing tasks in heterogeneous resources
CN112101525A (en) Method, device and system for designing neural network through NAS
Noorian Talouki et al. A hybrid meta-heuristic scheduler algorithm for optimization of workflow scheduling in cloud heterogeneous computing environment
CN113220450A (en) Load prediction method, resource scheduling method and device for cloud-side multi-data center
CN112434785B (en) Distributed parallel deep neural network performance evaluation method for supercomputer
CN112764893B (en) Data processing method and data processing system
Dublish et al. Poise: Balancing thread-level parallelism and memory system performance in GPUs using machine learning
Jalalian et al. A hierarchical multi-objective task scheduling approach for fast big data processing
Ahmed et al. Fuzzy active learning to detect OpenCL kernel heterogeneous machines in cyber physical systems
CN116680063A (en) Task scheduling method, device, computing system, electronic equipment and storage medium
Shirazi et al. PARSA: A parallel program scheduling and assessment environment
CN116800610A (en) Distributed data plane resource optimization method and system
Li et al. Performance optimization algorithm of radar signal processing system
Li et al. Dynamic data replacement and adaptive scheduling policies in spark
CN116048759A (en) Data processing method, device, computer and storage medium for data stream
Do et al. Co-scheduling ensembles of in situ workflows

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination