CN113821323B - Offline job task scheduling algorithm for mixed deployment data center scene - Google Patents

Offline job task scheduling algorithm for mixed deployment data center scene Download PDF

Info

Publication number
CN113821323B
CN113821323B CN202111089490.3A CN202111089490A CN113821323B CN 113821323 B CN113821323 B CN 113821323B CN 202111089490 A CN202111089490 A CN 202111089490A CN 113821323 B CN113821323 B CN 113821323B
Authority
CN
China
Prior art keywords
data center
attention
offline
task scheduling
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111089490.3A
Other languages
Chinese (zh)
Other versions
CN113821323A (en
Inventor
李嘉伦
吴维刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202111089490.3A priority Critical patent/CN113821323B/en
Publication of CN113821323A publication Critical patent/CN113821323A/en
Application granted granted Critical
Publication of CN113821323B publication Critical patent/CN113821323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an offline operation task scheduling algorithm for a hybrid deployment data center scene, which adopts an adaptive action-critic (A2C) reinforcement Learning model which is more advanced than Deep Q-Learning and Policy Gradient to be applied to the data center task scheduling scene in cloud computing resource management; in order to enhance the understanding capability in the reinforcement learning training process, the load state information of the data center is efficiently encoded and finally a relationship induction bias is generated, and the bias continuously enhances the understanding capability of the reinforcement learning model to the environment and the execution action in the training process, so that a better scheduling strategy is formed, and the model is integrally applied to the task scheduling problem in the mixed deployment data center scene.

Description

Offline job task scheduling algorithm for mixed deployment data center scene
Technical Field
The invention relates to the field of large-scale data center resource management in cloud computing, in particular to an offline job task scheduling algorithm for a mixed deployment data center scene.
Background
Improving the resource utilization of a data center is critical to reducing the infrastructure investment of cloud resource providers. However, current real world production data centers typically exhibit relatively low resource utilization.
For data centers with a huge number of nodes, considerable cost savings can be achieved even if the average utilization is only marginally increased. Therefore, improving resource utilization is a major concern for cloud providers to achieve economic goals. At the same time, it is common practice to mix different workloads on the same data center to increase resource utilization, which is a so-called hybrid deployment.
In a hybrid deployment scenario, there are mainly two workloads: long-running applications (online jobs) and batch applications (offline jobs). Online jobs have higher priority in a hybrid deployment scenario, while offline jobs are best effort, they may be suspended to reserve resources for online jobs. Furthermore, online job runtime behavior is typically dynamically changing.
Deep rm is a task scheduling algorithm that utilizes (Deep Reinforcement Learning) a DRL training policy network to implement multidimensional resource scheduling in a data center. It uses (Convolutional Neural Network) CNN module to extract information from the data center in the form of pictures, with higher level targets that are closer to the work in terms of targets and methods. However, deep rm can only treat the data center as a single physical resource pool, without deploying LRAs. Furthermore, deep rm treats all tasks running in the data center as stateless batch offline tasks. DeepJS also uses DRLs for data center task scheduling. It is embedded in the framework of the boxing problem, treating the entire data center as a collection of multiple compute nodes. However, deepJS is still not used to solve the task scheduling problem of the hybrid deployment scenario. Decima combines the graph neural network (Graph Neural Network) GNN with DRL, processes the state information into feature embedding (empeddings), and then passes it to its policy network. In addition, decima is a scheduling algorithm that solves the problem of DAG (Directed Acyclic Graph) dependent task scheduling in Spark distributed environments. The deep RM Plus is an extended version of the deep RM, mainly enhances a CNN module of the deep RM while introducing imitation learning, and further utilizes expert knowledge (i.e. short task priority, first-come first-get and the like) to accelerate convergence speed during DRL training.
Compared with the existing task scheduling algorithm, task scheduling in a data center hybrid deployment scene needs to consider not only the data center state change caused by the task scheduling process, but also the data center state change caused by the deployed online job. For example, online jobs deployed on a hybrid deployment data center typically have various runtime features, and the runtime features are dynamically changing, requiring proper processing of these features to ensure final scheduling quality. However, considering a large number of online jobs with unknown dynamically changing runtime behavior features in the task scheduling process is very challenging because it is labor intensive, labor costs outweigh possible benefits to fully analyze all online jobs and collect their runtime behavior features. Therefore, there is a need to develop new scheduling algorithms to meet the needs of co-existence data center scenarios that do not require prior knowledge of the information profile of the data center scheduling scenario, and that can automatically learn the scheduling policy through historical experience.
Disclosure of Invention
The invention provides an offline job task scheduling algorithm for a mixed deployment data center scene, which can form a better scheduling strategy.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
an offline job task scheduling algorithm for a mixed deployment data center scene comprises the following steps:
s1: processing each piece of node state information and offline operation information in the data center as an entity, and constructing an entity set E;
s2: converting the entity set into a data center node state and an offline operation runtime behavior characteristic representation;
s3: the feature representation is converted into a relational summary bias during the deep reinforcement learning DRL training process.
Further, each entity in the set of entities will calculate pairwise interactions with all other entities, and each accumulated interaction of pairwise interactions with all other entities will be a behavior feature representation of the original entity, respectively.
Further, the constructed entity set E is mapped into three matrices Q, K, V, the attention output value is calculated by using the matrix Q, K, V, and the attention output value vector set is used to enhance the decision ability of the reinforcement learning agent.
Further, the attention output value is calculated using the matrix Q, K, V in the following manner: the attention-output is the final calculated offline task-policy attention weight, the weight value is derived based on three matrices Q, K, V derived from the linear mapping of the original entity set E,is a scaling factor for normalization, which aims to prevent the output distribution of the softmax () function from being too steep, thereby keeping the gradient values stable in deep learning training, d k The number of columns of the Q and K matrixes, namely the dimension of the vector, and the softmax () function normalizes the output calculated value to be in a (0, 1) interval, so that the stability of the gradient value in the deep learning training is ensured:
attention_output=Attention(Q,K,V)
further, in step S3, based on the attention weight vector set output by the attention module, under the feedback mechanism driven by the reward signal, the DRL model captures the attention weight beneficial to promoting the positive reward, and gradually filters out the relationship links that cause the scheduling result to be worse under the driving of the feedback reward, and the required relationship induction bias is obtained by performing the above-mentioned process iteratively.
Further, in step S3, the training process of the DRL model is:
1) Collecting node states of the data center and offline operation information in a waiting queue at each time point in the operation of the data center;
2) Integrating and combining the information into an entity set E for subsequent use;
3) The self-attention module performs weight calculation and codes on the E to output a weight vector group;
4) The strategy of the DRL model is generated and optimized based on the output attention weight vector group of the self-attention module, and the DRL model gradually filters invalid attention weights and retains valid attention weights under the guidance of a reward signal excitation mechanism, so that high-quality DRL model strategy parameters are gradually constructed;
5) Based on the strategy parameters which are updated continuously, selecting proper scheduling actions according to the current data center state and obtaining corresponding rewarding feedback signals, so that the DRL strategy is optimized in an iterative mode.
Further, the relational summary bias allows the DRL model to capture different hybrid deployment state patterns aggregated by currently running online job runtime behavior with unknown arriving offline job tasks without manually analyzing them during DRL training.
Further, the set of attention weight vectors based on the output from the attention module represents a relational link between the attention vector set capturing the data center node state and the offline job task waiting to be scheduled.
Further, each entity in the set of entities will calculate pairwise interactions with all other entities, including itself.
Further, each accumulated interaction of paired interactions with all other entities, including itself, will be the final representation of the original entity, respectively.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention adopts an advanced reagent action-critic (A2C) reinforcement Learning model which is more advanced than Deep Q-Learning and Policy Gradient to be applied to a data center task scheduling scene in cloud computing resource management; in order to enhance the understanding capability in the reinforcement learning training process, the load state information of the data center is efficiently encoded and finally a relationship induction bias is generated, and the bias continuously enhances the understanding capability of the reinforcement learning model to the environment and the execution action in the training process, so that a better scheduling strategy is formed, and the model is integrally applied to the task scheduling problem in the mixed deployment data center scene.
Drawings
FIG. 1 is a diagram of a model structure employed in the method of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;
it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
The invention provides an offline job task scheduling algorithm Co-ScheRRL for a mixed deployment data center scene. The Co-ScheRRL adopts a Deep Reinforcement Learning (DRL) method, and automatically learns a scheduling strategy through historical experience. The DRL method is well suited for automatically learning scheduling policies without prior knowledge of workload characteristics. The Co-ScheRRL encodes its scheduling policy in the neural network through past scheduling operation experience, during which the Co-ScheRRL schedules offline jobs, observes effects, and gradually perfects the scheduling policy. In addition, in order to effectively learn a high quality scheduling policy in a hybrid deployment data center scenario, it is necessary to develop a new information processing representation method and integrate the new representation into a DRL method, improving the DRL training technique.
First, the scheduling algorithm must be able to generalize to hundreds of online jobs and millions of offline jobs, and decisions must be made in the tens of data center runtime behaviors that may exist per timestamp. This makes the problem more complex and increases the scale of the problem compared to existing DRL application scenarios (e.g., gaming and robotic control). In addition, the amount of information and the number of available choices in the scene is also of greater scale than existing DRL application scenes. Thus, an improved self-attention module is introduced to efficiently and effectively process information about incoming offline jobs and already deployed online job runtime behaviors without requiring manual engineering of properties.
The improved self-attention module is based on Google's self-attention model transducer. The transducer structure is shown below. The Transformer model is essentially a focus-added seq2seq model, but the existing seq2seq models are both combined with RNN and focus-attention models, which replace the RNN network layer with a full focus-attention structure. Similar to most seq2seq models, the structure of the transducer is also composed of an encoder and a decoder. The Encoder consists of N identical layers, which refer to the elements on the left side of FIG. 1. Each Layer consists of two sub-layers, multi-head self-attention mechanism and fully connected feed-forward network, respectively. Wherein residual connection and normalization are added to each sub-layer. The structure of the Decoder is similar to that of the Encoder.
Multi-head self-attention mechanism is the core of the self-attention module. Each piece of node state information and offline-job information in the data center is treated as one entity, and an entity set E is constructed, which is then converted into a final representation of the data center node state and offline-job runtime behavior characteristics.
In particular, each entity in the set of entities will calculate pairwise interactions with all other entities (including itself), and each accumulated interaction of pairwise interactions with all other entities (including itself) will be the final representation of the original entity, respectively. This is similar to the messaging mechanism in the Graph Neural Network (GNN). The calculation procedure is as follows.
The constructed entity set E is first mapped into three matrices Q, K, V. The attention output value is then calculated with the following formula. The set of attention weight vectors after completion of the calculation will be used for enhanced reinforcement learning agent decision capability.
attention_output=Attention(Q,K,V)
Second, existing DRL algorithms cannot handle continuously arriving offline job distribution scenarios and decision-making scenarios where already deployed LRAs may change their runtime behavior. For example, the quality of both decisions may be due to the quality of the current policy or state patterns of the offline and online jobs currently located in the same hybrid deployment scenario. Therefore, it is key to obtain a high quality scheduling strategy to effectively understand different hybrid deployment state modes in the DRL model training process. To address the problem of handling different hybrid deployment state patterns during training, feature representations are extracted from the output of the improved self-attention module described above, which are then translated into relationships inducing bias during DRL training.
Specifically, the relationship induction deviation is generated as follows. Based on the set of attention weight vectors output from the attention module, which captures the relational links between the data center node states and the offline job tasks waiting to be scheduled, the DRL model under the feedback mechanism driven by the reward signal will capture attention weight values beneficial to promote positive rewards and gradually filter out relational links that cause the scheduling results to be worse under the feedback rewards. By performing the above-described process iteratively, a desired relational summary bias can be obtained. In feedback rewards training, it is believed that relational summary bias can learn some general, abstract concepts, successfully schedule and generalize to data center node state patterns that it did not observe before. By doing so, the relational summary bias will help the DRL model capture different hybrid deployment state patterns aggregated by currently running online job runtime behavior with unknown arriving offline job tasks without manually analyzing them during DRL training. This helps build up the state of various hybrid deployments with respect to high quality scheduling strategies. The Co-ScheRRL training procedure is as follows.
1) At each point in time in the operation of the data center, the Co-ScheRRL first gathers node status of the data center and offline job information in the wait queue.
2) Co-ScheRRL consolidates the information into one entity set E for subsequent use.
3) The Co-ScheRRL self-attention module will weight E and encode the set of output weight vectors.
4) The DRL model of the Co-ScheRRL receives the attention weight vector group output by the attention module, the strategy of the DRL model is generated and optimized based on the attention weight vector group output by the attention module, and the DRL model gradually filters invalid attention weights and retains valid attention weights under the guidance of a reward signal excitation mechanism, so that high-quality DRL model strategy parameters are gradually constructed.
5) Based on the strategy parameters updated continuously, the Co-ScheRRL can select proper scheduling actions according to the current data center state and obtain corresponding rewarding feedback signals, so that DRL strategies are optimized in an iterative mode.
The same or similar reference numerals correspond to the same or similar components;
the positional relationship depicted in the drawings is for illustrative purposes only and is not to be construed as limiting the present patent;
it is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (7)

1. An offline job task scheduling algorithm for a mixed deployment data center scene is characterized by comprising the following steps of:
s1: processing each piece of node state information and offline operation information in the data center as an entity, and constructing an entity set E;
s2: converting the entity set into a data center node state and an offline operation runtime behavior characteristic representation;
s3: converting the characteristic representation into relation induction deviation in the deep reinforcement learning DRL training process;
mapping the constructed entity set E into three matrixes Q, K, V, calculating attention output values by using the matrixes Q, K, V, and enhancing the decision capability of the reinforcement learning agent by using the attention output value vector group;
the attention output value is calculated using the matrix Q, K, V in the following manner: the attention-output is the final calculated offline task-policy attention weight, the weight value is derived based on three matrices Q, K, V derived from the linear mapping of the original entity set E,is a scaling factor for normalization, which aims to prevent the output distribution of the softmax () function from being too steep, thereby keeping the gradient values stable in deep learning training, d k Is the number of columns of the Q and K matrix, namely the dimension of the vector, the softmax () function normalizes the output calculated value to be in the (0, 1) interval, thereby ensuring the deep learning trainingStability of the in-training gradient values:
attention_output=Attention(Q,K,V)
in step S3, based on the attention weight vector set output from the attention module, under the feedback mechanism driven by the reward signal, the DRL model captures the attention weight value beneficial to promoting the positive reward, and gradually filters out the relationship link causing the deterioration of the scheduling result under the feedback reward driving, and the required relationship induction deviation is obtained by performing the above-mentioned process iteratively.
2. The hybrid deployment data center scenario-oriented offline job task scheduling algorithm of claim 1, wherein each entity in the set of entities will calculate pairwise interactions with all other entities, and each of the pairwise interactions with all other entities' cumulative interactions will be a behavior feature representation of the original entity, respectively.
3. The offline-job task scheduling algorithm for a hybrid deployment data center scenario according to claim 1, wherein in step S3, the training process of the DRL model is:
1) Collecting node states of the data center and offline operation information in a waiting queue at each time point in the operation of the data center;
2) Integrating and combining the information into an entity set E for subsequent use;
3) The self-attention module performs weight calculation and codes on the E to output a weight vector group;
4) The strategy of the DRL model is generated and optimized based on the output attention weight vector group of the self-attention module, and the DRL model gradually filters invalid attention weights and retains valid attention weights under the guidance of a reward signal excitation mechanism, so that high-quality DRL model strategy parameters are gradually constructed;
5) Based on the strategy parameters which are updated continuously, selecting proper scheduling actions according to the current data center state and obtaining corresponding rewarding feedback signals, so that the DRL strategy is optimized in an iterative mode.
4. The hybrid deployment data center scenario-oriented offline job task scheduling algorithm of claim 3, wherein the relational summary bias is such that the DRL model captures different hybrid deployment state patterns aggregated by currently running online job runtime behavior with unknown arriving offline job tasks without manually analyzing them during DRL training.
5. The hybrid deployment-oriented data center scenario offline job task scheduling algorithm of claim 3, wherein the attention weight vector set based on the output from the attention module represents a relational link between the data center node state and offline job tasks waiting to be scheduled captured by the attention vector set.
6. The hybrid deployment data center scenario-oriented offline job task scheduling algorithm of claim 5, wherein each entity in the set of entities will calculate pairwise interactions with all other entities, including itself.
7. The hybrid deployment data center scenario-oriented offline job task scheduling algorithm of claim 5, wherein each of the cumulative interactions with paired interactions with all other entities, including itself, will be the final representation of the original entity, respectively.
CN202111089490.3A 2021-09-16 2021-09-16 Offline job task scheduling algorithm for mixed deployment data center scene Active CN113821323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111089490.3A CN113821323B (en) 2021-09-16 2021-09-16 Offline job task scheduling algorithm for mixed deployment data center scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111089490.3A CN113821323B (en) 2021-09-16 2021-09-16 Offline job task scheduling algorithm for mixed deployment data center scene

Publications (2)

Publication Number Publication Date
CN113821323A CN113821323A (en) 2021-12-21
CN113821323B true CN113821323B (en) 2023-09-19

Family

ID=78922347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111089490.3A Active CN113821323B (en) 2021-09-16 2021-09-16 Offline job task scheduling algorithm for mixed deployment data center scene

Country Status (1)

Country Link
CN (1) CN113821323B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580937B (en) * 2022-03-10 2023-04-28 暨南大学 Intelligent job scheduling system based on reinforcement learning and attention mechanism

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111258734A (en) * 2020-01-16 2020-06-09 中国人民解放军国防科技大学 Deep learning task scheduling method based on reinforcement learning
CN112486641A (en) * 2020-11-18 2021-03-12 鹏城实验室 Task scheduling method based on graph neural network
CN112631750A (en) * 2020-12-21 2021-04-09 中山大学 Predictive online scheduling and mixed task deployment method based on compressed sensing and oriented to cloud data center
CN112929849A (en) * 2021-01-27 2021-06-08 南京航空航天大学 Reliable vehicle-mounted edge calculation unloading method based on reinforcement learning
CN112948070A (en) * 2019-12-10 2021-06-11 百度(美国)有限责任公司 Method for processing data by a data processing accelerator and data processing accelerator

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11620158B2 (en) * 2020-01-15 2023-04-04 B.G. Negev Technologies & Applications Ltd. At Ben-Gurion University Multi-objective scheduling system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948070A (en) * 2019-12-10 2021-06-11 百度(美国)有限责任公司 Method for processing data by a data processing accelerator and data processing accelerator
CN111258734A (en) * 2020-01-16 2020-06-09 中国人民解放军国防科技大学 Deep learning task scheduling method based on reinforcement learning
CN112486641A (en) * 2020-11-18 2021-03-12 鹏城实验室 Task scheduling method based on graph neural network
CN112631750A (en) * 2020-12-21 2021-04-09 中山大学 Predictive online scheduling and mixed task deployment method based on compressed sensing and oriented to cloud data center
CN112929849A (en) * 2021-01-27 2021-06-08 南京航空航天大学 Reliable vehicle-mounted edge calculation unloading method based on reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
混部数据中心负载特征及其任务调度优化分析;王济伟 等;计算机工程与科学;第42卷(第1期);全文 *

Also Published As

Publication number Publication date
CN113821323A (en) 2021-12-21

Similar Documents

Publication Publication Date Title
Xin et al. Application of deep reinforcement learning in mobile robot path planning
CN108900358B (en) Virtual network function dynamic migration method based on deep belief network resource demand prediction
CN113191484B (en) Federal learning client intelligent selection method and system based on deep reinforcement learning
Xu et al. Evolutionary extreme learning machine–based on particle swarm optimization
CN111966484A (en) Cluster resource management and task scheduling method and system based on deep reinforcement learning
Zhu et al. A deep-reinforcement-learning-based optimization approach for real-time scheduling in cloud manufacturing
CN112052948B (en) Network model compression method and device, storage medium and electronic equipment
CN113361680A (en) Neural network architecture searching method, device, equipment and medium
CN113821323B (en) Offline job task scheduling algorithm for mixed deployment data center scene
CN112732436B (en) Deep reinforcement learning acceleration method of multi-core processor-single graphics processor
CN111198550A (en) Cloud intelligent production optimization scheduling on-line decision method and system based on case reasoning
Moon et al. Smart manufacturing scheduling system: DQN based on cooperative edge computing
Long et al. Complexity-aware adaptive training and inference for edge-cloud distributed AI systems
Bi et al. Multi-swarm Genetic Gray Wolf Optimizer with Embedded Autoencoders for High-dimensional Expensive Problems
Dasgupta et al. Adaptive computational chemotaxis in bacterial foraging algorithm
CN110351561A (en) A kind of efficient intensified learning training method for video encoding optimization
CN111950690A (en) Efficient reinforcement learning strategy model with self-adaptive capacity
CN111950691A (en) Reinforced learning strategy learning method based on potential action representation space
CN110046746B (en) Scheduling method of online public opinion device based on reinforcement learning
CN115989503A (en) Method, computer program product and Reinforced Learning (RL) system for state engineering of the RL system
CN112306641B (en) Training method for virtual machine migration model
Zhai et al. Multi-swarm genetic gray wolf optimizer with embedded autoencoders for high-dimensional expensive problems
Zhao et al. A hybrid approach based on artificial neural network and genetic algorithm for job-shop scheduling problem
CN110618626B (en) Communication energy consumption balancing method and device for multi-unmanned platform cooperative formation maintenance
Ding et al. Guest Editorial Evolutionary Computation Meets Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant