CN117667360B - Intelligent computing network scheduling method for computing and communication fusion of large model task - Google Patents

Intelligent computing network scheduling method for computing and communication fusion of large model task Download PDF

Info

Publication number
CN117667360B
CN117667360B CN202410130270.8A CN202410130270A CN117667360B CN 117667360 B CN117667360 B CN 117667360B CN 202410130270 A CN202410130270 A CN 202410130270A CN 117667360 B CN117667360 B CN 117667360B
Authority
CN
China
Prior art keywords
computing
task
network
representing
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410130270.8A
Other languages
Chinese (zh)
Other versions
CN117667360A (en
Inventor
陈晓红
唐鸿凯
梁伟
杨秋月
黄素珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangjiang Laboratory
Original Assignee
Xiangjiang Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangjiang Laboratory filed Critical Xiangjiang Laboratory
Priority to CN202410130270.8A priority Critical patent/CN117667360B/en
Publication of CN117667360A publication Critical patent/CN117667360A/en
Application granted granted Critical
Publication of CN117667360B publication Critical patent/CN117667360B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention provides an intelligent computing network scheduling method for integrating computation and communication of large model tasks, which belongs to the technical field of data processing and specifically comprises the following steps: establishing a scheduling optimization objective function; designing a deep reinforcement learning environment, designing a scheduling optimization objective function as a reward function, and forming a Markov process; extracting state characteristics from a current large model task; the intelligent agent for deep reinforcement learning makes a scheduling strategy according to the time sequence characteristics and the state characteristics; calculating a predicted prize value from the prize function; after the large model task is executed on the computing node, a complete Markov process is obtained and stored in an experience pool; and constructing a layered experience pool, performing joint training on the multi-head attention layer and the prediction network, and performing joint training on the multi-head attention layer and the Q network according to a new Markov process formed by calculating prediction rewards according to prediction feedback. By the scheme of the invention, the dispatching efficiency, the precision and the adaptability of the large model task are improved.

Description

Intelligent computing network scheduling method for computing and communication fusion of large model task
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to an intelligent computing network scheduling method for computing and communication fusion of large model tasks.
Background
At present, with the development of the large model technology of the generated AI, training of the large model at the cloud end has become normal. The geographically dispersed computing nodes form a complex computational power network. The computational power network dispatching center needs to dispatch the tasks of the large model to proper computing nodes and allocate computational power resources with certain specification and size to complete training and reasoning of the large model. However, the heterogeneous nature of the computational resources and the dynamic nature of the network resources present a significant challenge to the real-time scheduling of tasks. Cloud computing resource heterogeneity and geographic distribution present a series of challenges for resource management and task scheduling. Task scheduling problems in distributed computing nodes are widely recognized as NP-hard problems. The loading conditions of the computing resources and the states of the optical fiber links are dynamically changing, and it is difficult for existing heuristic or meta-heuristic algorithms to capture the states of the computing resources and the optical fiber links that are dynamically changing and to find a satisfactory solution within polynomial time. Although the deep reinforcement learning can solve the problem of resource allocation optimization in a dynamic environment, a large model task is scheduled to train on a cloud server, and the task execution time can span from hours to days. The reward signals required for deep reinforcement learning exhibit discretization and sparsity, and it is difficult for an agent to accurately estimate the long-term return for each action, and the difficulty of algorithm convergence may lead to a relatively suboptimal scheduling strategy.
Therefore, an intelligent computing network scheduling method for integrating computation and communication of large model-oriented tasks with high scheduling efficiency, accuracy and adaptability is needed.
Disclosure of Invention
In view of the above, the embodiment of the invention provides an intelligent computing network scheduling method for integrating computation and communication of large model tasks, which at least partially solves the problems of poor scheduling efficiency, accuracy and adaptability in the prior art.
The embodiment of the invention provides an intelligent computing network scheduling method for computing and communication fusion of large model tasks, which comprises the following steps:
step 1, establishing a dispatching optimization objective function with the aim of minimizing the energy efficiency of a computing power network system according to the states of computing resources and communication resources of the computing power network;
step 2, designing a deep reinforcement learning environment, designing a dispatching optimization objective function as a reward function, and combining the state of computing resources and communication resources of a computing power network, a dispatching strategy, a predicted reward value, a real reward value and a discount factor to jointly form a Markov process;
step 3, based on the multi-head attention layer, carrying out time sequence modeling on the states of the communication resources and the calculation resources in the power network in the past T time slots to obtain time sequence characteristics, and extracting state characteristics from the current large model task;
Step 4, the deep reinforcement learning agent makes a scheduling strategy according to the time sequence characteristic and the state characteristic, wherein the scheduling strategy comprises the determination of computing nodes of a computing network of communication resources, the type and the size of the computing resources and the size of the communication resources;
step 5, the prediction network predicts the execution time and energy consumption of the large model task according to the scheduling strategy and the state of the computing resource, combines the load balancing information directly fed back by the computing power network and calculates a predicted rewarding value by a rewarding function;
step 6, after the execution of the large model task on the computing node is finished, calculating a real rewarding value by a rewarding function according to real execution time, energy consumption and load balancing information, and obtaining a complete Markov process and storing the complete Markov process into an experience pool;
and 7, after the deep reinforcement learning interacts with the real environment of the computing network for preset times, extracting a section of continuous experience sample with a window larger than T from the experience pool, constructing a layered experience pool, performing joint training on the multi-head attention layer and the prediction network by using the working track of the computing node formed by the real rewards in the layered experience pool, and performing joint training on the multi-head attention layer and the Q network by forming a new Markov process according to the predicted rewards.
According to a specific implementation manner of the embodiment of the present invention, the step 1 specifically includes:
step 1.1, calculating the minimum total power loss of a single wavelength from a computing power network dispatching center to each computing node by using the minimum sensitivity of an optical receiver according to the length of an optical fiber link and the insertion loss of optical fiber equipment on the link
Step 1.2, according to the power calculation network dispatching centerSend to->Task of individual computing nodes->Calculating the current big model task +.>The number of time slots occupied for data transmission of (a)>
Step 1.3, according to minimum total power lossAnd the time slot occupied by the data transmission +.>Calculating total energy consumption of task data transmission of current large model>
Step 1.4, according to the type of the current large model taskFloating point number calculation amount required for calculating the task +.>、/>Type computing resource handling task type->Time scale parameter>And computing node->Floating point arithmetic processing rate of (a)Computing task execution time->
Step 1.5, according to the computing resourcesType of execution on task->Power consumption ratio parameter->And task execution time->Energy consumption for the execution of a computing task>
Step 1.6, calculating the load balancing condition of the computing power network according to the load condition of the computing node executing the current task and the capacity constraint boundary of the computing resource;
Step 1.7, establishing a scheduling optimization objective function according to the information, wherein the expression of the optimization objective function is as follows
Wherein,expressed as transmission task->The allocated bandwidth size, +.>Representing a computing power network dispatch centerTo->Maximum transmission bandwidth on the fibre links of the individual computing nodes,/->Is indicated at->At the beginning of the time slot, the same type in the power network>Is>Representing computing node->Use of computing resources->Executing large model computing task->Load of->And->Is the trade-off coefficient between the shortest time delay and the total energy consumption of the task, determines the relative importance of the time delay and the energy consumption in the scheduling strategy, and is +.>Representing a task being transmitted in the current computing network, < >>Representing the task being calculated in the current power network, < +.>Indicating entry into the power network dispatch center>Total tasks involved in scheduling->Represents the maximum bandwidth that a single beam can carry, < +.>Representing the computing network dispatch center->To->Maximum available beam on the fibre link of the individual computing nodes +.>Is indicated at +.>Computing node->Maximum capacity of computing resources +.>Representing the total number of types of computing resources on a computing power network, constraint C1 representing the offloading policy of a large model task >And->Constraint C2 indicates that each large model task can only be scheduled to one compute node and can only be allocated one type of computing resource of that compute node, in addition, each task can only be scheduled once, constraint C3 indicates that the transmission bandwidth allocated to each task is limited, ensuring that the maximum bandwidth transmitted to the compute node fiber link is not exceeded, avoiding network congestion and communication performance degradation, constraint C4 indicates that the beam required to transmit the task data cannot exceed the maximum available beam on the fiber link, constraint C5 indicates that the sum of the allocation of each computing resource type on each compute node must not exceed the maximum computing capacity of that computing resource type, constraint C6 indicates that allocation of large model tasks guarantees that each type of computing resource>Is within a reasonable range.
According to a specific implementation manner of the embodiment of the present invention, the step 2 specifically includes:
step 2.1 time slotThe state of the medium power network is expressed as +.>Wherein->Task indicating that the current time slot is about to be scheduled +.>The state of each computing node is denoted +.>Wherein->Indicate->Status of individual computing nodes->Is the total number of computing nodes, +. >Wherein->Indicate->First->Status of individual computing resources->Is the total number of computing resources in the computing node, the state of each fiber is denoted +.>Wherein->Representing the computing network dispatch center->And->Status of fiber links between computing nodes, +.>Wherein->Indicating optical fiber->Go up to->Status of individual DWDM beams, +.>Is the total number of beams on the fiber;
step 2.2, defining an action space asWherein->Representing the current task->Possible computing resource scheduling actions, including selection of computing nodes, selection of computing resource types and selection of corresponding specification sizes, +.>Representing a possible communication resource scheduling action;
step 2.3, designing a deep reinforcement learning reward function according to the scheduling optimization objective function,
wherein,is a super-parameter for controlling habit preferences learned by the agent, if +.>Indicating that an agent will be more inclined to perform tasks with minimal time and energy consumption during the learning process, whereas an agent is aimed at improving the overall energy efficiency of the computing network, so that it is more focused on load balancing on the computing resources of the computing network during the learning process>Representing a bonus function regarding time and energy consumption,
Wherein,is used to control the order of magnitude of the super parameter so that +.>Andat the same order of magnitude->Representing a reward function related to load balancing
Wherein,for a fixed reward, the agent is in slot +.>A prize is obtained, the goal of which is to make the neural network learn a strategy for maximizing the cumulative prize,
wherein,indicating expiration to the current time slot->Task already scheduled, < > on->Representing a discount factor;
step 2.4, modeling the scheduling problem as a Markov process usingRepresentation, wherein->State space representing environment, ++>Representing alternative action space of the scheduler, in each time slot, the scheduler is based on the current stateSelect action->Then the environmental state is according to the state transition probability +.>By->Transition to New State->At the same time, the->Representing environmental New State->Is a reward function of->Is a discount factor, wherein->The sources of the independent variables can be classified into bonus values predicted by the multi-layer perceptron ++>And the true prize value of the environmental feedback +.>
According to a specific implementation manner of the embodiment of the present invention, the step 3 specifically includes:
step 3.1, in time slotRespectively generating embedded vectors of each computing node, the computing resource and the communication resource according to the state of the communication resource and the state of the computing resource in the computing power network;
Step 3.2, each computing node is connected withIs->Each time sequence stateStatus of the current time slot->Inputting the time sequence modeling to a first multi-head attention layer, and splicing the calculation of each calculation node c in each time slot with the communication resource embedding vector to obtain a time sequence characteristic;
step 3.3, obtaining the current large model task through the embedded layerStatus of->Is embedded in the representation +.>The state features are extracted using a second multi-headed attention layer.
According to a specific implementation of an embodiment of the present invention, the prediction network includes two multi-layer perceptrons.
According to a specific implementation manner of the embodiment of the present invention, the step 5 specifically includes:
step 5.1, task the current big modelFeatures of (2) and +.>For the current large model taskThe characteristics of the pointed computing nodes are used as common input of two multi-layer perceptron;
step 5.2, selecting a computing resource type in the computing node by taking other computing-related actions by the agentAnd the size of the computing resource +.>Inputting a first multi-layer perceptron->The output is the predicted time of task execution +.>And energy consumption->
Step 5.3, the current big model task is transmitted Allocated bandwidth->Inputting a second multi-layer perceptronIn outputting the predicted time slot occupied by the data transmission +.>And total energy consumption->The predicted prize value r is formed by inputting the state of load balance of the current computing node of the power network and the prize function together.
According to a specific implementation manner of the embodiment of the present invention, the step 6 specifically includes:
acquiring real execution time of tasks after large model tasks are executedAnd energy consumption->Time of actual transmission of task->And energy consumption->Input to the reward function to obtain the true reward value +.>Deep reinforcement learning is directed to time slots +.>Middle task->Markov process->Store in experience pool->
According to a specific implementation manner of the embodiment of the present invention, the step 7 specifically includes:
step 7.1, large model task with environmental feedbackReal calculation and communication time and energy consumption combined training of multi-head attention layer and prediction network responsible for environmental state feature extraction will ∈ ->Constructed as each computing nodeIs->Wherein->,/>Indicate->Status of individual computing nodes->Is the total number of computing nodes, +.>Wherein->Indicate->First->Status of individual computing resources- >Is the total number of computing resources in the computing node, +.>Representing the computing network dispatch center->And->Status of fiber links between computing nodes, +.>WhereinIndicating optical fiber->Go up to->Status of individual DWDM beams, +.>Is the total number of beams on the fiber, +.>Task representing past time slot scheduled>,/>Indicating entry into the power network dispatch center>Total tasks participating in scheduling to calculate node order +.>Constitutes data set->Training a prediction network for mean square error loss;
step 7.2, learning experience per time slotStored in dataset->Wherein the agent randomly extracts a window from the experience pool greater than +.>Training a multi-headed attention layer from successive empirical samples of (a) and +.>A batch of experience samples are randomly extracted to form a data set +.>Training is performed, and parameters of the Q network are updated.
The intelligent computing network scheduling scheme for integrating computation and communication of large model-oriented tasks in the embodiment of the invention comprises the following steps: step 1, establishing a dispatching optimization objective function with the aim of minimizing the energy efficiency of a computing power network system according to the states of computing resources and communication resources of the computing power network; step 2, designing a deep reinforcement learning environment, designing a dispatching optimization objective function as a reward function, and combining the state of computing resources and communication resources of a computing power network, a dispatching strategy, a predicted reward value, a real reward value and a discount factor to jointly form a Markov process; step 3, based on the multi-head attention layer, carrying out time sequence modeling on the states of the communication resources and the calculation resources in the power network in the past T time slots to obtain time sequence characteristics, and extracting state characteristics from the current large model task; step 4, the deep reinforcement learning agent makes a scheduling strategy according to the time sequence characteristic and the state characteristic, wherein the scheduling strategy comprises the determination of computing nodes of a computing network of communication resources, the type and the size of the computing resources and the size of the communication resources; step 5, the prediction network predicts the execution time and energy consumption of the large model task according to the scheduling strategy and the state of the computing resource, combines the load balancing information directly fed back by the computing power network and calculates a predicted rewarding value by a rewarding function; step 6, after the execution of the large model task on the computing node is finished, calculating a real rewarding value by a rewarding function according to real execution time, energy consumption and load balancing information, and obtaining a complete Markov process and storing the complete Markov process into an experience pool; and 7, after the deep reinforcement learning interacts with the real environment of the computing network for preset times, extracting a section of continuous experience sample with a window larger than T from the experience pool, constructing a layered experience pool, performing joint training on the multi-head attention layer and the prediction network by using the working track of the computing node formed by the real rewards in the layered experience pool, and performing joint training on the multi-head attention layer and the Q network by forming a new Markov process according to the predicted rewards.
The embodiment of the invention has the beneficial effects that: according to the scheme of the invention, a multi-head attention mechanism extracts time sequence characteristics of dynamic change conditions of a network, deep reinforcement learning makes a scheduling strategy according to the characteristics of tasks and the time sequence characteristics of the network, and meanwhile, a large model execution time and energy consumption prediction network based on a Markov chain is developed and guides reinforcement learning decision-making process together with instant feedback signals of the network about task deployment. The experience playback mechanism is redesigned to better train the intelligent agent, the prediction network and the Q network are trained in layers, the multi-head attention layer for extracting the characteristics adopts a shared mechanism, the real delayed reward signals given by the environment are utilized, the prediction accuracy of time and cost is remarkably improved, the multi-head attention layer for extracting the characteristics and modeling in time sequence is effectively trained, the task scheduling process is optimized, and the scheduling efficiency, the precision and the adaptability are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of an intelligent computing network scheduling method for integrating computation and communication of tasks facing a large model according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a corresponding system model of an intelligent computing network scheduling method for integrating computation and communication of tasks oriented to a large model according to an embodiment of the present disclosure;
FIG. 3 is a model overview of a method for computing and communication fusion intelligent computing network scheduling for large model tasks according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of an empirical playback mechanism for hierarchical training of a prediction network and a Q network according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present disclosure, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should also be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided in order to provide a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
The embodiment of the invention provides an intelligent computing network scheduling method for computing and communication fusion of large model tasks, which can be applied to the cloud computing process of an Internet scene.
Referring to fig. 1, a flow chart of an intelligent computing network scheduling method for integrating computation and communication of tasks oriented to a large model is provided in an embodiment of the present invention. As shown in fig. 1, the method mainly comprises the following steps:
step 1, establishing a dispatching optimization objective function with the aim of minimizing the energy efficiency of a computing power network system according to the states of computing resources and communication resources of the computing power network;
further, the step 1 specifically includes:
step 1.1, calculating the minimum total power loss of a single wavelength from a computing power network dispatching center to each computing node by using the minimum sensitivity of an optical receiver according to the length of an optical fiber link and the insertion loss of optical fiber equipment on the link
Step 1.2, according to the power calculation network dispatching centerSend to- >Task of individual computing nodes->Calculating the current big model task +.>The number of time slots occupied for data transmission of (a)>
Step 1.3, according to minimum total power lossAnd the time slot occupied by the data transmission +.>Calculating the current large modulusTotal energy consumption of task data transmission>
Step 1.4, according to the type of the current large model taskFloating point number calculation amount required for calculating the task +.>、/>Type computing resource handling task type->Time scale parameter>And computing node->Floating point arithmetic processing rate of (a)Computing task execution time->
Step 1.5, according to the computing resourcesType of execution on task->Power consumption ratio parameter->And task execution time->Energy consumption for the execution of a computing task>
Step 1.6, calculating the load balancing condition of the computing power network according to the load condition of the computing node executing the current task and the capacity constraint boundary of the computing resource;
step 1.7, establishing a scheduling optimization objective function according to the information, wherein the expression of the optimization objective function is as follows
Wherein,expressed as transmission task->The allocated bandwidth size, +.>Representing a computing power network dispatch centerTo->Maximum transmission bandwidth on the fibre links of the individual computing nodes,/- >Is indicated at->At the beginning of the time slot, the same type in the power network>Is>Representing computing node->Use of computing resources->Executing large model computing task->Load of->And->Is the trade-off coefficient between the shortest time delay and the total energy consumption of the task, determines the relative importance of the time delay and the energy consumption in the scheduling strategy, and is +.>Representing a task being transmitted in the current computing network, < >>Representing the task being calculated in the current power network, < +.>Indicating entry into the power network dispatch center>Total tasks involved in scheduling->Represents the maximum bandwidth that a single beam can carry, < +.>Representing the computing network dispatch center->To->Maximum available beam on the fibre link of the individual computing nodes +.>Is indicated at +.>Computing node->Maximum capacity of computing resources +.>Representing the total number of types of computing resources on a computing power network, constraint C1 representing the offloading policy of a large model task>And->Constraint C2 indicates that each large model task can only be scheduled to one compute node and can only be allocated one type of computing resource of that compute node, in addition, each task can only be scheduled once, constraint C3 indicates that the transmission bandwidth allocated to each task is limited, ensuring that the maximum bandwidth transmitted to the compute node fiber link is not exceeded, avoiding network congestion and communication performance degradation, constraint C4 indicates that the beam required to transmit the task data cannot exceed the maximum available beam on the fiber link, constraint C5 indicates that the sum of the allocation of each computing resource type on each compute node must not exceed the maximum computing capacity of that computing resource type, constraint C6 indicates that allocation of large model tasks guarantees that each type of computing resource >Is within a reasonable range.
In particular, the dense wavelength division multiplexing technology is used for supporting high bandwidth and long distance in a single-mode optical fiber in consideration of the fact that communication between computing nodes distributed at different geographic positions is carried by an all-optical base formed by optical fibersAnd (5) data transmission. Each computing nodeAnd power network dispatch center->The length of the optical fiber link of the connection is +.>Each fiber link supports->Each beam, the bandwidth of each beam is +.>The data transmission rate that can be carried by a single beam is +.>Gbps, provides end-to-end data transport services.
The loss of the optical fiber link mainly comprises the insertion loss of a dispersion compensation module (Dispersion Compensation Module for short, DCM), an Erbium-doped fiber amplifier (Erbium-doped fiber amplifiers for short, EDFA), a reconfigurable optical add-Drop Multiplexer (Reconfigurable Optical Add-Drop Multiplexer for short, ROADM) and the like. The length and loss of the fiber optic links and the need for network flexibility can affect the number and location of deployments of devices such as EDFAs and ROADMs on the fiber.
In the optical transmitter, a multiplexer (Mux-Demux Unit, abbreviated as MDU) directly provides a plurality of beams for a user, and a multiplexing signal modifies the routing of the beams through Add Ports and Drop Ports on a ROADM deployed at the edge of a network, and switches wavelengths according to the network, while the ROADM is deployed at a hub portion of optical fiber communication. The degree of ROADM refers to the number of Add Ports and Drop Ports, indicating the number of beams that the ROADM device can handle simultaneously, and an increase in the value of the degree introduces additional optical elements and optical paths, resulting in an increase in insertion loss. Furthermore, directionless ROADM, typically co-deployed with conventional ROADMS, may increase the flexibility of the optical communication network, enabling the network to handle routing and wavelength management of optical signals more efficiently.
When the dispersion loss exceeds a certain threshold in the transmission process, DCM needs to be deployed at a proper position for dispersion compensation, and EDFA is used for increasing the gain of an optical signal, andand (3) representing. The locations of the DCM and the EDFA in the fiber are critical, and each additional DCM in the link brings about an insertion loss of 4dB, which needs to be compensated in the preamplifier PA of the EDFA. We define +.>Is made up of computing power network dispatching center>Send to->Tasks of individual computing nodesIs allocated bandwidth +.>The number of beams actually used for transmitting the data is +.>,/>Representing an upward integer->Representing the maximum bandwidth that a single beam can carry. The calculation formula of the output power of each beam amplified by EDFA before the optical receiver is as follows,
thus, tasks are transmitted with the lowest sensitivity calculation into the optical receiverThe minimum total power loss of the fiber optic link of (a) is,
wherein,representing the insertion loss caused by ROADM. />The insertion loss is indicated by Directionless ROADM. />The insertion loss of the MDU is expressed in dB.
Large model taskTime slot occupied by data transmission of (2)>The calculation formula is as follows,
/>
the total energy consumption calculation formula for the task data transmission is calculated by the following formula,
Large model computing taskThe execution of (2) has mainly two factors influence, +.>Representing the type of large model task,representing the number of floating point numbers required for the computational task. Computing node->Use of a floating point arithmetic processing rate +.>Is->Computing resources of type architecture perform large model tasks +.>. The task execution time is expressed as follows,
for the same large model, there is often a significant difference in processing rate and power consumption on different types of chips. We useExpressed in computing node->Type of execution on task->Computing resources at the time->Fitting parameters of->Expressed in computing resource->Type of task executed on/>Is used for the time scale parameter of the system. />Expressed in computing resource->Type of execution on task->Proportional parameters of power consumption. The power consumption of task execution may be expressed as,
task scheduling problems in a computational network can be modeled as an MDP, usingAnd (3) representing. We define at slot +.>The environmental state of (2) is->Wherein->Is the task to be scheduled for the current slot. The state of each computing node is denoted +.>Wherein->Indicate->Status of individual computing nodes->Is the total number of computing nodes, +.>Wherein->Indicate->First->Status of individual computing resources- >Is the total number of computing resources in the computing node. The state of each fiber is denoted +.>Wherein->Representing the computing network dispatch center->And->Status of fiber links between computing nodes, +.>Wherein->Indicating optical fiber->Upper firstStatus of individual DWDM beams, +.>Is the total number of beams on the fiber.
The action space contains the operation of allocating the currently available communication and computing resources to tasks, we are in time slotsThe action taken by a task is defined as +.>Wherein->Representing the current task->Possible computing resource scheduling actions, including selection of computing nodes, selection of computing resource types, and selection of corresponding specification sizes, determine the time and corresponding energy consumption of tasks during the computing process. />Representing possible communication resource scheduling actions, including selection of bandwidth size for data transmission, which will affect time and energy consumption during transmission.
In a given time slot, a power network dispatch centerResponsible for big model tasks of the user->Offloading to computing node->Go up and use->Representing task->Has been successfully offloaded to the computing nodePoint->And (3) upper part. Furthermore, use->Representing execution of the task->The type used in the execution is +. >Computing resources. Task->Related data->Transmission from a computational network dispatch center to computational nodes by DWDM technology, which occupies bandwidth +.>And (3) representing.
Computing nodeUse of computing resources->Executing large model computing task->Is +.>At computing node->Upper computing resource->Is set to be the total load of (a),
at the position ofAt the beginning of the time slot, the same type in the power network>The total load of the computing resources of (a) is,
to ensure uniform distribution of computing tasks in a computing power network and to improve utilization, performance, and availability of computing resources. Each computing nodeMiddle->The computational load of a type of computational resource needs to satisfy the following load balancing formula,
wherein the method comprises the steps ofAnd->Is the two boundary factors of the computational power network, the minimum and maximum boundaries that control the load balancing execution. />Is indicated at +.>Computing node->The maximum capacity of the computing resource.
To maximize the utilization of available communication and computing resources, we present an important optimization problem, namely latency and energy consumption minimization, also known as energy efficiency minimization. Our goal is to minimize the total delay and total energy consumption in the task scheduling process to improve the resource utilization efficiency of the system. We will set up the scheduling optimization objective function for this large model task scheduling problem, expressed as:
Wherein,expressed as transmission task +.>The allocated bandwidth size, +.>Representing the computing network dispatch center->To->Maximum transmission bandwidth on the fibre links of the individual computing nodes,/->Is indicated at->At the beginning of the time slot, the same type in the power network>Is>Representing computing node->Use of computing resources->Executing large model computing task->Load of->And->Is the trade-off coefficient between the shortest time delay and the total energy consumption of the task, determines the relative importance of the time delay and the energy consumption in the scheduling strategy, and is +.>Representing a task being transmitted in the current computing network, < >>Representing the task being calculated in the current computational power network,indicating entry into the power network dispatch center>Total tasks involved in scheduling->Represents the maximum bandwidth that a single beam can carry, < +.>Representing the computing network dispatch center->To->Maximum available beam on the fibre link of the individual computing nodes +.>Is indicated at +.>Computing node->Maximum capacity of computing resources +.>Representing the total number of types of computing resources on a computing power network, constraint C1 representing the offloading policy of a large model task>And->Constraint C2 indicates that each large model task can only be scheduled to one compute node and can only be allocated one type of computing resource of that compute node, in addition, each task can only be scheduled once, constraint C3 indicates that the transmission bandwidth allocated to each task is limited, ensuring that the maximum bandwidth transmitted to the compute node fiber link is not exceeded, avoiding network congestion and communication performance degradation, constraint C4 indicates that the beam required to transmit the task data cannot exceed the maximum available beam on the fiber link, constraint C5 indicates that the sum of the allocation of each computing resource type on each compute node must not exceed the maximum computing capacity of that computing resource type, constraint C6 indicates that allocation of large model tasks guarantees that each type of computing resource >Is within a reasonable range.
Step 2, designing a deep reinforcement learning environment, designing a dispatching optimization objective function as a reward function, and combining the state of computing resources and communication resources of a computing power network, a dispatching strategy, a predicted reward value, a real reward value and a discount factor to jointly form a Markov process;
on the basis of the above embodiment, the step 2 specifically includes:
step 2.1 time slotThe state of the medium power network is expressed as +.>Wherein->Task indicating that the current time slot is about to be scheduled +.>The state of each computing node is denoted +.>Wherein->Represent the firstStatus of individual computing nodes->Is the total number of computing nodes, +.>WhereinIndicate->First->Status of individual computing resources->Is the total number of computing resources in the computing node, the state of each fiber is denoted +.>Wherein->Representing the computing network dispatch center->And->Status of fiber links between computing nodes, +.>Wherein->Indicating optical fiberGo up to->Status of individual DWDM beams, +.>Is the total number of beams on the fiber;
step 2.2, defining an action space asWherein->Representing the current task->Possible computing resource scheduling actions, including selection of computing nodes, selection of computing resource types and selection of corresponding specification sizes, +. >Representing a possible communication resource scheduling action;
step 2.3, designing a deep reinforcement learning reward function according to the scheduling optimization objective function,
wherein,is a super-parameter for controlling habit preferences learned by the agent, if +.>Indicating that an agent will be more inclined to perform tasks with minimal time and energy consumption during the learning process, whereas an agent is aimed at improving the overall energy efficiency of the computing network, so that it is more focused on load balancing on the computing resources of the computing network during the learning process>Representing a bonus function regarding time and energy consumption,
wherein,is used to control the order of magnitude of the super parameter so that +.>Andat the same order of magnitude->Representing a load-balancing related reward function +.>
Wherein,for a fixed reward, the agent is in slot +.>A prize is obtained, the goal of which is to make the neural network learn a strategy for maximizing the cumulative prize,
wherein,indicating expiration to the current time slot->Task already scheduled, < > on->Representing a discount factor;
step 2.4, modeling the scheduling problem as a Markov process usingRepresentation, wherein->State space representing environment, ++>Representing alternative action space of the scheduler, in each time slot, the scheduler is based on the current state Select action->Then the environmental state is according to the state transition probability +.>By->Transition to New State->At the same time, the->Representing environmental New State->Is a reward function of->Is a discount factor, wherein->The sources of the independent variables can be classified into bonus values predicted by the multi-layer perceptron ++>And the true prize value of the environmental feedback +.>
In particular, the design factor of the reward function is considered to be that the sum of the time and energy consumption of task calculation and communication is minimized under the condition that the resource constraint is met. We want to ensure that the scheduling policy of tasks strictly meets the capacity constraints of the communication and computing resources, i.e. The capacity of the computing and communication resources allocated by a single task should not exceed the maximum capacity of the resources, while at the same time the current time slot must be ensured +.>The resources allocated to the task do not cause the load of the computing resources corresponding to the computing node to exceed the maximum capacity, otherwise the task scheduling failure is considered. Load balancing constraint->Does not cause execution failure of a single task, butWhich affects the efficiency of the overall power network. Thus +.>The bonus function is expressed, for example, as follows,
the scheduling strategy firstly meets the resource constraint to execute the task, and then reduces the execution time and the energy consumption as much as possible on the basis. To this end we devised a reward function with respect to time and energy consumption For indicating rewards given to the agent by task execution time and energy consumption, as follows,
reward function related to load balancingIt can be represented as follows,
wherein,is a fixed prize. The intelligent agent is in the time slot->Obtaining a reward with the goal of making the neural network learn a strategy of maximum cumulative rewards,/->
Wherein,indicating expiration to the current time slot->Tasks that have been scheduled.
The scheduling problem can then be modeled as a Markov process, usingRepresentation, wherein->State space representing environment, ++>Representing alternative action space of the scheduler, in each time slot, the scheduler is based on the current stateSelect action->Then the environmental state is according to the state transition probability +.>By->Transition to New State->At the same time, the->Representing environmental New State->Is a reward function of->Is a discount factor, wherein->The sources of the independent variables can be classified into bonus values predicted by the multi-layer perceptron ++>And the true prize value of the environmental feedback +.>
Step 3, based on the multi-head attention layer, carrying out time sequence modeling on the states of the communication resources and the calculation resources in the power network in the past T time slots to obtain time sequence characteristics, and extracting state characteristics from the current large model task;
On the basis of the above embodiment, the step 3 specifically includes:
step 3.1, in time slotRespectively generating embedded vectors of each computing node, the computing resource and the communication resource according to the state of the communication resource and the state of the computing resource in the computing power network;
step 3.2, each computing node is connected withIs->The status of the time sequence state and the status of the current time slot->Inputting the time sequence modeling to a first multi-head attention layer, and splicing the calculation of each calculation node c in each time slot with the communication resource embedding vector to obtain a time sequence characteristic;
step 3.3, obtaining the current large model task through the embedded layerStatus of->Is embedded in the representation of (a)/>The state features are extracted using a second multi-headed attention layer.
In practice, to cope with complex dynamic changes in the computational power network, including resource utilization fluctuations, load imbalances, and unpredictable workloads are included. We have designed two multi-headed layers of attention to help the agent understand the diversity and dynamics between different types of resources and the characteristics of different types of tasks. Wherein, the first multi-head attention layer is specially responsible for carrying out time sequence modeling on the communication and calculation resource states of the calculation power network, and the other multi-head attention layer is responsible for extracting the current task Is characterized by (3).
In time slot t, we consider the state of the computing resources of the power networkAnd communication resource status. Each row of the state matrix represents an independent computing node, and first, the embedded vectors of each node related to the computing and communication resources are respectively generated according to the sequence of the computing nodes, which are respectively expressed as +.>And->. Further, we splice the computation of each computation node c in the time slot t with the communication resource embedding vector to obtain. It is noted that we do not just the information of the current moment, but by associating it with the past +.>Status of individual time slotsThe information is combined to form a message containing->Sequences embedded in time sequence, i.e.)>
In time slotMiddle, big model task->Status of->Embedded representation obtained by an embedding layer->. Unlike time-series modeling of resource states, time-series information of task states is not a critical consideration affecting task scheduling.
We will each compute nodeIs->The time sequence states are input into a first multi-head attention layer to conduct time sequence modeling so as to capture the characteristics of each computing node more effectively and enhance the understanding capability of an intelligent agent to a dynamically-changing computing network. At the same time, the second multi-headed attention layer focuses on extracting task-related features. Finally, the task characteristics and the dynamic resource characteristics are fused by splicing the output of two multi-head attention layers to form the characteristic representation +. >
Step 4, the deep reinforcement learning agent makes a scheduling strategy according to the time sequence characteristic and the state characteristic, wherein the scheduling strategy comprises the determination of computing nodes of a computing network of communication resources, the type and the size of the computing resources and the size of the communication resources;
in particular, the action space contains the operation of allocating the currently available communication and computing resources to tasks, we are in time slotsThe action taken by a task is defined as +.>Wherein->Representing the current task->Possible computing resource scheduling actions, including selection of computing nodes, selection of computing resource types, and selection of corresponding specification sizes, determine the time and corresponding energy consumption of tasks during the computing process. />Representing possible communication resource scheduling actions, including selection of bandwidth size for data transmission, which will affect time and energy consumption during transmission.
Step 5, the prediction network predicts the execution time and energy consumption of the large model task according to the scheduling strategy and the state of the computing resource, combines the load balancing information directly fed back by the computing power network and calculates a predicted rewarding value by a rewarding function;
optionally, the prediction network includes two multi-layer perceptrons.
Further, the step 5 specifically includes:
step 5.1, task the current big modelFeatures of (2) and +.>For the current large model taskThe characteristics of the pointed computing nodes are used as common input of two multi-layer perceptron;
step 5.2, selecting a computing resource type in the computing node by taking other computing-related actions by the agentAnd the size of the computing resource +.>Inputting a first multi-layer perceptron->The output is the predicted time of task execution +.>And energy consumption->
Step 5.3, the current big model task is transmittedAllocated bandwidth->Inputting a second multi-layer perceptronIn outputting the predicted time slot occupied by the data transmission +.>And total energy consumption->The predicted prize value r is formed by inputting the state of load balance of the current computing node of the power network and the prize function together.
In particular, consider the unavoidable questions faced by scheduling large model tasks using DRLThe result of uncertainty in time and energy consumption, which is the subject of large model task scheduling, has a long-term impact on rewards of DRL. Communication energy consumption of rewarding function (formula 12) related to time and energy consumptionAnd transmission duration->Regarding (equation 4), the energy consumption of a computing node to perform the task +. >Execution time->Correlation (equation 6), and ∈>Often is less than->And more difficult to determine. The decision on the constraint in the bonus function (equation 11) is that the environment can be given in time, while a portion of the specific bonus value is that with a long delay bonus (equation 12). Therefore, we have devised a predictive network.
The prediction network is depth-coupled with deep reinforcement learning. The main idea is that we use the predicted time and energy consumption as an alternative reward to the long-delay rewards under the primary constraint that the agent scheduling policy is compliant. The prediction network is formed by two multi-layer perceptron (MLP) to predict the time and energy consumption of task execution and task data transmission. In the decision making process, the current taskFeatures of (2) and +.>For tasks->Pointed computing nodeThe sign is a common input to both MLPs. />The input of (a) also includes other computing-related actions by the agent, the selection of the computing resource type in the computing node +.>And the size of the computing resource +.>Time of task execution of the current scheduling policy +.>And energy consumption->And (5) predicting. />The additional input in (a) is a communication related action, i.e. transmitting the task +. >Allocated bandwidth->Time of task data transmission for the current scheduling policy +.>And energy consumption->And predicting and outputting, and inputting the predicted prize value r together with the load balance state of the current power network computing node into a prize function.
Step 6, after the execution of the large model task on the computing node is finished, calculating a real rewarding value by a rewarding function according to real execution time, energy consumption and load balancing information, and obtaining a complete Markov process and storing the complete Markov process into an experience pool;
further, the step 6 specifically includes:
acquiring real execution time of tasks after large model tasks are executedAnd energy consumption->Time of actual transmission of task->And energy consumption->Input to the reward function to obtain the true reward value +.>Deep reinforcement learning is directed to time slots +.>Middle task->Markov process->Store in experience pool
And 7, after the deep reinforcement learning interacts with the real environment of the computing network for preset times, extracting a section of continuous experience sample with a window larger than T from the experience pool, constructing a layered experience pool, performing joint training on the multi-head attention layer and the prediction network by using the working track of the computing node formed by the real rewards in the layered experience pool, and performing joint training on the multi-head attention layer and the Q network according to a new Markov process formed by calculating the predicted rewards according to the predicted feedback.
On the basis of the above embodiment, the step 7 specifically includes:
step 7.1, large model task with environmental feedbackReal calculation and communication time and energy consumption combined training of multi-head attention layer and prediction network responsible for environmental state feature extraction will ∈ ->Constructed as each computing nodeIs->Wherein->,/>Indicate->Status of individual computing nodes->Is the total number of computing nodes, +.>Wherein->Indicate->First->Status of individual computing resources->Is the total number of computing resources in the computing node, +.>Representing the calculation powerNetwork scheduling center->And->Status of fiber links between computing nodes, +.>WhereinIndicating optical fiber->Go up to->Status of individual DWDM beams, +.>Is the total number of beams on the fiber, +.>Task representing past time slot scheduled>,/>Indicating entry into the power network dispatch center>Total tasks participating in scheduling to calculate node order +.>Constitutes data set->Training a prediction network for mean square error loss;
step 7.2, learning experience per time slotStored in dataset->Wherein the agent randomly extracts a window from the experience pool greater than +.>Training a multi-headed attention layer from successive empirical samples of (a) and +. >A batch of experience samples are randomly extracted to form a data set +.>Training is performed, and parameters of the Q network are updated.
In particular implementations, the agent uses only the previously designed predictive and environmental real-time feedback combined rewards function in interacting with the environment to accommodate such long-delay rewards. Taking Double-DQN as an example, in the decision process, the on-line network and the target network of Double-DQN perform gradient descent through a mean square error function, so as to improve the best action Q value estimation. Meanwhile, in the experience playback step, the environment is utilized to feed back the long-delay rewards of the intelligent agent on time and energy consumption parameters in time.
We have improved the empirical playback mechanism of DRL and split it into two parts with finer granularity. The experience of the agent in each time slot is stored in an experience poolWherein->Unlike other DRL-based task scheduling algorithms, here +.>We are designed to be rewarded with a combination of prediction and partial real-time environmental feedback>(equation 11) and rewards of real feedback of the environment after the task is performed by the computing network +.>
Specifically, the first step of the experience playback mechanism is a task that utilizes environmental feedbackReal calculation and communication time and energy consumption combined training of multi-head attention layer and prediction network responsible for environmental state feature extraction, which part of corresponding experience is that ∈ >Construction for each computing node->Is->Wherein,/>Indicate->Status of individual computing nodes->It is the total number of the calculation nodes,wherein->Indicate->First->Status of individual computing resources->Is the total number of computing resources in the computing node, +.>Representing the computing network dispatch center->And->Status of fiber links between computing nodes, +.>Wherein->Indicating optical fiber->Go up to->Status of individual DWDM beams, +.>Is the total number of beams on the fiber, +.>Tasks indicating past time slots were scheduled,/>Indicating entry into the power network dispatch center>Participating in scheduling of aggregate tasks to compute node orderConstitutes data set->The mean square error loss of the predictive network is trained.
The second step is the update of Double-DQN network parameters, learning experience of each time slotStored in dataset->Is a kind of medium. In the experience playback step, the agent randomly extracts a window greater than +.>Enhancing the environmental feature extraction capability of the multi-head attention layer of the time series modeling and predicting the accuracy of the network, while the updating of the parameters of the Q network can be from the experience pool->A batch of experience samples are randomly extracted to form +.>Training is performed.
According to the intelligent computing network scheduling method for the large model task fusion of computation and communication, the time sequence characteristics of the dynamic change condition of the computing network are extracted through a multi-head attention mechanism, the scheduling strategy is made according to the characteristics of the task and the time sequence characteristics of the computing network through deep reinforcement learning, meanwhile, a large model execution time and energy consumption prediction network based on a Markov chain is developed, and the large model execution time and energy consumption prediction network and a computing network instant feedback signal related to task deployment are used for guiding a reinforcement learning decision process together. The experience playback mechanism is redesigned to better train the intelligent agent, the prediction network and the Q network are trained in layers, the multi-head attention layer for extracting the characteristics adopts a shared mechanism, the real delayed reward signals given by the environment are utilized, the prediction accuracy of time and cost is remarkably improved, the multi-head attention layer for extracting the characteristics and modeling in time sequence is effectively trained, the task scheduling process is optimized, and the scheduling efficiency, the precision and the adaptability are improved.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (7)

1. The intelligent computing network scheduling method for the calculation and communication fusion of the large model task is characterized by comprising the following steps of:
step 1, establishing a dispatching optimization objective function with the aim of minimizing the energy efficiency of a computing power network system according to the states of computing resources and communication resources of the computing power network;
the step 1 specifically includes:
step 1.1, calculating the minimum total power loss of a single wavelength from a computing power network dispatching center to each computing node by using the minimum sensitivity of an optical receiver according to the length of an optical fiber link and the insertion loss of optical fiber equipment on the link
Step 1.2, according to the power calculation network dispatching centerSend to->Task of individual computing nodes->Data size and allocated bandwidth calculation of (a)Current big model task->The number of time slots occupied for data transmission of (a)>
Step 1.3, according to minimum total power lossAnd the time slot occupied by the data transmission +.>Calculating total energy consumption of task data transmission of current large model>
Step 1.4, according to the type of the current large model taskFloating point number calculation amount required for calculating the task +.>、/>Type computing resource handling task type->Time scale parameter>And computing node->Floating point arithmetic processing rate +.>Computing task execution time- >
Step 1.5, according to the computing resourcesType of execution on task->Power consumption ratio parameter->And task execution timeEnergy consumption for the execution of a computing task>
Step 1.6, calculating the load balancing condition of the computing power network according to the load condition of the computing node executing the current task and the capacity constraint boundary of the computing resource;
step 1.7, establishing a scheduling optimization objective function according to the information, wherein the expression of the optimization objective function is as follows
Wherein,expressed as transmission task->The allocated bandwidth size, +.>Representing the computing network dispatch center->To the firstMaximum transmission bandwidth on the fibre links of the individual computing nodes,/->Is indicated at->At the beginning of the time slot, the same type in the power network>Is>Representing computing node->Use of computing resources->Executing large model computing task->Load of->And->Is the trade-off coefficient between the shortest time delay and the total energy consumption of the task, determines the relative importance of the time delay and the energy consumption in the scheduling strategy, and is +.>Representing a task being transmitted in the current computing network, < >>Representing the task being calculated in the current power network, < +.>Indicating entry into the power network dispatch center>Total tasks involved in scheduling->Represents the maximum bandwidth that a single beam can carry, < +. >Representing the computing network dispatch center->To->Maximum available beam on the fibre link of the individual computing nodes +.>Is indicated at +.>Computing node->Maximum capacity of computing resources +.>Representing the total number of types of computing resources on a computing power network, constraint C1 representing the offloading policy of a large model task>And->Constraint C2 indicates that each large model task can only be scheduled to one compute node and can only be allocated one type of computing resource of that compute node, in addition, each task can only be scheduled once, constraint C3 indicates that the transmission bandwidth allocated to each task is limited, ensuring that the maximum bandwidth transmitted to the compute node fiber link is not exceeded, avoiding network congestion and communication performance degradation, constraint C4 indicates that the beam required to transmit the task data cannot exceed the maximum available beam on the fiber link, constraint C5 indicates that the sum of the allocation of each computing resource type on each compute node must not exceed the maximum computing capacity of that computing resource type, constraint C6 indicates that allocation of large model tasks guarantees that each type of computing resource>Is to be within a reasonable range;
step 2, designing a deep reinforcement learning environment, designing a dispatching optimization objective function as a reward function, and combining the state of computing resources and communication resources of a computing power network, a dispatching strategy, a predicted reward value, a real reward value and a discount factor to jointly form a Markov process;
Step 3, based on the multi-head attention layer, carrying out time sequence modeling on the states of the communication resources and the calculation resources in the power network in the past T time slots to obtain time sequence characteristics, and extracting state characteristics from the current large model task;
step 4, the deep reinforcement learning agent makes a scheduling strategy according to the time sequence characteristic and the state characteristic, wherein the scheduling strategy comprises the determination of computing nodes of a computing network of communication resources, the type and the size of the computing resources and the size of the communication resources;
step 5, the prediction network predicts the execution time and energy consumption of the large model task according to the scheduling strategy and the state of the computing resource, combines the load balancing information directly fed back by the computing power network and calculates a predicted rewarding value by a rewarding function;
step 6, after the execution of the large model task on the computing node is finished, calculating a real rewarding value by a rewarding function according to real execution time, energy consumption and load balancing information, and obtaining a complete Markov process and storing the complete Markov process into an experience pool;
and 7, after the deep reinforcement learning interacts with the real environment of the computing network for preset times, extracting a section of continuous experience sample with a window larger than T from the experience pool, constructing a layered experience pool, performing joint training on the multi-head attention layer and the prediction network by using the working track of the computing node formed by the real rewards in the layered experience pool, and performing joint training on the multi-head attention layer and the Q network by forming a new Markov process according to the predicted rewards.
2. The method according to claim 1, wherein the step 2 specifically comprises:
step 2.1 time slotThe state of the medium power network is expressed as +.>Wherein->Task indicating that the current time slot is about to be scheduled +.>The state of each computing node is denoted +.>Wherein->Represent the firstStatus of individual computing nodes->Is the total number of computing nodes, +.>WhereinIndicate->First->Status of individual computing resources->Is the total number of computing resources in the computing node, the state of each fiber is denoted +.>Wherein->Representing the computing network dispatch center->And->Status of fiber links between computing nodes, +.>Wherein->Indicating optical fiberGo up to->Status of individual DWDM beams, +.>Is the total number of beams on the fiber;
step 2.2, defining an action space asWherein->Representing a current taskPossible computing resource scheduling actions, including selection of computing nodes, selection of computing resource types and selection of corresponding specification sizes, +.>Representing a possible communication resource scheduling action;
step 2.3, designing a deep reinforcement learning reward function according to the scheduling optimization objective function,
wherein,is a super-parameter for controlling habit preferences learned by the agent, if +. >Indicating that the agent will be more prone to executing any event in the learning process with minimal time and energy consumptionOn the contrary, the intelligent agent aims at improving the overall energy efficiency of the computing network, so that the intelligent agent is more focused on the load balance of each computing resource of the computing network in the learning process,representing a bonus function regarding time and energy consumption,
wherein,is used to control the order of magnitude of the super parameter so that +.>Andat the same order of magnitude->Representing a reward function related to load balancing
Wherein,for a fixed reward, the agent is in slot +.>A prize is obtained, the goal of which is to make the neural network learn a strategy for maximizing the cumulative prize,
wherein,indicating expiration to the current time slot->Task already scheduled, < > on->Representing a discount factor;
step 2.4, modeling the scheduling problem as a Markov process usingRepresentation, wherein->State space representing environment, ++>Representing alternative action space of the scheduler, in each time slot, the scheduler is based on the current stateSelect action->Then the environmental state is according to the state transition probability +.>By->Transition to New State->At the same time, the->Representing environmental New State->Is a reward function of->Is a discount factor, wherein- >The sources of the independent variables can be classified into bonus values predicted by the multi-layer perceptron ++>And the true prize value of the environmental feedback +.>
3. The method according to claim 2, wherein the step 3 specifically comprises:
step 3.1, in time slotRespectively generating embedded vectors of each computing node, the computing resource and the communication resource according to the state of the communication resource and the state of the computing resource in the computing power network;
step 3.2, each computing node is connected withIs->The status of the time sequence state and the status of the current time slot->The time sequence modeling is carried out by inputting the data into a first multi-head attention layer, and the calculation of each calculation node c in each time slot is spliced with the communication resource embedded vector to obtainTo a timing feature;
step 3.3, obtaining the current large model task through the embedded layerStatus of->Is embedded in the representation +.>The state features are extracted using a second multi-headed attention layer.
4. A method according to claim 3, wherein the predictive network comprises two multi-layer perceptrons.
5. The method according to claim 4, wherein the step 5 specifically comprises:
step 5.1, task the current big modelFeatures of (2) and +. >Task for the current big model>The characteristics of the pointed computing nodes are used as common input of two multi-layer perceptron;
step 5.2, selecting a computing resource type in the computing node by taking other computing-related actions by the agentAnd the size of the computing resource +.>Inputting a first multi-layer perceptron->The output is the predicted time of task execution +.>And energy consumption->
Step 5.3, the current big model task is transmittedAllocated bandwidth->Inputting the second multi-layer perceptron->In outputting the predicted time slot occupied by the data transmission +.>And total energy consumption->The predicted prize value r is formed by inputting the state of load balance of the current computing node of the power network and the prize function together.
6. The method according to claim 5, wherein the step 6 specifically includes:
acquiring real execution time of tasks after large model tasks are executedAnd energy consumption->Time of actual transmission of task->And energy consumption->Input to the reward function to obtain the true reward value +.>Deep reinforcement learning is directed to time slots +.>Middle task->Markov process->Store in experience pool->
7. The method according to claim 6, wherein the step 7 specifically includes:
Step 7.1, large model task with environmental feedbackReal calculation and communication time and energy consumption combined training of multi-head attention layer and prediction network responsible for environmental state feature extraction will ∈ ->Construction for each computing node->Is->Wherein->,/>Indicate->Status of individual computing nodes->Is the total number of computing nodes, +.>Wherein->Indicate->First->Status of individual computing resources->Is the total number of computing resources in the computing node, +.>Representing the computing network dispatch center->And->Status of fiber links between computing nodes, +.>Wherein->Indicating optical fiber->Go up to->Status of individual DWDM beams, +.>Is the total number of beams on the fiber, +.>Task representing past time slot scheduled>,/>Indicating entry into the power network dispatch center>Total tasks participating in scheduling to calculate node order +.>Constitutes data set->Training a prediction network for mean square error loss;
step 7.2, learning experience per time slotStored in dataset->Wherein the agent randomly extracts a window from the experience pool greater than +.>Training a multi-headed attention layer from successive empirical samples of (a) and +.>A batch of experience samples are randomly extracted to form a data set +. >Training is performed, and parameters of the Q network are updated.
CN202410130270.8A 2024-01-31 2024-01-31 Intelligent computing network scheduling method for computing and communication fusion of large model task Active CN117667360B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410130270.8A CN117667360B (en) 2024-01-31 2024-01-31 Intelligent computing network scheduling method for computing and communication fusion of large model task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410130270.8A CN117667360B (en) 2024-01-31 2024-01-31 Intelligent computing network scheduling method for computing and communication fusion of large model task

Publications (2)

Publication Number Publication Date
CN117667360A CN117667360A (en) 2024-03-08
CN117667360B true CN117667360B (en) 2024-04-16

Family

ID=90071615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410130270.8A Active CN117667360B (en) 2024-01-31 2024-01-31 Intelligent computing network scheduling method for computing and communication fusion of large model task

Country Status (1)

Country Link
CN (1) CN117667360B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8458715B1 (en) * 2007-02-23 2013-06-04 Hrl Laboratories, Llc System for allocating resources to optimize transition from a current state to a desired state
CN112232478A (en) * 2020-09-03 2021-01-15 天津(滨海)人工智能军民融合创新中心 Multi-agent reinforcement learning method and system based on layered attention mechanism
CN113543156A (en) * 2021-06-24 2021-10-22 中国科学院沈阳自动化研究所 Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning
CN113573324A (en) * 2021-07-06 2021-10-29 河海大学 Cooperative task unloading and resource allocation combined optimization method in industrial Internet of things
CN114327827A (en) * 2021-12-28 2022-04-12 中国联合网络通信集团有限公司 Task processing method and device and storage medium
CN114401532A (en) * 2022-01-24 2022-04-26 天津大学 Intra-network pooled resource allocation optimization method based on contribution perception in computational power network
WO2022199792A1 (en) * 2021-03-22 2022-09-29 Telefonaktiebolaget Lm Ericsson (Publ) Reward estimation for a target policy
CN115168027A (en) * 2022-06-15 2022-10-11 中国科学院沈阳自动化研究所 Calculation power resource measurement method based on deep reinforcement learning
CN115268493A (en) * 2022-07-25 2022-11-01 中南大学 Large-scale multi-unmanned-aerial-vehicle task scheduling method based on double-layer reinforcement learning
CN115809147A (en) * 2023-01-16 2023-03-17 合肥工业大学智能制造技术研究院 Multi-edge cooperative cache scheduling optimization method, system and model training method
CN116156563A (en) * 2023-01-31 2023-05-23 中国科学院沈阳自动化研究所 Heterogeneous task and resource end edge collaborative scheduling method based on digital twin
CN116546021A (en) * 2023-06-12 2023-08-04 重庆邮电大学 Agent policy learning method with privacy protection in mobile edge calculation
WO2023184939A1 (en) * 2022-03-28 2023-10-05 福州大学 Deep-reinforcement-learning-based adaptive efficient resource allocation method for cloud data center

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220012585A1 (en) * 2020-07-10 2022-01-13 Hitachi, Ltd. Deep reinforcement learning with short-term adjustments
CN115936080A (en) * 2021-10-01 2023-04-07 三星电子株式会社 Apparatus and method for large scale computing
US20230401363A1 (en) * 2022-06-13 2023-12-14 Mitsubishi Electric Research Laboratories, Inc. GaN Distributed RF Power Amplifier Automation Design with Deep Reinforcement Learning

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8458715B1 (en) * 2007-02-23 2013-06-04 Hrl Laboratories, Llc System for allocating resources to optimize transition from a current state to a desired state
CN112232478A (en) * 2020-09-03 2021-01-15 天津(滨海)人工智能军民融合创新中心 Multi-agent reinforcement learning method and system based on layered attention mechanism
WO2022199792A1 (en) * 2021-03-22 2022-09-29 Telefonaktiebolaget Lm Ericsson (Publ) Reward estimation for a target policy
CN113543156A (en) * 2021-06-24 2021-10-22 中国科学院沈阳自动化研究所 Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning
CN113573324A (en) * 2021-07-06 2021-10-29 河海大学 Cooperative task unloading and resource allocation combined optimization method in industrial Internet of things
CN114327827A (en) * 2021-12-28 2022-04-12 中国联合网络通信集团有限公司 Task processing method and device and storage medium
CN114401532A (en) * 2022-01-24 2022-04-26 天津大学 Intra-network pooled resource allocation optimization method based on contribution perception in computational power network
WO2023184939A1 (en) * 2022-03-28 2023-10-05 福州大学 Deep-reinforcement-learning-based adaptive efficient resource allocation method for cloud data center
CN115168027A (en) * 2022-06-15 2022-10-11 中国科学院沈阳自动化研究所 Calculation power resource measurement method based on deep reinforcement learning
CN115268493A (en) * 2022-07-25 2022-11-01 中南大学 Large-scale multi-unmanned-aerial-vehicle task scheduling method based on double-layer reinforcement learning
CN115809147A (en) * 2023-01-16 2023-03-17 合肥工业大学智能制造技术研究院 Multi-edge cooperative cache scheduling optimization method, system and model training method
CN116156563A (en) * 2023-01-31 2023-05-23 中国科学院沈阳自动化研究所 Heterogeneous task and resource end edge collaborative scheduling method based on digital twin
CN116546021A (en) * 2023-06-12 2023-08-04 重庆邮电大学 Agent policy learning method with privacy protection in mobile edge calculation

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Muti-agent reinforcement learning for dynamic resouerce management in 6G in-X subnetworks;Du Xiao等;IEEE Transactions on Wireless Communications;20221231;第22卷(第3期);第1900-1914页 *
云环境下基于强化学习的任务调度问题研究;陈新鹏;汪莹;;现代计算机;20200325(09);全文 *
基于改进分层强化学习的CPS指令多目标动态优化分配算法;余涛;王宇名;叶文加;刘前进;;中国电机工程学报;20110705(19);全文 *
深度强化学习中稀疏奖励问题研究综述;杨惟轶;白辰甲;蔡超;赵英男;刘鹏;;计算机科学;20200331(03);全文 *
面向算力网络的业务调度方法的研究与实现;孙慧悦;中国优秀硕士学位论文全文数据库 信息科技辑;20240115;全文 *

Also Published As

Publication number Publication date
CN117667360A (en) 2024-03-08

Similar Documents

Publication Publication Date Title
Chen et al. DeepRMSA: A deep reinforcement learning framework for routing, modulation and spectrum assignment in elastic optical networks
Gu et al. Machine learning for intelligent optical networks: A comprehensive survey
JP6942397B2 (en) How to develop a singletasking offload strategy in a mobile edge computing scenario
Zhang et al. Overview on routing and resource allocation based machine learning in optical networks
CN111586502B (en) Resource allocation method and system in elastic optical network
Zhu et al. Hybrid machine learning EDFA model
CN102246435A (en) System and method for impairment-aware routing and wavelength assignment in wavelength switched optical networks
CN113407345B (en) Target driving calculation unloading method based on deep reinforcement learning
Di Cicco et al. On deep reinforcement learning for static routing and wavelength assignment
CN109450587B (en) Spectrum integration processing method, device and storage medium
Yu et al. A deep learning based RSA strategy for elastic optical networks
CN117667360B (en) Intelligent computing network scheduling method for computing and communication fusion of large model task
Huang et al. Self-learning routing for optical networks
Yu et al. Multi visual GRU based survivable computing power scheduling in metro optical networks
Yi et al. Cost-optimized joint resource allocation in grids/clouds with multilayer optical network architecture
Li et al. Swarm-intelligence-based routing and wavelength assignment in optical satellite networks
Rai et al. Deep learning—a route to WDM high-speed optical networks
Cheng et al. Routing and spectrum assignment employing long short-term memory technique for elastic optical networks
Biswas et al. Q-Learning-Based Energy-Efficient Network Planning in IP-Over-EON
Zhao et al. Cost-efficient routing, modulation, wavelength and port assignment using reinforcement learning in optical transport networks
Jiao et al. Reliability-Oriented RSA Combined with Reinforcement Learning in Elastic Optical Networks
Morette et al. Machine learning enhancement of a digital twin for WDM network performance prediction leveraging Quality of Transmission parameter refinement
Belkout et al. A load balancing and routing strategy in fog computing using deep reinforcement learning
Matzner et al. Ultra-fast optical network throughput prediction using graph neural networks
Mehr et al. Performance of resource delayed release strategy in software-defined OTN over WDM networks for uniform and non-uniform traffic

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant