CN114168328A - Mobile edge node calculation task scheduling method and system based on federal learning - Google Patents
Mobile edge node calculation task scheduling method and system based on federal learning Download PDFInfo
- Publication number
- CN114168328A CN114168328A CN202111478407.1A CN202111478407A CN114168328A CN 114168328 A CN114168328 A CN 114168328A CN 202111478407 A CN202111478407 A CN 202111478407A CN 114168328 A CN114168328 A CN 114168328A
- Authority
- CN
- China
- Prior art keywords
- task
- mobile edge
- training
- edge node
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000004364 calculation method Methods 0.000 title claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 82
- 230000002776 aggregation Effects 0.000 claims abstract description 29
- 238000004220 aggregation Methods 0.000 claims abstract description 29
- 230000008569 process Effects 0.000 claims abstract description 23
- 230000004044 response Effects 0.000 claims abstract description 13
- 230000009471 action Effects 0.000 claims description 28
- 238000004891 communication Methods 0.000 claims description 24
- 230000003993 interaction Effects 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 6
- 230000004931 aggregating effect Effects 0.000 claims description 4
- 230000007613 environmental effect Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/10—Interfaces, programming languages or software development kits, e.g. for simulating neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/502—Proximity
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The application discloses a mobile edge node calculation task scheduling method based on federal learning and a system thereof, wherein the mobile edge node calculation task scheduling method based on federal learning specifically comprises the following steps: initializing information parameters; in response to completing initialization, locally training a DQN network deployed at each mobile edge node; judging whether the number of updating rounds meets the aggregation frequency or not in the DQN network training process; if the number of the updating rounds meets the aggregation frequency, updating global parameters; in response to the completion of global parameter updating, judging whether the number of training rounds reaches the specified number in the DQN network training process; and if the number of training rounds reaches the specified number, outputting a result. The application provides a method for scheduling a computing task in a mobile edge computing system from the viewpoint of the execution sequence of the computing task, and the time for completing the computing task is shortened by utilizing the cooperation of a plurality of mobile edge nodes.
Description
Technical Field
The present application relates to the field of mobile communications technologies, and in particular, to a method and a system for scheduling a computation task of a mobile edge node based on federal learning.
Background
The long communication distance between the terminal user and the remote cloud is a limitation always faced by cloud computing, so that the terminal user cannot gradually adapt to novel mobile applications with increasingly strict requirements on delay. As cloud functionality increasingly migrates to the Edge of the network, Mobile Edge Computing (MEC) is seen as a potential solution. Utilizing the large amount of free computing resources and storage space distributed at the edge of the network, mobile devices located at the edge of the network may also be given the ability to handle computationally intensive and delay sensitive tasks.
However, the computing power of a single device is limited and cannot meet the performance requirements of some computing task-intensive services such as automatic driving, Virtual Reality (VR), Augmented Reality (AR), etc., and for this reason, the concept of mobile edge collaborative computing is introduced. On one hand, the computing pressure of a single node is reduced, and on the other hand, the cooperation among various edge devices provides more possibilities for improving the performance of the system. For example, a natural distributed structure of mobile edge computation is well matched with the idea of Federal Learning (FL), and the introduction of the federal Learning in a collaborative computing scene solves the huge communication overhead caused by the adoption of solution methods such as deep Learning to a certain extent, and meanwhile relieves the worry of users about privacy and data security.
In the cooperative computing, an efficient computing task dispatching and scheduling mechanism needs to be introduced to carry out overall management on the cooperative computing, so that the user experience is improved. Considering the problem of computation offloading only from the viewpoint of computing task dispatching is not sufficient for optimizing the execution process of the computing tasks, and performing reasonable planning on the execution sequence of the computing tasks queued at the edge end is also a key ring for shortening the average delay. The computing task scheduling policy needs to have the ability to adapt to dynamically changing external environments and differentiated service requirements. The reasonable and effective computing task scheduling strategy can fully exert the potential of the system, improve the execution efficiency of computing tasks and improve the service quality.
In this context, we combine the distributed features of the MEC with federal learning to provide a calculation task scheduling scheme based on federal learning for the scenario of multiple mobile edge nodes, aiming at the problem of calculation task scheduling of mobile edge nodes in the MEC system. According to the scheme, on the basis that Deep Q Network (DQN) training is independently carried out on a single mobile edge node, the user experience of a delay-sensitive computing task is further optimized through global parameter aggregation.
Although a solution is provided for the problems of insufficient resources of terminal equipment or strict requirements on computing task time delay based on a collaborative scheme for computing task offloading, the existing method does not consider the execution sequence of a single mobile edge node on a plurality of computing tasks to be processed, namely the problem of computing task scheduling, and neglects the influence of the computing task scheduling on computing service time delay. Meanwhile, the capability of the MEC system for performing information aggregation on a plurality of mobile edge nodes to realize more optimal computation task scheduling is not fully exerted.
Therefore, how to obtain a method for scheduling a computation task of a mobile edge node based on federal learning is an urgent problem to be solved by those skilled in the art.
Disclosure of Invention
The application provides a mobile edge node calculation task scheduling method based on federal learning, which realizes calculation task scheduling of a plurality of mobile edge nodes in an MEC system based on federal learning cooperation and shortens the completion delay of calculation tasks. Aiming at the scene that the mobile edge nodes process task requests generated by sensors in the coverage range of the mobile edge nodes, a reasonable task execution sequence is planned for a dynamically changed task queue, and the time delay of task completion calculation is reduced compared with the time delay of node independent training by aggregating the differential information of a plurality of mobile edge nodes.
A mobile edge node calculation task scheduling method based on federal learning specifically comprises the following steps: initializing information parameters; in response to completing initialization, locally training a DQN network deployed at each mobile edge node; judging whether the number of updating rounds meets the aggregation frequency or not in the DQN network training process; if the number of the updating rounds meets the aggregation frequency, carrying out global parameter aggregation and updating; in response to the completion of global parameter updating, judging whether the number of training rounds reaches the specified number in the DQN network training process; and if the number of training rounds reaches the specified number, outputting a result.
As above, in the process of initializing information parameters, setting parameters required by the system of each link, including sensor transmission power, communication bandwidth, mobile edge node computing power, wireless channel related parameters, experience pool capacity in the DQN network, mini Batch size, learning rate, discount factor, aggregation frequency, and the number of computing tasks performed per round, is also included.
As above, after the parameters required by the system are set, the mobile edge node, the local and global network parameters and the experience pool in the environment are initialized.
As above, the local training of the DQN network deployed at each mobile edge node specifically comprises the following sub-steps: initializing a sensor in a coverage range of a mobile edge node; updating the environmental status information in response to completing initialization of the sensor; after responding to the updated environment state information, performing action selection on the current state; executing the selected action, and acquiring experience generated by interaction with the environment; storing experiences generated by interaction with the environment as samples in an experience pool; judging whether the number of samples in the experience pool reaches a set value or not; if the set value is reached, randomly selecting a specified number of samples from the experience pool, and training the DQN network; judging whether the decision in the round is in a termination state or not in response to the completion of the DQN network training; if the training is in the termination state, the training round is finished.
As above, wherein initializing sensors within the coverage of the mobile edge node comprises obtaining sensor-generated computational tasks, wherein computational task gm(tv) Expressed as:
gm(tv)=<um,bm,cm>
wherein u ismRepresenting the generation time of the calculation task m, bmRepresenting the amount of communication data of the calculation task m, cmRepresenting computational tasks mAnd (5) calculating the quantity.
As above, among other things, the computational task g generated at the acquisition sensorm(tv) Then, t is addedvThe compute task queue managed by the moving edge node at that moment is modeled as a 3M matrix, the matrix g (t)v) The concrete expression is as follows:
wherein g is1(tv),…,gm(tv),…,gM(tv) Representing M computational tasks, u, generated by a sensor1…um…uMRepresenting the completion time of M computing tasks, b1…bm…nMRepresenting the amount of communication data of M calculation tasks, c1…cm…cMRepresenting the amount of computation of M computation tasks.
The method comprises the following steps that (1) the total time delay from generation to completion of processing of a computation task is composed of execution time delay and waiting time delay; in which the execution delay dem(tv) The concrete expression is as follows:
wherein cmm(tv) Indicating the communication time, cp, required to perform task mm(tv) Representing the computation time required to perform task m, B is the communication bandwidth, pmIs the sensor transmit power, hm(tv) Is the channel gain, r, obtained according to the current sensor and mobile edge node position relationship and channel statem(tv) Indicating the current data transmission rate, fcIs the computing power of the mobile edge node, bmRepresenting the amount of communication data of the computing task, cmRepresenting the amount of computation, σ, of the task2Representing the channel noise power.
As above, where latency refers to the residence time of the computing task in the queue from generation to the beginning of execution; for the computational task g processed in the v-th roundm(tv)=<um,bm,cm>Its waiting delay dwm(tv) The concrete expression is as follows:
dwm(tv)=tv-um
wherein u ismIndicating the generation time of the computing task, tvIndicating the time of day.
As above, wherein the delay d is accomplishedm(tv) The concrete expression is as follows:
wherein dem(tv) Indicating the execution delay, dw, of the execution of the computational taskm(tv) Representing the latency of the computational task, bmThe amount of communication data (bits), c, representing the calculation taskmRepresenting the amount of computation of the computation task, fcIs the computing power of the mobile edge node, umIndicating the generation time of the computing task, tvRepresents the time, rm(tv) Indicating the current data transmission rate.
A mobile edge node calculation task scheduling system based on federal learning specifically comprises: the device comprises an initialization unit, a training unit, a first judgment unit, a global updating unit, a second judgment unit and an output unit; an initialization unit for initializing information parameters; a training unit, configured to perform local training on the DQN network deployed at each mobile edge node; the first judgment unit is used for judging whether the updating round number meets the aggregation frequency in the DQN training process; the global updating unit is used for aggregating and updating global parameters if the number of updating rounds meets the aggregation frequency; the second judgment unit is used for judging whether the number of training rounds reaches the specified number of times in the DQN training process; and the output unit is used for outputting the result if the training round number reaches the set specified number.
The application has the following beneficial effects:
(1) the application provides a method for scheduling a computation task in an MEC system from the viewpoint of a computation task execution sequence, and the computation task completion time is shortened by utilizing the cooperation of a plurality of mobile edge nodes in the MEC system.
(2) The method and the device can flexibly adapt to dynamically changing environment states such as calculation task queue information, sensor positions and the like, can adjust the balance coefficient lambda in the instant reward function according to different requirements, and change the weights of waiting time delay and execution time delay to deal with different application scenes.
(3) According to the method, system performance is improved through global aggregation at a certain frequency by using federal learning, and compared with the method that each mobile edge node independently performs DQN training, the method has the advantage that a lower calculation task completion time delay is brought to each mobile edge node participating in the federal by an action selection strategy under global parameters.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a flowchart of a method for scheduling a computation task of a mobile edge node based on federal learning according to an embodiment of the present application;
fig. 2 is an internal structure diagram of a mobile edge node computation task scheduling system based on federal learning according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The main purpose of the present application is to design a calculation task scheduling policy for the MEC system, and introduce federal learning in a plurality of mobile edge node scenarios, so as to further optimize the system performance, so that each party of the federal (i.e. each edge node, one of which is a party of the federal learning) can obtain a lower delay than that of the independent training.
Scene assumption is as follows: in an MEC system, there are N moving edge nodes N ═ {1,2, L, N }, and M sensors M ═ 1,2, L, M }, exist in the coverage area of the moving edge nodes N. And the mobile edge node processes V computing tasks in each round, namely V rounds of decision making in each round. Wherein the V-round includes a V-round decision. Wherein the computing tasks generated by each sensor constitute a dynamically changing queue g (t)v)={g1(tv),L,gm(tv),L,gM(tv)},tvIndicating the decision time currently being the v-th round.
Example one
As shown in fig. 1, a method for scheduling a calculation task of a mobile edge node based on federal learning provided in the present application specifically includes the following steps:
step S110: information parameters are initialized.
Specifically, initializing specifically N mobile edge nodes in the environment and their network parametersGlobal network parametersAnd their corresponding experience pools.
Before initializing the information parameters, setting parameters required by each link system, including sensor transmitting power, communication bandwidth, mobile edge node computing power (CPU frequency), wireless channel related parameters, experience pool capacity in DQN, mini Batch size (namely the number of samples of each gradient descent training), learning rate, discount factor, aggregation frequency and the number of computing tasks executed in each round.
And after the parameters required by the system are set, initializing the mobile edge nodes, the local and global network parameters and the experience pool in the environment.
Step S120: in response to completing the initialization, the DQN network deployed at each mobile edge node is locally trained.
Specifically, the essence of local training of the DQN network deployed at each mobile edge node is that each mobile edge node performs DQN training that minimizes task completion latency according to the dynamic changes of the local compute task queue
Wherein the DQN training specifically is: and the intelligent agent deployed on the mobile edge node selects the action according to the state information of each wheel and the current network parameter by an epsilon-greedy strategy to obtain the instant reward of the action, and meanwhile, the calculation task queue in the environment is updated to the next state. The result of the interaction with the environment is stored in an experience pool, and after the capacity of the experience pool reaches a certain scale, the DQN network is subjected to gradient descent training to gradually converge to an optimal action selection strategy.
Wherein DQN training is multiple rounds of DQN training simultaneously at each moving edge node, the purpose of each round of training is to minimize the average task completion delay within a round, so the optimization objective for each round is expressed as:
wherein a isv.m={0,1},av.mThe decision to perform a computational task from sensor m, denoted 1 for the v-th round of moving edge nodes, av.mWhen 0 indicates not to execute the task, V indicates moveThe edge node processes V computing tasks in each round, namely, V round decisions are carried out in each round, tvIndicating the decision time, d, of the current v-th roundm(tv) Indicating the calculation task g in the v-th roundm(tv) Completion delay after being executed.
Based on the optimization objective provided above, step S120 specifically includes the following sub-steps:
step S1201: sensors within the coverage of the mobile edge node are initialized.
Since it is assumed that the mobile edge node processes V computation tasks in a total of one round, i.e. makes V decisions in a total of one round.
Wherein the node is at tvAnd determining which computing task in the processing queue of the round at any moment, wherein the computing task immediately generates the next computing task after being processed by the corresponding sensor, and other unprocessed computing tasks continue to be kept in the queue to wait for the next round of decision. Sensor generated computational task gm(tv) The description is as follows:
gm(tv)=<um,bm,cm>equation 2
Wherein u ismRepresenting the generation time of the computing task, bmThe amount of communication data (bits), c, representing the calculation taskmIndicating the amount of computation (number of CPU cycles required) for the computation task.
Thus, tvThe compute task queue managed by the mobile edge node at a time can be modeled as a 3M matrix, the matrix g (t)v) The concrete expression is as follows:
wherein g is1(tv),…,gm(tv),…,gM(tv) Representing M calculators generated by a sensorAffair u1…um…uMIndicating the completion time of M computing tasks, b1 … bM … bM indicating the amount of communication data for M computing tasks, c1…cm…cMRepresenting the amount of computation of M computation tasks.
Further, a completion delay from generation to completion of a computational task is comprised of an execution delay and a wait delay.
The execution time delay is related to the scale of the calculation task and is the sum of the communication time of the data collected by the mobile edge node from the corresponding sensor and the calculation time required for executing calculation, wherein the execution time delay dem(tv) The concrete expression is as follows:
wherein cmm(tv) Indicating the communication time, cp, required to perform task mm(tv) Representing the computation time required to perform task m, B is the communication bandwidth, pmIs the sensor transmit power, hm(tv) The channel gain is obtained according to the position relationship between the current sensor and the mobile edge node and the channel state, and the current data transmission rate r can be calculated by the Shannon formulam(tv),fcIs the computing power of the mobile edge node, bmThe amount of communication data (bits), c, representing the calculation taskmRepresenting the amount of computation, σ, of the task2Representing the channel noise power.
Latency refers to the residence time in the queue from generation to the beginning of the execution of the computing task. For the computational task g processed in the v-th roundm(tv)=<um,bm,cm>Its waiting delay dwm(tv) The concrete expression is as follows:
dwm(tv)=tv-umequation 5
Wherein u ismIndicating the generation time of the computing task, tvIndicating the time of day.
Therefore, the temperature of the molten metal is controlled,if computing task g in the v-th roundm(tv) Is executed, its completion is delayed by dm(tv) Can be expressed as:
wherein dem(tv) Indicating the execution delay, dw, of the execution of the computational taskm(tv) Representing the latency of the computational task, bmThe amount of communication data (bits), c, representing the calculation taskmRepresenting the amount of computation of the computation task, fcIs the computing power of the mobile edge node, umIndicating the generation time of the computing task, tvIndicating the time of day.
Step S1202: in response to completing initialization of the sensor, the environmental status information is updated.
Wherein the environment state information is computation task queue information and channel state information managed by the mobile edge node, wherein the computation task queue information and the channel state information are managed along with the time tvVariation, matrix g (t)v) And then updated accordingly.
Step S1203: and selecting the action of the current state by an epsilon-greedy strategy.
And the intelligent agent deployed on the mobile edge node performs action selection by an epsilon-greedy strategy according to the state information of each wheel and the current network parameters.
The current state refers to the current system state s (t) when the decision of the v-th round is madev) The action represents which calculation task in the task queue is processed by the current round of moving edge node, wherein the action space size is M, and the action variable a (t)v) And M, the M belongs to the M and indicates the current round of moving edge nodes to process the mth calculation task in the task queue.
In particular, the system state s (t)v) Respective execution times and waiting times, s (t), for the M computation tasks in the current queuev) The concrete expression is as follows:
s(tv)={de1(tv),…,deM(tv),dw1(tv),…dwM(tv) Equation 7
Wherein de1(tv) Indicating the time delay of execution of the first computational task, deM(tv) Indicating the execution delay, dw, for the execution of the Mth computational task1(tv) Indicating the latency, dw, of the 1 st computational taskM(tv) Representing the latency of the mth computing task.
The intelligent agent selects the action with the maximum Q value under the current state according to the probability of epsilon to execute during each round of decision; otherwise, it will choose randomly in the action space to strike a balance of exploration and utilization, called an epsilon-greedy strategy.
The probability value of epsilon is a preset fixed value and does not change along with the change of the iteration times of the decision. The Q value is generated from initialized random network parameters at the first iteration. In the remaining number of iterations, the Q value is the value directly output by the DQN network.
Step S1204: and executing the selected action and acquiring experience generated by interaction with the environment.
After the action selection is performed by the epsilon-greedy strategy in step S1203, the real-time reward of the action is obtained, and at the same time, the calculation task queue g (t) in the environment is usedv) Updating to the next state.
In particular, experience includes the current system state s (t)v) Currently performed action a (t)v) M, M ∈ M, the reward earned by the current action is executed and the state of the system at the next decision.
Current system state s (t)v) From equation 6, the instant prize r (t) obtained for the current actionv) The concrete expression is as follows:
where λ is a balance coefficient set according to different usage scenarios and requirements, dwm(tv) Representing the latency, dw, of the computational task mj(tv) Representing the latency of the computation task j,m represents the number of sensors (or the number of computational tasks), dem(tv) Representing the time delay of execution of the computing task m, dej(tv) Representing the execution latency of the computation task j.
In this embodiment, the design of the reward function jointly considers the execution delay and the waiting delay of the computation task, and if a certain computation task g in the current round of processing queue is selectedm(tv) Its waiting time dwm(tv) The timing is terminated and will not be extended, but other computing tasks in the queueWill necessarily at least extend dem(tv). The computing tasks with longer execution time can prolong the waiting time of other M-1 computing tasks in the whole queue for a longer time, so that the influence of the two computing tasks is simultaneously reflected in each round of instant reward to minimize the completion time delay of the computing tasks.
Step S1205: and storing the experience generated by the interaction with the environment as a sample to an experience pool.
Wherein, in each decision round, the experience generated by the interaction between the action and the environment is stored as a sample in an experience pool.
Further, after storing the experience pool, the method also comprises accumulating the instant rewards and the task time delay.
Where the immediate rewards and task delays for each round of decision are accumulated. The immediate reward for performing an action in each decision round can be obtained from equation 8. The task delay is the total delay of one calculation task in each decision round and is formed by formula 6.
Step S1206: and judging whether the number of samples in the experience pool reaches a set value.
If yes, go to step S1207, otherwise go to step S1202.
Step S1207: and randomly selecting a specified number of samples from the experience pool, and training the DQN network.
Wherein the selected specified number of samples is the preset mini-batch size.
Specifically, after the empirical pool capacity reaches a certain scale, the DQN network is subjected to gradient descent training according to a specified number of samples, so that the DQN network gradually converges to an optimal action selection strategy.
Step S1208: and judging whether the decision in the round is in a termination state.
Wherein the termination state refers to whether the value of v has reached a maximum.
If the training is in the termination state, the training round is ended, otherwise v +1, and steps S1202-S1208 are executed again.
Step S130: and judging whether the updating round number meets the aggregation frequency or not in the DQN network training process.
Wherein the polymerization frequency is a predetermined fixed value.
The v-round decision in step S120 can be regarded as updating round number, i.e. each round of decision is for g (t)v) An update is made.
When the v value reaches the preset aggregation frequency, that is, the number of update rounds satisfies the aggregation frequency, step S140 is executed. Otherwise, the step S120 is executed again until the v value reaches the preset aggregation frequency.
Step S140: and carrying out global parameter updating.
After each mobile edge node is updated for a certain turn, the global aggregation is started. And the global aggregation is that each mobile edge node uploads the DQN network parameters to a parameter server at the moment, and the parameter server updates global parameters.
The step S140 specifically includes the following sub-steps:
step S1401: the network parameters of the respective mobile edge node at the time are input.
Specifically, each mobile edge node uploads the DQN network parameters at the time deployed within each mobile edge node to the parameter server.
Step S1402: aggregating the network parameters and updating the global parameters.
The process of global aggregation is specifically represented as:
wherein,representing the network parameters of the mobile edge node n at the v-th turn,indicating the global parameter at this time and N indicating N mobile edge nodes.
Step S1403: and outputting the updated global parameters to each edge node.
And taking the updated global parameters as updated network parameters to continue local DQN training.
In this embodiment, by using distributed characteristics of mobile edge device deployment, federal learning is introduced to aggregate current network parameters of each mobile edge node at a certain frequency, a global network with better performance is aggregated according to experience obtained from heterogeneous local information of each participant, and a training process of each participant in the federal is promoted.
Step S150: and responding to the completion of global parameter updating, and judging whether the training round number reaches the specified number in the DQN network training process.
Specifically, wherein steps S1201-S1208 are performed in a training round, when the v value reaches a preset aggregation frequency, a global parameter is updated, and at this time, it is determined whether the number of training rounds reaches a set maximum value.
If the number of training rounds reaches the set maximum value, adding 1 to the number of training rounds, continuing to perform the next training round, namely executing steps S120-S140 again, and meanwhile, continuing to perform local DQN training by using the updated global parameters as the updated network parameters until the number of training rounds reaches the maximum value.
If the number of training rounds reaches the set maximum value, all training is finished, and step S160 is executed:
step S160: and outputting the result.
The output result is that the optimal action selection strategy and the calculation task completion time delay obtained by training at the moment are obtained.
The steps 120-S150 are iterated continuously until the set maximum number of training rounds is reached, so that the minimum computation task completion time delay under federate learning of a plurality of mobile edge nodes can be obtained.
Example two
As shown in fig. 2, the present application provides a federate learning-based mobile edge node computation task scheduling system, which specifically includes: initialization unit 210, training unit 220, first judgment unit 230, global update unit 240, second judgment unit 250, and output unit 260.
The initialization unit 210 is used to initialize information parameters.
The training unit 220 is connected to the initialization unit 210 for locally training the DQN network deployed at each mobile edge node.
The intelligent agent exists in the training unit 220, and can perform action selection by an epsilon-greedy strategy according to the state information of each wheel and the current network parameters, so as to obtain the instant reward of the action, and meanwhile, the calculation task queue in the environment is updated to the next state. The result of the interaction with the environment is stored in an experience pool, and after the capacity of the experience pool reaches a certain scale, the DQN network is subjected to gradient descent training to gradually converge to an optimal action selection strategy.
The first determining unit 230 is connected to the training unit 220, and is configured to determine whether the number of update rounds meets the aggregation frequency in the DQN training process.
The global updating unit 240 is connected to the first determining unit 230, and configured to perform global parameter updating if the number of updating rounds satisfies the aggregation frequency.
Wherein the global updating unit 240 performs global parameter updating as a parameter server.
The second determining unit 250 is connected to the global updating unit 240, and is configured to determine whether the number of training rounds reaches a specified number of times in the DQN training process.
The output unit 260 is connected to the second determining unit 240, and is configured to output the result if the number of training rounds reaches the set specified number of times.
The application has the following beneficial effects:
(4) the application provides a method for scheduling a computation task in an MEC system from the viewpoint of a computation task execution sequence, and the computation task completion time is shortened by utilizing the cooperation of a plurality of mobile edge nodes in the MEC system.
(5) The method and the device can flexibly adapt to dynamically changing environment states such as calculation task queue information, sensor positions and the like, can adjust the balance coefficient lambda in the instant reward function according to different requirements, and change the weights of waiting time delay and execution time delay to deal with different application scenes.
(6) According to the method, system performance is improved through global aggregation at a certain frequency by using federal learning, and compared with the method that each mobile edge node independently performs DQN training, the method has the advantage that a lower calculation task completion time delay is brought to each mobile edge node participating in the federal by an action selection strategy under global parameters.
Although the present application has been described with reference to examples, which are intended to be illustrative only and not to be limiting of the application, changes, additions and/or deletions may be made to the embodiments without departing from the scope of the application.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A mobile edge node calculation task scheduling method based on federal learning is characterized by specifically comprising the following steps:
initializing information parameters;
in response to completing initialization, locally training a DQN network deployed at each mobile edge node;
judging whether the number of updating rounds meets the aggregation frequency or not in the DQN network training process;
if the number of the updating rounds meets the aggregation frequency, carrying out global parameter aggregation and updating;
in response to the completion of global parameter updating, judging whether the number of training rounds reaches the specified number in the DQN network training process;
and if the number of training rounds reaches the specified number, outputting a result.
2. The method of claim 1, wherein in initializing information parameters, the method further comprises setting parameters required by each link system, including sensor transmit power, communication bandwidth, mobile edge node computing power, radio channel related parameters, experience pool capacity in DQN network, mini Batch size, learning rate, discount factor, aggregation frequency, number of computing tasks performed per round.
3. The method according to claim 2, wherein the initialization of the mobile edge nodes, local and global network parameters, and experience pools in the environment is performed in response to the system required parameters being set.
4. The method of claim 2, wherein the local training of the DQN network deployed at each mobile edge node specifically comprises the sub-steps of:
initializing a sensor in a coverage range of a mobile edge node;
updating the environmental status information in response to completing initialization of the sensor;
after responding to the updated environment state information, performing action selection on the current state;
executing the selected action, and acquiring experience generated by interaction with the environment;
storing experiences generated by interaction with the environment as samples in an experience pool;
judging whether the number of samples in the experience pool reaches a set value or not;
if the set value is reached, randomly selecting a specified number of samples from the experience pool, and training the DQN network;
judging whether the decision in the round is in a termination state or not in response to the completion of the DQN network training;
if the training is in the termination state, the training round is finished.
5. The method of claim 4, wherein initializing sensors within a coverage area of the mobile edge node comprises obtaining sensor-generated computation tasks, wherein computation task g is a task of computationm(tv) Expressed as:
gm(tv)=<um,bm,cm>
wherein u ismRepresenting the generation time of the calculation task m, bmRepresenting the amount of communication data of the calculation task m, cmRepresenting the amount of computation of the computation task m.
6. The method of claim 5, wherein the computation task g is generated at a sensorm(tv) Then, t is addedvThe compute task queue managed by the moving edge node at that moment is modeled as a 3M matrix, the matrix g (t)v) The concrete expression is as follows:
wherein g is1(tv),…,gm(tv),…,gM(tv) Representing M computational tasks, u, generated by a sensor1…um…uMRepresenting the completion time of M computing tasks, b1…bm…bMRepresenting the amount of communication data of M calculation tasks, c1…cm…cMRepresenting the amount of computation of M computation tasks.
7. The method of claim 6 wherein the total latency from generation to completion of a computation task is comprised of execution latency and wait latency;
in which the execution delay dem(tv) The concrete expression is as follows:
wherein cmm(tv) Indicating the communication time, cp, required to perform task mm(tv) Representing the computation time required to perform task m, B is the communication bandwidth, pmIs the sensor transmit power, hm(tv) Is the channel gain, r, obtained according to the current sensor and mobile edge node position relationship and channel statem(tv) Indicating the current data transmission rate, fcIs the computing power of the mobile edge node, bmThe amount of communication data (bits), c, representing the calculation taskmRepresenting the amount of computation, σ, of the task2Representing the channel noise power.
8. The method of claim 7, wherein latency refers to a residence time of the computation task in a queue from generation to start of execution; for the computational task g processed in the v-th roundm(tv)=<um,bm,cm>Its waiting delay dwm(tv) The concrete expression is as follows:
dwm(tv)=tv-um
wherein u ismIndicating the generation time of the computing task, tvIndicating the time of day.
9. The method of claim 8, wherein the completion delay d is a time delay for scheduling computational tasks for the mobile edge nodem(tv) The concrete expression is as follows:
wherein dem(tv) Indicating the execution delay, dw, of the execution of the computational taskm(tv) Representing the latency of the computational task, bmRepresenting the amount of communication data of the computing task, cmRepresenting the amount of computation of the computation task, fcIs the computing power of the mobile edge node, umIndicating the generation time of the computing task, tvRepresents the time, rm(tv) Indicating the current data transmission rate.
10. A mobile edge node calculation task scheduling system based on federal learning is characterized by specifically comprising: the device comprises an initialization unit, a training unit, a first judgment unit, a global updating unit, a second judgment unit and an output unit;
an initialization unit for initializing information parameters;
a training unit, configured to perform local training on the DQN network deployed at each mobile edge node;
the first judgment unit is used for judging whether the updating round number meets the aggregation frequency in the DQN training process;
the global updating unit is used for aggregating and updating global parameters if the number of updating rounds meets the aggregation frequency;
the second judgment unit is used for judging whether the number of training rounds reaches the specified number of times in the DQN training process;
and the output unit is used for outputting the result if the training round number reaches the set specified number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111478407.1A CN114168328B (en) | 2021-12-06 | 2021-12-06 | Mobile edge node calculation task scheduling method and system based on federal learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111478407.1A CN114168328B (en) | 2021-12-06 | 2021-12-06 | Mobile edge node calculation task scheduling method and system based on federal learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114168328A true CN114168328A (en) | 2022-03-11 |
CN114168328B CN114168328B (en) | 2024-09-10 |
Family
ID=80483585
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111478407.1A Active CN114168328B (en) | 2021-12-06 | 2021-12-06 | Mobile edge node calculation task scheduling method and system based on federal learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114168328B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114938372A (en) * | 2022-05-20 | 2022-08-23 | 天津大学 | Federal learning-based micro-grid group request dynamic migration scheduling method and device |
CN115357402A (en) * | 2022-10-20 | 2022-11-18 | 北京理工大学 | Intelligent edge optimization method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10666342B1 (en) * | 2019-05-01 | 2020-05-26 | Qualcomm Incorporated | Beam management using adaptive learning |
CN111339554A (en) * | 2020-02-17 | 2020-06-26 | 电子科技大学 | User data privacy protection method based on mobile edge calculation |
WO2021155671A1 (en) * | 2020-08-24 | 2021-08-12 | 平安科技(深圳)有限公司 | High-latency network environment robust federated learning training method and apparatus, computer device, and storage medium |
WO2021169577A1 (en) * | 2020-02-27 | 2021-09-02 | 山东大学 | Wireless service traffic prediction method based on weighted federated learning |
CN113467952A (en) * | 2021-07-15 | 2021-10-01 | 北京邮电大学 | Distributed federated learning collaborative computing method and system |
-
2021
- 2021-12-06 CN CN202111478407.1A patent/CN114168328B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10666342B1 (en) * | 2019-05-01 | 2020-05-26 | Qualcomm Incorporated | Beam management using adaptive learning |
CN111339554A (en) * | 2020-02-17 | 2020-06-26 | 电子科技大学 | User data privacy protection method based on mobile edge calculation |
WO2021169577A1 (en) * | 2020-02-27 | 2021-09-02 | 山东大学 | Wireless service traffic prediction method based on weighted federated learning |
WO2021155671A1 (en) * | 2020-08-24 | 2021-08-12 | 平安科技(深圳)有限公司 | High-latency network environment robust federated learning training method and apparatus, computer device, and storage medium |
CN113467952A (en) * | 2021-07-15 | 2021-10-01 | 北京邮电大学 | Distributed federated learning collaborative computing method and system |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114938372A (en) * | 2022-05-20 | 2022-08-23 | 天津大学 | Federal learning-based micro-grid group request dynamic migration scheduling method and device |
CN114938372B (en) * | 2022-05-20 | 2023-04-18 | 天津大学 | Federal learning-based micro-grid group request dynamic migration scheduling method and device |
CN115357402A (en) * | 2022-10-20 | 2022-11-18 | 北京理工大学 | Intelligent edge optimization method and device |
CN115357402B (en) * | 2022-10-20 | 2023-01-24 | 北京理工大学 | Intelligent edge optimization method and device |
Also Published As
Publication number | Publication date |
---|---|
CN114168328B (en) | 2024-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111835827B (en) | Internet of things edge computing task unloading method and system | |
JP6942397B2 (en) | How to develop a singletasking offload strategy in a mobile edge computing scenario | |
CN109947545B (en) | Task unloading and migration decision method based on user mobility | |
CN114168328A (en) | Mobile edge node calculation task scheduling method and system based on federal learning | |
Heydari et al. | Dynamic task offloading in multi-agent mobile edge computing networks | |
CN111401744B (en) | Dynamic task unloading method in uncertainty environment in mobile edge calculation | |
CN108566242B (en) | Spatial information network resource scheduling system for remote sensing data transmission service | |
CN113946423B (en) | Multi-task edge computing, scheduling and optimizing method based on graph attention network | |
CN113867843A (en) | Mobile edge computing task unloading method based on deep reinforcement learning | |
CN112596910B (en) | Cloud computing resource scheduling method in multi-user MEC system | |
CN114490057A (en) | MEC unloaded task resource allocation method based on deep reinforcement learning | |
CN112799823A (en) | Online dispatching and scheduling method and system for edge computing tasks | |
CN116489708B (en) | Meta universe oriented cloud edge end collaborative mobile edge computing task unloading method | |
CN116886703A (en) | Cloud edge end cooperative computing unloading method based on priority and reinforcement learning | |
CN113573363A (en) | MEC calculation unloading and resource allocation method based on deep reinforcement learning | |
CN115408072A (en) | Rapid adaptation model construction method based on deep reinforcement learning and related device | |
CN117749796A (en) | Cloud edge computing power network system calculation unloading method and system | |
CN117202264A (en) | 5G network slice oriented computing and unloading method in MEC environment | |
CN114615705B (en) | Single-user resource allocation strategy method based on 5G network | |
CN116431326A (en) | Multi-user dependency task unloading method based on edge calculation and deep reinforcement learning | |
CN115756873A (en) | Mobile edge computing unloading method and platform based on federal reinforcement learning | |
CN109874154A (en) | A kind of C-RAN user-association and computational resource allocation method based on deeply study | |
CN114706673A (en) | Task allocation method considering task delay and server cost in mobile edge computing network | |
CN117891532B (en) | Terminal energy efficiency optimization unloading method based on attention multi-index sorting | |
Hlophe et al. | Prospect-theoretic DRL Approach for Container Provisioning in Energy-constrained Edge Platforms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |