CN114168328A

CN114168328A - Mobile edge node calculation task scheduling method and system based on federal learning

Info

Publication number: CN114168328A
Application number: CN202111478407.1A
Authority: CN
Inventors: 张天魁; 石天祎; 许文俊
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-03-11
Anticipated expiration: 2041-12-06
Also published as: CN114168328B

Abstract

The application discloses a mobile edge node calculation task scheduling method based on federal learning and a system thereof, wherein the mobile edge node calculation task scheduling method based on federal learning specifically comprises the following steps: initializing information parameters; in response to completing initialization, locally training a DQN network deployed at each mobile edge node; judging whether the number of updating rounds meets the aggregation frequency or not in the DQN network training process; if the number of the updating rounds meets the aggregation frequency, updating global parameters; in response to the completion of global parameter updating, judging whether the number of training rounds reaches the specified number in the DQN network training process; and if the number of training rounds reaches the specified number, outputting a result. The application provides a method for scheduling a computing task in a mobile edge computing system from the viewpoint of the execution sequence of the computing task, and the time for completing the computing task is shortened by utilizing the cooperation of a plurality of mobile edge nodes.

Description

Mobile edge node calculation task scheduling method and system based on federal learning

Technical Field

The present application relates to the field of mobile communications technologies, and in particular, to a method and a system for scheduling a computation task of a mobile edge node based on federal learning.

Background

The long communication distance between the terminal user and the remote cloud is a limitation always faced by cloud computing, so that the terminal user cannot gradually adapt to novel mobile applications with increasingly strict requirements on delay. As cloud functionality increasingly migrates to the Edge of the network, Mobile Edge Computing (MEC) is seen as a potential solution. Utilizing the large amount of free computing resources and storage space distributed at the edge of the network, mobile devices located at the edge of the network may also be given the ability to handle computationally intensive and delay sensitive tasks.

However, the computing power of a single device is limited and cannot meet the performance requirements of some computing task-intensive services such as automatic driving, Virtual Reality (VR), Augmented Reality (AR), etc., and for this reason, the concept of mobile edge collaborative computing is introduced. On one hand, the computing pressure of a single node is reduced, and on the other hand, the cooperation among various edge devices provides more possibilities for improving the performance of the system. For example, a natural distributed structure of mobile edge computation is well matched with the idea of Federal Learning (FL), and the introduction of the federal Learning in a collaborative computing scene solves the huge communication overhead caused by the adoption of solution methods such as deep Learning to a certain extent, and meanwhile relieves the worry of users about privacy and data security.

In the cooperative computing, an efficient computing task dispatching and scheduling mechanism needs to be introduced to carry out overall management on the cooperative computing, so that the user experience is improved. Considering the problem of computation offloading only from the viewpoint of computing task dispatching is not sufficient for optimizing the execution process of the computing tasks, and performing reasonable planning on the execution sequence of the computing tasks queued at the edge end is also a key ring for shortening the average delay. The computing task scheduling policy needs to have the ability to adapt to dynamically changing external environments and differentiated service requirements. The reasonable and effective computing task scheduling strategy can fully exert the potential of the system, improve the execution efficiency of computing tasks and improve the service quality.

In this context, we combine the distributed features of the MEC with federal learning to provide a calculation task scheduling scheme based on federal learning for the scenario of multiple mobile edge nodes, aiming at the problem of calculation task scheduling of mobile edge nodes in the MEC system. According to the scheme, on the basis that Deep Q Network (DQN) training is independently carried out on a single mobile edge node, the user experience of a delay-sensitive computing task is further optimized through global parameter aggregation.

Although a solution is provided for the problems of insufficient resources of terminal equipment or strict requirements on computing task time delay based on a collaborative scheme for computing task offloading, the existing method does not consider the execution sequence of a single mobile edge node on a plurality of computing tasks to be processed, namely the problem of computing task scheduling, and neglects the influence of the computing task scheduling on computing service time delay. Meanwhile, the capability of the MEC system for performing information aggregation on a plurality of mobile edge nodes to realize more optimal computation task scheduling is not fully exerted.

Therefore, how to obtain a method for scheduling a computation task of a mobile edge node based on federal learning is an urgent problem to be solved by those skilled in the art.

Disclosure of Invention

The application provides a mobile edge node calculation task scheduling method based on federal learning, which realizes calculation task scheduling of a plurality of mobile edge nodes in an MEC system based on federal learning cooperation and shortens the completion delay of calculation tasks. Aiming at the scene that the mobile edge nodes process task requests generated by sensors in the coverage range of the mobile edge nodes, a reasonable task execution sequence is planned for a dynamically changed task queue, and the time delay of task completion calculation is reduced compared with the time delay of node independent training by aggregating the differential information of a plurality of mobile edge nodes.

A mobile edge node calculation task scheduling method based on federal learning specifically comprises the following steps: initializing information parameters; in response to completing initialization, locally training a DQN network deployed at each mobile edge node; judging whether the number of updating rounds meets the aggregation frequency or not in the DQN network training process; if the number of the updating rounds meets the aggregation frequency, carrying out global parameter aggregation and updating; in response to the completion of global parameter updating, judging whether the number of training rounds reaches the specified number in the DQN network training process; and if the number of training rounds reaches the specified number, outputting a result.

As above, in the process of initializing information parameters, setting parameters required by the system of each link, including sensor transmission power, communication bandwidth, mobile edge node computing power, wireless channel related parameters, experience pool capacity in the DQN network, mini Batch size, learning rate, discount factor, aggregation frequency, and the number of computing tasks performed per round, is also included.

As above, after the parameters required by the system are set, the mobile edge node, the local and global network parameters and the experience pool in the environment are initialized.

As above, the local training of the DQN network deployed at each mobile edge node specifically comprises the following sub-steps: initializing a sensor in a coverage range of a mobile edge node; updating the environmental status information in response to completing initialization of the sensor; after responding to the updated environment state information, performing action selection on the current state; executing the selected action, and acquiring experience generated by interaction with the environment; storing experiences generated by interaction with the environment as samples in an experience pool; judging whether the number of samples in the experience pool reaches a set value or not; if the set value is reached, randomly selecting a specified number of samples from the experience pool, and training the DQN network; judging whether the decision in the round is in a termination state or not in response to the completion of the DQN network training; if the training is in the termination state, the training round is finished.

As above, wherein initializing sensors within the coverage of the mobile edge node comprises obtaining sensor-generated computational tasks, wherein computational task g^m(t_v) Expressed as:

g^m(t_v)＝<u^m,b^m,c^m>

wherein u is^mRepresenting the generation time of the calculation task m, b^mRepresenting the amount of communication data of the calculation task m, c^mRepresenting computational tasks mAnd (5) calculating the quantity.

As above, among other things, the computational task g generated at the acquisition sensor^m(t_v) Then, t is added_vThe compute task queue managed by the moving edge node at that moment is modeled as a 3M matrix, the matrix g (t)_v) The concrete expression is as follows:

wherein g is¹(t_v)，…，g^m(t_v)，…，g^M(t_v) Representing M computational tasks, u, generated by a sensor¹…u^m…u^MRepresenting the completion time of M computing tasks, b¹…b^m…n^MRepresenting the amount of communication data of M calculation tasks, c¹…c^m…c^MRepresenting the amount of computation of M computation tasks.

The method comprises the following steps that (1) the total time delay from generation to completion of processing of a computation task is composed of execution time delay and waiting time delay; in which the execution delay de_m(t_v) The concrete expression is as follows:

wherein cm_m(t_v) Indicating the communication time, cp, required to perform task m_m(t_v) Representing the computation time required to perform task m, B is the communication bandwidth, p_mIs the sensor transmit power, h_m(t_v) Is the channel gain, r, obtained according to the current sensor and mobile edge node position relationship and channel state_m(t_v) Indicating the current data transmission rate, f_cIs the computing power of the mobile edge node, b^mRepresenting the amount of communication data of the computing task, c^mRepresenting the amount of computation, σ, of the task²Representing the channel noise power.

As above, where latency refers to the residence time of the computing task in the queue from generation to the beginning of execution; for the computational task g processed in the v-th round^m(t_v)＝<u^m,b^m,c_m>Its waiting delay dw_m(t_v) The concrete expression is as follows:

dw_m(t_v)＝t_v-u^m

wherein u is^mIndicating the generation time of the computing task, t_vIndicating the time of day.

As above, wherein the delay d is accomplished_m(t_v) The concrete expression is as follows:

wherein de_m(t_v) Indicating the execution delay, dw, of the execution of the computational task_m(t_v) Representing the latency of the computational task, b^mThe amount of communication data (bits), c, representing the calculation task^mRepresenting the amount of computation of the computation task, f_cIs the computing power of the mobile edge node, u^mIndicating the generation time of the computing task, t_vRepresents the time, r_m(t_v) Indicating the current data transmission rate.

A mobile edge node calculation task scheduling system based on federal learning specifically comprises: the device comprises an initialization unit, a training unit, a first judgment unit, a global updating unit, a second judgment unit and an output unit; an initialization unit for initializing information parameters; a training unit, configured to perform local training on the DQN network deployed at each mobile edge node; the first judgment unit is used for judging whether the updating round number meets the aggregation frequency in the DQN training process; the global updating unit is used for aggregating and updating global parameters if the number of updating rounds meets the aggregation frequency; the second judgment unit is used for judging whether the number of training rounds reaches the specified number of times in the DQN training process; and the output unit is used for outputting the result if the training round number reaches the set specified number.

The application has the following beneficial effects:

(1) the application provides a method for scheduling a computation task in an MEC system from the viewpoint of a computation task execution sequence, and the computation task completion time is shortened by utilizing the cooperation of a plurality of mobile edge nodes in the MEC system.

(2) The method and the device can flexibly adapt to dynamically changing environment states such as calculation task queue information, sensor positions and the like, can adjust the balance coefficient lambda in the instant reward function according to different requirements, and change the weights of waiting time delay and execution time delay to deal with different application scenes.

(3) According to the method, system performance is improved through global aggregation at a certain frequency by using federal learning, and compared with the method that each mobile edge node independently performs DQN training, the method has the advantage that a lower calculation task completion time delay is brought to each mobile edge node participating in the federal by an action selection strategy under global parameters.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a flowchart of a method for scheduling a computation task of a mobile edge node based on federal learning according to an embodiment of the present application;

fig. 2 is an internal structure diagram of a mobile edge node computation task scheduling system based on federal learning according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The main purpose of the present application is to design a calculation task scheduling policy for the MEC system, and introduce federal learning in a plurality of mobile edge node scenarios, so as to further optimize the system performance, so that each party of the federal (i.e. each edge node, one of which is a party of the federal learning) can obtain a lower delay than that of the independent training.

Scene assumption is as follows: in an MEC system, there are N moving edge nodes N ═ {1,2, L, N }, and M sensors M ═ 1,2, L, M }, exist in the coverage area of the moving edge nodes N. And the mobile edge node processes V computing tasks in each round, namely V rounds of decision making in each round. Wherein the V-round includes a V-round decision. Wherein the computing tasks generated by each sensor constitute a dynamically changing queue g (t)_v)＝{g¹(t_v),L,g^m(t_v),L,g^M(t_v)}，t_vIndicating the decision time currently being the v-th round.

Example one

As shown in fig. 1, a method for scheduling a calculation task of a mobile edge node based on federal learning provided in the present application specifically includes the following steps:

step S110: information parameters are initialized.

Specifically, initializing specifically N mobile edge nodes in the environment and their network parameters

Global network parameters

And their corresponding experience pools.

Before initializing the information parameters, setting parameters required by each link system, including sensor transmitting power, communication bandwidth, mobile edge node computing power (CPU frequency), wireless channel related parameters, experience pool capacity in DQN, mini Batch size (namely the number of samples of each gradient descent training), learning rate, discount factor, aggregation frequency and the number of computing tasks executed in each round.

And after the parameters required by the system are set, initializing the mobile edge nodes, the local and global network parameters and the experience pool in the environment.

Step S120: in response to completing the initialization, the DQN network deployed at each mobile edge node is locally trained.

Specifically, the essence of local training of the DQN network deployed at each mobile edge node is that each mobile edge node performs DQN training that minimizes task completion latency according to the dynamic changes of the local compute task queue

Wherein the DQN training specifically is: and the intelligent agent deployed on the mobile edge node selects the action according to the state information of each wheel and the current network parameter by an epsilon-greedy strategy to obtain the instant reward of the action, and meanwhile, the calculation task queue in the environment is updated to the next state. The result of the interaction with the environment is stored in an experience pool, and after the capacity of the experience pool reaches a certain scale, the DQN network is subjected to gradient descent training to gradually converge to an optimal action selection strategy.

Wherein DQN training is multiple rounds of DQN training simultaneously at each moving edge node, the purpose of each round of training is to minimize the average task completion delay within a round, so the optimization objective for each round is expressed as:

wherein a is_v.m＝{0,1}，a_v.mThe decision to perform a computational task from sensor m, denoted 1 for the v-th round of moving edge nodes, a_v.mWhen 0 indicates not to execute the task, V indicates moveThe edge node processes V computing tasks in each round, namely, V round decisions are carried out in each round, t_vIndicating the decision time, d, of the current v-th round_m(t_v) Indicating the calculation task g in the v-th round^m(t_v) Completion delay after being executed.

Based on the optimization objective provided above, step S120 specifically includes the following sub-steps:

step S1201: sensors within the coverage of the mobile edge node are initialized.

Since it is assumed that the mobile edge node processes V computation tasks in a total of one round, i.e. makes V decisions in a total of one round.

Wherein the node is at t_vAnd determining which computing task in the processing queue of the round at any moment, wherein the computing task immediately generates the next computing task after being processed by the corresponding sensor, and other unprocessed computing tasks continue to be kept in the queue to wait for the next round of decision. Sensor generated computational task g^m(t_v) The description is as follows:

g^m(t_v)＝<u^m,b^m,c^m>equation 2

Wherein u is^mRepresenting the generation time of the computing task, b^mThe amount of communication data (bits), c, representing the calculation task^mIndicating the amount of computation (number of CPU cycles required) for the computation task.

Thus, t_vThe compute task queue managed by the mobile edge node at a time can be modeled as a 3M matrix, the matrix g (t)_v) The concrete expression is as follows:

wherein g is¹(t_v)，…，g^m(t_v)，…，g^M(t_v) Representing M calculators generated by a sensorAffair u¹…u^m…u^MIndicating the completion time of M computing tasks, b1 … bM … bM indicating the amount of communication data for M computing tasks, c¹…c^m…c^MRepresenting the amount of computation of M computation tasks.

Further, a completion delay from generation to completion of a computational task is comprised of an execution delay and a wait delay.

The execution time delay is related to the scale of the calculation task and is the sum of the communication time of the data collected by the mobile edge node from the corresponding sensor and the calculation time required for executing calculation, wherein the execution time delay de_m(t_v) The concrete expression is as follows:

wherein cm_m(t_v) Indicating the communication time, cp, required to perform task m_m(t_v) Representing the computation time required to perform task m, B is the communication bandwidth, p_mIs the sensor transmit power, h_m(t_v) The channel gain is obtained according to the position relationship between the current sensor and the mobile edge node and the channel state, and the current data transmission rate r can be calculated by the Shannon formula_m(t_v)，f_cIs the computing power of the mobile edge node, b^mThe amount of communication data (bits), c, representing the calculation task^mRepresenting the amount of computation, σ, of the task²Representing the channel noise power.

Latency refers to the residence time in the queue from generation to the beginning of the execution of the computing task. For the computational task g processed in the v-th round^m(t_v)＝<u^m,b^m,c^m>Its waiting delay dw_m(t_v) The concrete expression is as follows:

dw_m(t_v)＝t_v-u^mequation 5

Therefore, the temperature of the molten metal is controlled,if computing task g in the v-th round^m(t_v) Is executed, its completion is delayed by d_m(t_v) Can be expressed as:

wherein de_m(t_v) Indicating the execution delay, dw, of the execution of the computational task_m(t_v) Representing the latency of the computational task, b^mThe amount of communication data (bits), c, representing the calculation task^mRepresenting the amount of computation of the computation task, f_cIs the computing power of the mobile edge node, u^mIndicating the generation time of the computing task, t_vIndicating the time of day.

Step S1202: in response to completing initialization of the sensor, the environmental status information is updated.

Wherein the environment state information is computation task queue information and channel state information managed by the mobile edge node, wherein the computation task queue information and the channel state information are managed along with the time t_vVariation, matrix g (t)_v) And then updated accordingly.

Step S1203: and selecting the action of the current state by an epsilon-greedy strategy.

And the intelligent agent deployed on the mobile edge node performs action selection by an epsilon-greedy strategy according to the state information of each wheel and the current network parameters.

The current state refers to the current system state s (t) when the decision of the v-th round is made_v) The action represents which calculation task in the task queue is processed by the current round of moving edge node, wherein the action space size is M, and the action variable a (t)_v) And M, the M belongs to the M and indicates the current round of moving edge nodes to process the mth calculation task in the task queue.

In particular, the system state s (t)_v) Respective execution times and waiting times, s (t), for the M computation tasks in the current queue_v) The concrete expression is as follows:

s(t_v)＝{de₁(t_v)，…，de_M(t_v)，dw₁(t_v)，…dw_M(t_v) Equation 7

Wherein de₁(t_v) Indicating the time delay of execution of the first computational task, de_M(t_v) Indicating the execution delay, dw, for the execution of the Mth computational task₁(t_v) Indicating the latency, dw, of the 1 st computational task_M(t_v) Representing the latency of the mth computing task.

The intelligent agent selects the action with the maximum Q value under the current state according to the probability of epsilon to execute during each round of decision; otherwise, it will choose randomly in the action space to strike a balance of exploration and utilization, called an epsilon-greedy strategy.

The probability value of epsilon is a preset fixed value and does not change along with the change of the iteration times of the decision. The Q value is generated from initialized random network parameters at the first iteration. In the remaining number of iterations, the Q value is the value directly output by the DQN network.

Step S1204: and executing the selected action and acquiring experience generated by interaction with the environment.

After the action selection is performed by the epsilon-greedy strategy in step S1203, the real-time reward of the action is obtained, and at the same time, the calculation task queue g (t) in the environment is used_v) Updating to the next state.

In particular, experience includes the current system state s (t)_v) Currently performed action a (t)_v) M, M ∈ M, the reward earned by the current action is executed and the state of the system at the next decision.

Current system state s (t)_v) From equation 6, the instant prize r (t) obtained for the current action_v) The concrete expression is as follows:

where λ is a balance coefficient set according to different usage scenarios and requirements, dw_m(t_v) Representing the latency, dw, of the computational task m_j(t_v) Representing the latency of the computation task j,m represents the number of sensors (or the number of computational tasks), de_m(t_v) Representing the time delay of execution of the computing task m, de_j(t_v) Representing the execution latency of the computation task j.

In this embodiment, the design of the reward function jointly considers the execution delay and the waiting delay of the computation task, and if a certain computation task g in the current round of processing queue is selected^m(t_v) Its waiting time dw_m(t_v) The timing is terminated and will not be extended, but other computing tasks in the queue

Will necessarily at least extend de_m(t_v). The computing tasks with longer execution time can prolong the waiting time of other M-1 computing tasks in the whole queue for a longer time, so that the influence of the two computing tasks is simultaneously reflected in each round of instant reward to minimize the completion time delay of the computing tasks.

Step S1205: and storing the experience generated by the interaction with the environment as a sample to an experience pool.

Wherein, in each decision round, the experience generated by the interaction between the action and the environment is stored as a sample in an experience pool.

Further, after storing the experience pool, the method also comprises accumulating the instant rewards and the task time delay.

Where the immediate rewards and task delays for each round of decision are accumulated. The immediate reward for performing an action in each decision round can be obtained from equation 8. The task delay is the total delay of one calculation task in each decision round and is formed by formula 6.

Step S1206: and judging whether the number of samples in the experience pool reaches a set value.

If yes, go to step S1207, otherwise go to step S1202.

Step S1207: and randomly selecting a specified number of samples from the experience pool, and training the DQN network.

Wherein the selected specified number of samples is the preset mini-batch size.

Specifically, after the empirical pool capacity reaches a certain scale, the DQN network is subjected to gradient descent training according to a specified number of samples, so that the DQN network gradually converges to an optimal action selection strategy.

Step S1208: and judging whether the decision in the round is in a termination state.

Wherein the termination state refers to whether the value of v has reached a maximum.

If the training is in the termination state, the training round is ended, otherwise v +1, and steps S1202-S1208 are executed again.

Step S130: and judging whether the updating round number meets the aggregation frequency or not in the DQN network training process.

Wherein the polymerization frequency is a predetermined fixed value.

The v-round decision in step S120 can be regarded as updating round number, i.e. each round of decision is for g (t)_v) An update is made.

When the v value reaches the preset aggregation frequency, that is, the number of update rounds satisfies the aggregation frequency, step S140 is executed. Otherwise, the step S120 is executed again until the v value reaches the preset aggregation frequency.

Step S140: and carrying out global parameter updating.

After each mobile edge node is updated for a certain turn, the global aggregation is started. And the global aggregation is that each mobile edge node uploads the DQN network parameters to a parameter server at the moment, and the parameter server updates global parameters.

The step S140 specifically includes the following sub-steps:

step S1401: the network parameters of the respective mobile edge node at the time are input.

Specifically, each mobile edge node uploads the DQN network parameters at the time deployed within each mobile edge node to the parameter server.

Step S1402: aggregating the network parameters and updating the global parameters.

The process of global aggregation is specifically represented as:

wherein,

representing the network parameters of the mobile edge node n at the v-th turn,

indicating the global parameter at this time and N indicating N mobile edge nodes.

Step S1403: and outputting the updated global parameters to each edge node.

And taking the updated global parameters as updated network parameters to continue local DQN training.

In this embodiment, by using distributed characteristics of mobile edge device deployment, federal learning is introduced to aggregate current network parameters of each mobile edge node at a certain frequency, a global network with better performance is aggregated according to experience obtained from heterogeneous local information of each participant, and a training process of each participant in the federal is promoted.

Step S150: and responding to the completion of global parameter updating, and judging whether the training round number reaches the specified number in the DQN network training process.

Specifically, wherein steps S1201-S1208 are performed in a training round, when the v value reaches a preset aggregation frequency, a global parameter is updated, and at this time, it is determined whether the number of training rounds reaches a set maximum value.

If the number of training rounds reaches the set maximum value, adding 1 to the number of training rounds, continuing to perform the next training round, namely executing steps S120-S140 again, and meanwhile, continuing to perform local DQN training by using the updated global parameters as the updated network parameters until the number of training rounds reaches the maximum value.

If the number of training rounds reaches the set maximum value, all training is finished, and step S160 is executed:

step S160: and outputting the result.

The output result is that the optimal action selection strategy and the calculation task completion time delay obtained by training at the moment are obtained.

The steps 120-S150 are iterated continuously until the set maximum number of training rounds is reached, so that the minimum computation task completion time delay under federate learning of a plurality of mobile edge nodes can be obtained.

Example two

As shown in fig. 2, the present application provides a federate learning-based mobile edge node computation task scheduling system, which specifically includes: initialization unit 210, training unit 220, first judgment unit 230, global update unit 240, second judgment unit 250, and output unit 260.

The initialization unit 210 is used to initialize information parameters.

The training unit 220 is connected to the initialization unit 210 for locally training the DQN network deployed at each mobile edge node.

The intelligent agent exists in the training unit 220, and can perform action selection by an epsilon-greedy strategy according to the state information of each wheel and the current network parameters, so as to obtain the instant reward of the action, and meanwhile, the calculation task queue in the environment is updated to the next state. The result of the interaction with the environment is stored in an experience pool, and after the capacity of the experience pool reaches a certain scale, the DQN network is subjected to gradient descent training to gradually converge to an optimal action selection strategy.

The first determining unit 230 is connected to the training unit 220, and is configured to determine whether the number of update rounds meets the aggregation frequency in the DQN training process.

The global updating unit 240 is connected to the first determining unit 230, and configured to perform global parameter updating if the number of updating rounds satisfies the aggregation frequency.

Wherein the global updating unit 240 performs global parameter updating as a parameter server.

The second determining unit 250 is connected to the global updating unit 240, and is configured to determine whether the number of training rounds reaches a specified number of times in the DQN training process.

The output unit 260 is connected to the second determining unit 240, and is configured to output the result if the number of training rounds reaches the set specified number of times.

The application has the following beneficial effects:

(4) the application provides a method for scheduling a computation task in an MEC system from the viewpoint of a computation task execution sequence, and the computation task completion time is shortened by utilizing the cooperation of a plurality of mobile edge nodes in the MEC system.

(5) The method and the device can flexibly adapt to dynamically changing environment states such as calculation task queue information, sensor positions and the like, can adjust the balance coefficient lambda in the instant reward function according to different requirements, and change the weights of waiting time delay and execution time delay to deal with different application scenes.

(6) According to the method, system performance is improved through global aggregation at a certain frequency by using federal learning, and compared with the method that each mobile edge node independently performs DQN training, the method has the advantage that a lower calculation task completion time delay is brought to each mobile edge node participating in the federal by an action selection strategy under global parameters.

Although the present application has been described with reference to examples, which are intended to be illustrative only and not to be limiting of the application, changes, additions and/or deletions may be made to the embodiments without departing from the scope of the application.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A mobile edge node calculation task scheduling method based on federal learning is characterized by specifically comprising the following steps:

initializing information parameters;

in response to completing initialization, locally training a DQN network deployed at each mobile edge node;

judging whether the number of updating rounds meets the aggregation frequency or not in the DQN network training process;

if the number of the updating rounds meets the aggregation frequency, carrying out global parameter aggregation and updating;

in response to the completion of global parameter updating, judging whether the number of training rounds reaches the specified number in the DQN network training process;

and if the number of training rounds reaches the specified number, outputting a result.

2. The method of claim 1, wherein in initializing information parameters, the method further comprises setting parameters required by each link system, including sensor transmit power, communication bandwidth, mobile edge node computing power, radio channel related parameters, experience pool capacity in DQN network, mini Batch size, learning rate, discount factor, aggregation frequency, number of computing tasks performed per round.

3. The method according to claim 2, wherein the initialization of the mobile edge nodes, local and global network parameters, and experience pools in the environment is performed in response to the system required parameters being set.

4. The method of claim 2, wherein the local training of the DQN network deployed at each mobile edge node specifically comprises the sub-steps of:

initializing a sensor in a coverage range of a mobile edge node;

updating the environmental status information in response to completing initialization of the sensor;

after responding to the updated environment state information, performing action selection on the current state;

executing the selected action, and acquiring experience generated by interaction with the environment;

storing experiences generated by interaction with the environment as samples in an experience pool;

judging whether the number of samples in the experience pool reaches a set value or not;

if the set value is reached, randomly selecting a specified number of samples from the experience pool, and training the DQN network;

judging whether the decision in the round is in a termination state or not in response to the completion of the DQN network training;

if the training is in the termination state, the training round is finished.

5. The method of claim 4, wherein initializing sensors within a coverage area of the mobile edge node comprises obtaining sensor-generated computation tasks, wherein computation task g is a task of computation^m(t_v) Expressed as:

g^m(t_v)＝<u^m，b^m，c^m>

wherein u is^mRepresenting the generation time of the calculation task m, b^mRepresenting the amount of communication data of the calculation task m, c^mRepresenting the amount of computation of the computation task m.

6. The method of claim 5, wherein the computation task g is generated at a sensor^m(t_v) Then, t is added_vThe compute task queue managed by the moving edge node at that moment is modeled as a 3M matrix, the matrix g (t)_v) The concrete expression is as follows:

wherein g is¹(t_v)，…，g^m(t_v)，…，g^M(t_v) Representing M computational tasks, u, generated by a sensor¹…u^m…u^MRepresenting the completion time of M computing tasks, b¹…b^m…b^MRepresenting the amount of communication data of M calculation tasks, c¹…c^m…c^MRepresenting the amount of computation of M computation tasks.

7. The method of claim 6 wherein the total latency from generation to completion of a computation task is comprised of execution latency and wait latency;

in which the execution delay de_m(t_v) The concrete expression is as follows:

wherein cm_m(t_v) Indicating the communication time, cp, required to perform task m_m(t_v) Representing the computation time required to perform task m, B is the communication bandwidth, p_mIs the sensor transmit power, h_m(t_v) Is the channel gain, r, obtained according to the current sensor and mobile edge node position relationship and channel state_m(t_v) Indicating the current data transmission rate, f_cIs the computing power of the mobile edge node, b^mThe amount of communication data (bits), c, representing the calculation task^mRepresenting the amount of computation, σ, of the task²Representing the channel noise power.

8. The method of claim 7, wherein latency refers to a residence time of the computation task in a queue from generation to start of execution; for the computational task g processed in the v-th round^m(t_v)＝<u^m，b^m，c^m>Its waiting delay dw_m(t_v) The concrete expression is as follows:

dw_m(t_v)＝t_v-u^m

9. The method of claim 8, wherein the completion delay d is a time delay for scheduling computational tasks for the mobile edge node_m(t_v) The concrete expression is as follows:

wherein de_m(t_v) Indicating the execution delay, dw, of the execution of the computational task_m(t_v) Representing the latency of the computational task, b^mRepresenting the amount of communication data of the computing task, c^mRepresenting the amount of computation of the computation task, f_cIs the computing power of the mobile edge node, u^mIndicating the generation time of the computing task, t_vRepresents the time, r_m(t_v) Indicating the current data transmission rate.

10. A mobile edge node calculation task scheduling system based on federal learning is characterized by specifically comprising: the device comprises an initialization unit, a training unit, a first judgment unit, a global updating unit, a second judgment unit and an output unit;

an initialization unit for initializing information parameters;

a training unit, configured to perform local training on the DQN network deployed at each mobile edge node;

the first judgment unit is used for judging whether the updating round number meets the aggregation frequency in the DQN training process;

the global updating unit is used for aggregating and updating global parameters if the number of updating rounds meets the aggregation frequency;

the second judgment unit is used for judging whether the number of training rounds reaches the specified number of times in the DQN training process;

and the output unit is used for outputting the result if the training round number reaches the set specified number.