CN111367657A - Computing resource collaborative cooperation method based on deep reinforcement learning - Google Patents
Computing resource collaborative cooperation method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN111367657A CN111367657A CN202010107300.5A CN202010107300A CN111367657A CN 111367657 A CN111367657 A CN 111367657A CN 202010107300 A CN202010107300 A CN 202010107300A CN 111367657 A CN111367657 A CN 111367657A
- Authority
- CN
- China
- Prior art keywords
- state
- edge server
- cpu
- experience
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1008—Server selection for load balancing based on parameters of servers, e.g. available memory or workload
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention relates to a computing resource collaborative cooperation method based on deep reinforcement learning, and belongs to the field of edge computing resource allocation. The method comprises the following steps: deploying edge servers into a honeycomb in a 5G dense area; recording a sample of the computing resource state and the corresponding action at a period of time by considering each edge server as an agent; randomly selecting a state sample from the experience replay at each time t to obtain an experience tuple, and storing each experience tuple into the experience replay to accumulate experiences; obtaining a new experience tuple from the Q value at the same time, and filling the experience tuple; and (4) iterating the Q value, and carrying out training by bringing into a target-net network and an eval-net network to obtain a cooperation decision of the optimal approximate solution. The invention breaks the association between the state samples, makes the samples mutually independent and improves the utilization rate of the computing resources in cooperative cooperation.
Description
Technical Field
The invention belongs to the field of edge computing resource allocation, and relates to a computing resource collaborative cooperation method based on deep reinforcement learning.
Background
Currently, the internet of things (IoT) is extended by internet technology to connect ubiquitous Mobile Devices (MDs) with sensors over wireless networks. The internet of things has been widely used in many fields, and the amount of data in mobile internet applications is exponentially increasing. In order to improve efficiency, the pursuit of low delay has become a trend. However, the data is uploaded from the terminal device to the cloud, and then is transmitted back to the terminal device after being calculated. The traditional cloud computing technology cannot meet the high requirement of people on computing efficiency.
The 5G can be connected with countless intelligent devices to realize data sharing and interaction. In addition, 5G can provide the basic idea of the Internet of things, the coverage range of the Internet of things is expanded, and billions of MDs access the Internet. The demand for data services has proliferated, creating new challenges for service providers and mobile network operators. Many applications in 5G, such as face recognition, natural language processing, etc., can operate in the terminal. Therefore, in order to exploit the computational load offload, we need to jointly manage the computational load offload and the related radio resource allocation, which has actually attracted much attention from researchers. It is precisely when it comes. Edge Computing (EC) enables a mobile terminal to actively transfer a Computing task to a nearby Edge server, so that Edge artificial intelligence cooperative cooperation is also allowed, and an effective method is provided for solving the increasing large-scale cluster Computing requirement and realizing efficient Computing cooperative transfer and cooperation.
Disclosure of Invention
In view of the above, the present invention provides a computing resource collaborative method based on deep reinforcement learning.
In order to achieve the purpose, the invention provides the following technical scheme:
a computing resource collaborative method based on deep reinforcement learning comprises the following steps:
the method comprises the following steps: for seamless connection, the edge servers are formed into a honeycomb shape and are deployed in a 5G network dense area;
step two: regarding each edge server as an agent, taking the state of the computing resource and the corresponding action recorded at a certain moment as samples and putting the samples into an experience replay;
step three: in order to increase the independence of the samples, randomly selecting a state sample from the experience replay at each time t to obtain an experience tuple, and then storing each experience tuple into the experience replay to accumulate and store experiences;
step four: and iterating the Q value through the target network target net and the evaluation network eval net to obtain a new state, putting the new state into an experience replay, updating the weight parameter by using the loss function, finally obtaining an optimal approximate solution, and obtaining an optimal decision of edge server cooperation.
Optionally, in the first step, the effort and time spent by the edge server in receiving the collaborative calculation result are ignored;
considering the system model, N mobile users offload computation tasks to the edge server over the wireless link; each user has M independent tasks to be completed;
to model tasks, a cellular network shape is used to maximize coverage utilization of edge servers; by cooperatively optimizing the unloading decision of each edge task and the distribution of the computing resources of the server and the transmission and the reception of the tasks, an optimization case aiming at minimizing the energy consumption for completing the computing tasks and fully utilizing the computing resources is made.
Optionally, in the second step, each edge server is regarded as an agent, and the computing resource states of the CPU, the task amount and the energy consumption at each moment are taken as a state sample, where a CPU idle configuration file of a partner is defined as data of the terminal device, and the partner is in the duration T ∈ [0, T]These data are internally calculated, denoted as Ubit(t);
The cooperative edge server information with free CPU is as follows, the cooperative edge server CPU state information refers to the state of CPU along with time and is recorded by defining the following cooperative CPU event space, process and epoch, wherein α ═ α1,α2Denotes the CPU state space samples of the cooperating edge servers, α1And α2Respectively cooperating with the edge server from busy to idle and then from idle to busy; the cooperative edge processor process is then defined as the time instant of the coprocessor event sequence:
the process of the CPU is to allow an offline design of the cooperative computing strategy, with a sample path of the partner CPU process, let I for each epoch kkA CPU status indicator is represented, wherein values 1 and 0 represent an idle state and a busy state, respectively; the CPU idle configuration of the server is as follows:
edge collaborators have non-causal knowledge of the nature of edge servers with no CPU idle; assuming that a q-bit buffer is reserved for storing unloaded data before the partner processes in the CPU;
consider two forms of data arrival on an edge server; one task arrival assumes that the input L-bit arrives at time t ═ 0, so the event space and process of the edge server CPU follow the complaints; on the other hand, the data arrival of the burst forms a random process; for burst data arrival, useRepresenting a combined event space, α3The new task state is shown to reach the edge server, and the corresponding process is a variable sequence:
{α1,α2,α3.. } representing time instants of the sequence of events; moreover, every momentLet LkIndicating the size of the data arrival, L k0 means α1And α2State, LkNot equal to 0 indicates α3The state of (1);otherwise, the task reaching the deadline can not be calculated, and then the total data is inputThen taking the state of the computing resource at each moment as a state sample;
selecting an action to represent how to collaborate between two different adjacent edge servers by selecting a state sample; corresponding to a particular change/shift between two different adjacent states; the variable v indicates the number v of the different time states 1,2, … NM +3N, and then the action a (t) av(t) }, action 1 × (NM +3N) depends on the selection of v, for which there are the following actions;
when v is more than or equal to 1 and less than or equal to NM, the corresponding action av(t) means how to change task xnmAn offloading decision; specifically, use is made of:
Integer operation is adopted, mod (v, M) is remainder operation, and a corresponding server task v is found;
when v is more than or equal to NM +1 and less than or equal to NM + N and V is more than or equal to NM +1 and less than or equal to NM +2N, corresponding action av(t) arranging the collaborative computing resources for the edge server, the actions being:
Wherein the computing resources are updated byWherein C iscoNumber of cycles for processing a computing task for the CPU of an edge server, Cco,maxFor maximum period calculated by CPU, Ndo,totThe number of CPU cores.
Optionally, in the third step, based on the CPU and the calculation task as the state sample and defining the selection of the action, the system state of any edge server is defined, in the initial stage, the corresponding action is taken for the corresponding state, the state sample therein is selected as the state of the given time, and a specific action is taken to obtain the maximum accumulated reward, that is, the Q value;
Qπ(s,a)=Eπ[rt+1+γrt+1+...|At=a,St=s]
Q(s,a)←Q(s,a)+δ[r+γmaxaQ(s`,a`)-Q(s,a)]
wherein Q (s, a) is an action state function value, the moment t is discounted, the influence of gamma decay on the Q function, the closer gamma to 1 represents that the later decision is influenced, and the closer gamma to 0 represents that the current interest value is emphasized;
at this point, there is an experience tuple table Dt=(e1,…,et) To record the value of each set of state and action; now experience tuple e in experience replay1=(st,a,rt,st+1) Not full, and randomly selecting a state sample at each time step t in the experience replay to obtain experience tuples, and then storing each experience tuple in the experience replay to accumulate the experience.
Optionally, in the fourth step, a method different from the previous method is adopted, and the Experience is performed on the obtained productExperience tuple e in replay1=(st,a,rt,st+1) Carrying into two same neural networks for training, namely a target net and an evaluation network eval net; output value Q of targetnettarRepresents the decay score when behavior a is selected under the state sample s of the front edge server, i.e.:
wherein r and s' respectively represent a corresponding score and a corresponding next observation state when the action a is taken in the current state s of the edge server; gamma is the attenuation factor; a ' is the action taken by the edge server in state s ', and w ' is the weight parameter of target net;
output value Q of evalnetevalRepresents the score when action a is taken under the state sample s of the front edge server:
wherein w is a weight parameter of eval net;
an epsilon-greedy strategy is adopted to obtain the behavior a, and a plurality of cooperative edge servers can be explored while a decision is generated based on a network and a single adjacent edge server is cooperative with a certain probability; continuously updating experience tuples in the experience pool and taking the experience tuples as input of target net and eval net to obtain QevalAnd Qtar(ii) a Will QevalAnd QtarThe difference value of the evaluation network is used as a Loss function, and a weight parameter of the evaluation network is updated by a gradient descent method; for training convergence, the weight parameters of the target network are updated by copying the weight parameters of the evaluation network at regular intervals, and the model is as follows:
wherein s istAnd a represents the current state and current work of the edge server respectivelyThe action taken, r, represents the reward rewarded by taking this action, γ is the desired reduced count factor, st+1Representing the state of the next step in the future, and w is used as a vector for fitting the deep neural network;
then, the gradient descent algorithm is used to minimize the difference between the target network output and the prediction, i.e. the main network output:
Loss=(Qtraget(st,a)-Qpre(st,a,w))2
and finally, training the two neural networks by using the experience tuples, and continuously iterating the Q value, so that the edge server can obtain a solution of an optimal approximate solution under the condition that the edge server is in a limited computing resource state, and the solution is used as an optimal strategy for the edge server to cooperate with each other.
The invention has the beneficial effects that: and taking the acquired computing resource state and the corresponding selection action as samples to be put into an experience pool, and then randomly extracting one sample from the experience pool to carry out two-row neural network training. The first part of this treatment breaks the link between the samples, making them independent of each other. Fixed Q target network to calculate the target value of the network, the existing Q value is needed. A slower update network is used to provide this Q value. This improves the stability and convergence of training to further improve the efficiency and cost of collaboration between edge servers.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a flow chart of a computing resource collaborative method based on deep reinforcement learning;
FIG. 2 is a schematic diagram of an edge server deployment;
FIG. 3 is a diagram illustrating the cooperation of an optimal solution of the demand.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
The technical scheme for solving the technical problems is as follows:
referring to fig. 1 to 3, the following description is made with reference to the accompanying drawings, and the present invention includes the following steps:
the method comprises the following steps: the effort and time spent by the edge servers in receiving the collaborative computing results is negligible because they are typically much smaller than the offload counterparts. Extending the current analysis to include overhead is simple and tedious. Considering the system model, N mobile users offload their computing tasks to the edge server over the wireless link. Each user has M independent tasks to be completed. To model the task, a cellular network shape is used to maximize coverage utilization of the edge servers. Then, by collaboratively optimizing the offloading decision of each edge task and the allocation of server computing resources, as well as the transmission and reception of tasks, an optimization plan aiming at minimizing the energy consumption for completing the computing tasks and the full utilization of computing resources is made, as shown in fig. 2.
And step two, regarding each edge server as an agent, taking the state of the computing resources such as CPU, task amount, energy consumption and the like of each edge server as a state sample, wherein the CPU idle configuration file of the partner is defined as data (in bits) of the terminal equipment, and the partner can continue for the duration T ∈ [0, T]These data are internally calculated and are denoted as Ubit(t) the collaboration edge server information with free CPUs in it is such that collaboration edge server CPU state information, which refers to the state of the CPUs over time, can be recorded by defining a collaboration CPU event space, process and epoch where α ═ α1,α2Denotes the CPU state space samples of the cooperating edge servers, α1And α2Respectively cooperating with the edge server from busy to idle and then from idle to busy. The coordinated edge processor process can then be defined as the time instant of the coprocessor event sequence:
The CPU process is one that allows for the offline design of collaborative computing strategies, e.g., one sample path of partner CPU process, let I for each epoch kkIndicating a CPU status indicator with values 1 and 0 indicating an idle state and a busy state, respectively. However, with fhThe constant CPU frequency of the collaborators is indicated and C is used to indicate the number of cycles of the CPU in 1 bit of input data that need to be counted by the user. Based on the above definition, the CPU idle configuration of the server is as follows:
edge collaborators have non-causal knowledge of the nature of edge servers that have no CPU idle. Finally, assume that a q-bit buffer is reserved for storing offloaded data before processing in the CPU by the partner.
Consider two forms of data arrival on an edge server. One task arrival assumes that the input L-bit arrives at time t ═ 0, so the event space and process of the edge server CPU follow the complaints. On the other hand, the data arrival of the burst forms a random process. For a short specification, it is useful to define a random process that combines the data arrival of both processes with the edge server CPU. Thus for a combined random process of bursty task arrivals. For burst data arrival, useRepresenting a combined event space, α1,α2Already introduced above, α3The new task state is shown to reach the edge server, and the corresponding process is a variable sequence:
{α1,α2,α3.. } representing time instants of the sequence of events. And, every time every hourCarving toolLet LkIndicating the size of the data arrival, L k0 means α1And α2State, LkNot equal to 0 indicates α3The state of (1). In addition, the first and second substrates are,otherwise, the task reaching the deadline can not be calculated, and then the total data is inputThen, the state of the computing resources such as the CPU, the task amount and the like at each moment is taken as a state sample.
Selecting a state sample will select an action to represent how to collaborate between two different adjacent edge servers. In particular, to a particular change/movement between two different adjacent states. The variable v indicates the number v of the different time states 1,2, … NM +3N, and then the action a (t) av(t) }, action 1 × (NM +3N) depends on the selection of v, for which there are the following actions.
When v is more than or equal to 1 and less than or equal to NM, the corresponding action av(t) means how to change task xnmAnd (6) unloading the decision. Specifically, use is made of:
Is an integer operation, mod (v, M) is a remainder operation, and we can then find the corresponding server task v.
When v is more than or equal to NM +1 and less than or equal to NM + N and v is more than or equal to NM +2N, the correspondingAction av(t) arranging the collaborative computing resources for the edge server, the actions being:
Wherein the computing resources are updated byWherein C iscoNumber of cycles for processing a computing task for the CPU of an edge server, Cco,maxFor maximum period calculated by CPU, Ndo,totThe number of CPU cores.
Step three: based on CPU and computing task as state sample and defining action selection, thereby defining system state of any edge server, in initial stage, taking corresponding action for corresponding state, selecting state sample as state of given time, and taking a specific action after action, in order to obtain maximum accumulated reward, i.e. Q value.
Qπ(s,a)=Eπ[rt+1+γrt+1+...|At=a,St=s]
Q(s,a)←Q(s,a)+δ[r+γmaxaQ(s`,a`)-Q(s,a)]
Wherein Q (s, a) is the function value of the action state, the moment t is discounted, the influence of gamma decay on the Q function, the closer gamma to 1 represents that the later decision is influenced, and the closer gamma to 0 represents that the current interest value is emphasized.
At this point, there is an experience tuple table Dt=(e1,…,et) To record the value of each set of state and action. Now experience tuple e in experience replay1=(st,a,rt,st+1) Not full, and randomly selecting a state sample in each time step t in the experience replay to obtain an experience tupleThen, each time the experience tuple is stored in the experience replay, the experience is accumulated.
Step four: adopting a different method from the previous method to extract the Experience tuple e in the Experience replay thereof1=(st,a,rt,st+1) Training is carried into two identical neural networks (three convolutional layers and two fully-connected layers), targetnet and eval net respectively. Output value Q of target nettarRepresents the decay score when behavior a is selected under the state sample s of the front edge server, i.e.:
wherein r and s' respectively represent a corresponding score and a corresponding next observation state when the action a is taken in the current state s of the edge server; gamma is the attenuation factor; a ' is the action taken by the edge server at state s ', and w ' is the weight parameter of target net.
Output value Q of evalnetevalRepresents the score when action a is taken under the state sample s of the front edge server:
where w is the weight parameter of eval net.
And an epsilon-greedy strategy (epsilon is gradually reduced from 1 to 0) is adopted to obtain the behavior a, so that a plurality of edge servers which are cooperated can be searched while a decision is generated based on a network and an adjacent single edge server is cooperated with a certain probability, and the problem of optimal solution is avoided. Here, the experience tuples in the experience pool are continuously updated and used as the input of target net and evalnet to obtain QevalAnd Qtar. Will QevalAnd QtarThe difference value of (2) is used as a Loss function, and the weight parameter of the evaluation network is updated by a gradient descent method. For training convergence, the weight parameters of the target network are updated by copying the weight parameters of the evaluation network at regular intervalsThe model is as follows:
wherein s istAnd a represents the current state of the edge server and the action currently being taken, respectively, r represents the reward rewarded for taking this action, γ is the desired reduced count factor, st+1Representing the state of the next step in the future, w is used as a vector for the deep neural network fit.
Then, the difference between the target network output and the prediction is minimized using a gradient descent algorithm:
Loss=(Qtraget(st,a)-Qeval(st,a,w))2
and finally, training the two neural networks by using the experience tuples, and continuously iterating the Q value, so that the edge server can obtain a solution of an optimal approximate solution under the condition that the edge server is in a limited computing resource state, and the solution is used as an optimal strategy for the edge server to cooperate with each other.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.
Claims (5)
1. A computing resource collaborative cooperation method based on deep reinforcement learning is characterized in that: the method comprises the following steps:
the method comprises the following steps: for seamless connection, the edge servers are formed into a honeycomb shape and are deployed in a 5G network dense area;
step two: regarding each edge server as an agent, recording the state of the computing resource at a certain moment and the corresponding action as a sample and putting the sample into an experienceplay;
step three: in order to increase the independence of the samples, randomly selecting a state sample from the experiencereplay at each time t to obtain an experience tuple, and then storing each experience tuple into the experiencereplay to accumulate experiences and store;
step four: and iterating the Q value through the target network targetnet and the evaluation network evalnet to obtain a new state, putting the new state into experienceplay again, updating the weight parameter by using the loss function, finally obtaining an optimal approximate solution, and obtaining an optimal decision of edge server cooperation.
2. The computing resource collaborative method based on deep reinforcement learning according to claim 1, wherein: in the first step, the effort and time spent by the edge server on receiving the collaborative calculation result are ignored;
considering the system model, N mobile users offload computation tasks to the edge server over the wireless link; each user has M independent tasks to be completed;
to model tasks, a cellular network shape is used to maximize coverage utilization of edge servers; by cooperatively optimizing the unloading decision of each edge task and the distribution of the computing resources of the server and the transmission and the reception of the tasks, an optimization case aiming at minimizing the energy consumption for completing the computing tasks and fully utilizing the computing resources is made.
3. The method according to claim 1, wherein in the second step, regarding each edge server as an agent, the computing resource status of CPU, task amount and energy consumption at each moment is taken as a status sample, wherein the CPU idle configuration file of the partner is defined as the data of the terminal device, and the partner is in the duration of T ∈ [0, T]These data are internally calculated, denoted as Ubit(t);
The cooperative edge server information with free CPU is as follows, the cooperative edge server CPU state information refers to the state of CPU along with time and is recorded by defining the following cooperative CPU event space, process and epoch, wherein α ═{α1,α2Denotes the CPU state space samples of the cooperating edge servers, α1And α2Respectively cooperating with the edge server from busy to idle and then from idle to busy; the cooperative edge processor process is then defined as the time instant of the coprocessor event sequence:
the process of the CPU is to allow an offline design of the cooperative computing strategy, with a sample path of the partner CPU process, let I for each epoch kkA CPU status indicator is represented, wherein values 1 and 0 represent an idle state and a busy state, respectively; the CPU idle configuration of the server is as follows:
edge collaborators have non-causal knowledge of the nature of edge servers with no CPU idle; assuming that a q-bit buffer is reserved for storing unloaded data before the partner processes in the CPU;
consider two forms of data arrival on an edge server; one task arrival assumes that the input L-bit arrives at time t ═ 0, so the event space and process of the edge server CPU follow the complaints; on the other hand, the data arrival of the burst forms a random process; for burst data arrival, useRepresenting a combined event space, α3The new task state is shown to reach the edge server, and the corresponding process is a variable sequence:
{α1,α2,α3.. } representing time instants of the sequence of events; moreover, every momentLet LkIndicating the size of the data arrival, Lk0 means α1And α2State, LkNot equal to 0 indicates α3The state of (1);otherwise, the task reaching the deadline can not be calculated, and then the total data is inputThen taking the state of the computing resource at each moment as a state sample;
selecting an action to represent how to collaborate between two different adjacent edge servers by selecting a state sample; corresponding to a particular change/shift between two different adjacent states; the variable v indicates the number v of the different time states 1,2, … NM +3N, and then the action a (t) av(t) }, action 1 × (NM +3N) depends on the selection of v, for which there are the following actions;
when v is more than or equal to 1 and less than or equal to NM, the corresponding action av(t) means how to change task xnmAn offloading decision; specifically, use is made of:
Integer operation is adopted, mod (v, M) is remainder operation, and a corresponding server task v is found;
when v is more than or equal to NM +1 and less than or equal to NM + N and V is more than or equal to NM +1 and less than or equal to NM +2N, corresponding action av(t) arranging the collaborative computing resources for the edge server, the actions being:
4. The computing resource collaborative method based on deep reinforcement learning according to claim 1, wherein: in the third step, based on the CPU and the calculation task as the state sample and the selection of the defined action, the system state of any edge server is defined, in the initial stage, corresponding action is taken for the corresponding state, the state sample is selected as the state of the given time, and a specific action is taken to obtain the maximum accumulated reward, namely the Q value;
Qπ(s,a)=Eπ[rt+1+γrt+1+...|At=a,St=s]
Q(s,a)←Q(s,a)+δ[r+γmaxaQ(s`,a`)-Q(s,a)]
wherein Q (s, a) is an action state function value, the moment t is discounted, the influence of gamma decay on the Q function, the closer gamma to 1 represents that the later decision is influenced, and the closer gamma to 0 represents that the current interest value is emphasized;
at this point, there is an experience tuple table Dt=(e1,…,et) To record the value of each set of state and action; now experience tuple e in experience replay1=(st,a,rt,st+1) Not full, and randomly selecting a state sample at each time step t in the experience replay to obtain experience tuples, and then storing each experience tuple in the experience replay to accumulate the experience.
5. The computing resource collaborative method based on deep reinforcement learning according to claim 1, wherein: in the fourth step, a method different from the previous method is adopted, and the Experience tuple e in Experience replay is used1=(st,a,rt,st+1) Carrying into two same neural networks for training, namely a target net and an evaluation network eval net; output value Q of target nettarRepresents the decay score when behavior a is selected under the state sample s of the front edge server, i.e.:
wherein r and s' respectively represent a corresponding score and a corresponding next observation state when the action a is taken in the current state s of the edge server; gamma is the attenuation factor; a ' is the action taken by the edge server in state s ', and w ' is the weight parameter of target net;
output value Q of eval netevalRepresents the score when action a is taken under the state sample s of the front edge server:
wherein w is a weight parameter of eval net;
and adopts epsilon-greedy strategy to obtain behaviora, when a decision is generated based on a network, a plurality of cooperative edge servers can be explored while the adjacent edge servers are cooperated with a certain probability; continuously updating experience tuples in the experience pool and taking the experience tuples as input of target net and eval net to obtain QevalAnd Qtar(ii) a Will QevalAnd QtarThe difference value of the evaluation network is used as a Loss function, and a weight parameter of the evaluation network is updated by a gradient descent method; for training convergence, the weight parameters of the target network are updated by copying the weight parameters of the evaluation network at regular intervals, and the model is as follows:
wherein s istAnd a represents the current state of the edge server and the action currently being taken, respectively, r represents the reward rewarded for taking this action, γ is the desired reduced count factor, st+1Representing the state of the next step in the future, and w is used as a vector for fitting the deep neural network;
then, the gradient descent algorithm is used to minimize the difference between the target network output and the prediction, i.e. the main network output:
Loss=(Qtraget(st,a)-Qpre(st,a,w))2
and finally, training the two neural networks by using the experience tuples, and continuously iterating the Q value, so that the edge server can obtain a solution of an optimal approximate solution under the condition that the edge server is in a limited computing resource state, and the solution is used as an optimal strategy for the edge server to cooperate with each other.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010107300.5A CN111367657B (en) | 2020-02-21 | 2020-02-21 | Computing resource collaborative cooperation method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010107300.5A CN111367657B (en) | 2020-02-21 | 2020-02-21 | Computing resource collaborative cooperation method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111367657A true CN111367657A (en) | 2020-07-03 |
CN111367657B CN111367657B (en) | 2022-04-19 |
Family
ID=71206211
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010107300.5A Active CN111367657B (en) | 2020-02-21 | 2020-02-21 | Computing resource collaborative cooperation method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111367657B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112099510A (en) * | 2020-09-25 | 2020-12-18 | 东南大学 | Intelligent agent control method based on end edge cloud cooperation |
CN112134916A (en) * | 2020-07-21 | 2020-12-25 | 南京邮电大学 | Cloud edge collaborative computing migration method based on deep reinforcement learning |
CN112506673A (en) * | 2021-02-04 | 2021-03-16 | 国网江苏省电力有限公司信息通信分公司 | Intelligent edge calculation-oriented collaborative model training task configuration method |
CN112511336A (en) * | 2020-11-05 | 2021-03-16 | 上海大学 | Online service placement method in edge computing system |
CN112948112A (en) * | 2021-02-26 | 2021-06-11 | 杭州电子科技大学 | Edge computing workload scheduling method based on reinforcement learning |
CN113836788A (en) * | 2021-08-24 | 2021-12-24 | 浙江大学 | Acceleration method for flow industry reinforcement learning control based on local data enhancement |
CN113835878A (en) * | 2021-08-24 | 2021-12-24 | 润联软件系统(深圳)有限公司 | Resource allocation method and device, computer equipment and storage medium |
CN114140033A (en) * | 2022-01-29 | 2022-03-04 | 北京新唐思创教育科技有限公司 | Service personnel allocation method and device, electronic equipment and storage medium |
CN115496208A (en) * | 2022-11-15 | 2022-12-20 | 清华大学 | Unsupervised multi-agent reinforcement learning method with collaborative mode diversity guidance |
CN116821693A (en) * | 2023-08-29 | 2023-09-29 | 腾讯科技(深圳)有限公司 | Model training method and device for virtual scene, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170085488A1 (en) * | 2015-09-22 | 2017-03-23 | Brocade Communications Systems, Inc. | Intelligent, load adaptive, and self optimizing master node selection in an extended bridge |
CN108920280A (en) * | 2018-07-13 | 2018-11-30 | 哈尔滨工业大学 | A kind of mobile edge calculations task discharging method under single user scene |
CN109583582A (en) * | 2017-09-28 | 2019-04-05 | 中国石油化工股份有限公司 | Neural network intensified learning method and system based on Eligibility traces |
CN109710404A (en) * | 2018-12-20 | 2019-05-03 | 上海交通大学 | Method for scheduling task in distributed system |
CN109976909A (en) * | 2019-03-18 | 2019-07-05 | 中南大学 | Low delay method for scheduling task in edge calculations network based on study |
CN110427261A (en) * | 2019-08-12 | 2019-11-08 | 电子科技大学 | A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree |
CN110619385A (en) * | 2019-08-31 | 2019-12-27 | 电子科技大学 | Structured network model compression acceleration method based on multi-stage pruning |
-
2020
- 2020-02-21 CN CN202010107300.5A patent/CN111367657B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170085488A1 (en) * | 2015-09-22 | 2017-03-23 | Brocade Communications Systems, Inc. | Intelligent, load adaptive, and self optimizing master node selection in an extended bridge |
CN109583582A (en) * | 2017-09-28 | 2019-04-05 | 中国石油化工股份有限公司 | Neural network intensified learning method and system based on Eligibility traces |
CN108920280A (en) * | 2018-07-13 | 2018-11-30 | 哈尔滨工业大学 | A kind of mobile edge calculations task discharging method under single user scene |
CN109710404A (en) * | 2018-12-20 | 2019-05-03 | 上海交通大学 | Method for scheduling task in distributed system |
CN109976909A (en) * | 2019-03-18 | 2019-07-05 | 中南大学 | Low delay method for scheduling task in edge calculations network based on study |
CN110427261A (en) * | 2019-08-12 | 2019-11-08 | 电子科技大学 | A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree |
CN110619385A (en) * | 2019-08-31 | 2019-12-27 | 电子科技大学 | Structured network model compression acceleration method based on multi-stage pruning |
Non-Patent Citations (2)
Title |
---|
EJAZ AHMED: ""Bringing Computation Closer toward the User Network:Is Edge Computing the Solution?"", 《IEEE COMMUNICATIONS MAGAZINE》 * |
戴亚盛: ""边缘计算可信协同服务机制研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112134916A (en) * | 2020-07-21 | 2020-12-25 | 南京邮电大学 | Cloud edge collaborative computing migration method based on deep reinforcement learning |
CN112099510A (en) * | 2020-09-25 | 2020-12-18 | 东南大学 | Intelligent agent control method based on end edge cloud cooperation |
CN112511336A (en) * | 2020-11-05 | 2021-03-16 | 上海大学 | Online service placement method in edge computing system |
CN112506673A (en) * | 2021-02-04 | 2021-03-16 | 国网江苏省电力有限公司信息通信分公司 | Intelligent edge calculation-oriented collaborative model training task configuration method |
CN112948112A (en) * | 2021-02-26 | 2021-06-11 | 杭州电子科技大学 | Edge computing workload scheduling method based on reinforcement learning |
CN113835878A (en) * | 2021-08-24 | 2021-12-24 | 润联软件系统(深圳)有限公司 | Resource allocation method and device, computer equipment and storage medium |
CN113836788A (en) * | 2021-08-24 | 2021-12-24 | 浙江大学 | Acceleration method for flow industry reinforcement learning control based on local data enhancement |
CN113836788B (en) * | 2021-08-24 | 2023-10-27 | 浙江大学 | Acceleration method for flow industrial reinforcement learning control based on local data enhancement |
CN114140033A (en) * | 2022-01-29 | 2022-03-04 | 北京新唐思创教育科技有限公司 | Service personnel allocation method and device, electronic equipment and storage medium |
CN114140033B (en) * | 2022-01-29 | 2022-04-12 | 北京新唐思创教育科技有限公司 | Service personnel allocation method and device, electronic equipment and storage medium |
CN115496208A (en) * | 2022-11-15 | 2022-12-20 | 清华大学 | Unsupervised multi-agent reinforcement learning method with collaborative mode diversity guidance |
CN116821693A (en) * | 2023-08-29 | 2023-09-29 | 腾讯科技(深圳)有限公司 | Model training method and device for virtual scene, electronic equipment and storage medium |
CN116821693B (en) * | 2023-08-29 | 2023-11-03 | 腾讯科技(深圳)有限公司 | Model training method and device for virtual scene, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111367657B (en) | 2022-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111367657B (en) | Computing resource collaborative cooperation method based on deep reinforcement learning | |
CN113254197B (en) | Network resource scheduling method and system based on deep reinforcement learning | |
CN113950066B (en) | Single server part calculation unloading method, system and equipment under mobile edge environment | |
CN109753751B (en) | MEC random task migration method based on machine learning | |
CN110809306B (en) | Terminal access selection method based on deep reinforcement learning | |
CN113543176B (en) | Unloading decision method of mobile edge computing system based on intelligent reflecting surface assistance | |
CN110113190A (en) | Time delay optimization method is unloaded in a kind of mobile edge calculations scene | |
CN111858009A (en) | Task scheduling method of mobile edge computing system based on migration and reinforcement learning | |
CN109947545A (en) | A kind of decision-making technique of task unloading and migration based on user mobility | |
CN113098714A (en) | Low-delay network slicing method based on deep reinforcement learning | |
CN113469325A (en) | Layered federated learning method, computer equipment and storage medium for edge aggregation interval adaptive control | |
CN113286317B (en) | Task scheduling method based on wireless energy supply edge network | |
CN114390057B (en) | Multi-interface self-adaptive data unloading method based on reinforcement learning under MEC environment | |
CN114546608B (en) | Task scheduling method based on edge calculation | |
CN115277689A (en) | Yun Bianwang network communication optimization method and system based on distributed federal learning | |
CN114205353B (en) | Calculation unloading method based on hybrid action space reinforcement learning algorithm | |
Yang et al. | Deep reinforcement learning based wireless network optimization: A comparative study | |
CN116489712B (en) | Mobile edge computing task unloading method based on deep reinforcement learning | |
CN114938372B (en) | Federal learning-based micro-grid group request dynamic migration scheduling method and device | |
Tao et al. | Drl-driven digital twin function virtualization for adaptive service response in 6g networks | |
CN116321307A (en) | Bidirectional cache placement method based on deep reinforcement learning in non-cellular network | |
Henna et al. | Distributed and collaborative high-speed inference deep learning for mobile edge with topological dependencies | |
CN113821346B (en) | Edge computing unloading and resource management method based on deep reinforcement learning | |
CN114154685A (en) | Electric energy data scheduling method in smart power grid | |
CN117749796A (en) | Cloud edge computing power network system calculation unloading method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |