CN111813538B - Edge computing resource allocation method - Google Patents
Edge computing resource allocation method Download PDFInfo
- Publication number
- CN111813538B CN111813538B CN202010460707.6A CN202010460707A CN111813538B CN 111813538 B CN111813538 B CN 111813538B CN 202010460707 A CN202010460707 A CN 202010460707A CN 111813538 B CN111813538 B CN 111813538B
- Authority
- CN
- China
- Prior art keywords
- edge computing
- network
- task
- neural network
- resource allocation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 93
- 238000013468 resource allocation Methods 0.000 title claims abstract description 21
- 238000013528 artificial neural network Methods 0.000 claims abstract description 39
- 230000009471 action Effects 0.000 claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 26
- 230000008569 process Effects 0.000 claims abstract description 21
- 230000006870 function Effects 0.000 claims description 25
- 239000003795 chemical substances by application Substances 0.000 claims description 24
- 239000013598 vector Substances 0.000 claims description 24
- 239000000872 buffer Substances 0.000 claims description 14
- 230000004913 activation Effects 0.000 claims description 9
- 238000005192 partition Methods 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 3
- 230000002787 reinforcement Effects 0.000 abstract description 6
- 238000005516 engineering process Methods 0.000 abstract description 5
- 238000010801 machine learning Methods 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 description 11
- 238000012545 processing Methods 0.000 description 5
- 230000007704 transition Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Complex Calculations (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application belongs to the technical field of resource allocation strategies, and particularly relates to an edge computing resource allocation method. With respect to how efficiently edge computing resources are allocated, a mobile edge computing environment can be modeled as a Markov decision process, and the model is highly random and very complex. A method of edge computing resource allocation, the method comprising: 1): defining edge computing model states, actions, and rewards; 2): analyzing and utilizing the state, the action and the rewards defined in the 1) to define the structure of the neural network and the structure of input and output; 3): updating, training and applying the neural network defined in the step 2) according to a given training method. The deep reinforcement learning technology in machine learning is very suitable for being applied to the model due to the strong perceptibility and decision capability of the deep reinforcement learning technology to the environment so as to solve the problem of maximum return.
Description
Technical Field
The application belongs to the technical field of resource allocation strategies, and particularly relates to an edge computing resource allocation method.
Background
With the rise of the fifth generation mobile communication technology (5G), mobile applications and services also put higher demands on time delay, and the conventional cloud computing mode can no longer completely adapt to the low-time-delay requirement, so that the mobile edge computing (mobileedge computing) technology is generated.
In the traditional cloud computing mode, a user uploads a computation-intensive task to a cloud server for processing through a core network, and although the computing resources of the cloud server are sufficient, the computation can be completed in a short time, but the transmission delay is larger due to factors such as limited bandwidth of the core network, network jitter and the like. In order to reduce transmission delay, the mobile edge computing mode configures computing resources beside a network edge, such as a wireless access point of a base station and the like, and a user only needs to offload tasks to an edge computing server for processing, so that larger transmission delay caused when data is transmitted through a core network can be avoided, and meanwhile, the advantages of saving the bandwidth of the core network, facilitating an operator to provide personalized service strategies according to different places, further guaranteeing privacy safety and the like are brought. However, since computing resources at the network edge are very limited relative to the cloud server, how to effectively allocate and utilize the edge computing resources becomes one of the keys of the mobile edge computing technology.
With respect to how efficiently edge computing resources are allocated, a mobile edge computing environment can be modeled as a Markov decision process, and the model is highly random and very complex.
Disclosure of Invention
1. Technical problem to be solved
Based on how to allocate edge computing resources effectively, the mobile edge computing environment can be modeled as a markov decision process, and the model is highly random and very complex, and the application provides an edge computing resource allocation method facing 5G communication requirements.
2. Technical proposal
To achieve the above object, the present application provides a method for allocating edge computing resources, the method including:
1): defining edge computing model states, actions, and rewards;
2): defining the structure of a neural network and the structure of input and output;
3): the neural network is updated, trained and applied according to a given training method.
Another embodiment provided herein is: the edge computation model state, action, and prize definition process of 1) is as follows:
in the kth frame, the environment observed by the agent is x (k) =[d (k) ,w (k) ,q (k) ,η (k) ]And define x (k) Is an observation of the environment at the kth frame, and is not a state, where each element has the meaning:
a vector formed by the size of the task data at the head of the cache queue;
a vector consisting of the waiting time of the task at the head of the cache queue;
a vector consisting of the length of each cache queue;
a vector consisting of the signal-to-noise ratios of the channels.
Another embodiment provided herein is: the environment observed by the agent comprises the current state of the previous W frame and the current frame.
Another embodiment provided herein is: the policy adopted by the agent is recorded as pi, then a k =π(s k ) The method comprises the steps of carrying out a first treatment on the surface of the In the edge computing model, the edge computing server has C CPU cores, N users and terminals, and the total of the N users and terminals is obtained by using a partition method in permutation and combinationDifferent allocation schemes, thus action a k There may beAnd (5) seed selection.
Another embodiment provided herein is: the rewards and penalties are set to the maximum tolerable delay d r =αT f And maximum tolerable error probability ε max 。
Another embodiment provided herein is: the rewards and penalties include three things:
1) Task latency<d r Probability of error<ε max The method comprises the steps of carrying out a first treatment on the surface of the At this time, the task is successfully completed, and a prize R= +1 is obtained;
2) Task latency<d r Probability of error>ε max The method comprises the steps of carrying out a first treatment on the surface of the At this time, the task is obtainedProcessing, wherein the error probability is too high, and the prize R= -1 is obtained, namely punishment is carried out;
3) Task latency>d r At this time, the task waits too long in the buffer queue and is not processed, and obtains the prize r= -1.5, namely, is punished.
Another embodiment provided herein is: the neural network in the step 2) is a multi-layer fully-connected neural network, each output node corresponds to an output scalar, the ReLU function is selected except for the activation function outside the output layer, and the activation function is not used by the output layer. The neural network parameter of the intelligent agent is theta k In this case, the action state value function is recorded asAt this time, the strategy of the agent is pi (s k ;θ k ) The method comprises the steps of carrying out a first treatment on the surface of the For simplicity of description, Q is used when ambiguity is not caused k (s k ,a k ) Representation->The Q function is estimated by a neural network, and the network is called Q-network.
Another embodiment provided herein is: the training method in 3) comprises an experience replay method, wherein samples of the interaction of the intelligent agent and the environment are stored in a memory bank, and a batch of samples are randomly selected from the samples in the training process to update and iterate the network. The empirical replay method and the fixed Q target network method are skills in algorithm 1, and are included in algorithm 1.
Another embodiment provided herein is: the training method adopts a fixed Q-network method when updating, and the training process needs two Q-networks: q-policy network Q policy And Q-target network Q target The parameters are respectivelyAnd theta target The method comprises the steps of carrying out a first treatment on the surface of the At the time of iteration, Q target The frequency of variation of (2) is far lower than Q policy 。
3. Advantageous effects
Compared with the prior art, the edge computing resource allocation method provided by the application has the beneficial effects that:
the edge computing resource allocation method provided by the application is very suitable for being applied to the model due to the strong perceptibility and decision capability of the edge computing resource allocation method to the environment in the machine learning so as to solve the problem of maximum return.
According to the edge computing resource allocation method, in the scene that a single edge computing node, multiple users and multiple terminals are used for network coding with limited block length, the edge computing resource allocation method based on deep reinforcement learning is provided.
According to the edge computing resource allocation method, the computing resources, particularly the CPU cores, in the edge computing server are reasonably allocated, so that the processing success rate of tasks unloaded to the edge computing server by users is improved.
According to the edge computing resource allocation method, states, actions, rewards, neural network structures, neural network input and output structures, training and application methods and the like can be effectively perceived and decided on an edge computing environment.
According to the edge computing resource allocation method, after the states are decoupled into the states of the queue buffers, the specially designed neural network is input, the action state cost functions of all possible allocation schemes are output, and then the allocation scheme with the highest action state cost function is selected to allocate the CPU core, so that the probability of task success is improved. The neural network architecture shown in fig. 1 is specifically designed and includes the manner of input. The input layer of a neural network, for example, is divided into several independent blocks of the neural network, which are specifically designed for each buffer state.
Drawings
FIG. 1 is a schematic diagram of the DQN network architecture of the present application;
fig. 2 is a schematic diagram of simulation results of the present application.
Detailed Description
Hereinafter, specific embodiments of the present application will be described in detail with reference to the accompanying drawings, and according to these detailed descriptions, those skilled in the art can clearly understand the present application and can practice the present application. Features from various embodiments may be combined to obtain new implementations or to replace certain features from certain embodiments to obtain other preferred implementations without departing from the principles of the present application.
1-2, the present application provides a method for allocating edge computing resources, the method comprising:
1): defining edge computing model states, actions, and rewards;
2): analyzing and utilizing the state, the action and the rewards defined in the 1) to define the structure of the neural network and the structure of input and output;
3): updating, training and applying the neural network defined in the step 2) according to a given training method.
Further, the edge computation model state, action, and prize definition process in 1) is as follows:
in the kth frame, the environment observed by the agent is x (k) =[d (k) ,w (k) ,q (k) ,η (k) ]And define x (k) Is an observation of the environment at the kth frame, and is not a state, where each element has the meaning:
a vector formed by the size of the task data at the head of the cache queue;
a vector consisting of the waiting time of the task at the head of the cache queue;
a vector consisting of the length of each cache queue;
a vector consisting of the signal-to-noise ratios of the channels.
Further, the environment observed by the agent comprises the current state of the previous W frame and the current frame, namely s k =[x (k-W) ,x (k-W+1) ,...,x (k) ]. In view of the correlation of the channel, each decision is not only related to the currently observed environment, but also to the observations of several frames before,
further, if the policy adopted by the agent is recorded as pi, then a k =π(s k ) The method comprises the steps of carrying out a first treatment on the surface of the In the edge computing model, the edge computing server has C CPU cores, N users and terminals, and the total of the N users and terminals is obtained by using a partition method in permutation and combinationDifferent allocation schemes, thus action a k There may be->And (5) seed selection. Each action a k Is an N-dimensional vector, each action in the vector being the number of CPU cores allocated to the corresponding task.
Further, the rewards and penalties are set to a maximum tolerable delay d r =αT f And maximum tolerable error probability ε max 。
Further, the rewards and penalties include three scenarios:
1) Task latency<d r Probability of error<ε max The method comprises the steps of carrying out a first treatment on the surface of the At this time, the task is successfully completed, and a prize R= +1 is obtained;
2) Task latency<d r Probability of error>ε max The method comprises the steps of carrying out a first treatment on the surface of the At this time, although the task is processed, the error probability is too high, and the obtained prize R= -1 is punished;
3) Task latency>d r At this time, the task waits too long in the buffer queue, is not processed, and obtains a prize r=-1.5, i.e. penalized.
Further, the neural network in the 2) is a multi-layer fully connected neural network, each output node corresponds to an output scalar, the ReLU function is selected except for the activation function of the output layer, and the activation function is not used by the output layer.
Further, the neural network parameter of the agent is θ k In this case, the action state value function is recorded asAt this time, the strategy of the agent is pi (s k ;θ k ) The method comprises the steps of carrying out a first treatment on the surface of the For simplicity of description, Q is used when ambiguity is not caused k (s k ,a k ) Representation->The Q function is estimated by a neural network, and the network is called Q-network.
Further, the updating method in 3) includes an empirical replay method, wherein samples of the interaction between the agent and the environment are stored in a memory bank, and a batch of samples are randomly selected from the samples in the training process to update and iterate the network.
Further, the update method adopts a fixed Q-target method when updating, and the training process requires two Q-networks: q-policy network Q policy And Q-target network Q target The parameters are respectivelyAnd theta target The method comprises the steps of carrying out a first treatment on the surface of the At the time of iteration, Q target The frequency of variation of (2) is far lower than Q policy 。
2) In the above, the structure of the neural network and the input/output structure are as follows
In this algorithm, we use the deep neural network structure shown in figure 1. In the figure, the left side is input, the right side is output, the left side two-layer cube represents a multi-layer fully connected neural network (DNN), the rightmost small cube represents output nodes, each node corresponds to an output scalar, and functions are activated except the output layersThe ReLU function is selected and the output layer does not use an activation function. After normalization of the state vectors, the state vectors are recombined in the input structure shown in the figure, i.e. each group of inputs isA state corresponding to buffer n; the network will firstly read the state of each buffer, then the read characteristics are collected and then further analyzed by the next part of neural network, and finally the output is +.>The action state value of the |a| different actions are expressed respectively.
3) The training method and the application process are as follows:
in the course of the neural network update we introduced a method that we introduced a fixed Q-target and empirical replay.
With the fixed Q-target approach, the training process requires two Q-networks: q-policy network Q policy And Q-target network Q target The parameters are respectivelyAnd theta target . So that the network to be updated and the calculated target value y can be updated i Network separation of (c), calculate y i When using the Q-target network, the updated network is the Q-policy network. At the time of iteration, Q target The frequency of variation of (2) is far lower than Q policy Specifically, every iteration M turns, let +.>To update one time Q target . The loss function may be defined as:
wherein the method comprises the steps of
And (3) storing samples interacted by the intelligent agent and the environment into a memory bank by adopting an experience replay method, and randomly selecting a batch of samples from the samples in the training process to update and iterate the network. Let us note e k =(s k ,a k ,R k ,s k+1 ) For one Transition, all that is stored in memory bank D is the transitions, i.e., d= { e 1 ,e 2 ,...,e k ,...}。
The specific update algorithm is shown as algorithm 1 (edge computing resource allocation algorithm based on deep reinforcement learning), where a widely used e-greedy exploration strategy is also used.
After the neural network training is finished, when in application, at each frame of the edge computing environment, making a decision by using the neural network, wherein the input and output methods are as shown in the step 2), and the actions selected during application are as follows
Examples
In this section, we consider a specific edge calculation model as follows:
the edge computing model is a scenario of a mobile edge computing node (MEC node) with C CPU cores, N buffers, and N users, N terminals. In this scenario, the user sends tasks to the edge computing node, which allocates CPU cores for it and processes the tasks, and then sends the processing results to the terminal. Each frame of the model is divided into T f Each symbol time t s One frame can be divided into three phases again. Taking the kth frame as an example, in the first stage, user U n (n.epsilon {1,2,.,. N }) send MEC nodes a size ofIs occupied in time>The length of each symbol, after the task is sent to the MEC node, the node stores the task in a Buffer queue Buffer n; in the second stage, the MEC node distributes the computing resources of the C CPU cores to users, so that the users of the computing resources can process the first task positioned at the head of the queue in the corresponding buffer, and the processing time length occupies +.>A number of symbols; in the third stage, after the task has been processed, the MEC node sends the calculation result to the terminal T corresponding to the user n Occupy->And a symbol.
In the downlink, the channel gain is seen at each frame timeModeling the channel gain as +.>Wherein ρ is DL,n As the channel correlation coefficient(s),is a complex gaussian random variable subject to zero mean unit variance. Thus, the downlink Signal-to-Noise Ratio (SNR) can be expressed as +.>Wherein->Is the average signal to noise ratio.
Due to T f 、m DL,n With limited length, we consider a limited blockDecoding error probability problem in length coding, assuming that the error probability isThe following formula can be obtained:
wherein the method comprises the steps ofTherefore, the decoding error probability is deduced to be
In moving edge computation, the computation result is typically smaller than the uploaded data, so we assume in this model that for a certain task D DL,n =βD UL,n Wherein β is a positive number less than 1.
In the second stage, the edge computing node is moved to the user U at the kth frame n DispensingThe number of CPU cores is an integer->Needs to meet->At the same time can obtain the treatment time as
Where L represents how many CPU cycles are needed per bit task, f 0 Representing the frequency of each CPU core, ceil (x) represents the integer rounded up closest to x. Therefore, in order to ensure the synchronization of frames, inT f Given the time, the third stage availability time can be calculated as
In deep reinforcement learning, the edge computation model is an environment with which an agent interacts, which can be modeled as a Markov decision process, and reinforcement learning algorithms solve the maximized success rate problem. In this algorithm, the agent changes the environment by deciding on the allocation policy of the computing resources, i.e., the CPU cores, and the environment determines rewards or penalties by task success or failure.
In the kth frame, the environment observed by the agent is x (k) =[d (k) ,w (k) ,q (k) ,η (k) ]And define x (k) Observations (not states) of the environment at the kth frame, where each element has the meaning:
a vector formed by the size of the task data at the head of the cache queue;
a vector consisting of the waiting time of the task at the head of the cache queue;
a vector consisting of the length of each cache queue;
a vector consisting of the signal-to-noise ratios of the channels.
However, considering the correlation of the channel, each decision is not only related to the currently observed environment, but also to the observations of several frames before, so we define to include the previous W frame and the current frameIs the current state, i.e. s k =[x (k-W) ,x (k -W+1) ,...,x (k) ]. We mark the policy adopted by the agent as pi, namely a k =π(s k ). In the edge computing model, the edge computing server has C CPU cores, and the number of users and terminals is N, and the total number can be obtained by using a partition method in permutation and combinationDifferent allocation schemes, thus action a k There may be->And (5) seed selection. Each action a k Is an N-dimensional vector, each action in the vector being the number of CPU cores allocated to the corresponding task.
For rewards and penalties we set the maximum tolerable delay d r =αT f And maximum tolerable error probability ε max . We consider three cases:
1) Task latency<d r Probability of error<ε max . At this time, the task is successfully completed, and a prize r= +1 is obtained.
2) Task latency<d r Probability of error>ε max . At this time, although the task is processed, the error probability is too high, and the prize r= -1 is obtained, namely, punishment is performed.
3) Task latency>d r . At this time, the task waits too long in the buffer queue and is not processed, and obtains the prize r= -1.5, namely, is punished.
In calculating the rewards, we use the gamma discount rewards, i.e. the rewards are scored as being at the kth frame
G k =R k +γR k+1 +γ 2 R k+2 +...=R k +γG k+1 .\*MERGEFORMAT(6)
In this algorithm, we use the deep neural network structure shown in figure 1. In the figure, the left side is input, the right side is output, and the left two-layer cube represents multiple layersThe full-connected neural network (DNN) of the layer, the right-most small cube represents output nodes, each node corresponds to an output scalar, the ReLU function is selected except for the activation function of the output layer, and the activation function is not used by the output layer. After normalization of the state vectors, the state vectors are recombined in the input structure shown in the figure, i.e. each group of inputs isA state corresponding to buffer n; the network will firstly read the state of each buffer, then the read characteristics are collected and then further analyzed by the next part of neural network, and finally the output is +.>The action state value of the |a| different actions are expressed respectively.
In the course of the neural network update we introduced a method that we introduced a fixed Q-target and empirical replay. With the fixed Q-network approach, the training process requires two Q-networks: q-policy network Q policy And Q-target network Q target The parameters are respectivelyAnd theta target . So that the network to be updated and the calculated target value y can be updated i Network separation of (c), calculate y i When using the Q-target network, the updated network is the Q-policy network. At the time of iteration, Q target The frequency of variation of (2) is far lower than Q policy Specifically, every iteration M turns, let +.>To update one time Q target . The loss function may be defined as:
wherein the method comprises the steps of
And (3) storing samples interacted by the intelligent agent and the environment into a memory bank by adopting an experience replay method, and randomly selecting a batch of samples from the samples in the training process to update and iterate the network. Let us note e k =(s k ,a k ,R k ,s k+1 ) For one Transition, all that is stored in memory bank D is the transitions, i.e., d= { e 1 ,e 2 ,...,e k ,...}。
The specific updating algorithm is shown as algorithm 1, wherein a widely used epsilon-greedy exploration strategy is also used.
In fig. 2, the abscissa represents the training process set number, and the ordinate represents the task success rate. The two horizontal lines represent the task success rates at the time of average allocation and random allocation, respectively. The trade-off represents the success rate variation of the present invention during training. It can be seen that after training 40-60 rounds, the task success rate of the method of the present invention is higher than the baseline scheme, at which time the neural network can stop training and apply.
Although the present application has been described with reference to particular embodiments, those skilled in the art will appreciate that many modifications are possible in the principles and scope of the disclosure. The scope of the application is to be determined by the appended claims, and it is intended that the claims cover all modifications that are within the literal meaning or range of equivalents of the technical features of the claims.
Claims (4)
1. An edge computing resource allocation method, characterized in that: the method comprises the following steps:
1): defining edge computing model states, actions, and rewards;
2): analyzing and utilizing the state, the action and the rewards defined in the 1) to define the structure of the neural network and the structure of input and output;
3): updating and training the neural network defined in the step 2) according to a given training method and applying the neural network; the policy adopted by the agent is recorded as pi, then a k =π(s k ) The method comprises the steps of carrying out a first treatment on the surface of the In the edge computing model, the edge computing server has C CPU cores, N users and terminals, and the total of the N users and terminals is obtained by using a partition method in permutation and combinationDifferent allocation schemes, thus action a k There may be->Seed selection; the edge computation model state, action, and prize definition process of 1) is as follows:
in the kth frame, the environment observed by the agent is x (k) =[d (k) ,w (k) ,q (k) ,η (k) ]And define x (k) Is an observation of the environment at the kth frame, and is not a state, where each element has the meaning:
a vector formed by the size of the task data at the head of the cache queue;
a vector consisting of the waiting time of the task at the head of the cache queue;
a vector consisting of the length of each cache queue;
a vector formed by the signal-to-noise ratio of each channel;
the rewards and penalties include three things:
1) Task latency<d r Probability of error<ε max The method comprises the steps of carrying out a first treatment on the surface of the At this time, the task is successfully completed, and a prize R= +1 is obtained;
2) Task latency<d r Probability of error>ε max The method comprises the steps of carrying out a first treatment on the surface of the At this time, although the task is processed, the error probability is too high, and the obtained prize R= -1 is punished;
3) Task latency>d r At this time, the task waits too long in the buffer queue and is not processed, and obtains the reward R= -1.5, namely, is punished; the neural network parameter of the intelligent agent is theta k In this case, the action state value function is recorded asAt this time, the strategy of the agent is pi (s k ;θ k ) The method comprises the steps of carrying out a first treatment on the surface of the For simplicity of description, Q is used when ambiguity is not caused k (s k ,a k ) Representation->Estimating the Q function by a neural network, and the network is called Q-network; the updating method in the 3) comprises an experience replay method, wherein samples of the interaction of the intelligent agent and the environment are stored in a memory bank, and a batch of samples are randomly selected from the samples in the training process to update and iterate the network; the updating method adopts a fixed Q-network method when updating, and the training process needs two Q-networks: q-policy network Q policy And Q-target network Q target The parameters are->And theta target The method comprises the steps of carrying out a first treatment on the surface of the At the time of iteration, Q target Frequency of variation of (2)Far below Q policy 。
2. The edge computing resource allocation method of claim 1, wherein: the environment observed by the agent comprises the current state of the previous W frame and the current frame.
3. The edge computing resource allocation method of claim 1, wherein: the rewards and penalties are set to the maximum tolerable delay d r =αT f And maximum tolerable error probability ε max 。
4. The edge computing resource allocation method of claim 1, wherein: the neural network in the step 2) is a multi-layer fully-connected neural network, each output node corresponds to an output scalar, the ReLU function is selected except for the activation function outside the output layer, and the activation function is not used by the output layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010460707.6A CN111813538B (en) | 2020-05-27 | 2020-05-27 | Edge computing resource allocation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010460707.6A CN111813538B (en) | 2020-05-27 | 2020-05-27 | Edge computing resource allocation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111813538A CN111813538A (en) | 2020-10-23 |
CN111813538B true CN111813538B (en) | 2024-03-29 |
Family
ID=72847752
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010460707.6A Active CN111813538B (en) | 2020-05-27 | 2020-05-27 | Edge computing resource allocation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111813538B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114126066B (en) * | 2021-11-27 | 2022-07-19 | 云南大学 | MEC-oriented server resource allocation and address selection joint optimization decision method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019002465A1 (en) * | 2017-06-28 | 2019-01-03 | Deepmind Technologies Limited | Training action selection neural networks using apprenticeship |
CN109976909A (en) * | 2019-03-18 | 2019-07-05 | 中南大学 | Low delay method for scheduling task in edge calculations network based on study |
CN110312231A (en) * | 2019-06-28 | 2019-10-08 | 重庆邮电大学 | Content caching decision and resource allocation joint optimization method based on mobile edge calculations in a kind of car networking |
CN110427261A (en) * | 2019-08-12 | 2019-11-08 | 电子科技大学 | A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10762424B2 (en) * | 2017-09-11 | 2020-09-01 | Sas Institute Inc. | Methods and systems for reinforcement learning |
-
2020
- 2020-05-27 CN CN202010460707.6A patent/CN111813538B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019002465A1 (en) * | 2017-06-28 | 2019-01-03 | Deepmind Technologies Limited | Training action selection neural networks using apprenticeship |
CN109976909A (en) * | 2019-03-18 | 2019-07-05 | 中南大学 | Low delay method for scheduling task in edge calculations network based on study |
CN110312231A (en) * | 2019-06-28 | 2019-10-08 | 重庆邮电大学 | Content caching decision and resource allocation joint optimization method based on mobile edge calculations in a kind of car networking |
CN110427261A (en) * | 2019-08-12 | 2019-11-08 | 电子科技大学 | A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree |
Non-Patent Citations (3)
Title |
---|
刘庆杰 ; 林友勇 ; 李少利 ; .面向智能避障场景的深度强化学习研究.智能物联技术.2018,(02),全文. * |
彭军等 ; .一种车载服务的快速深度Q学习网络边云迁移策略.电子与信息学报.2020,(第01期),全文. * |
谭俊杰 ; 梁应敞 ; .面向智能通信的深度强化学习方法.电子科技大学学报.2020,(第02期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111813538A (en) | 2020-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sun et al. | Adaptive federated learning with gradient compression in uplink NOMA | |
CN111918339B (en) | AR task unloading and resource allocation method based on reinforcement learning in mobile edge network | |
CN111414252B (en) | Task unloading method based on deep reinforcement learning | |
CN111800828B (en) | Mobile edge computing resource allocation method for ultra-dense network | |
CN112512070B (en) | Multi-base-station cooperative wireless network resource allocation method based on graph attention mechanism reinforcement learning | |
CN113543342B (en) | NOMA-MEC-based reinforcement learning resource allocation and task unloading method | |
CN114285853B (en) | Task unloading method based on end edge cloud cooperation in equipment-intensive industrial Internet of things | |
CN113867843B (en) | Mobile edge computing task unloading method based on deep reinforcement learning | |
CN116390125A (en) | Industrial Internet of things cloud edge cooperative unloading and resource allocation method based on DDPG-D3QN | |
CN116456493A (en) | D2D user resource allocation method and storage medium based on deep reinforcement learning algorithm | |
CN113784410A (en) | Heterogeneous wireless network vertical switching method based on reinforcement learning TD3 algorithm | |
CN111813538B (en) | Edge computing resource allocation method | |
CN114885420A (en) | User grouping and resource allocation method and device in NOMA-MEC system | |
CN113626104A (en) | Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture | |
CN113760511A (en) | Vehicle edge calculation task unloading method based on depth certainty strategy | |
CN116321293A (en) | Edge computing unloading and resource allocation method based on multi-agent reinforcement learning | |
CN115134778A (en) | Internet of vehicles calculation unloading method based on multi-user game and federal learning | |
CN114828018A (en) | Multi-user mobile edge computing unloading method based on depth certainty strategy gradient | |
CN113811009A (en) | Multi-base-station cooperative wireless network resource allocation method based on space-time feature extraction reinforcement learning | |
CN117354934A (en) | Double-time-scale task unloading and resource allocation method for multi-time-slot MEC system | |
Yu et al. | Virtual reality in metaverse over wireless networks with user-centered deep reinforcement learning | |
CN116367231A (en) | Edge computing Internet of vehicles resource management joint optimization method based on DDPG algorithm | |
KR102409974B1 (en) | Method and system for optimizing source, channel code rate and power control based on artificial neural network | |
CN116367190A (en) | Digital twin function virtualization method for 6G mobile network | |
CN116112488A (en) | Fine-grained task unloading and resource allocation method for MEC network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |