CN111813538B - Edge computing resource allocation method - Google Patents

Edge computing resource allocation method Download PDF

Info

Publication number
CN111813538B
CN111813538B CN202010460707.6A CN202010460707A CN111813538B CN 111813538 B CN111813538 B CN 111813538B CN 202010460707 A CN202010460707 A CN 202010460707A CN 111813538 B CN111813538 B CN 111813538B
Authority
CN
China
Prior art keywords
edge computing
network
task
neural network
resource allocation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010460707.6A
Other languages
Chinese (zh)
Other versions
CN111813538A (en
Inventor
袁新杰
杜清河
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010460707.6A priority Critical patent/CN111813538B/en
Publication of CN111813538A publication Critical patent/CN111813538A/en
Application granted granted Critical
Publication of CN111813538B publication Critical patent/CN111813538B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application belongs to the technical field of resource allocation strategies, and particularly relates to an edge computing resource allocation method. With respect to how efficiently edge computing resources are allocated, a mobile edge computing environment can be modeled as a Markov decision process, and the model is highly random and very complex. A method of edge computing resource allocation, the method comprising: 1): defining edge computing model states, actions, and rewards; 2): analyzing and utilizing the state, the action and the rewards defined in the 1) to define the structure of the neural network and the structure of input and output; 3): updating, training and applying the neural network defined in the step 2) according to a given training method. The deep reinforcement learning technology in machine learning is very suitable for being applied to the model due to the strong perceptibility and decision capability of the deep reinforcement learning technology to the environment so as to solve the problem of maximum return.

Description

Edge computing resource allocation method
Technical Field
The application belongs to the technical field of resource allocation strategies, and particularly relates to an edge computing resource allocation method.
Background
With the rise of the fifth generation mobile communication technology (5G), mobile applications and services also put higher demands on time delay, and the conventional cloud computing mode can no longer completely adapt to the low-time-delay requirement, so that the mobile edge computing (mobileedge computing) technology is generated.
In the traditional cloud computing mode, a user uploads a computation-intensive task to a cloud server for processing through a core network, and although the computing resources of the cloud server are sufficient, the computation can be completed in a short time, but the transmission delay is larger due to factors such as limited bandwidth of the core network, network jitter and the like. In order to reduce transmission delay, the mobile edge computing mode configures computing resources beside a network edge, such as a wireless access point of a base station and the like, and a user only needs to offload tasks to an edge computing server for processing, so that larger transmission delay caused when data is transmitted through a core network can be avoided, and meanwhile, the advantages of saving the bandwidth of the core network, facilitating an operator to provide personalized service strategies according to different places, further guaranteeing privacy safety and the like are brought. However, since computing resources at the network edge are very limited relative to the cloud server, how to effectively allocate and utilize the edge computing resources becomes one of the keys of the mobile edge computing technology.
With respect to how efficiently edge computing resources are allocated, a mobile edge computing environment can be modeled as a Markov decision process, and the model is highly random and very complex.
Disclosure of Invention
1. Technical problem to be solved
Based on how to allocate edge computing resources effectively, the mobile edge computing environment can be modeled as a markov decision process, and the model is highly random and very complex, and the application provides an edge computing resource allocation method facing 5G communication requirements.
2. Technical proposal
To achieve the above object, the present application provides a method for allocating edge computing resources, the method including:
1): defining edge computing model states, actions, and rewards;
2): defining the structure of a neural network and the structure of input and output;
3): the neural network is updated, trained and applied according to a given training method.
Another embodiment provided herein is: the edge computation model state, action, and prize definition process of 1) is as follows:
in the kth frame, the environment observed by the agent is x (k) =[d (k) ,w (k) ,q (k)(k) ]And define x (k) Is an observation of the environment at the kth frame, and is not a state, where each element has the meaning:
a vector formed by the size of the task data at the head of the cache queue;
a vector consisting of the waiting time of the task at the head of the cache queue;
a vector consisting of the length of each cache queue;
a vector consisting of the signal-to-noise ratios of the channels.
Another embodiment provided herein is: the environment observed by the agent comprises the current state of the previous W frame and the current frame.
Another embodiment provided herein is: the policy adopted by the agent is recorded as pi, then a k =π(s k ) The method comprises the steps of carrying out a first treatment on the surface of the In the edge computing model, the edge computing server has C CPU cores, N users and terminals, and the total of the N users and terminals is obtained by using a partition method in permutation and combinationDifferent allocation schemes, thus action a k There may beAnd (5) seed selection.
Another embodiment provided herein is: the rewards and penalties are set to the maximum tolerable delay d r =αT f And maximum tolerable error probability ε max
Another embodiment provided herein is: the rewards and penalties include three things:
1) Task latency<d r Probability of error<ε max The method comprises the steps of carrying out a first treatment on the surface of the At this time, the task is successfully completed, and a prize R= +1 is obtained;
2) Task latency<d r Probability of error>ε max The method comprises the steps of carrying out a first treatment on the surface of the At this time, the task is obtainedProcessing, wherein the error probability is too high, and the prize R= -1 is obtained, namely punishment is carried out;
3) Task latency>d r At this time, the task waits too long in the buffer queue and is not processed, and obtains the prize r= -1.5, namely, is punished.
Another embodiment provided herein is: the neural network in the step 2) is a multi-layer fully-connected neural network, each output node corresponds to an output scalar, the ReLU function is selected except for the activation function outside the output layer, and the activation function is not used by the output layer. The neural network parameter of the intelligent agent is theta k In this case, the action state value function is recorded asAt this time, the strategy of the agent is pi (s k ;θ k ) The method comprises the steps of carrying out a first treatment on the surface of the For simplicity of description, Q is used when ambiguity is not caused k (s k ,a k ) Representation->The Q function is estimated by a neural network, and the network is called Q-network.
Another embodiment provided herein is: the training method in 3) comprises an experience replay method, wherein samples of the interaction of the intelligent agent and the environment are stored in a memory bank, and a batch of samples are randomly selected from the samples in the training process to update and iterate the network. The empirical replay method and the fixed Q target network method are skills in algorithm 1, and are included in algorithm 1.
Another embodiment provided herein is: the training method adopts a fixed Q-network method when updating, and the training process needs two Q-networks: q-policy network Q policy And Q-target network Q target The parameters are respectivelyAnd theta target The method comprises the steps of carrying out a first treatment on the surface of the At the time of iteration, Q target The frequency of variation of (2) is far lower than Q policy
3. Advantageous effects
Compared with the prior art, the edge computing resource allocation method provided by the application has the beneficial effects that:
the edge computing resource allocation method provided by the application is very suitable for being applied to the model due to the strong perceptibility and decision capability of the edge computing resource allocation method to the environment in the machine learning so as to solve the problem of maximum return.
According to the edge computing resource allocation method, in the scene that a single edge computing node, multiple users and multiple terminals are used for network coding with limited block length, the edge computing resource allocation method based on deep reinforcement learning is provided.
According to the edge computing resource allocation method, the computing resources, particularly the CPU cores, in the edge computing server are reasonably allocated, so that the processing success rate of tasks unloaded to the edge computing server by users is improved.
According to the edge computing resource allocation method, states, actions, rewards, neural network structures, neural network input and output structures, training and application methods and the like can be effectively perceived and decided on an edge computing environment.
According to the edge computing resource allocation method, after the states are decoupled into the states of the queue buffers, the specially designed neural network is input, the action state cost functions of all possible allocation schemes are output, and then the allocation scheme with the highest action state cost function is selected to allocate the CPU core, so that the probability of task success is improved. The neural network architecture shown in fig. 1 is specifically designed and includes the manner of input. The input layer of a neural network, for example, is divided into several independent blocks of the neural network, which are specifically designed for each buffer state.
Drawings
FIG. 1 is a schematic diagram of the DQN network architecture of the present application;
fig. 2 is a schematic diagram of simulation results of the present application.
Detailed Description
Hereinafter, specific embodiments of the present application will be described in detail with reference to the accompanying drawings, and according to these detailed descriptions, those skilled in the art can clearly understand the present application and can practice the present application. Features from various embodiments may be combined to obtain new implementations or to replace certain features from certain embodiments to obtain other preferred implementations without departing from the principles of the present application.
1-2, the present application provides a method for allocating edge computing resources, the method comprising:
1): defining edge computing model states, actions, and rewards;
2): analyzing and utilizing the state, the action and the rewards defined in the 1) to define the structure of the neural network and the structure of input and output;
3): updating, training and applying the neural network defined in the step 2) according to a given training method.
Further, the edge computation model state, action, and prize definition process in 1) is as follows:
in the kth frame, the environment observed by the agent is x (k) =[d (k) ,w (k) ,q (k)(k) ]And define x (k) Is an observation of the environment at the kth frame, and is not a state, where each element has the meaning:
a vector formed by the size of the task data at the head of the cache queue;
a vector consisting of the waiting time of the task at the head of the cache queue;
a vector consisting of the length of each cache queue;
a vector consisting of the signal-to-noise ratios of the channels.
Further, the environment observed by the agent comprises the current state of the previous W frame and the current frame, namely s k =[x (k-W) ,x (k-W+1) ,...,x (k) ]. In view of the correlation of the channel, each decision is not only related to the currently observed environment, but also to the observations of several frames before,
further, if the policy adopted by the agent is recorded as pi, then a k =π(s k ) The method comprises the steps of carrying out a first treatment on the surface of the In the edge computing model, the edge computing server has C CPU cores, N users and terminals, and the total of the N users and terminals is obtained by using a partition method in permutation and combinationDifferent allocation schemes, thus action a k There may be->And (5) seed selection. Each action a k Is an N-dimensional vector, each action in the vector being the number of CPU cores allocated to the corresponding task.
Further, the rewards and penalties are set to a maximum tolerable delay d r =αT f And maximum tolerable error probability ε max
Further, the rewards and penalties include three scenarios:
1) Task latency<d r Probability of error<ε max The method comprises the steps of carrying out a first treatment on the surface of the At this time, the task is successfully completed, and a prize R= +1 is obtained;
2) Task latency<d r Probability of error>ε max The method comprises the steps of carrying out a first treatment on the surface of the At this time, although the task is processed, the error probability is too high, and the obtained prize R= -1 is punished;
3) Task latency>d r At this time, the task waits too long in the buffer queue, is not processed, and obtains a prize r=-1.5, i.e. penalized.
Further, the neural network in the 2) is a multi-layer fully connected neural network, each output node corresponds to an output scalar, the ReLU function is selected except for the activation function of the output layer, and the activation function is not used by the output layer.
Further, the neural network parameter of the agent is θ k In this case, the action state value function is recorded asAt this time, the strategy of the agent is pi (s k ;θ k ) The method comprises the steps of carrying out a first treatment on the surface of the For simplicity of description, Q is used when ambiguity is not caused k (s k ,a k ) Representation->The Q function is estimated by a neural network, and the network is called Q-network.
Further, the updating method in 3) includes an empirical replay method, wherein samples of the interaction between the agent and the environment are stored in a memory bank, and a batch of samples are randomly selected from the samples in the training process to update and iterate the network.
Further, the update method adopts a fixed Q-target method when updating, and the training process requires two Q-networks: q-policy network Q policy And Q-target network Q target The parameters are respectivelyAnd theta target The method comprises the steps of carrying out a first treatment on the surface of the At the time of iteration, Q target The frequency of variation of (2) is far lower than Q policy
2) In the above, the structure of the neural network and the input/output structure are as follows
In this algorithm, we use the deep neural network structure shown in figure 1. In the figure, the left side is input, the right side is output, the left side two-layer cube represents a multi-layer fully connected neural network (DNN), the rightmost small cube represents output nodes, each node corresponds to an output scalar, and functions are activated except the output layersThe ReLU function is selected and the output layer does not use an activation function. After normalization of the state vectors, the state vectors are recombined in the input structure shown in the figure, i.e. each group of inputs isA state corresponding to buffer n; the network will firstly read the state of each buffer, then the read characteristics are collected and then further analyzed by the next part of neural network, and finally the output is +.>The action state value of the |a| different actions are expressed respectively.
3) The training method and the application process are as follows:
in the course of the neural network update we introduced a method that we introduced a fixed Q-target and empirical replay.
With the fixed Q-target approach, the training process requires two Q-networks: q-policy network Q policy And Q-target network Q target The parameters are respectivelyAnd theta target . So that the network to be updated and the calculated target value y can be updated i Network separation of (c), calculate y i When using the Q-target network, the updated network is the Q-policy network. At the time of iteration, Q target The frequency of variation of (2) is far lower than Q policy Specifically, every iteration M turns, let +.>To update one time Q target . The loss function may be defined as:
wherein the method comprises the steps of
And (3) storing samples interacted by the intelligent agent and the environment into a memory bank by adopting an experience replay method, and randomly selecting a batch of samples from the samples in the training process to update and iterate the network. Let us note e k =(s k ,a k ,R k ,s k+1 ) For one Transition, all that is stored in memory bank D is the transitions, i.e., d= { e 1 ,e 2 ,...,e k ,...}。
The specific update algorithm is shown as algorithm 1 (edge computing resource allocation algorithm based on deep reinforcement learning), where a widely used e-greedy exploration strategy is also used.
After the neural network training is finished, when in application, at each frame of the edge computing environment, making a decision by using the neural network, wherein the input and output methods are as shown in the step 2), and the actions selected during application are as follows
Examples
In this section, we consider a specific edge calculation model as follows:
the edge computing model is a scenario of a mobile edge computing node (MEC node) with C CPU cores, N buffers, and N users, N terminals. In this scenario, the user sends tasks to the edge computing node, which allocates CPU cores for it and processes the tasks, and then sends the processing results to the terminal. Each frame of the model is divided into T f Each symbol time t s One frame can be divided into three phases again. Taking the kth frame as an example, in the first stage, user U n (n.epsilon {1,2,.,. N }) send MEC nodes a size ofIs occupied in time>The length of each symbol, after the task is sent to the MEC node, the node stores the task in a Buffer queue Buffer n; in the second stage, the MEC node distributes the computing resources of the C CPU cores to users, so that the users of the computing resources can process the first task positioned at the head of the queue in the corresponding buffer, and the processing time length occupies +.>A number of symbols; in the third stage, after the task has been processed, the MEC node sends the calculation result to the terminal T corresponding to the user n Occupy->And a symbol.
In the downlink, the channel gain is seen at each frame timeModeling the channel gain as +.>Wherein ρ is DL,n As the channel correlation coefficient(s),is a complex gaussian random variable subject to zero mean unit variance. Thus, the downlink Signal-to-Noise Ratio (SNR) can be expressed as +.>Wherein->Is the average signal to noise ratio.
Due to T f 、m DL,n With limited length, we consider a limited blockDecoding error probability problem in length coding, assuming that the error probability isThe following formula can be obtained:
wherein the method comprises the steps ofTherefore, the decoding error probability is deduced to be
In moving edge computation, the computation result is typically smaller than the uploaded data, so we assume in this model that for a certain task D DL,n =βD UL,n Wherein β is a positive number less than 1.
In the second stage, the edge computing node is moved to the user U at the kth frame n DispensingThe number of CPU cores is an integer->Needs to meet->At the same time can obtain the treatment time as
Where L represents how many CPU cycles are needed per bit task, f 0 Representing the frequency of each CPU core, ceil (x) represents the integer rounded up closest to x. Therefore, in order to ensure the synchronization of frames, inT f Given the time, the third stage availability time can be calculated as
In deep reinforcement learning, the edge computation model is an environment with which an agent interacts, which can be modeled as a Markov decision process, and reinforcement learning algorithms solve the maximized success rate problem. In this algorithm, the agent changes the environment by deciding on the allocation policy of the computing resources, i.e., the CPU cores, and the environment determines rewards or penalties by task success or failure.
In the kth frame, the environment observed by the agent is x (k) =[d (k) ,w (k) ,q (k)(k) ]And define x (k) Observations (not states) of the environment at the kth frame, where each element has the meaning:
a vector formed by the size of the task data at the head of the cache queue;
a vector consisting of the waiting time of the task at the head of the cache queue;
a vector consisting of the length of each cache queue;
a vector consisting of the signal-to-noise ratios of the channels.
However, considering the correlation of the channel, each decision is not only related to the currently observed environment, but also to the observations of several frames before, so we define to include the previous W frame and the current frameIs the current state, i.e. s k =[x (k-W) ,x (k -W+1) ,...,x (k) ]. We mark the policy adopted by the agent as pi, namely a k =π(s k ). In the edge computing model, the edge computing server has C CPU cores, and the number of users and terminals is N, and the total number can be obtained by using a partition method in permutation and combinationDifferent allocation schemes, thus action a k There may be->And (5) seed selection. Each action a k Is an N-dimensional vector, each action in the vector being the number of CPU cores allocated to the corresponding task.
For rewards and penalties we set the maximum tolerable delay d r =αT f And maximum tolerable error probability ε max . We consider three cases:
1) Task latency<d r Probability of error<ε max . At this time, the task is successfully completed, and a prize r= +1 is obtained.
2) Task latency<d r Probability of error>ε max . At this time, although the task is processed, the error probability is too high, and the prize r= -1 is obtained, namely, punishment is performed.
3) Task latency>d r . At this time, the task waits too long in the buffer queue and is not processed, and obtains the prize r= -1.5, namely, is punished.
In calculating the rewards, we use the gamma discount rewards, i.e. the rewards are scored as being at the kth frame
G k =R k +γR k+12 R k+2 +...=R k +γG k+1 .\*MERGEFORMAT(6)
In this algorithm, we use the deep neural network structure shown in figure 1. In the figure, the left side is input, the right side is output, and the left two-layer cube represents multiple layersThe full-connected neural network (DNN) of the layer, the right-most small cube represents output nodes, each node corresponds to an output scalar, the ReLU function is selected except for the activation function of the output layer, and the activation function is not used by the output layer. After normalization of the state vectors, the state vectors are recombined in the input structure shown in the figure, i.e. each group of inputs isA state corresponding to buffer n; the network will firstly read the state of each buffer, then the read characteristics are collected and then further analyzed by the next part of neural network, and finally the output is +.>The action state value of the |a| different actions are expressed respectively.
In the course of the neural network update we introduced a method that we introduced a fixed Q-target and empirical replay. With the fixed Q-network approach, the training process requires two Q-networks: q-policy network Q policy And Q-target network Q target The parameters are respectivelyAnd theta target . So that the network to be updated and the calculated target value y can be updated i Network separation of (c), calculate y i When using the Q-target network, the updated network is the Q-policy network. At the time of iteration, Q target The frequency of variation of (2) is far lower than Q policy Specifically, every iteration M turns, let +.>To update one time Q target . The loss function may be defined as:
wherein the method comprises the steps of
And (3) storing samples interacted by the intelligent agent and the environment into a memory bank by adopting an experience replay method, and randomly selecting a batch of samples from the samples in the training process to update and iterate the network. Let us note e k =(s k ,a k ,R k ,s k+1 ) For one Transition, all that is stored in memory bank D is the transitions, i.e., d= { e 1 ,e 2 ,...,e k ,...}。
The specific updating algorithm is shown as algorithm 1, wherein a widely used epsilon-greedy exploration strategy is also used.
In fig. 2, the abscissa represents the training process set number, and the ordinate represents the task success rate. The two horizontal lines represent the task success rates at the time of average allocation and random allocation, respectively. The trade-off represents the success rate variation of the present invention during training. It can be seen that after training 40-60 rounds, the task success rate of the method of the present invention is higher than the baseline scheme, at which time the neural network can stop training and apply.
Although the present application has been described with reference to particular embodiments, those skilled in the art will appreciate that many modifications are possible in the principles and scope of the disclosure. The scope of the application is to be determined by the appended claims, and it is intended that the claims cover all modifications that are within the literal meaning or range of equivalents of the technical features of the claims.

Claims (4)

1. An edge computing resource allocation method, characterized in that: the method comprises the following steps:
1): defining edge computing model states, actions, and rewards;
2): analyzing and utilizing the state, the action and the rewards defined in the 1) to define the structure of the neural network and the structure of input and output;
3): updating and training the neural network defined in the step 2) according to a given training method and applying the neural network; the policy adopted by the agent is recorded as pi, then a k =π(s k ) The method comprises the steps of carrying out a first treatment on the surface of the In the edge computing model, the edge computing server has C CPU cores, N users and terminals, and the total of the N users and terminals is obtained by using a partition method in permutation and combinationDifferent allocation schemes, thus action a k There may be->Seed selection; the edge computation model state, action, and prize definition process of 1) is as follows:
in the kth frame, the environment observed by the agent is x (k) =[d (k) ,w (k) ,q (k)(k) ]And define x (k) Is an observation of the environment at the kth frame, and is not a state, where each element has the meaning:
a vector formed by the size of the task data at the head of the cache queue;
a vector consisting of the waiting time of the task at the head of the cache queue;
a vector consisting of the length of each cache queue;
a vector formed by the signal-to-noise ratio of each channel;
the rewards and penalties include three things:
1) Task latency<d r Probability of error<ε max The method comprises the steps of carrying out a first treatment on the surface of the At this time, the task is successfully completed, and a prize R= +1 is obtained;
2) Task latency<d r Probability of error>ε max The method comprises the steps of carrying out a first treatment on the surface of the At this time, although the task is processed, the error probability is too high, and the obtained prize R= -1 is punished;
3) Task latency>d r At this time, the task waits too long in the buffer queue and is not processed, and obtains the reward R= -1.5, namely, is punished; the neural network parameter of the intelligent agent is theta k In this case, the action state value function is recorded asAt this time, the strategy of the agent is pi (s k ;θ k ) The method comprises the steps of carrying out a first treatment on the surface of the For simplicity of description, Q is used when ambiguity is not caused k (s k ,a k ) Representation->Estimating the Q function by a neural network, and the network is called Q-network; the updating method in the 3) comprises an experience replay method, wherein samples of the interaction of the intelligent agent and the environment are stored in a memory bank, and a batch of samples are randomly selected from the samples in the training process to update and iterate the network; the updating method adopts a fixed Q-network method when updating, and the training process needs two Q-networks: q-policy network Q policy And Q-target network Q target The parameters are->And theta target The method comprises the steps of carrying out a first treatment on the surface of the At the time of iteration, Q target Frequency of variation of (2)Far below Q policy
2. The edge computing resource allocation method of claim 1, wherein: the environment observed by the agent comprises the current state of the previous W frame and the current frame.
3. The edge computing resource allocation method of claim 1, wherein: the rewards and penalties are set to the maximum tolerable delay d r =αT f And maximum tolerable error probability ε max
4. The edge computing resource allocation method of claim 1, wherein: the neural network in the step 2) is a multi-layer fully-connected neural network, each output node corresponds to an output scalar, the ReLU function is selected except for the activation function outside the output layer, and the activation function is not used by the output layer.
CN202010460707.6A 2020-05-27 2020-05-27 Edge computing resource allocation method Active CN111813538B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010460707.6A CN111813538B (en) 2020-05-27 2020-05-27 Edge computing resource allocation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010460707.6A CN111813538B (en) 2020-05-27 2020-05-27 Edge computing resource allocation method

Publications (2)

Publication Number Publication Date
CN111813538A CN111813538A (en) 2020-10-23
CN111813538B true CN111813538B (en) 2024-03-29

Family

ID=72847752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010460707.6A Active CN111813538B (en) 2020-05-27 2020-05-27 Edge computing resource allocation method

Country Status (1)

Country Link
CN (1) CN111813538B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114126066B (en) * 2021-11-27 2022-07-19 云南大学 MEC-oriented server resource allocation and address selection joint optimization decision method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019002465A1 (en) * 2017-06-28 2019-01-03 Deepmind Technologies Limited Training action selection neural networks using apprenticeship
CN109976909A (en) * 2019-03-18 2019-07-05 中南大学 Low delay method for scheduling task in edge calculations network based on study
CN110312231A (en) * 2019-06-28 2019-10-08 重庆邮电大学 Content caching decision and resource allocation joint optimization method based on mobile edge calculations in a kind of car networking
CN110427261A (en) * 2019-08-12 2019-11-08 电子科技大学 A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10762424B2 (en) * 2017-09-11 2020-09-01 Sas Institute Inc. Methods and systems for reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019002465A1 (en) * 2017-06-28 2019-01-03 Deepmind Technologies Limited Training action selection neural networks using apprenticeship
CN109976909A (en) * 2019-03-18 2019-07-05 中南大学 Low delay method for scheduling task in edge calculations network based on study
CN110312231A (en) * 2019-06-28 2019-10-08 重庆邮电大学 Content caching decision and resource allocation joint optimization method based on mobile edge calculations in a kind of car networking
CN110427261A (en) * 2019-08-12 2019-11-08 电子科技大学 A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘庆杰 ; 林友勇 ; 李少利 ; .面向智能避障场景的深度强化学习研究.智能物联技术.2018,(02),全文. *
彭军等 ; .一种车载服务的快速深度Q学习网络边云迁移策略.电子与信息学报.2020,(第01期),全文. *
谭俊杰 ; 梁应敞 ; .面向智能通信的深度强化学习方法.电子科技大学学报.2020,(第02期),全文. *

Also Published As

Publication number Publication date
CN111813538A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
Sun et al. Adaptive federated learning with gradient compression in uplink NOMA
CN111918339B (en) AR task unloading and resource allocation method based on reinforcement learning in mobile edge network
CN111414252B (en) Task unloading method based on deep reinforcement learning
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
CN112512070B (en) Multi-base-station cooperative wireless network resource allocation method based on graph attention mechanism reinforcement learning
CN113543342B (en) NOMA-MEC-based reinforcement learning resource allocation and task unloading method
CN114285853B (en) Task unloading method based on end edge cloud cooperation in equipment-intensive industrial Internet of things
CN113867843B (en) Mobile edge computing task unloading method based on deep reinforcement learning
CN116390125A (en) Industrial Internet of things cloud edge cooperative unloading and resource allocation method based on DDPG-D3QN
CN116456493A (en) D2D user resource allocation method and storage medium based on deep reinforcement learning algorithm
CN113784410A (en) Heterogeneous wireless network vertical switching method based on reinforcement learning TD3 algorithm
CN111813538B (en) Edge computing resource allocation method
CN114885420A (en) User grouping and resource allocation method and device in NOMA-MEC system
CN113626104A (en) Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture
CN113760511A (en) Vehicle edge calculation task unloading method based on depth certainty strategy
CN116321293A (en) Edge computing unloading and resource allocation method based on multi-agent reinforcement learning
CN115134778A (en) Internet of vehicles calculation unloading method based on multi-user game and federal learning
CN114828018A (en) Multi-user mobile edge computing unloading method based on depth certainty strategy gradient
CN113811009A (en) Multi-base-station cooperative wireless network resource allocation method based on space-time feature extraction reinforcement learning
CN117354934A (en) Double-time-scale task unloading and resource allocation method for multi-time-slot MEC system
Yu et al. Virtual reality in metaverse over wireless networks with user-centered deep reinforcement learning
CN116367231A (en) Edge computing Internet of vehicles resource management joint optimization method based on DDPG algorithm
KR102409974B1 (en) Method and system for optimizing source, channel code rate and power control based on artificial neural network
CN116367190A (en) Digital twin function virtualization method for 6G mobile network
CN116112488A (en) Fine-grained task unloading and resource allocation method for MEC network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant