CN108632860B - Mobile edge calculation rate maximization method based on deep reinforcement learning - Google Patents

Mobile edge calculation rate maximization method based on deep reinforcement learning Download PDF

Info

Publication number
CN108632860B
CN108632860B CN201810342359.5A CN201810342359A CN108632860B CN 108632860 B CN108632860 B CN 108632860B CN 201810342359 A CN201810342359 A CN 201810342359A CN 108632860 B CN108632860 B CN 108632860B
Authority
CN
China
Prior art keywords
wireless device
wireless devices
energy
reinforcement learning
base station
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810342359.5A
Other languages
Chinese (zh)
Other versions
CN108632860A (en
Inventor
黄亮
冯旭
钱丽萍
吴远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810342359.5A priority Critical patent/CN108632860B/en
Publication of CN108632860A publication Critical patent/CN108632860A/en
Application granted granted Critical
Publication of CN108632860B publication Critical patent/CN108632860B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/06Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

A moving edge calculation rate maximization method based on deep reinforcement learning comprises the following steps: 1) in an edge computing system comprising a base station and a plurality of wireless devices and powered wirelessly, computing the sum of the rates of all wireless devices in the system given a mode selection; 2) finding an optimal mode selection, i.e. mode selection M of all wireless devices, by a reinforcement learning algorithm0And M1(ii) a 3) Mode selection M for all wireless devices0And M1System state x as reinforcement learningtAction a is to system state xtIf the total calculation rate of the modified system is greater than before, then the current prize r (x) is awardedtA) is set to a positive value, otherwise to a negative value, and the system enters the next state xt+1This iterative process is repeated until the best mode selection M is obtained0And M1. The invention maximizes the total calculation rate of all wireless devices on the premise of ensuring user experience.

Description

Mobile edge calculation rate maximization method based on deep reinforcement learning
Technical Field
The invention belongs to the field of communication, and particularly relates to a communication system for mobile edge calculation and a mobile edge calculation rate maximization method based on deep reinforcement learning.
Background
The recent development of internet of things technology is a key step towards real intelligence and autonomous control, and is particularly prominent in many important industrial and commercial systems. In an internet of things network, a large number of Wireless Devices (WDs) capable of communication and computing are deployed, and due to device size limitations and manufacturing cost considerations, internet of things devices (e.g., sensors) often carry batteries with limited capacity and energy-efficient low-performance processors, and therefore, the limited device lifetime and low computing power cannot support more and more sustainable new applications that require high-performance computing, such as autopilot and augmented reality. Deployment of wireless energy Transfer Systems (WPTs) can solve the two aforementioned performance problems, but frequent device battery failures not only disrupt normal personal wireless device operation but can also significantly degrade overall network performance, e.g., sensing accuracy in wireless sensor networks. Conventional wireless systems require frequent manual battery replacement, which is expensive and inconvenient, and due to severe battery capacity limitations, minimizing power consumption and extending the operational life of the wireless device is a critical design in battery-powered wireless systems. Each energy harvesting wireless device follows a binary computation offload policy, i.e., the data set for one task may be performed locally or by remote server offload. In order to maximize the total computation rate of all wireless devices, it is necessary to find the optimal individual computation mode selection.
Disclosure of Invention
In order to overcome the defect that the sum calculation rate of the existing wireless energy transmission system is low, in order to maximize the sum calculation rate of all wireless devices and find the optimal individual calculation mode selection and system transmission time allocation, the invention provides a mobile edge calculation rate maximization method based on deep reinforcement learning, and the sum calculation rate of all wireless devices is maximized on the premise of ensuring user experience.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for maximizing a moving edge computation rate based on deep reinforcement learning, the method comprising the following steps:
1) in an edge computing system powered wirelessly by a base station and a plurality of wireless devices, the base station and each wireless device having a separate antenna; the radio frequency energy emitter and the edge calculation server are integrated in the base station, and the base station is assumed to have a stable energy supply and can broadcast radio frequency energy to all wireless devices; each wireless device has an energy harvesting circuit and a rechargeable battery to perform some task by storing harvested energy; in this wireless communication system, each wireless device needs to establish contact with a base station, and the channel gain h between the wireless device i and the base stationiThe calculation is as follows:
Figure GDA0002998777310000021
wherein, each parameter is defined as follows:
Ad: antenna gain;
pi: a circumferential ratio;
fc: a carrier frequency;
di: distance between wireless device i and base station;
de: a path loss exponent;
2) assuming that the computing tasks of each wireless device can be executed on a local low-performance microprocessor or offloaded to an edge computing server with more powerful processing power, it will process the computing tasks and then send the results back to the wireless device; suppose a wireless device employs a binary computation offload rule, i.e., a wireless device must choose either a local computation mode or an offload mode; using two non-overlapping sets
Figure GDA0002998777310000031
And
Figure GDA0002998777310000032
all wireless devices, all sets of wireless devices, representing local compute mode and offload mode, respectively
Figure GDA0002998777310000033
Expressed as:
Figure GDA0002998777310000034
3) in a collection
Figure GDA0002998777310000035
The wireless device in (1) is able to collect energy and process local tasks simultaneously while in the aggregate
Figure GDA0002998777310000036
The wireless device in the system can only shunt the task to the base station for processing after collecting energy, and under the condition that the computing capacity and transmission capacity of the base station are much stronger than those of the energy collection wireless device, the wireless device exhausts the energy collected by the wireless device in the task shunting process; the compute rate sum maximization problem for all wireless devices is described as:
Figure GDA0002998777310000037
the constraint conditions are as follows:
Figure GDA0002998777310000038
Figure GDA0002998777310000039
Figure GDA00029987773100000310
in the formula:
Figure GDA00029987773100000311
Figure GDA00029987773100000312
Figure GDA00029987773100000313
wherein, each parameter is defined as follows:
ωi: a transition weight for the ith wireless device;
μ: an energy collection efficiency;
p: radio frequency energy transmission power;
phi: the number of calculation cycles required to process each bit of data;
hi: channel gain of the ith wireless device;
ki: an energy efficiency coefficient for the ith wireless device;
a: a time coefficient;
vμ: conversion efficiency;
b: a bandwidth;
τj: a time coefficient for the jth wireless device;
N0: the number of wireless devices in the local processing mode;
4) finding an optimal mode selection, i.e. mode selection of all wireless devices, by a reinforcement learning algorithm
Figure GDA0002998777310000041
And
Figure GDA0002998777310000042
the reinforcement learning system is composed of an agent and an environment. Mode selection for all users
Figure GDA0002998777310000043
And
Figure GDA0002998777310000044
are all programmed into the current state x of the systemtThe agent takes action a in the current state to enter the next state xt+1While receiving the reward r (x) returned by the environmenttA), mode selection under constant interactive update of agent and environment
Figure GDA0002998777310000045
And
Figure GDA0002998777310000046
will be optimized continuously until finding the optimum, the update mode of the agent is:
Qθ(xt,a)=r(xt,a)+γmaxQθ′(xt+1,a′) (4)
wherein, each parameter is defined as follows:
θ: evaluating a parameter in the network;
theta': parameters in the target network;
xt: at time t, the system is in the state;
Qθ(xta): in state xtTaking the Q value obtained by the action a;
r(xta): in state xtThe reward resulting from taking action a;
γ: rewarding the attenuated specific gravity;
5) mode selection for all wireless devices
Figure GDA0002998777310000047
And
Figure GDA0002998777310000048
system state x as deep reinforcement learningtAction a is to system state xtIf the total calculation rate of the modified system is greater than before, then the current prize r (x) is awardedtA) is set to a positive value, otherwise to a negative value, and the system enters the next state xt+1
Further, in the step 5), the iterative process of reinforcement learning is as follows:
step 5.1: and initializing an evaluation network, a target network and a memory base in reinforcement learning. The current system state is xtT is initialized to 1, and the iteration number k is initialized to 1;
step 5.2: randomly selecting a probability p when K is less than or equal to a given iteration number K;
step 5.3: if p is less than or equal to ε; selecting an action a (t) output by the evaluation network, otherwise randomly selecting an action;
step 5.4: after action a (t) is taken, obtaining reward r (t) and next state x (t +1), and storing the information in a memory base according to formats (x (t), a (t), r (t), x (t + 1));
step 5.5: combining the output of the target network, calculating the target y (r) (x) of the evaluation networkt,a)+γmaxQθ′(xt+1,a′);
Step 5.6: minimizing errors (y-Q (x (t), a (t); theta))2Meanwhile, updating the parameter theta of the evaluation network to enable the next time of prediction to be more accurate;
step 5.7: assigning the parameters of the evaluation network to the target network every S step, and returning to the step 4.2 when k is equal to k + 1;
step 5.8: when K is greater than the given iteration number K, the learning process is ended to obtain the best mode selection
Figure GDA0002998777310000051
And
Figure GDA0002998777310000052
the technical conception of the invention is as follows: first, in an internet of things network, a large number of Wireless Devices (WDs) capable of communication and computation are deployed, and due to device size constraints and manufacturing cost considerations, internet of things devices (e.g., sensors) often carry batteries with limited capacity and energy-saving low-performance processors, so that the limited device lifetime and low computing power cannot support more and more sustainable new applications requiring high-performance computation, and due to strict battery capacity constraints, in a battery-powered wireless system, minimizing energy consumption and extending the wireless device operational life cycle is a critical design. Each energy harvesting wireless device follows a binary computation offload policy, i.e., the data set for one task may be performed locally or by remote server offload. To maximize the total computation rate of all wireless devices, an optimal individual computation mode selection method is proposed.
The invention has the following beneficial effects: the optimal mode selection method is found through deep reinforcement learning, the total calculation rate of all wireless devices is maximized, the energy consumption is minimized, and the operation life cycle of the wireless devices is prolonged.
Drawings
FIG. 1 is a system model diagram.
Fig. 2 is a flow chart of a method of finding an optimal mode selection.
Detailed Description
The present invention is described in further detail below with reference to the attached drawing figures.
Referring to fig. 1 and 2, a method for maximizing a moving edge computation rate based on deep reinforcement learning, which maximizes a sum computation rate of all wireless devices, minimizes energy consumption, and prolongs an operation life cycle of the wireless devices, the present invention provides an optimal individual computation mode selection method to decide which tasks of the wireless devices are to be shunted to a base station based on a system model of multiple wireless devices (as shown in fig. 1), the optimal individual computation mode selection method includes the following steps (as shown in fig. 2):
1) in an edge computing system powered wirelessly by a base station and a plurality of wireless devices, the base station and each wireless device having a separate antenna; the radio frequency energy emitter and the edge calculation server are integrated in the base station, and the base station is assumed to have a stable energy supply and can broadcast radio frequency energy to all wireless devices; each wireless device has an energy harvesting circuit and a rechargeable battery to perform some task by storing harvested energy; in this wireless communication system, each wireless device needs to establish contact with a base station, and the channel gain h between the wireless device i and the base stationiThe calculation is as follows:
Figure GDA0002998777310000071
wherein, each parameter is defined as follows:
Ad: antenna gain;
pi: a circumferential ratio;
fc: a carrier frequency;
di: distance between wireless device i and base station;
de: a path loss exponent;
2) assuming that the computing tasks of each wireless device can be executed on a local low-performance microprocessor or offloaded to an edge computing server with more powerful processing power, it will process the computing tasks and then send the results back to the wireless device; suppose a wireless device employs a binary computation offload rule, i.e., a wireless device must choose either a local computation mode or an offload mode; using two non-overlapping sets
Figure GDA0002998777310000072
And
Figure GDA0002998777310000073
all wireless devices, all sets of wireless devices, representing local compute mode and offload mode, respectively
Figure GDA0002998777310000074
Expressed as:
Figure GDA0002998777310000075
3) in a collection
Figure GDA0002998777310000076
The wireless devices in (1) can collect energy and process local tasks simultaneously while in the aggregate
Figure GDA0002998777310000077
The wireless device in (1) can only shunt the task to the base station for processing after collecting energy, and assuming that the computing power and transmission capability of the base station are much stronger than those of the energy collecting wireless device, in this case, in the task shunting process, the wireless device exhausts the energy collected by the wireless device, and the problem of maximizing the sum of the computing rates of all the wireless devices is described as follows:
Figure GDA0002998777310000078
the constraint conditions are as follows:
Figure GDA0002998777310000081
Figure GDA0002998777310000082
Figure GDA0002998777310000083
in the formula:
Figure GDA0002998777310000084
Figure GDA0002998777310000085
Figure GDA0002998777310000086
wherein, each parameter is defined as follows:
ωi: a transition weight for the ith wireless device;
μ: an energy collection efficiency;
p: radio frequency energy transmission power;
the method comprises the following steps: the number of calculation cycles required to process each bit of data;
hi: channel gain of the ith wireless device;
ki: an energy efficiency coefficient for the ith wireless device;
a: a time coefficient;
vμ: conversion efficiency;
b: a bandwidth;
τj: a time coefficient for the jth wireless device;
N0: the number of wireless devices in the local processing mode;
4) finding an optimal mode selection, i.e. mode selection of all wireless devices, by a reinforcement learning algorithm
Figure GDA0002998777310000087
And
Figure GDA0002998777310000088
the reinforcement learning system is composed of an agent and an environment. Mode selection for all users
Figure GDA0002998777310000089
And
Figure GDA00029987773100000810
are all programmed into the current state x of the systemtThe agent takes action a in the current state to enter the next state xt+1While receiving the reward r (x) returned by the environmenttA), mode selection under constant interactive update of agent and environment
Figure GDA0002998777310000091
And
Figure GDA0002998777310000092
will be optimized continuously until finding the optimum, the update mode of the agent is:
Qθ(xt,a)=r(xt,a)+γmaxQθ′(xt+1,a′) (4)
wherein, each parameter is defined as follows:
θ: evaluating a parameter in the network;
theta': parameters in the target network;
xt: at time t, the system is in the state;
Qθ(xta): in state xtTaking the Q value obtained by the action a;
r(xta): in state xtThe reward resulting from taking action a;
γ: rewarding the attenuated specific gravity;
5) mode selection for all wireless devices
Figure GDA0002998777310000093
And
Figure GDA0002998777310000094
system state x as deep reinforcement learningtAction a is to system state xtIf the total calculation rate of the modified system is greater than before, then the current prize r (x) is awardedtA) is set to a positive value, otherwise to a negative value, and the system enters the next state xt+1
In the step 5), the iterative process of reinforcement learning is as follows:
step 5.1: and initializing an evaluation network, a target network and a memory base in reinforcement learning. The current system state is xtT is initialized to 1, and the iteration number k is initialized to 1;
step 5.2: randomly selecting a probability p when K is less than or equal to a given iteration number K;
step 5.3: if p is less than or equal to ε; selecting an action a (t) output by the evaluation network, otherwise randomly selecting an action;
step 5.4: after action a (t) is taken, obtaining reward r (t) and next state x (t +1), and storing the information in a memory base according to formats (x (t), a (t), r (t), x (t + 1));
step 5.5: combining the output of the target network, calculating the target y (r) (x) of the evaluation networkt,a)+γmaxQθ′(xt+1,a′);
Step 5.6: minimizing errors (y-Q (x (t), a (t); theta))2Meanwhile, updating the parameter theta of the evaluation network to enable the next time of prediction to be more accurate;
step 5.7: assigning the parameters of the evaluation network to the target network every S step, and returning to the step 4.2 when k is equal to k + 1;
step 5.8: when K is greater than the given iteration number K, the learning process is ended to obtain the best mode selection
Figure GDA0002998777310000101
And
Figure GDA0002998777310000102

Claims (2)

1. a moving edge calculation rate maximization method based on deep reinforcement learning is characterized by comprising the following steps:
1) in an edge computing system powered wirelessly by a base station and a plurality of wireless devices, the base station and each wireless device having a separate antenna; the radio frequency energy emitter and the edge calculation server are integrated in the base station, and the base station is assumed to have a stable energy supply and can broadcast radio frequency energy to all wireless devices; each wireless device has an energy harvesting circuit and a rechargeable battery to perform some task by storing harvested energy; in this wireless communication system, each wireless device needs to establish contact with a base station, and the channel gain h between the wireless device i and the base stationiThe calculation is as follows:
Figure FDA0002998777300000011
wherein, each parameter is defined as follows:
Ad: antenna gain;
pi: a circumferential ratio;
fc: a carrier frequency;
di: distance between wireless device i and base station;
de: a path loss exponent;
2) assume that the computing task of each wireless device can be locally lowThe performance microprocessor executes or shunts to the edge computing server with greater processing power, which will process the computing task and then send the result back to the wireless device; suppose a wireless device employs a binary computation offload rule, i.e., a wireless device must choose either a local computation mode or an offload mode; using two non-overlapping sets
Figure FDA0002998777300000012
And
Figure FDA0002998777300000013
all wireless devices, all sets of wireless devices, representing local compute mode and offload mode, respectively
Figure FDA0002998777300000014
Expressed as:
Figure FDA0002998777300000015
3) in a collection
Figure FDA0002998777300000021
The wireless devices in (1) can collect energy and process local tasks simultaneously while in the aggregate
Figure FDA0002998777300000022
The wireless device in (1) can only shunt the task to the base station for processing after collecting energy, and assuming that the computing power and transmission capability of the base station are much stronger than those of the energy collecting wireless device, in this case, during task unloading, the wireless device exhausts the energy collected by the wireless device, and the problem of maximizing the sum of the computing rates of all the wireless devices is described as follows:
Figure FDA0002998777300000023
the constraint conditions are as follows:
Figure FDA0002998777300000024
Figure FDA0002998777300000025
in the formula:
Figure FDA0002998777300000026
Figure FDA0002998777300000027
Figure FDA0002998777300000028
wherein, each parameter is defined as follows:
ωi: a transition weight for the ith wireless device;
μ: an energy collection efficiency;
p: radio frequency energy transmission power;
phi: the number of calculation cycles required to process each bit of data;
hi: channel gain of the ith wireless device;
ki: an energy efficiency coefficient for the ith wireless device;
α: a time coefficient;
vμ: conversion efficiency;
b: a bandwidth;
τj: a time coefficient for the jth wireless device;
N0: the number of wireless devices in the local processing mode;
4)finding an optimal mode selection, i.e. mode selection of all wireless devices, by a reinforcement learning algorithm
Figure FDA0002998777300000031
And
Figure FDA0002998777300000032
the reinforcement learning system consists of an intelligent agent and an environment; mode selection for all users
Figure FDA0002998777300000033
And
Figure FDA0002998777300000034
are all programmed into the current state x of the systemtThe agent takes action a in the current state to enter the next state xt+1While receiving the reward r (x) returned by the environmenttA); mode selection under constant interactive update of agent and environment
Figure FDA0002998777300000035
And
Figure FDA0002998777300000036
will be optimized continuously until finding the optimum, the update mode of the agent is:
Qθ(xt,a)=r(xt,a)+γmaxQθ′(xt+1,a′) (4)
wherein, each parameter is defined as follows:
θ: evaluating a parameter in the network;
theta': parameters in the target network;
xt: at time t, the system is in the state;
Qθ(xta): in state xtTaking the Q value obtained by the action a;
r(xta): in state xtThe reward resulting from taking action a;
γ: rewarding the attenuated specific gravity;
5) mode selection for all wireless devices
Figure FDA0002998777300000037
And
Figure FDA0002998777300000038
system state x as deep reinforcement learningtAction a is to system state xtIf the total calculation rate of the modified system is greater than before, then the current prize r (x) is awardedtA) is set to a positive value, otherwise to a negative value, and the system enters the next state xt+1
2. The method according to claim 1, wherein the method for maximizing the computation rate of the moving edge based on the deep reinforcement learning comprises: in the step 5), the iterative process of reinforcement learning is as follows:
step 5.1: initializing an evaluation network, a target network and a memory base in reinforcement learning, wherein the current system state is xtT is initialized to 1, and the iteration number k is initialized to 1;
step 5.2: randomly selecting a probability p when K is less than or equal to a given iteration number K;
step 5.3: if p is less than or equal to ε; selecting an action a (t) output by the evaluation network, otherwise randomly selecting an action;
step 5.4: after action a (t) is taken, obtaining reward r (t) and next state x (t +1), and storing the information in a memory base according to formats (x (t), a (t), r (t), x (t + 1));
step 5.5: calculating a target of the evaluation network in combination with the output of the target network
y=r(xt,a)+γmaxQθ′(xt+1,a′);
Step 5.6: minimizing error (y-Q)θ(xt,a))2Meanwhile, updating the parameter theta of the evaluation network to enable the next time of prediction to be more accurate;
step 5.7: assigning the parameters of the evaluation network to the target network every S step, and returning to the step 4.2 when k is equal to k + 1;
step 5.8: when K is greater than the given iteration number K, the learning process is ended to obtain the best mode selection
Figure FDA0002998777300000041
And
Figure FDA0002998777300000042
CN201810342359.5A 2018-04-17 2018-04-17 Mobile edge calculation rate maximization method based on deep reinforcement learning Active CN108632860B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810342359.5A CN108632860B (en) 2018-04-17 2018-04-17 Mobile edge calculation rate maximization method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810342359.5A CN108632860B (en) 2018-04-17 2018-04-17 Mobile edge calculation rate maximization method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN108632860A CN108632860A (en) 2018-10-09
CN108632860B true CN108632860B (en) 2021-06-18

Family

ID=63705383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810342359.5A Active CN108632860B (en) 2018-04-17 2018-04-17 Mobile edge calculation rate maximization method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN108632860B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109618399A (en) * 2018-12-26 2019-04-12 东华大学 Distributed energy management solutions optimization method in the mobile edge calculations system of multi-user
CN109803292B (en) * 2018-12-26 2022-03-04 佛山市顺德区中山大学研究院 Multi-level user moving edge calculation method based on reinforcement learning
CN109756371B (en) * 2018-12-27 2022-04-29 上海无线通信研究中心 Game-based network node resource perception excitation method and system
CN110809306B (en) * 2019-11-04 2021-03-16 电子科技大学 Terminal access selection method based on deep reinforcement learning
CN113222166A (en) * 2020-01-21 2021-08-06 厦门邑通软件科技有限公司 Machine heuristic learning method, system and device for operation behavior record management
CN111556461B (en) * 2020-04-29 2023-04-21 南京邮电大学 Vehicle-mounted edge network task distribution and unloading method based on deep Q network
CN113727362B (en) * 2021-05-31 2022-10-28 南京邮电大学 Unloading strategy method of wireless power supply system based on deep reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107708135A (en) * 2017-07-21 2018-02-16 上海交通大学 A kind of resource allocation methods for being applied to mobile edge calculations scene
CN107734558A (en) * 2017-10-26 2018-02-23 北京邮电大学 A kind of control of mobile edge calculations and resource regulating method based on multiserver
CN107846704A (en) * 2017-10-26 2018-03-27 北京邮电大学 A kind of resource allocation and base station service arrangement method based on mobile edge calculations
CN107872823A (en) * 2016-09-28 2018-04-03 维布络有限公司 The method and system of communication operational mode in the mobile edge calculations environment of identification
US9942825B1 (en) * 2017-03-27 2018-04-10 Verizon Patent And Licensing Inc. System and method for lawful interception (LI) of Network traffic in a mobile edge computing environment
CN107911242A (en) * 2017-11-15 2018-04-13 北京工业大学 A kind of cognitive radio based on industry wireless network and edge calculations method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107872823A (en) * 2016-09-28 2018-04-03 维布络有限公司 The method and system of communication operational mode in the mobile edge calculations environment of identification
US9942825B1 (en) * 2017-03-27 2018-04-10 Verizon Patent And Licensing Inc. System and method for lawful interception (LI) of Network traffic in a mobile edge computing environment
CN107708135A (en) * 2017-07-21 2018-02-16 上海交通大学 A kind of resource allocation methods for being applied to mobile edge calculations scene
CN107734558A (en) * 2017-10-26 2018-02-23 北京邮电大学 A kind of control of mobile edge calculations and resource regulating method based on multiserver
CN107846704A (en) * 2017-10-26 2018-03-27 北京邮电大学 A kind of resource allocation and base station service arrangement method based on mobile edge calculations
CN107911242A (en) * 2017-11-15 2018-04-13 北京工业大学 A kind of cognitive radio based on industry wireless network and edge calculations method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Computation Rate Maximization for Wireless Powered Mobile-Edge Computing With Binary Computation Offloading;Suzhi BI等;《IEEE Transactions on Wireless Communications》;20180409;全文 *

Also Published As

Publication number Publication date
CN108632860A (en) 2018-10-09

Similar Documents

Publication Publication Date Title
CN108632860B (en) Mobile edge calculation rate maximization method based on deep reinforcement learning
Engmann et al. Prolonging the lifetime of wireless sensor networks: a review of current techniques
Adu-Manu et al. Energy-harvesting wireless sensor networks (EH-WSNs) A review
CN107743308B (en) Node clustering data collection method and device for environmental monitoring
Zhang et al. An analytical approach to the design of energy harvesting wireless sensor nodes
Xie et al. Backscatter-assisted computation offloading for energy harvesting IoT devices via policy-based deep reinforcement learning
CN102316496A (en) Data merging method based on Kalman filtering in wireless sensor network
Siew et al. Cluster heads distribution of wireless sensor networks via adaptive particle swarm optimization
CN113286317B (en) Task scheduling method based on wireless energy supply edge network
WO2022242468A1 (en) Task offloading method and apparatus, scheduling optimization method and apparatus, electronic device, and storage medium
CN108738045B (en) Moving edge calculation rate maximization method based on depth certainty strategy gradient
CN114727359A (en) Unmanned aerial vehicle-assisted post-disaster clustering mine Internet of things data acquisition method
Chen et al. Learning aided joint sensor activation and mobile charging vehicle scheduling for energy-efficient WRSN-based industrial IoT
CN115562756A (en) Multi-access edge computing vehicle task unloading method and system
Koulali et al. Dynamic power control for energy harvesting wireless multimedia sensor networks
CN108738046B (en) Mobile edge calculation rate maximization method based on semi-supervised learning
CN115175347A (en) Wireless energy-carrying communication network resource allocation optimization method
Thiyagarajan et al. An investigation on energy consumption in wireless sensor network
CN114521023A (en) SWIPT-assisted NOMA-MEC system resource allocation modeling method
Benmad et al. Data collection in UAV-assisted wireless sensor networks powered by harvested energy
Alageswaran et al. Design and implementation of dynamic sink node placement using Particle Swarm Optimization for life time maximization of WSN applications
Liu et al. Learning-based multi-UAV assisted data acquisition and computation for information freshness in WPT enabled space-air-ground PIoT
Lin et al. Maximum data collection rate routing for data gather trees with data aggregation in rechargeable wireless sensor networks
CN111162852B (en) Ubiquitous power Internet of things access method based on matching learning
Alsharif et al. Notice of Retraction: Enabling Hardware Green Internet of Things: A review of Substantial Issues

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant