CN108738045B - Moving edge calculation rate maximization method based on depth certainty strategy gradient - Google Patents

Moving edge calculation rate maximization method based on depth certainty strategy gradient Download PDF

Info

Publication number
CN108738045B
CN108738045B CN201810342357.6A CN201810342357A CN108738045B CN 108738045 B CN108738045 B CN 108738045B CN 201810342357 A CN201810342357 A CN 201810342357A CN 108738045 B CN108738045 B CN 108738045B
Authority
CN
China
Prior art keywords
wireless device
state
execution unit
action
energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810342357.6A
Other languages
Chinese (zh)
Other versions
CN108738045A (en
Inventor
黄亮
冯旭
钱丽萍
吴远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810342357.6A priority Critical patent/CN108738045B/en
Publication of CN108738045A publication Critical patent/CN108738045A/en
Application granted granted Critical
Publication of CN108738045B publication Critical patent/CN108738045B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/06Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/10Flow control between communication endpoints

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

A moving edge calculation rate maximization method based on a depth certainty strategy gradient method comprises the following steps: 1) calculating the sum of the rates of all wireless devices in the system given the mode selection; 2) a set of all wireless devices; 3) a problem of maximizing the sum of the calculated rates of all wireless devices; 4) finding an optimal mode selection by a depth deterministic strategy gradient method; 5) mode selection M for all wireless devices0And M1State x as a depth-deterministic policy gradient methodtAction a is for state xtThe total calculation rate of the system after the change is compared with a set standard value, and if the total calculation rate is larger than the set standard value, the current reward r (x) is giventA) is set to a positive value, otherwise to a negative value, and the system enters the next state xt+1. The invention maximizes the total calculation rate of all wireless devices on the premise of ensuring user experience.

Description

Moving edge calculation rate maximization method based on depth certainty strategy gradient
Technical Field
The invention belongs to the field of communication, and particularly relates to a mobile edge computing communication system and a mobile edge computing rate maximization method based on a depth certainty strategy gradient method.
Background
The recent development of internet of things technology is a key step towards real intelligence and autonomous control, and is particularly prominent in many important industrial and commercial systems. In an internet of things network, a large number of Wireless Devices (WDs) capable of communication and computing are deployed, and due to device size limitations and manufacturing cost considerations, internet of things devices (e.g., sensors) often carry batteries with limited capacity and energy-efficient low-performance processors, and therefore, the limited device lifetime and low computing power cannot support more and more sustainable new applications that require high-performance computing, such as autopilot and augmented reality. Deployment of wireless energy Transfer Systems (WPTs) can solve the two aforementioned performance problems, but frequent device battery failures not only disrupt normal personal wireless device operation but can also significantly degrade overall network performance, e.g., sensing accuracy in wireless sensor networks. Conventional wireless systems require frequent manual battery replacement, which is expensive and inconvenient, and due to severe battery capacity limitations, minimizing power consumption and extending the operational life of the wireless device is a critical design in battery-powered wireless systems. Each energy harvesting wireless device follows a binary computation offload policy, i.e., the data set for one task may be performed locally or by remote server offload. In order to maximize the total computation rate of all wireless devices, it is necessary to find the optimal individual computation mode selection.
Disclosure of Invention
In order to overcome the defect that the sum computing rate of the existing wireless equipment is low, in order to maximize the sum computing rate of all the wireless equipment and find the optimal individual computing mode selection and system transmission time allocation, the invention provides a mobile edge computing rate maximization method based on a depth certainty strategy gradient method, and the sum computing rate of all the wireless equipment is maximized on the premise of ensuring the user experience.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a moving edge computation rate maximization method based on a depth-deterministic policy gradient, the method comprising the steps of:
1) in an edge computing system powered wirelessly by a base station and a plurality of wireless devices, the base station and each wireless device having a separate antenna; the radio frequency energy emitter and the edge calculation server are integrated in the base station, and the base station is assumed to have a stable energy supply and can broadcast radio frequency energy to all wireless devices; each wireless device has an energy harvesting circuit and a rechargeable battery to perform some task by storing harvested energy; in this wireless communication system, each wireless device needs to communicate with a base stationStation establishment association, channel gain h between wireless device i and base stationiThe calculation is as follows:
Figure GDA0002906641750000021
wherein, each parameter is defined as follows:
Ad: antenna gain;
pi: a circumferential ratio;
fc: a carrier frequency;
di: distance between wireless device i and base station;
de: a path loss exponent;
2) assuming that the computing tasks of each wireless device are executed on a local low-performance microprocessor or offloaded to an edge computing server with greater processing power, it will process the computing tasks and then send the results back to the wireless device; suppose a wireless device employs a binary computation offload rule, i.e., a wireless device must choose either a local computation mode or an offload mode; (ii) a Using two non-overlapping sets
Figure GDA0002906641750000031
And
Figure GDA0002906641750000032
all wireless devices, all sets of wireless devices, representing local compute mode and offload mode, respectively
Figure GDA0002906641750000033
Expressed as:
Figure GDA0002906641750000034
3) in a collection
Figure GDA0002906641750000035
In a wireless deviceCapable of harvesting energy and simultaneously processing local tasks while in aggregate
Figure GDA0002906641750000036
The wireless device in (1) can only shunt the task to the base station for processing after collecting energy, and assuming that the computing power and transmission capability of the base station are much stronger than those of the energy collecting wireless device, in this case, in the task shunting process, the wireless device exhausts the energy collected by the wireless device, and the problem of maximizing the sum of the computing rates of all the wireless devices is described as follows:
Figure GDA0002906641750000037
the constraint conditions are as follows:
Figure GDA0002906641750000038
Figure GDA0002906641750000039
Figure GDA00029066417500000310
in the formula:
Figure GDA00029066417500000311
Figure GDA00029066417500000312
Figure GDA00029066417500000313
wherein, each parameter is defined as follows:
ωi: a transition weight for the ith wireless device;
μ: an energy collection efficiency;
p: radio frequency energy transmission power;
phi: the number of calculation cycles required to process each bit of data;
hi: channel gain of the ith wireless device;
ki: an energy efficiency coefficient for the ith wireless device;
a: a time coefficient;
vμ: conversion efficiency;
b: a bandwidth;
τj: a time coefficient for the jth wireless device;
N0: the number of wireless devices in the local processing mode;
4) finding an optimal mode selection, i.e. mode selection of all wireless devices, by a depth-deterministic policy gradient method
Figure GDA0002906641750000041
And
Figure GDA0002906641750000042
the gradient method of the deep certainty strategy consists of an execution unit, a scoring unit and an environment, and the mode selection of all users
Figure GDA0002906641750000043
And
Figure GDA0002906641750000044
are programmed with the states x required by the execution unittThe execution unit takes action a on mode selection in the current state
Figure GDA0002906641750000045
And
Figure GDA0002906641750000046
make changes and enter the next state xt+1While receiving the reward r (x) returned by the environmenttA), scoring the cell binding status xtAction a and reward r (x) returned by the environmenttA) scoring the execution unit, i.e. indicating that the execution unit is in state xtThe action a is good or bad, the goal of the execution unit is to make the score of the scoring unit higher and better, and the goal of the scoring unit is to make the score of the execution unit played each time close to the real, through the reward r (x)tA) to adjust; mode selection under the condition of continuous interactive update of execution unit, scoring unit and environment
Figure GDA0002906641750000047
And
Figure GDA0002906641750000048
continuously optimizing until the evaluation unit is updated to be optimal, wherein the updating mode of the evaluation unit is as follows:
S(xt,a)=r(xt,a)+γS′(xt+1,a′) (4)
wherein, each parameter is defined as follows:
xt: at time t, the system is in the state;
xt+1: at time t +1, the system is in a state;
a: performing the action taken by the unit in the current state;
a': performing the action taken by the unit in the next state;
S(xta): evaluation network in execution Unit in State xtThe score obtained by the action a is taken;
S′(xt+1a'): target network in execution Unit is in State xt+1The score obtained by taking action a' below;
r(xta): in state xtThe reward resulting from taking action a;
γ: rewarding the attenuated specific gravity;
5) mode selection for all wireless devices
Figure GDA0002906641750000051
And
Figure GDA0002906641750000052
state x as a depth-deterministic policy gradient methodtAction a is for state xtThe total calculation rate of the system after the change is compared with a set standard value, and if the total calculation rate is larger than the set standard value, the current reward r (x) is giventA) is set to a positive value, otherwise to a negative value, and the system enters the next state xt+1
Further, in the step 5), an iterative process of the depth deterministic strategy gradient method is as follows:
step 5.1: initializing an execution unit, a scoring unit and a memory base in the gradient method of the depth certainty strategy, wherein the current system state is xtT is initialized to 1, and the iteration number k is initialized to 1;
step 5.2: when K is less than or equal to a given number of iterations K, in state xtNext, the execution unit predicts an action a;
step 5.3: action a vs. State xtChange it to the next state xt+1And gets the reward r (x) fed back by the environmentt,a);
Step 5.4: according to the format (x)t,a,r(xt,a),xt+1) Storing the historical experience in a memory base;
step 5.5: the scoring unit receives an action a, a state xtAnd a prize r (x)tA), a score S (x) is given to the execution unitt,a);
Step 5.6: the execution unit continuously maximizes the score S (x) by updating the parameters of the execution unittA), making the user to make high-speed work next time as much as possible;
step 5.7: the scoring unit extracts historical experience in the memory base, continuously learns, updates parameters to enable the score scored by the scoring unit to be as accurate as possible, and returns to the step 5.2 when k is equal to k + 1;
step 5.8: when K is greater than a given number of iterations KThe learning process is finished to obtain the best mode selection
Figure GDA0002906641750000061
And
Figure GDA0002906641750000062
the technical conception of the invention is as follows: first, in an internet of things network, a large number of Wireless Devices (WDs) capable of communication and computation are deployed, and due to device size constraints and manufacturing cost considerations, internet of things devices (e.g., sensors) often carry batteries with limited capacity and energy-saving low-performance processors, so that the limited device lifetime and low computing power cannot support more and more sustainable new applications requiring high-performance computation, and due to strict battery capacity constraints, in a battery-powered wireless system, minimizing energy consumption and extending the wireless device operational life cycle is a critical design. Each energy harvesting wireless device follows a binary computation offload policy, i.e., the data set for one task may be performed locally or by remote server offload. To maximize the total computation rate of all wireless devices, an optimal individual computation mode selection method is proposed.
The invention has the following beneficial effects: an optimal mode selection method is found out through a depth certainty strategy gradient method, the total calculation rate of all wireless devices is maximized, the energy consumption is minimized, and the operation life cycle of the wireless devices is prolonged.
Drawings
FIG. 1 is a system model diagram.
Fig. 2 is a flow chart of a method of finding an optimal mode selection.
Detailed Description
The present invention is described in further detail below with reference to the attached drawing figures.
Referring to fig. 1 and 2, a method for maximizing a moving edge computation rate based on a depth deterministic strategy gradient maximizes the sum computation rate of all wireless devices, minimizes energy consumption, and prolongs the wireless device operation life cycle. The present invention proposes an optimal individual computation mode selection method to decide which wireless devices will be tasked with offloading to the base station based on a system model of multiple wireless devices (as shown in fig. 1). The optimal individual calculation mode selection method comprises the following steps (as shown in fig. 2):
1) in an edge computing system powered wirelessly by a base station and a plurality of wireless devices, the base station and each wireless device having a separate antenna; the radio frequency energy emitter and the edge calculation server are integrated in the base station, and the base station is assumed to have a stable energy supply and can broadcast radio frequency energy to all wireless devices; each wireless device has an energy harvesting circuit and a rechargeable battery, and can perform some tasks by storing harvested energy; in this wireless communication system, each wireless device needs to establish contact with a base station, and the channel gain h between the wireless device i and the base stationiThe calculation is as follows:
Figure GDA0002906641750000071
wherein, each parameter is defined as follows:
Ad: antenna gain;
pi: a circumferential ratio;
fc: a carrier frequency;
di: distance between wireless device i and base station;
de: a path loss exponent;
2) assuming that the computing tasks of each wireless device are executed on a local low-performance microprocessor or offloaded to an edge computing server with greater processing power, it will process the computing tasks and then send the results back to the wireless device; suppose a wireless device employs a binary computation offload rule, i.e., a wireless device must choose either a local computation mode or an offload mode; using two non-overlapping sets
Figure GDA0002906641750000072
And
Figure GDA0002906641750000073
all wireless devices, all sets of wireless devices, representing local compute mode and offload mode, respectively
Figure GDA0002906641750000074
Expressed as:
Figure GDA0002906641750000081
3) in a collection
Figure GDA0002906641750000082
The wireless device in (1) is able to collect energy and process local tasks simultaneously while in the aggregate
Figure GDA0002906641750000083
The wireless device in (1) can only shunt the task to the base station for processing after collecting energy, and assuming that the computing power and transmission capability of the base station are much stronger than those of the energy collecting wireless device, in this case, in the task shunting process, the wireless device exhausts the energy collected by the wireless device, and the problem of maximizing the sum of the computing rates of all the wireless devices is described as follows:
Figure GDA0002906641750000084
the constraint conditions are as follows:
Figure GDA0002906641750000085
Figure GDA0002906641750000086
Figure GDA0002906641750000087
in the formula:
Figure GDA0002906641750000088
Figure GDA0002906641750000089
Figure GDA00029066417500000810
wherein, each parameter is defined as follows:
ωi: a transition weight for the ith wireless device;
μ: an energy collection efficiency;
p: radio frequency energy transmission power;
phi: the number of calculation cycles required to process each bit of data;
hi: channel gain of the ith wireless device;
ki: an energy efficiency coefficient for the ith wireless device;
a: a time coefficient;
vμ: conversion efficiency;
b: a bandwidth;
τj: a time coefficient for the jth wireless device;
N0: the number of wireless devices in the local processing mode;
4) finding an optimal mode selection, i.e. mode selection of all wireless devices, by a depth-deterministic policy gradient method
Figure GDA0002906641750000091
And
Figure GDA0002906641750000092
the gradient method of the deep certainty strategy consists of an execution unit, a scoring unit and an environment, and the mode selection of all users
Figure GDA0002906641750000093
And
Figure GDA0002906641750000094
are programmed with the states x required by the execution unittThe execution unit takes action a on mode selection in the current state
Figure GDA0002906641750000095
And
Figure GDA0002906641750000096
make changes and enter the next state xt+1While receiving the reward r (x) returned by the environmenttA), scoring the cell binding status xtAction a and reward r (x) returned by the environmenttA) scoring the execution unit, i.e. indicating that the execution unit is in state xtThe action a is good or bad, the goal of the execution unit is to make the score of the scoring unit higher and better, and the goal of the scoring unit is to make the score of the execution unit played each time close to the real, through the reward r (x)tA) to adjust; mode selection under the condition of continuous interactive update of execution unit, scoring unit and environment
Figure GDA0002906641750000097
And
Figure GDA0002906641750000098
continuously optimizing until the evaluation unit is updated to be optimal, wherein the updating mode of the evaluation unit is as follows:
S(xt,a)=r(xt,a)+γS′(xt+1,a′) (4)
wherein, each parameter is defined as follows:
xt: at time t, the system is in the state;
xt+1: at time t +1, the system is in a state;
a: performing the action taken by the unit in the current state;
a': performing the action taken by the unit in the next state;
S(xta): evaluation network in execution Unit in State xtThe score obtained by the action a is taken;
S′(xt+1a'): target network in execution Unit is in State xt+1The score obtained by taking action a' below;
r(xta): in state xtThe reward resulting from taking action a;
γ: rewarding the attenuated specific gravity;
5) mode selection for all wireless devices
Figure GDA0002906641750000101
And
Figure GDA0002906641750000102
state x as a depth-deterministic policy gradient methodtAction a is for state xtThe total calculation rate of the system after the change is compared with a set standard value, and if the total calculation rate is larger than the set standard value, the current reward r (x) is giventA) is set to a positive value, otherwise to a negative value, and the system enters the next state xt+1
In the step 5), an iterative process of the depth deterministic strategy gradient method is as follows:
step 5.1: initializing an execution unit, a scoring unit and a memory base in the depth deterministic strategy gradient method. The current system state is xtT is initialized to 1, and the iteration number k is initialized to 1;
step 5.2: when K is less than or equal to a given number of iterations K, in state xtNext, the execution unit predicts an action a;
step 5.3: action a vs. State xtChange it to the next state xt+1And obtainReward r (x) fed back by environmentt,a);
Step 5.4: according to the format (x)t,a,r(xt,a),xt+1) Storing the historical experience in a memory base;
step 5.5: the scoring unit receives an action a, a state xtAnd a prize r (x)tA), a score S (x) is given to the execution unitt,a);
Step 5.6: the execution unit continuously maximizes the score S (x) by updating the parameters of the execution unittA), making the user to make high-speed work next time as much as possible;
step 5.7: the scoring unit extracts historical experience in the memory base, continuously learns, updates parameters to enable the score scored by the scoring unit to be as accurate as possible, and returns to the step 5.2 when k is equal to k + 1;
step 5.8: when K is greater than the given iteration number K, the learning process is ended to obtain the best mode selection
Figure GDA0002906641750000103
And
Figure GDA0002906641750000104

Claims (2)

1. a moving edge computation rate maximization method based on a depth deterministic strategy gradient, characterized in that the method comprises the following steps:
1) in an edge computing system powered wirelessly by a base station and a plurality of wireless devices, the base station and each wireless device having a separate antenna; the radio frequency energy emitter and the edge calculation server are integrated in the base station, and the base station is assumed to have a stable energy supply and can broadcast radio frequency energy to all wireless devices; each wireless device has an energy harvesting circuit and a rechargeable battery to perform some task by storing harvested energy; in this wireless communication system, each wireless device needs to establish contact with a base station, and the channel gain h between the wireless device i and the base stationiThe calculation is as follows:
Figure FDA0002906641740000011
wherein, each parameter is defined as follows:
Ad: antenna gain;
pi: a circumferential ratio;
fc: a carrier frequency;
di: distance between wireless device i and base station;
de: a path loss exponent;
2) assuming that the computing tasks of each wireless device are executed on a local low-performance microprocessor or offloaded to an edge computing server with greater processing power, it will process the computing tasks and then send the results back to the wireless device; assuming that the wireless device employs the binary computation offload rule, i.e., one wireless device must choose to be in the local computation mode or the offload mode, we use two sets that do not overlap with each other
Figure FDA0002906641740000012
And
Figure FDA0002906641740000013
all wireless devices, all sets of wireless devices, representing local compute mode and offload mode, respectively
Figure FDA0002906641740000014
Expressed as:
Figure FDA0002906641740000015
3) in a collection
Figure FDA0002906641740000021
The wireless device in (1) can collect energy and process the book at the same timeA ground task is in a set
Figure FDA0002906641740000022
The wireless device in (1) can only shunt the task to the base station for processing after collecting energy, and assuming that the computing power and transmission capability of the base station are much stronger than those of the energy collecting wireless device, in this case, during task unloading, the wireless device exhausts the energy collected by the wireless device, and the problem of maximizing the sum of the computing rates of all the wireless devices is described as follows:
Figure FDA0002906641740000023
the constraint conditions are as follows:
Figure FDA0002906641740000024
Figure FDA0002906641740000025
Figure FDA0002906641740000026
in the formula:
Figure FDA0002906641740000027
Figure FDA0002906641740000028
Figure FDA0002906641740000029
wherein, each parameter is defined as follows:
ωi: a transition weight for the ith wireless device;
μ: an energy collection efficiency;
p: radio frequency energy transmission power;
phi: the number of calculation cycles required to process each bit of data;
hi: channel gain of the ith wireless device;
ki: an energy efficiency coefficient for the ith wireless device;
t: a time coefficient;
vμ: conversion efficiency;
b: a bandwidth;
τj: a time coefficient for the jth wireless device;
N0: the number of wireless devices in the local processing mode;
4) finding an optimal mode selection, i.e. mode selection of all wireless devices, by a depth-deterministic policy gradient method
Figure FDA0002906641740000031
And
Figure FDA0002906641740000032
the gradient method of the deep certainty strategy consists of an execution unit, a scoring unit and an environment, and the mode selection of all users
Figure FDA0002906641740000033
And
Figure FDA0002906641740000034
are programmed with the states x required by the execution unittThe execution unit takes action a on mode selection in the current state
Figure FDA0002906641740000035
And
Figure FDA0002906641740000036
make changes and enter the next state xt+1While receiving the reward r (x) returned by the environmenttA), scoring the cell binding status xtAction a and reward r (x) returned by the environmenttA) scoring the execution unit, i.e. indicating that the execution unit is in state xtThe action a is good or bad, the goal of the execution unit is to make the score of the scoring unit higher and better, and the goal of the scoring unit is to make the score of the execution unit played each time close to the real, through the reward r (x)tA) adjusting mode selection under the condition of continuous interactive update of the execution unit, the scoring unit and the environment
Figure FDA0002906641740000037
And
Figure FDA0002906641740000038
continuously optimizing until the evaluation unit is updated to be optimal, wherein the updating mode of the evaluation unit is as follows:
S(xt,a)=r(xt,a)+γS′(xt+1,a′) (4)
wherein, each parameter is defined as follows:
xt: at time t, the system is in the state;
xt+1: at time t +1, the system is in a state;
a: performing the action taken by the unit in the current state;
a': performing the action taken by the unit in the next state;
S(xta): evaluation network in execution Unit in State xtThe score obtained by the action a is taken;
S′(xt+1a'): target network in execution Unit is in State xt+1The score obtained by taking action a' below;
r(xta): in state xtThe reward resulting from taking action a;
γ: rewarding the attenuated specific gravity;
5) mode selection for all wireless devices
Figure FDA0002906641740000041
And
Figure FDA0002906641740000042
state x as a depth-deterministic policy gradient methodtAction a is for state xtThe total calculation rate of the system after the change is compared with a set standard value, and if the total calculation rate is larger than the set standard value, the current reward r (x) is giventA) is set to a positive value, otherwise to a negative value, and the system enters the next state xt+1
2. The depth-deterministic-policy-gradient-based moving-edge computation rate maximization method of claim 1, wherein in the step 5), an iterative process of the depth-deterministic-policy gradient method is as follows:
step 5.1: initializing an execution unit, a comment unit and a memory library in the depth certainty strategy gradient method, wherein the current system state is xtT is initialized to 1, and the iteration number k is initialized to 1;
step 5.2: when K is less than or equal to a given number of iterations K, in state xtNext, the execution unit predicts an action a;
step 5.3: action a vs. State xtChange it to the next state xt+1And gets the reward r (x) fed back by the environmentt,a);
Step 5.4: according to the format (x)t,a,r(xt,a),xt+1) Storing the historical experience in a memory base;
step 5.5: the scoring unit receives an action a, a state xtAnd a prize r (x)tA), a score S (x) is given to the execution unitt,a);
Step 5.6: the execution unit continuously maximizes the score by updating the self parameterS(xtA), making the user to make high-speed work next time as much as possible;
step 5.7: the scoring unit extracts historical experience in the memory base, continuously learns, updates parameters to enable the score scored by the scoring unit to be as accurate as possible, and returns to the step 5.2 when k is equal to k + 1;
step 5.8: when K is greater than the given iteration number K, the learning process is ended to obtain the best mode selection
Figure FDA0002906641740000051
And
Figure FDA0002906641740000052
CN201810342357.6A 2018-04-17 2018-04-17 Moving edge calculation rate maximization method based on depth certainty strategy gradient Active CN108738045B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810342357.6A CN108738045B (en) 2018-04-17 2018-04-17 Moving edge calculation rate maximization method based on depth certainty strategy gradient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810342357.6A CN108738045B (en) 2018-04-17 2018-04-17 Moving edge calculation rate maximization method based on depth certainty strategy gradient

Publications (2)

Publication Number Publication Date
CN108738045A CN108738045A (en) 2018-11-02
CN108738045B true CN108738045B (en) 2021-04-06

Family

ID=63938925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810342357.6A Active CN108738045B (en) 2018-04-17 2018-04-17 Moving edge calculation rate maximization method based on depth certainty strategy gradient

Country Status (1)

Country Link
CN (1) CN108738045B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109831236B (en) * 2018-11-13 2021-06-01 电子科技大学 Beam selection method based on Monte Carlo tree search assistance
CN111026548B (en) * 2019-11-28 2023-05-09 国网甘肃省电力公司电力科学研究院 Power communication equipment test resource scheduling method for reverse deep reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107708135A (en) * 2017-07-21 2018-02-16 上海交通大学 A kind of resource allocation methods for being applied to mobile edge calculations scene
CN107734558A (en) * 2017-10-26 2018-02-23 北京邮电大学 A kind of control of mobile edge calculations and resource regulating method based on multiserver
CN107846704A (en) * 2017-10-26 2018-03-27 北京邮电大学 A kind of resource allocation and base station service arrangement method based on mobile edge calculations
CN107872823A (en) * 2016-09-28 2018-04-03 维布络有限公司 The method and system of communication operational mode in the mobile edge calculations environment of identification
US9942825B1 (en) * 2017-03-27 2018-04-10 Verizon Patent And Licensing Inc. System and method for lawful interception (LI) of Network traffic in a mobile edge computing environment
CN107911242A (en) * 2017-11-15 2018-04-13 北京工业大学 A kind of cognitive radio based on industry wireless network and edge calculations method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107872823A (en) * 2016-09-28 2018-04-03 维布络有限公司 The method and system of communication operational mode in the mobile edge calculations environment of identification
US9942825B1 (en) * 2017-03-27 2018-04-10 Verizon Patent And Licensing Inc. System and method for lawful interception (LI) of Network traffic in a mobile edge computing environment
CN107708135A (en) * 2017-07-21 2018-02-16 上海交通大学 A kind of resource allocation methods for being applied to mobile edge calculations scene
CN107734558A (en) * 2017-10-26 2018-02-23 北京邮电大学 A kind of control of mobile edge calculations and resource regulating method based on multiserver
CN107846704A (en) * 2017-10-26 2018-03-27 北京邮电大学 A kind of resource allocation and base station service arrangement method based on mobile edge calculations
CN107911242A (en) * 2017-11-15 2018-04-13 北京工业大学 A kind of cognitive radio based on industry wireless network and edge calculations method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Computation Rate Maximization for Wireless Powered Mobile-Edge Computing With Binary Computation Offloading;Suzhi BI等;《IEEE Transactions on Wireless Communications》;20180409;全文 *

Also Published As

Publication number Publication date
CN108738045A (en) 2018-11-02

Similar Documents

Publication Publication Date Title
CN108632860B (en) Mobile edge calculation rate maximization method based on deep reinforcement learning
Gunduz et al. Designing intelligent energy harvesting communication systems
CN108738045B (en) Moving edge calculation rate maximization method based on depth certainty strategy gradient
Xie et al. Backscatter-assisted computation offloading for energy harvesting IoT devices via policy-based deep reinforcement learning
CN102316496A (en) Data merging method based on Kalman filtering in wireless sensor network
WO2022242468A1 (en) Task offloading method and apparatus, scheduling optimization method and apparatus, electronic device, and storage medium
CN114845245B (en) Mobile data acquisition method, device and terminal based on unmanned aerial vehicle
Meng et al. Deep reinforcement learning-based topology optimization for self-organized wireless sensor networks
CN113255218B (en) Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network
Dai et al. Mobile crowdsensing for data freshness: A deep reinforcement learning approach
CN104093186A (en) Method for multi-hop wireless sensor network opportunistic routing and system thereof
CN114727359A (en) Unmanned aerial vehicle-assisted post-disaster clustering mine Internet of things data acquisition method
Mao et al. AI based service management for 6G green communications
Dasgupta et al. An improved Leach approach for Head selection Strategy in a Fuzzy-C Means induced clustering of a Wireless Sensor Network
CN108738046B (en) Mobile edge calculation rate maximization method based on semi-supervised learning
Liu et al. Learning-based multi-UAV assisted data acquisition and computation for information freshness in WPT enabled space-air-ground PIoT
Benmad et al. Data collection in UAV-assisted wireless sensor networks powered by harvested energy
Hosseinirad et al. Wireless sensor network design through genetic algorithm
CN112579290B (en) Computing task migration method of ground terminal equipment based on unmanned aerial vehicle
CN115243212B (en) Ocean data acquisition method based on AUV assistance and improved cross-layer clustering
Yang et al. Research on lifetime prediction-based recharging scheme in rechargeable WSNs
Yi et al. Multi-Task Transfer Deep Reinforcement Learning for Timely Data Collection in Rechargeable-UAV-aided IoT Networks
Li et al. Learning algorithms for complete targets coverage in RF-energy harvesting networks
CN107172170A (en) Agricultural product accumulating control system based on wireless sensor network
Benhamaid et al. Towards energy efficient mobile data collection in cluster-based IoT networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant