CN108738045B

CN108738045B - Moving edge calculation rate maximization method based on depth certainty strategy gradient

Info

Publication number: CN108738045B
Application number: CN201810342357.6A
Authority: CN
Inventors: 黄亮; 冯旭; 钱丽萍; 吴远
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2018-04-17
Filing date: 2018-04-17
Publication date: 2021-04-06
Anticipated expiration: 2038-04-17
Also published as: CN108738045A

Abstract

A moving edge calculation rate maximization method based on a depth certainty strategy gradient method comprises the following steps: 1) calculating the sum of the rates of all wireless devices in the system given the mode selection; 2) a set of all wireless devices; 3) a problem of maximizing the sum of the calculated rates of all wireless devices; 4) finding an optimal mode selection by a depth deterministic strategy gradient method; 5) mode selection M for all wireless devices₀And M₁State x as a depth-deterministic policy gradient method_tAction a is for state x_tThe total calculation rate of the system after the change is compared with a set standard value, and if the total calculation rate is larger than the set standard value, the current reward r (x) is given_tA) is set to a positive value, otherwise to a negative value, and the system enters the next state x_t+1. The invention maximizes the total calculation rate of all wireless devices on the premise of ensuring user experience.

Description

Moving edge calculation rate maximization method based on depth certainty strategy gradient

Technical Field

The invention belongs to the field of communication, and particularly relates to a mobile edge computing communication system and a mobile edge computing rate maximization method based on a depth certainty strategy gradient method.

Background

The recent development of internet of things technology is a key step towards real intelligence and autonomous control, and is particularly prominent in many important industrial and commercial systems. In an internet of things network, a large number of Wireless Devices (WDs) capable of communication and computing are deployed, and due to device size limitations and manufacturing cost considerations, internet of things devices (e.g., sensors) often carry batteries with limited capacity and energy-efficient low-performance processors, and therefore, the limited device lifetime and low computing power cannot support more and more sustainable new applications that require high-performance computing, such as autopilot and augmented reality. Deployment of wireless energy Transfer Systems (WPTs) can solve the two aforementioned performance problems, but frequent device battery failures not only disrupt normal personal wireless device operation but can also significantly degrade overall network performance, e.g., sensing accuracy in wireless sensor networks. Conventional wireless systems require frequent manual battery replacement, which is expensive and inconvenient, and due to severe battery capacity limitations, minimizing power consumption and extending the operational life of the wireless device is a critical design in battery-powered wireless systems. Each energy harvesting wireless device follows a binary computation offload policy, i.e., the data set for one task may be performed locally or by remote server offload. In order to maximize the total computation rate of all wireless devices, it is necessary to find the optimal individual computation mode selection.

Disclosure of Invention

In order to overcome the defect that the sum computing rate of the existing wireless equipment is low, in order to maximize the sum computing rate of all the wireless equipment and find the optimal individual computing mode selection and system transmission time allocation, the invention provides a mobile edge computing rate maximization method based on a depth certainty strategy gradient method, and the sum computing rate of all the wireless equipment is maximized on the premise of ensuring the user experience.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a moving edge computation rate maximization method based on a depth-deterministic policy gradient, the method comprising the steps of:

1) in an edge computing system powered wirelessly by a base station and a plurality of wireless devices, the base station and each wireless device having a separate antenna; the radio frequency energy emitter and the edge calculation server are integrated in the base station, and the base station is assumed to have a stable energy supply and can broadcast radio frequency energy to all wireless devices; each wireless device has an energy harvesting circuit and a rechargeable battery to perform some task by storing harvested energy; in this wireless communication system, each wireless device needs to communicate with a base stationStation establishment association, channel gain h between wireless device i and base station_iThe calculation is as follows:

wherein, each parameter is defined as follows:

A_d: antenna gain;

pi: a circumferential ratio;

f_c: a carrier frequency;

d_i: distance between wireless device i and base station;

d_e: a path loss exponent;

2) assuming that the computing tasks of each wireless device are executed on a local low-performance microprocessor or offloaded to an edge computing server with greater processing power, it will process the computing tasks and then send the results back to the wireless device; suppose a wireless device employs a binary computation offload rule, i.e., a wireless device must choose either a local computation mode or an offload mode; (ii) a Using two non-overlapping sets

And

all wireless devices, all sets of wireless devices, representing local compute mode and offload mode, respectively

Expressed as:

3) in a collection

In a wireless deviceCapable of harvesting energy and simultaneously processing local tasks while in aggregate

The wireless device in (1) can only shunt the task to the base station for processing after collecting energy, and assuming that the computing power and transmission capability of the base station are much stronger than those of the energy collecting wireless device, in this case, in the task shunting process, the wireless device exhausts the energy collected by the wireless device, and the problem of maximizing the sum of the computing rates of all the wireless devices is described as follows:

the constraint conditions are as follows:

in the formula:

wherein, each parameter is defined as follows:

ω_i: a transition weight for the ith wireless device;

μ: an energy collection efficiency;

p: radio frequency energy transmission power;

phi: the number of calculation cycles required to process each bit of data;

h_i: channel gain of the ith wireless device;

k_i: an energy efficiency coefficient for the ith wireless device;

a: a time coefficient;

v_μ: conversion efficiency;

b: a bandwidth;

τ_j: a time coefficient for the jth wireless device;

N₀: the number of wireless devices in the local processing mode;

4) finding an optimal mode selection, i.e. mode selection of all wireless devices, by a depth-deterministic policy gradient method

And

the gradient method of the deep certainty strategy consists of an execution unit, a scoring unit and an environment, and the mode selection of all users

And

are programmed with the states x required by the execution unit_tThe execution unit takes action a on mode selection in the current state

And

make changes and enter the next state x_t+1While receiving the reward r (x) returned by the environment_tA), scoring the cell binding status x_tAction a and reward r (x) returned by the environment_tA) scoring the execution unit, i.e. indicating that the execution unit is in state x_tThe action a is good or bad, the goal of the execution unit is to make the score of the scoring unit higher and better, and the goal of the scoring unit is to make the score of the execution unit played each time close to the real, through the reward r (x)_tA) to adjust; mode selection under the condition of continuous interactive update of execution unit, scoring unit and environment

And

continuously optimizing until the evaluation unit is updated to be optimal, wherein the updating mode of the evaluation unit is as follows:

S(x_t,a)＝r(x_t,a)+γS′(x_t+1,a′) (4)

wherein, each parameter is defined as follows:

x_t: at time t, the system is in the state;

x_t+1: at time t +1, the system is in a state;

a: performing the action taken by the unit in the current state;

a': performing the action taken by the unit in the next state;

S(x_ta): evaluation network in execution Unit in State x_tThe score obtained by the action a is taken;

S′(x_t+1a'): target network in execution Unit is in State x_t+1The score obtained by taking action a' below;

r(x_ta): in state x_tThe reward resulting from taking action a;

γ: rewarding the attenuated specific gravity;

5) mode selection for all wireless devices

And

state x as a depth-deterministic policy gradient method_tAction a is for state x_tThe total calculation rate of the system after the change is compared with a set standard value, and if the total calculation rate is larger than the set standard value, the current reward r (x) is given_tA) is set to a positive value, otherwise to a negative value, and the system enters the next state x_t+1。

Further, in the step 5), an iterative process of the depth deterministic strategy gradient method is as follows:

step 5.1: initializing an execution unit, a scoring unit and a memory base in the gradient method of the depth certainty strategy, wherein the current system state is x_tT is initialized to 1, and the iteration number k is initialized to 1;

step 5.2: when K is less than or equal to a given number of iterations K, in state x_tNext, the execution unit predicts an action a;

step 5.3: action a vs. State x_tChange it to the next state x_t+1And gets the reward r (x) fed back by the environment_t,a)；

Step 5.4: according to the format (x)_t,a,r(x_t,a),x_t+1) Storing the historical experience in a memory base;

step 5.5: the scoring unit receives an action a, a state x_tAnd a prize r (x)_tA), a score S (x) is given to the execution unit_t,a)；

Step 5.6: the execution unit continuously maximizes the score S (x) by updating the parameters of the execution unit_tA), making the user to make high-speed work next time as much as possible;

step 5.7: the scoring unit extracts historical experience in the memory base, continuously learns, updates parameters to enable the score scored by the scoring unit to be as accurate as possible, and returns to the step 5.2 when k is equal to k + 1;

step 5.8: when K is greater than a given number of iterations KThe learning process is finished to obtain the best mode selection

And

the technical conception of the invention is as follows: first, in an internet of things network, a large number of Wireless Devices (WDs) capable of communication and computation are deployed, and due to device size constraints and manufacturing cost considerations, internet of things devices (e.g., sensors) often carry batteries with limited capacity and energy-saving low-performance processors, so that the limited device lifetime and low computing power cannot support more and more sustainable new applications requiring high-performance computation, and due to strict battery capacity constraints, in a battery-powered wireless system, minimizing energy consumption and extending the wireless device operational life cycle is a critical design. Each energy harvesting wireless device follows a binary computation offload policy, i.e., the data set for one task may be performed locally or by remote server offload. To maximize the total computation rate of all wireless devices, an optimal individual computation mode selection method is proposed.

The invention has the following beneficial effects: an optimal mode selection method is found out through a depth certainty strategy gradient method, the total calculation rate of all wireless devices is maximized, the energy consumption is minimized, and the operation life cycle of the wireless devices is prolonged.

Drawings

FIG. 1 is a system model diagram.

Fig. 2 is a flow chart of a method of finding an optimal mode selection.

Detailed Description

The present invention is described in further detail below with reference to the attached drawing figures.

Referring to fig. 1 and 2, a method for maximizing a moving edge computation rate based on a depth deterministic strategy gradient maximizes the sum computation rate of all wireless devices, minimizes energy consumption, and prolongs the wireless device operation life cycle. The present invention proposes an optimal individual computation mode selection method to decide which wireless devices will be tasked with offloading to the base station based on a system model of multiple wireless devices (as shown in fig. 1). The optimal individual calculation mode selection method comprises the following steps (as shown in fig. 2):

1) in an edge computing system powered wirelessly by a base station and a plurality of wireless devices, the base station and each wireless device having a separate antenna; the radio frequency energy emitter and the edge calculation server are integrated in the base station, and the base station is assumed to have a stable energy supply and can broadcast radio frequency energy to all wireless devices; each wireless device has an energy harvesting circuit and a rechargeable battery, and can perform some tasks by storing harvested energy; in this wireless communication system, each wireless device needs to establish contact with a base station, and the channel gain h between the wireless device i and the base station_iThe calculation is as follows:

wherein, each parameter is defined as follows:

A_d: antenna gain;

pi: a circumferential ratio;

f_c: a carrier frequency;

d_i: distance between wireless device i and base station;

d_e: a path loss exponent;

2) assuming that the computing tasks of each wireless device are executed on a local low-performance microprocessor or offloaded to an edge computing server with greater processing power, it will process the computing tasks and then send the results back to the wireless device; suppose a wireless device employs a binary computation offload rule, i.e., a wireless device must choose either a local computation mode or an offload mode; using two non-overlapping sets

And

Expressed as:

3) in a collection

The wireless device in (1) is able to collect energy and process local tasks simultaneously while in the aggregate

the constraint conditions are as follows:

in the formula:

wherein, each parameter is defined as follows:

ω_i: a transition weight for the ith wireless device;

μ: an energy collection efficiency;

p: radio frequency energy transmission power;

phi: the number of calculation cycles required to process each bit of data;

h_i: channel gain of the ith wireless device;

k_i: an energy efficiency coefficient for the ith wireless device;

a: a time coefficient;

v_μ: conversion efficiency;

b: a bandwidth;

τ_j: a time coefficient for the jth wireless device;

N₀: the number of wireless devices in the local processing mode;

And

And

And

And

S(x_t,a)＝r(x_t,a)+γS′(x_t+1,a′) (4)

wherein, each parameter is defined as follows:

x_t: at time t, the system is in the state;

x_t+1: at time t +1, the system is in a state;

a: performing the action taken by the unit in the current state;

a': performing the action taken by the unit in the next state;

r(x_ta): in state x_tThe reward resulting from taking action a;

γ: rewarding the attenuated specific gravity;

5) mode selection for all wireless devices

And

In the step 5), an iterative process of the depth deterministic strategy gradient method is as follows:

step 5.1: initializing an execution unit, a scoring unit and a memory base in the depth deterministic strategy gradient method. The current system state is x_tT is initialized to 1, and the iteration number k is initialized to 1;

step 5.3: action a vs. State x_tChange it to the next state x_t+1And obtainReward r (x) fed back by environment_t,a)；

step 5.8: when K is greater than the given iteration number K, the learning process is ended to obtain the best mode selection

And

Claims

1. a moving edge computation rate maximization method based on a depth deterministic strategy gradient, characterized in that the method comprises the following steps:

1) in an edge computing system powered wirelessly by a base station and a plurality of wireless devices, the base station and each wireless device having a separate antenna; the radio frequency energy emitter and the edge calculation server are integrated in the base station, and the base station is assumed to have a stable energy supply and can broadcast radio frequency energy to all wireless devices; each wireless device has an energy harvesting circuit and a rechargeable battery to perform some task by storing harvested energy; in this wireless communication system, each wireless device needs to establish contact with a base station, and the channel gain h between the wireless device i and the base station_iThe calculation is as follows:

wherein, each parameter is defined as follows:

A_d: antenna gain;

pi: a circumferential ratio;

f_c: a carrier frequency;

d_i: distance between wireless device i and base station;

d_e: a path loss exponent;

2) assuming that the computing tasks of each wireless device are executed on a local low-performance microprocessor or offloaded to an edge computing server with greater processing power, it will process the computing tasks and then send the results back to the wireless device; assuming that the wireless device employs the binary computation offload rule, i.e., one wireless device must choose to be in the local computation mode or the offload mode, we use two sets that do not overlap with each other

And

Expressed as:

3) in a collection

The wireless device in (1) can collect energy and process the book at the same timeA ground task is in a set

The wireless device in (1) can only shunt the task to the base station for processing after collecting energy, and assuming that the computing power and transmission capability of the base station are much stronger than those of the energy collecting wireless device, in this case, during task unloading, the wireless device exhausts the energy collected by the wireless device, and the problem of maximizing the sum of the computing rates of all the wireless devices is described as follows:

the constraint conditions are as follows:

in the formula:

wherein, each parameter is defined as follows:

ω_i: a transition weight for the ith wireless device;

μ: an energy collection efficiency;

p: radio frequency energy transmission power;

phi: the number of calculation cycles required to process each bit of data;

h_i: channel gain of the ith wireless device;

k_i: an energy efficiency coefficient for the ith wireless device;

t: a time coefficient;

v_μ: conversion efficiency;

b: a bandwidth;

τ_j: a time coefficient for the jth wireless device;

N₀: the number of wireless devices in the local processing mode;

And

And

And

make changes and enter the next state x_t+1While receiving the reward r (x) returned by the environment_tA), scoring the cell binding status x_tAction a and reward r (x) returned by the environment_tA) scoring the execution unit, i.e. indicating that the execution unit is in state x_tThe action a is good or bad, the goal of the execution unit is to make the score of the scoring unit higher and better, and the goal of the scoring unit is to make the score of the execution unit played each time close to the real, through the reward r (x)_tA) adjusting mode selection under the condition of continuous interactive update of the execution unit, the scoring unit and the environment

And

S(x_t，a)＝r(x_t，a)+γS′(x_t+1，a′) (4)

wherein, each parameter is defined as follows:

x_t: at time t, the system is in the state;

x_t+1: at time t +1, the system is in a state;

a: performing the action taken by the unit in the current state;

a': performing the action taken by the unit in the next state;

r(x_ta): in state x_tThe reward resulting from taking action a;

γ: rewarding the attenuated specific gravity;

5) mode selection for all wireless devices

And

2. The depth-deterministic-policy-gradient-based moving-edge computation rate maximization method of claim 1, wherein in the step 5), an iterative process of the depth-deterministic-policy gradient method is as follows:

step 5.1: initializing an execution unit, a comment unit and a memory library in the depth certainty strategy gradient method, wherein the current system state is x_tT is initialized to 1, and the iteration number k is initialized to 1;

step 5.3: action a vs. State x_tChange it to the next state x_t+1And gets the reward r (x) fed back by the environment_t，a)；

Step 5.4: according to the format (x)_t，a，r(x_t，a)，x_t+1) Storing the historical experience in a memory base;

step 5.5: the scoring unit receives an action a, a state x_tAnd a prize r (x)_tA), a score S (x) is given to the execution unit_t，a)；

Step 5.6: the execution unit continuously maximizes the score by updating the self parameterS(x_tA), making the user to make high-speed work next time as much as possible;

And