CN111465032A

CN111465032A - Task unloading method and system based on A3C algorithm in multi-wireless body area network environment

Info

Publication number: CN111465032A
Application number: CN202010221507.5A
Authority: CN
Inventors: 王力立; 张戈; 奚思遥; 肖强; 黄成�; 单梁
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2020-07-28
Anticipated expiration: 2040-03-26
Also published as: CN111465032B

Abstract

The invention discloses a task unloading method and a task unloading system based on an A3C algorithm in a multi-wireless body area network environment. The method comprises the following steps: determining a network architecture of a multi-wireless body area network, and initializing network parameters; training a task classifier by using the sampled physiological data to obtain a stable classifier model; training the network resource allocation problem by adopting an A3C algorithm based on deep reinforcement learning to obtain a convergent decision network; and (3) task unloading according to the obtained model: and at each moment, firstly, carrying out task classification by using a classifier model, and then carrying out user channel access and edge server computing resource allocation according to a decision network. The method improves the time delay and energy consumption performance of multi-wireless body area network task unloading, and can be widely applied to the practical application scenes of body area networks such as remote medical treatment, health monitoring and the like.

Description

Task unloading method and system based on A3C algorithm in multi-wireless body area network environment

Technical Field

The invention belongs to the field of wireless communication networks, and particularly relates to a task unloading method and system based on an A3C algorithm in a multi-wireless body area network environment.

Background

The wireless body area network is a wireless sensor network taking a human body as a monitoring object. Because the human body has mobility, internetwork interference is more easily generated among a plurality of body area networks, and how to collect and manage data among the plurality of networks is an important direction for researching the body area networks. The current research shows that the body area network has the characteristics of mobility, intensive calculation, low time delay and the like, and the task unloading can be completed by edge calculation in an auxiliary way, namely, base stations equipped with edge servers are placed at the edges of a plurality of networks to perform unified collection and processing of tasks. Because the specific body area network of the monitored object has stricter requirements on time delay and energy consumption, a reasonable task unloading method must be designed to ensure low time delay and low energy consumption of data transmission.

In the existing research on data transmission between a multi-body area network and a data center, most algorithm research is based on a generalized communication network, and no attempt is made to carry out targeted research by combining the data characteristics and the user characteristics of the body area network. In fact, however, the physiological data monitored by the body area network has very important practical significance, and the movement track of the body area network user has characteristics of the body area network user. The existing unloading method does not consider the characteristics, so that the strict requirements of time delay and energy consumption of the wireless body area network cannot be met.

Disclosure of Invention

The invention aims to provide a task unloading method and a task unloading system in a multi-wireless body area network environment, so that the task state and the moving characteristic of a user can be fully considered when the system unloads tasks, and the aim of achieving smaller system time delay and energy consumption is fulfilled.

The technical solution for realizing the purpose of the invention is as follows: a method for task offloading based on A3C algorithm in a multi-wireless body area network environment, the method comprising the steps of:

step 1, constructing network architectures of a plurality of wireless body area networks and initializing network parameters;

step 2, collecting physiological data of a user, training a classifier according to the data, and obtaining a task classifier;

step 3, training the resource allocation problem during task unloading by using an A3C algorithm to obtain a decision network;

and 4, unloading the tasks of the multi-wireless body area network according to the obtained task classifier and the decision network.

Further, in the network architecture of the multiple wireless body area networks in step 1, the network parameters include a user set

Base station set

RGMM mobility model parameters of the subscriber, base station location l_s＝(x_s,y_s) Channel gain h_d,s(t) data transfer rate R_d,sTask category β_d∈ {0,1}, task offload energy consumption e_dAnd task offload delay t_d。

Further, the training classifier in step 2 obtains a user task classifier, and the specific process includes:

step 2-1, estimating a stationary interval of each physiological characteristic by using t-distribution; for a certain physiological characteristic x, the upper limit x of the stationary interval_upAnd a lower limit x_lowRespectively as follows:

in the formula (I), the compound is shown in the specification,

and s_xRespectively, the mean value and the standard deviation corresponding to x, n is the number of the physiological data samples corresponding to the physiological characteristic x, t_α,n-1Representing the t-distribution coefficient when the sample size is n;

step 2-2, adding a label for each physiological characteristic corresponding to the physiological characteristic, specifically comprising: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data sample outside the stable interval to represent an emergency task.

And 2-3, inputting the physiological data sample processed in the step 2-2 into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task type of the data.

Further, in step 3, the resource allocation problem during task offloading is trained by using an A3C algorithm, and the specific process includes:

step 3-1, the resource allocation problem is converted into a Markov decision problem, and a Markov decision problem model, namely a decision network, specifically comprises the following steps: state S_tAnd action a_tAnd a prize value r_t；

Will state S_tIs set as { b_d(t),β_d(t),l_d(t),E_d(t) }, in which the first two terms b_d(t)、β_d(t) two quantities related to the task data, respectively representing the data quantity of the task and the task category flag; third item l_d(t) is the location status of user d; fourth item E_d(t) is an energy state;

will act a_tIs arranged as α_d,s∈ {0,1} and f_d,s，α_d,sIndicating whether to offload the task of user d to base station s, f_d,sRepresenting the computational resources allocated by base station s to user d,

will award the value r_tThe method comprises the following steps:

in the formula, K_dFor the benefit of the system, t_staticAnd e_staticRespectively representing the time delay and energy consumption under the static allocation method, t_dAnd e_dRespectively representing the time for user d to complete the task and the total energy consumption,

weight factors of time delay and energy consumption respectively

And is

Step 3-2, training the decision network, specifically comprising: according to a determined state s_tDetermining the action a in this state by the decision network_tI.e. the base station to which each user should access and the calculation resources allocated by the base station, and then enter a new state to obtain the reward r_tObtaining an empirical sequence(s)_t,a_t,r_t) Defining the dominance function A(s)_t,a_t) Represents a state s_tLower motion a_tThe degree of superiority of (c):

wherein Q(s)_t,a_t) As a function of Q value, V(s)_t) As a function of value, gamma is a discount factor, pi_ωTo decide to offload a method;

iteratively updating the decision network parameters until a reward function of the decision network converges, wherein an iterative updating formula is as follows:

in the formula, pi_w(s_t,a_t) Is shown in state s_tLower selection action a_tTheta is a parameter of the decision network, E is a mean function, ▽_wIs a gradient operator.

Further, in step 4, the task offloading of the multi-radio body area network is performed according to the obtained task classifier and the decision network, and the specific process includes: and at each moment, carrying out task classification by using the trained task classifier, inputting the state of the multi-body-area network system into a decision network according to a classification result, and outputting the results of the user channel access base station and the base station computing resource allocation by the network.

A task offloading system based on A3C algorithm in a multi-wireless body area network environment, the system comprising:

the network construction module is used for constructing network architectures of a plurality of wireless body area networks and initializing network parameters;

the task classifier generating module is used for acquiring physiological data of the user and training a classifier according to the data to obtain a task classifier;

the decision network generation module is used for training the resource allocation problem during task unloading by utilizing an A3C algorithm to obtain a decision network;

and the task unloading module is used for unloading the tasks of the multi-wireless body area network according to the obtained task classifier and the decision network.

Compared with the prior art, the invention has the following remarkable advantages: 1) the data characteristics in the wireless body area network and the mobile characteristics of users are comprehensively considered, and the time delay and the energy consumption of system task unloading are reduced; 2) the A3C algorithm based on deep reinforcement learning is adopted to optimize the task unloading process of the multi-wireless body area network, and intelligent and autonomous dynamic unloading of the system can be realized under the condition that the system environment is unknown.

The present invention is described in further detail below with reference to the attached drawing figures.

Drawings

Fig. 1 is a flow diagram of a method for task offloading based on the A3C algorithm in a multi-wireless body area network environment, under an embodiment.

FIG. 2 is a flow diagram of training a task classifier in one embodiment.

Figure 3 is a diagram of a multi-wireless body area network architecture in one embodiment.

FIG. 4 is a graph of training benefit variation of the A3C algorithm in one embodiment.

FIG. 5 is a graph of variation in training benefit based on a greedy algorithm in one embodiment.

Detailed Description

In one embodiment, in conjunction with fig. 1, there is provided a method for task offloading based on A3C algorithm in a multi-wireless body area network environment, the method comprising the steps of:

Further, in one embodiment, the network architecture of the plurality of wireless body area networks in step 1, the network parameters of which include the user set

Base station set

Further, in one embodiment, with reference to fig. 2, the training of the classifier in step 2 to obtain the user task classifier includes:

step 2-1, estimating the average of each physiological characteristic by using t-distributionA stable interval; for a certain physiological characteristic x, the upper limit x of the stationary interval_upAnd a lower limit x_lowRespectively as follows:

in the formula (I), the compound is shown in the specification,

Further, in one embodiment, the resource allocation problem during task offloading is trained by using an A3C algorithm in step 3 to obtain a decision network, and the specific process includes:

Will state S_tIs set as { b_d(t),β_d(t),l_d(t),E_d(t) }, in which the first two terms b_d(t)、β_d(t) two quantities related to the task data, respectively representing the data quantity of the task and the task category flag; third item l_d(t) isThe location status of user d; fourth item E_d(t) is an energy state;

will award the value r_tThe method comprises the following steps:

weight factors of time delay and energy consumption respectively

And is

Further, in one embodiment, the task offloading of the multi-radio body area network is performed according to the obtained task classifier and the decision network in step 4, and the specific process includes: and at each moment, carrying out task classification by using the trained task classifier, inputting the state of the multi-body-area network system into a decision network according to a classification result, and outputting the results of the user channel access base station and the base station computing resource allocation by the network.

Further, in one embodiment, the task classifier generating module includes:

a plateau region setting unit forEstimating a stationary interval of each physiological characteristic by using the t-distribution; for a certain physiological characteristic x, the upper limit x of the stationary interval_upAnd a lower limit x_lowRespectively as follows:

in the formula (I), the compound is shown in the specification,

the task labeling unit is used for adding a label to the corresponding physiological data sample according to each physiological characteristic, and specifically comprises: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data sample outside the stable interval to represent an emergency task.

And the classifier training unit is used for inputting the physiological data samples processed by the task labeling unit into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task type of the data.

Further, in one embodiment, the decision network generating module includes:

a decision network construction unit, configured to convert the resource allocation problem into a markov decision problem, where the markov decision problem model, i.e., the decision network specifically includes: state S_tAnd action a_tAnd a prize value r_t；

Will state S_tIs set as { b_d(t),β_d(t),l_d(t),E_d(t) }, in which the first two terms b_d(t)、β_d(t) two quantities related to the task data, respectively representing the data quantity of the task and the task category flag;third item l_d(t) is the location status of user d; fourth item E_d(t) is an energy state;

will award the value r_tThe method comprises the following steps:

weight factors of time delay and energy consumption respectively

And is

The decision network training unit is used for training a decision network, and specifically comprises: according to a determined state s_tDetermining the action a in this state by the decision network_tI.e. the base station to which each user should access and the calculation resources allocated by the base station, and then enter a new state to obtain the reward r_tObtaining an empirical sequence(s)_t,a_t,r_t) Defining the dominance function A(s)_t,a_t) Represents a state s_tLower motion a_tThe degree of superiority of (c):

In one embodiment, as a specific example, the present invention is further explained and verified, and the specific contents include:

firstly, a multi-wireless body area network system is established according to the architecture of fig. 3, and initialization of network parameters is carried out. And then, according to the collected human physiological data, performing the calculation of the stable interval, the addition of the data label and the training of the classifier in the step 2. From these data sets, training of the task off-loading method based on the A3C algorithm was performed.

According to the step 3-1, the state s of the task unloading problem in the embodiment is obtained_tAnd action a_tPrize r_tModeling is carried out, and the time delay has more severe requirements for the body area network taking health monitoring as the target, so the weight factors of the time delay and the energy consumption in the step 3-1 are set as

The decision network is then trained according to step 3-2 using the A3C algorithm. Parameters in the algorithm are set as: the discount factor γ is 0.99, and the learning rate is 0.001.

In the training phase, after each task unloading is finished, the state vector s of the system is calculated_tInputting the vector into decision network, outputting the unloading method at next moment to unload task, delaying timeAnd energy consumption is fed back to the decision network in the form of reward values, these values are recorded and the dominance function A(s) is calculated_t,a_t) And then updating the parameters of the decision network until the average reward converges.

Fig. 4 and fig. 5 are graphs of system delay and energy consumption benefit changes after the present embodiment respectively adopts the conventional unloading method and the unloading method based on A3C (A3C-based unloading and Joint Resource Allocation, AOJRA). The traditional unloading method is an unloading method based on Greedy thought (GOJRA).

In fig. 4, the system benefit of the AOJRA method is around 0.8 when training starts in 3000 training cycles, and rapidly increases under continuous training, and stabilizes around 7 at about 2000 training cycles. According to the definition of the system benefit function in step 3-1, a benefit value of 7 indicates that the total benefit of system delay and energy consumption is 7 with respect to the SORA method. Considering that the number of system users in the embodiment is 20, the total benefit is averagely allocated to each user and is 0.35, which means that compared with the SORA method, the AOJRA method of the present invention averagely improves the delay and energy consumption performance of each user by 35%. Through similar analysis, the GOJRA method in fig. 5 can improve the delay and power consumption performance of each user by 29% on average compared with the SORA method.

Compared with the traditional GOJRA method, the AOJRA method can improve the time delay and energy consumption performance of users more, not only considers the influence of channel gain during task unloading, but also further considers the mutual interference of different users during data transmission at the same time, and can effectively avoid the network congestion caused by the fact that a large number of users select the same base station to perform data transmission in the same time and the time delay and energy consumption increase caused by the shortage of base station computing resources.

In conclusion, the method reduces the time delay and energy consumption of system task unloading under the condition of considering the data characteristics of the wireless body area network and the mobile characteristics of the user. The invention can improve the capability of the wireless body area network to more rapidly serve human life, and can be widely applied to the practical application scenes of the body area network such as remote medical treatment, health monitoring and the like.

Claims

1. A task unloading method based on A3C algorithm in multi-wireless body area network environment is characterized by comprising the following steps:

2. The method of claim 1, wherein the network parameters of the network architecture of the plurality of wireless body area networks of step 1 include a set of users

Base station set

3. The method for task offloading based on A3C algorithm in a multi-wireless body area network environment according to claim 1 or 2, wherein the step 2 of training the classifier to obtain the user task classifier comprises the following specific steps:

in the formula (I), the compound is shown in the specification,

4. The method for task offloading based on A3C algorithm in a multi-radio body area network environment according to claim 3, wherein the step 3 trains a resource allocation problem during task offloading by using an A3C algorithm to obtain a decision network, and the specific process includes:

Will state S_tIs set as { b_d(t),β_d(t),l_d(t),E_d(t) }, in which the first two terms b_d(t)、β_d(t) two quantities related to task data, each representing data of a taskVolume and task category flags; third item l_d(t) is the location status of user d; fourth item E_d(t) is an energy state;

will award the value r_tThe method comprises the following steps:

weight factors of time delay and energy consumption respectively

And is

in the formula, pi_w(s_t,a_t) Is shown in state s_tLower selection action a_tTheta is a parameter of the decision network, E is a mean function,

is a gradient operator.

5. The method for task offloading based on A3C algorithm in an environment of multiple wireless body area networks according to claim 4, wherein the step 4 of task offloading of multiple wireless body area networks according to the obtained task classifier and decision network comprises: and at each moment, carrying out task classification by using the trained task classifier, inputting the state of the multi-body-area network system into a decision network according to a classification result, and outputting the results of the user channel access base station and the base station computing resource allocation by the network.

6. A task offloading system based on A3C algorithm in a multi-wireless body area network environment, the system comprising:

7. The system of claim 6, wherein the task classifier generation module comprises:

a stationary interval setting unit for estimating a stationary interval of each physiological characteristic using the t-distribution; for a certain physiological characteristic x, the upper limit x of the stationary interval_upAnd a lower limit x_lowRespectively as follows:

in the formula (I), the compound is shown in the specification,

8. The system for task offloading based on the A3C algorithm in a multi-wireless body area network environment of claim 7, wherein the decision network generation module comprises:

will award the value r_tThe method comprises the following steps:

weight factors of time delay and energy consumption respectively

And is

is a gradient operator.